* Re: [ammarfaizi2-block:dhowells/linux-fs/fscache-fixes] [mm, netfs, fscache] 6919cda8e0: canonical_address#:#[##]
[not found] <[email protected]>
@ 2022-12-11 18:27 ` Linus Torvalds
0 siblings, 0 replies; only message in thread
From: Linus Torvalds @ 2022-12-11 18:27 UTC (permalink / raw)
To: kernel test robot
Cc: David Howells, oe-lkp, lkp, Rohith Surabattula, Matthew Wilcox,
Steve French, Shyam Prasad N, Dave Wysochanski,
Dominique Martinet, Ilya Dryomov, Ammar Faizi,
GNU/Weeb Mailing List, v9fs-developer, linux-afs, linux-cachefs,
ceph-devel, linux-cifs, samba-technical, linux-fsdevel, linux-mm
The disassembly isn't great, because the test robot doesn't try to
find where the instructions start, but before that
> 4: 48 8b 57 18 mov 0x18(%rdi),%rdx
instruction we also had a
mov (%rdi),%rax
and it looks like this is the very top of 'filemap_release_folio()',
so '%rdi' contains the folio pointer coming into this.
End result:
On Sun, Dec 11, 2022 at 6:27 AM kernel test robot <[email protected]> wrote:
>
> 4: 48 8b 57 18 mov 0x18(%rdi),%rdx
> 8: 83 e0 01 and $0x1,%eax
> b: 74 59 je 0x66
The
and $0x1,%eax
je 0x66
above is the test for
BUG_ON(!folio_test_locked(folio));
where it's jumping out to the 'ud2' in case the lock bit (bit #0) isn't set.
Then we have this:
> d: 48 f7 07 00 60 00 00 testq $0x6000,(%rdi)
> 14: 74 22 je 0x38
Which is testing PG_private | PG_private2, and jumping out (which we
also don't do) if neither is set.
And then we have:
> 16: 48 8b 07 mov (%rdi),%rax
> 19: f6 c4 80 test $0x80,%ah
> 1c: 75 32 jne 0x50
Which is checking for PG_writeback.
So then we get to
if (mapping && mapping->a_ops->release_folio)
return mapping->a_ops->release_folio(folio, gfp);
which is this:
> 1e: 48 85 d2 test %rdx,%rdx
> 21: 74 34 je 0x57
This %rdx value is the early load from the top of the function, it's
checking 'mapping' for NULL.
It's not NULL, but it's some odd value according to the oops report:
RDX: ffff889f03987f71
which doesn't look like it's valid (well, it's a valid kernel pointer,
but it's not aligned like a 'mapping' pointer should be.
So now when we're going to load 'a_ops' from there, we load another
garbage value:
> 23: 48 8b 82 90 00 00 00 mov 0x90(%rdx),%rax
and we now have RAX: b000000000000000
and then the 'a_ops->release_folio' access will trap:
> 2a:* 48 8b 40 48 mov 0x48(%rax),%rax <-- trapping instruction
> 2e: 48 85 c0 test %rax,%rax
> 31: 74 24 je 0x57
The above is the "load a_ops->release_folio and test it for NULL", but
the load took a page fault because RAX was garbage.
But RAX was garbage because we already had a bogus "mapping" pointer earlier.
Now, why 'mapping' was bogus, I don't know. Maybe that page wasn't a
page cache page at all? The mapping field is in a union and can
contain other things.
So I have no explanation for the oops, but I thought I'd just post the
decoding of the instruction stream in case that helps somebody else to
figure it out.
Linus
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2022-12-11 18:33 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <[email protected]>
2022-12-11 18:27 ` [ammarfaizi2-block:dhowells/linux-fs/fscache-fixes] [mm, netfs, fscache] 6919cda8e0: canonical_address#:#[##] Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox