From: Al Viro <[email protected]>
To: Christian Brauner <[email protected]>
Cc: Jens Axboe <[email protected]>,
syzbot <[email protected]>,
[email protected], [email protected],
[email protected], [email protected]
Subject: Re: [syzbot] WARNING in mntput_no_expire (2)
Date: Mon, 5 Apr 2021 16:18:58 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <20210405114437.hjcojekyp5zt6huu@wittgenstein>
On Mon, Apr 05, 2021 at 01:44:37PM +0200, Christian Brauner wrote:
> On Sun, Apr 04, 2021 at 08:17:21PM +0000, Al Viro wrote:
> > On Sun, Apr 04, 2021 at 06:50:10PM +0000, Al Viro wrote:
> >
> > > > Yeah, I have at least namei.o
> > > >
> > > > https://drive.google.com/file/d/1AvO1St0YltIrA86DXjp1Xg3ojtS9owGh/view?usp=sharing
> > >
> > > *grumble*
> > >
> > > Is it reproducible without KASAN? Would be much easier to follow the produced
> > > asm...
> >
> > Looks like inode_permission(_, NULL, _) from may_lookup(nd). I.e.
> > nd->inode == NULL.
>
> Yeah, I already saw that.
>
> >
> > Mind slapping BUG_ON(!nd->inode) right before may_lookup() call in
> > link_path_walk() and trying to reproduce that oops?
>
> Yep, no problem. If you run the reproducer in a loop for a little while
> you eventually trigger the BUG_ON() and then you get the following splat
> (and then an endless loop) in [1] with nd->inode NULL.
>
> _But_ I managed to debug this further and was able to trigger the BUG_ON()
> directly in path_init() in the AT_FDCWD branch (after all its AT_FDCWD(./file0)
> with the patch in [3] (it's in LOOKUP_RCU) the corresponding splat is in [2].
> So the crash happens for a PF_IO_WORKER thread with a NULL nd->inode for the
> PF_IO_WORKER's pwd (The PF_IO_WORKER seems to be in async context.).
So we find current->fs->pwd.dentry negative, with current->fs->seq sampled
equal before and after that? Lovely... The only places where we assign
anything to ->pwd.dentry are
void set_fs_pwd(struct fs_struct *fs, const struct path *path)
{
struct path old_pwd;
path_get(path);
spin_lock(&fs->lock);
write_seqcount_begin(&fs->seq);
old_pwd = fs->pwd;
fs->pwd = *path;
write_seqcount_end(&fs->seq);
spin_unlock(&fs->lock);
if (old_pwd.dentry)
path_put(&old_pwd);
}
where we have ->seq bumped between dget new/assignment/ dput old,
copy_fs_struct() where we have
spin_lock(&old->lock);
fs->root = old->root;
path_get(&fs->root);
fs->pwd = old->pwd;
path_get(&fs->pwd);
spin_unlock(&old->lock);
fs being freshly allocated instance that couldn't have been observed
by anyone and chroot_fs_refs(), where we have
spin_lock(&fs->lock);
write_seqcount_begin(&fs->seq);
hits += replace_path(&fs->root, old_root, new_root);
hits += replace_path(&fs->pwd, old_root, new_root);
write_seqcount_end(&fs->seq);
while (hits--) {
count++;
path_get(new_root);
}
spin_unlock(&fs->lock);
...
static inline int replace_path(struct path *p, const struct path *old, const struct path *new)
{
if (likely(p->dentry != old->dentry || p->mnt != old->mnt))
return 0;
*p = *new;
return 1;
}
Here we have new_root->dentry pinned from the very beginning,
and assignments are wrapped into bumps of ->seq. Moreover,
we are holding ->lock through that sequence (as all writers
do), so these references can't be dropped before path_get()
bumps new_root->dentry refcount.
chroot_fs_refs() is called only by pivot_root(2):
chroot_fs_refs(&root, &new);
and there new is set by
error = user_path_at(AT_FDCWD, new_root,
LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &new);
if (error)
goto out0;
which pins new.dentry *and* verifies that it's positive and a directory,
at that. Since pinned positive dentry can't be made negative by anybody
else, we know it will remain in that state until
path_put(&new);
well downstream of chroot_fs_refs(). In copy_fs_struct() we are
copying someone's ->pwd, so it's also pinned positive. And it
won't be dropped outside of old->lock, so by the time somebody
manages to drop the reference in old, path_get() effects will be
visible (old->lock serving as a barrier).
That leaves set_fs_pwd() calls:
fs/init.c:54: set_fs_pwd(current->fs, &path);
init_chdir(), path set by LOOKUP_DIRECTORY patwalk. Pinned positive.
fs/namespace.c:4207: set_fs_pwd(current->fs, &root);
init_mount_tree(), root.dentry being ->mnt_root of rootfs. Pinned
positive (and it would've oopsed much earlier had that been it)
fs/namespace.c:4485: set_fs_pwd(fs, &root);
mntns_install(), root filled by successful LOOKUP_DOWN for "/"
from mnt_ns->root. Should be pinned positive.
fs/open.c:501: set_fs_pwd(current->fs, &path);
chdir(2), path set by LOOKUP_DIRECTORY pathwalk. Pinned positive.
fs/open.c:528: set_fs_pwd(current->fs, &f.file->f_path);
fchdir(2), file->f_path of any opened file. Pinned positive.
kernel/usermode_driver.c:130: set_fs_pwd(current->fs, &umd_info->wd);
umd_setup(), ->wd.dentry equal to ->wd.mnt->mnt_root, should be pinned positive.
kernel/nsproxy.c:509: set_fs_pwd(me->fs, &nsset->fs->pwd);
commit_nsset(). Let's see what's going on there...
if ((flags & CLONE_NEWNS) && (flags & ~CLONE_NEWNS)) {
set_fs_root(me->fs, &nsset->fs->root);
set_fs_pwd(me->fs, &nsset->fs->pwd);
}
In those conditions nsset.fs has come from copy_fs_struct() done in
prepare_nsset(). And the only thing that might've been done to it
would be those set_fs_pwd() in mntns_install() (I'm not fond of the
entire nsset->fs thing - looks like papering over bad calling
conventions, but anyway)
Now, I might've missed some insanity (direct assignments to ->pwd.dentry,
etc. - wouldn't be the first time io_uring folks went "layering? wassat?
we'll just poke in whatever we can reach"), but I don't see anything
obvious of that sort in the area...
OK, how about this: in path_init(), right after
do {
seq = read_seqcount_begin(&fs->seq);
nd->path = fs->pwd;
nd->inode = nd->path.dentry->d_inode;
nd->seq = __read_seqcount_begin(&nd->path.dentry->d_seq);
} while (read_seqcount_retry(&fs->seq, seq));
slap
if (!nd->inode) {
// should never, ever happen
struct dentry *fucked = nd->path.dentry;
printk(KERN_ERR "%pd4 %d %x %p %d %d", fucked, d_count(fucked),
fucked->d_flags, fs, fs->users, seq);
BUG_ON(1);
return ERR_PTR(-EINVAL);
}
and see what it catches?
next prev parent reply other threads:[~2021-04-05 16:19 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <[email protected]>
2021-04-01 15:45 ` [syzbot] WARNING in mntput_no_expire (2) Christian Brauner
2021-04-01 16:09 ` Jens Axboe
2021-04-01 17:46 ` Christian Brauner
2021-04-01 17:59 ` Christian Brauner
2021-04-01 19:11 ` Al Viro
2021-04-04 2:34 ` Al Viro
2021-04-04 2:38 ` Al Viro
2021-04-04 11:34 ` Christian Brauner
2021-04-04 15:56 ` Al Viro
2021-04-04 16:40 ` Christian Brauner
2021-04-04 16:44 ` Al Viro
2021-04-04 17:05 ` Christian Brauner
2021-04-04 18:50 ` Al Viro
2021-04-04 20:17 ` Al Viro
2021-04-05 11:44 ` Christian Brauner
2021-04-05 16:18 ` Al Viro [this message]
2021-04-05 17:08 ` Christian Brauner
2021-04-05 18:23 ` Al Viro
2021-04-05 18:28 ` Al Viro
2021-04-05 20:07 ` Christian Brauner
2021-04-06 1:38 ` Al Viro
2021-04-06 2:24 ` Al Viro
2021-04-06 12:35 ` Christian Brauner
2021-04-06 13:13 ` Al Viro
2021-04-06 13:22 ` Christian Brauner
2021-04-06 14:15 ` Al Viro
2021-04-06 14:23 ` Al Viro
2021-04-06 15:37 ` Jens Axboe
2021-04-06 14:46 ` Christian Brauner
2021-04-04 16:52 ` Christian Brauner
2021-04-04 16:55 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox