[RFC] struct filename, io_uring and audit troubles

public inbox for [email protected]
 help / color / mirror / Atom feed

* [RFC] struct filename, io_uring and audit troubles
@ 2024-09-22  0:49 Al Viro
  2024-09-22  4:10 ` Al Viro
  2024-09-23  1:50 ` Al Viro
  0 siblings, 2 replies; 23+ messages in thread
From: Al Viro @ 2024-09-22  0:49 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: audit, io-uring

	Looks like things like async unlink might fuck the audit
very badly.  io_uring does getname() in originating thread and uses the
result at the time of operation, which can happen in a different thread.
Moreover, by that time the original syscall might have very well returned.

	The trouble is, getname() establishes linkage between the struct
filename and struct audit_name; filename->aname and audit_name->name
respectively.  struct filename can get moved from one thread to another;
struct audit_name is very much tied to audit_context, which is per-thread
- first few (5, currently) audit_name instances are embedded into
audit_context.	The rest gets allocated dynamically, but all of them
are placed on audit_context::names_list.

	At audit_free_names() they are all wiped out - references
back to filename instances are dropped, dynamically allocated ones
are freed, and while embedded ones survive, they are zeroed out on
reuse by audit_alloc_name().  audit_free_names() is called on each
audit_reset_context(), which is done by __audit_syscall_exit() and
(in states other than AUDIT_SYSCALL_CONTEXT) __audit_uring_exit().

	Linkage from filename to audit_name is used by __audit_inode().
It definitely expects the reference back to filename to be stable.
And in situation when io_uring has offloaded a directory operation to
helper thread, that is not guaranteed.

	Another fun bit is that both audit_inode() and audit_inode_child()
may bump the refcount on struct filename.  Which can get really fishy
if they get called by helper thread while the originator is exiting the
syscall - putname() from audit_free_names() in originator vs. refcount
increment in helper is Not Nice(tm), what with the refcount not being
atomic.

	Potential solutions:

* don't bother with audit_name creation and linkage in getname(); do that
when we start using the sucker.  Doing that from __set_nameidata() will
catch the majority of the stuff that ever gets audit_inode* called for it
(the only exceptions are mq_open(2) and mq_unlink(2)).  Unfortunately,
each audit_name instance gets spewed into logs, so we would need to
bring the rest of that shite in, including the things like symlink
bodies (note that for io_uring-originating symlink we'd need that done
in do_symlinkat()), etc.  Unpleasant, that.

* make refcount atomic, add a pointer to audit_context or even task_struct
in audit_name, have the "use name->aname if the type is acceptable"
logics in audit_inode dependent upon the name->aname->owner matching
what we want.  With some locking to make the check itself safe.

* make refcount atomic, get rid of ->aname and have audit_inode() scan
the names_list for entries with matching ->name and type - and that
before the existing scan with ->name->name comparisons.

* something else?

	Suggestions _not_ involving creative uses of TARDIS would
be welcome.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-22  0:49 [RFC] struct filename, io_uring and audit troubles Al Viro
@ 2024-09-22  4:10 ` Al Viro
  2024-09-22 15:09   ` Al Viro
  2024-09-23  1:50 ` Al Viro
  1 sibling, 1 reply; 23+ messages in thread
From: Al Viro @ 2024-09-22  4:10 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: audit, io-uring

On Sun, Sep 22, 2024 at 01:49:01AM +0100, Al Viro wrote:

> * don't bother with audit_name creation and linkage in getname(); do that
> when we start using the sucker.  Doing that from __set_nameidata() will
> catch the majority of the stuff that ever gets audit_inode* called for it
> (the only exceptions are mq_open(2) and mq_unlink(2)).  Unfortunately,
> each audit_name instance gets spewed into logs, so we would need to
> bring the rest of that shite in, including the things like symlink
> bodies (note that for io_uring-originating symlink we'd need that done
> in do_symlinkat()), etc.  Unpleasant, that.

BTW, how much is exact order and number of PATH items in audit logs cast
in stone?

For example,
        char s[2][20] = {"/tmp/blah/x", "/tmp/blah/x"};
	rename(s[0], s[1]);
	rename(s[0], s[0]);

produces 2 items (both for /tmp/blah) on the first call, and only 1 on
the second.  Does anything care about preserving that distinction?

And what in audit_inode{,_child}() behaviour can be changed?  I mean, does
anything care about the loop directions when we pick the candidate
audit_name for conversion, etc.?

It's been a long time since I've touched audit, and I have done my best
to purge memories of the specifications ;-/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-22  4:10 ` Al Viro
@ 2024-09-22 15:09   ` Al Viro
  0 siblings, 0 replies; 23+ messages in thread
From: Al Viro @ 2024-09-22 15:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: audit, io-uring

On Sun, Sep 22, 2024 at 05:10:06AM +0100, Al Viro wrote:
> On Sun, Sep 22, 2024 at 01:49:01AM +0100, Al Viro wrote:
> 
> > * don't bother with audit_name creation and linkage in getname(); do that
> > when we start using the sucker.  Doing that from __set_nameidata() will
> > catch the majority of the stuff that ever gets audit_inode* called for it
> > (the only exceptions are mq_open(2) and mq_unlink(2)).  Unfortunately,
> > each audit_name instance gets spewed into logs, so we would need to
> > bring the rest of that shite in, including the things like symlink
> > bodies (note that for io_uring-originating symlink we'd need that done
> > in do_symlinkat()), etc.  Unpleasant, that.
> 
> BTW, how much is exact order and number of PATH items in audit logs cast
> in stone?
> 
> For example,
>         char s[2][20] = {"/tmp/blah/x", "/tmp/blah/x"};
> 	rename(s[0], s[1]);
> 	rename(s[0], s[0]);
> 
> produces 2 items (both for /tmp/blah) on the first call, and only 1 on
> the second.  Does anything care about preserving that distinction?
> 
> And what in audit_inode{,_child}() behaviour can be changed?  I mean, does
> anything care about the loop directions when we pick the candidate
> audit_name for conversion, etc.?
> 
> It's been a long time since I've touched audit, and I have done my best
> to purge memories of the specifications ;-/

Speaking of which, is there anything in specs that would require -F obj_uid=0
to give a match on symlink("foo", "bar") done by non-root in non-root-owned
directory, but not on symlink("foo", "foo") in the same conditions?

Put it another way, could we possibly make the predicates that refer to inode
state *not* evaluate to true on audit_name instances that had never been
associated with any inodes?  Currently, symlink body has such an audit_name
instance, except when that instance had been picked by audit_inode_parent()
for conversion (or had been shared from the very beginning in case of symlink(2)
that had identical pointers passed in both arguments).

Al, carefully abstaining from any comments on the sanity of said specifications -
it's a government document, after all, so those should be obvious...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-22  0:49 [RFC] struct filename, io_uring and audit troubles Al Viro
  2024-09-22  4:10 ` Al Viro
@ 2024-09-23  1:50 ` Al Viro
  2024-09-23  6:30   ` Jens Axboe
  1 sibling, 1 reply; 23+ messages in thread
From: Al Viro @ 2024-09-23  1:50 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: audit, io-uring

On Sun, Sep 22, 2024 at 01:49:01AM +0100, Al Viro wrote:

> 	Another fun bit is that both audit_inode() and audit_inode_child()
> may bump the refcount on struct filename.  Which can get really fishy
> if they get called by helper thread while the originator is exiting the
> syscall - putname() from audit_free_names() in originator vs. refcount
> increment in helper is Not Nice(tm), what with the refcount not being
> atomic.

*blink*

OK, I really wonder which version had I been reading at the time; refcount
is, indeed, atomic these days.

Other problems (->aname pointing to other thread's struct audit_names
and outliving reuse of those, as well as insane behaviour of audit predicates
on symlink(2)) are, unfortunately, quite real - on the current mainline.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-23  1:50 ` Al Viro
@ 2024-09-23  6:30   ` Jens Axboe
  2024-09-23 12:54     ` Paul Moore
  2024-09-23 15:07     ` Al Viro
  0 siblings, 2 replies; 23+ messages in thread
From: Jens Axboe @ 2024-09-23  6:30 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel; +Cc: audit, io-uring

On 9/22/24 7:50 PM, Al Viro wrote:
> On Sun, Sep 22, 2024 at 01:49:01AM +0100, Al Viro wrote:
> 
>> 	Another fun bit is that both audit_inode() and audit_inode_child()
>> may bump the refcount on struct filename.  Which can get really fishy
>> if they get called by helper thread while the originator is exiting the
>> syscall - putname() from audit_free_names() in originator vs. refcount
>> increment in helper is Not Nice(tm), what with the refcount not being
>> atomic.
> 
> *blink*
> 
> OK, I really wonder which version had I been reading at the time; refcount
> is, indeed, atomic these days.
> 
> Other problems (->aname pointing to other thread's struct audit_names
> and outliving reuse of those, as well as insane behaviour of audit predicates
> on symlink(2)) are, unfortunately, quite real - on the current mainline.

Traveling but took a quick look. As far as I can tell, for the "reuse
someone elses aname", we could do either:

1) Just don't reuse the entry. Then we can drop the struct
   filename->aname completely as well. Yes that might incur an extra
   alloc for the odd case of audit_enabled and being deep enough that
   the preallocated names have been used, but doesn't anyone really
   care? It'll be noise in the overhead anyway. Side note - that would
   unalign struct filename again. Would be nice to drop audit_names from
   a core fs struct...

2) Add a ref to struct audit_names, RCU kfree it when it drops to zero.
   This would mean dropping struct audit_context->preallocated_names, as
   otherwise we'd run into trouble there if a context gets blown away
   while someone else has a ref to that audit_names struct. We could do
   this without a ref as well, as long as we can store an audit_context
   pointer in struct audit_names and be able to validate it under RCU.
   If ctx doesn't match, don't use it.

And probably other ways too, those were just the two immediate ones I
thought it. Seems like option 1 is simpler and just fine? Quick hack:

diff --git a/fs/namei.c b/fs/namei.c
index 891b169e38c9..11263f779b96 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -206,7 +206,6 @@ getname_flags(const char __user *filename, int flags)
 
 	atomic_set(&result->refcnt, 1);
 	result->uptr = filename;
-	result->aname = NULL;
 	audit_getname(result);
 	return result;
 }
@@ -254,7 +253,6 @@ getname_kernel(const char * filename)
 	}
 	memcpy((char *)result->name, filename, len);
 	result->uptr = NULL;
-	result->aname = NULL;
 	atomic_set(&result->refcnt, 1);
 	audit_getname(result);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0df3e5f0dd2b..859244c877b4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2685,10 +2685,8 @@ struct filename {
 	const char		*name;	/* pointer to actual string */
 	const __user char	*uptr;	/* original userland pointer */
 	atomic_t		refcnt;
-	struct audit_names	*aname;
 	const char		iname[];
 };
-static_assert(offsetof(struct filename, iname) % sizeof(long) == 0);
 
 static inline struct mnt_idmap *file_mnt_idmap(const struct file *file)
 {
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index cd57053b4a69..09caf8408225 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -2240,7 +2240,6 @@ void __audit_getname(struct filename *name)
 
 	n->name = name;
 	n->name_len = AUDIT_NAME_FULL;
-	name->aname = n;
 	atomic_inc(&name->refcnt);
 }
 
@@ -2325,22 +2324,6 @@ void __audit_inode(struct filename *name, const struct dentry *dentry,
 	if (!name)
 		goto out_alloc;
 
-	/*
-	 * If we have a pointer to an audit_names entry already, then we can
-	 * just use it directly if the type is correct.
-	 */
-	n = name->aname;
-	if (n) {
-		if (parent) {
-			if (n->type == AUDIT_TYPE_PARENT ||
-			    n->type == AUDIT_TYPE_UNKNOWN)
-				goto out;
-		} else {
-			if (n->type != AUDIT_TYPE_PARENT)
-				goto out;
-		}
-	}
-
 	list_for_each_entry_reverse(n, &context->names_list, list) {
 		if (n->ino) {
 			/* valid inode number, use that for the comparison */

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-23  6:30   ` Jens Axboe
@ 2024-09-23 12:54     ` Paul Moore
  2024-09-23 14:48       ` Al Viro
  2024-09-23 15:07     ` Al Viro
  1 sibling, 1 reply; 23+ messages in thread
From: Paul Moore @ 2024-09-23 12:54 UTC (permalink / raw)
  To: Jens Axboe, Al Viro; +Cc: linux-fsdevel, audit, io-uring

On Mon, Sep 23, 2024 at 2:31 AM Jens Axboe <[email protected]> wrote:
> On 9/22/24 7:50 PM, Al Viro wrote:
> > On Sun, Sep 22, 2024 at 01:49:01AM +0100, Al Viro wrote:
> >
> >>      Another fun bit is that both audit_inode() and audit_inode_child()
> >> may bump the refcount on struct filename.  Which can get really fishy
> >> if they get called by helper thread while the originator is exiting the
> >> syscall - putname() from audit_free_names() in originator vs. refcount
> >> increment in helper is Not Nice(tm), what with the refcount not being
> >> atomic.
> >
> > *blink*
> >
> > OK, I really wonder which version had I been reading at the time; refcount
> > is, indeed, atomic these days.
> >
> > Other problems (->aname pointing to other thread's struct audit_names
> > and outliving reuse of those, as well as insane behaviour of audit predicates
> > on symlink(2)) are, unfortunately, quite real - on the current mainline.
>
> Traveling but took a quick look. As far as I can tell, for the "reuse
> someone elses aname", we could do either:
>
> 1) Just don't reuse the entry. Then we can drop the struct
>    filename->aname completely as well. Yes that might incur an extra
>    alloc for the odd case of audit_enabled and being deep enough that
>    the preallocated names have been used, but doesn't anyone really
>    care? It'll be noise in the overhead anyway. Side note - that would
>    unalign struct filename again. Would be nice to drop audit_names from
>    a core fs struct...
>
> 2) Add a ref to struct audit_names, RCU kfree it when it drops to zero.
>    This would mean dropping struct audit_context->preallocated_names, as
>    otherwise we'd run into trouble there if a context gets blown away
>    while someone else has a ref to that audit_names struct. We could do
>    this without a ref as well, as long as we can store an audit_context
>    pointer in struct audit_names and be able to validate it under RCU.
>    If ctx doesn't match, don't use it.
>
> And probably other ways too, those were just the two immediate ones I
> thought it. Seems like option 1 is simpler and just fine? Quick hack:

[Sorry for the delay, between flying back home, and just not wanting
to think about the kernel for a day, I took the weekend "off".]

Jens and I have talked about similar issues in the past, and I think
the only real solution to ensure the correctness of the audit records
and provide some consistency between the io_uring approach and
traditional syscalls, is to introduce a mechanism where we
create/clone an audit_context in the io_uring prep stage to capture
things like PATH records, stash that audit_context in the io_kiocb
struct, and then restore it later when io_uring does/finishes the
operation.  I'm reasonably confident that we don't need to do it for
all of the io_uring ops, just the !audit_skip case.

I'm always open to ideas, but everything else I can think of is either
far too op-specific to be maintainable long term, a performance
nightmare, or just plain wrong with respect to the audit records.

I keep hoping to have some time to code it up properly, but so far
this year has been an exercise in "I'll just put this fire over here
with the other fire".  Believe it or not, this is at the top of my
TODO list, perhaps this week I can dedicate some time to this.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-23 12:54     ` Paul Moore
@ 2024-09-23 14:48       ` Al Viro
  2024-09-23 16:14         ` Paul Moore
  0 siblings, 1 reply; 23+ messages in thread
From: Al Viro @ 2024-09-23 14:48 UTC (permalink / raw)
  To: Paul Moore; +Cc: Jens Axboe, linux-fsdevel, audit, io-uring

On Mon, Sep 23, 2024 at 08:54:03AM -0400, Paul Moore wrote:
> [Sorry for the delay, between flying back home, and just not wanting
> to think about the kernel for a day, I took the weekend "off".]
> 
> Jens and I have talked about similar issues in the past, and I think
> the only real solution to ensure the correctness of the audit records
> and provide some consistency between the io_uring approach and
> traditional syscalls, is to introduce a mechanism where we
> create/clone an audit_context in the io_uring prep stage to capture
> things like PATH records, stash that audit_context in the io_kiocb
> struct, and then restore it later when io_uring does/finishes the
> operation.  I'm reasonably confident that we don't need to do it for
> all of the io_uring ops, just the !audit_skip case.
> 
> I'm always open to ideas, but everything else I can think of is either
> far too op-specific to be maintainable long term, a performance
> nightmare, or just plain wrong with respect to the audit records.
> 
> I keep hoping to have some time to code it up properly, but so far
> this year has been an exercise in "I'll just put this fire over here
> with the other fire".  Believe it or not, this is at the top of my
> TODO list, perhaps this week I can dedicate some time to this.

What are the requirements regarding the order of audit_names in
the ->names_list?  I really don't like the idea of having struct filename
tied to audit_context - io_uring is not the only context where it might
make sense to treat struct filename as first-class citizens.

And having everything that passed through getname()/getname_kernel()
shoved into ->names_list leads to very odd behaviour, especially with
audit_names conversions in audit_inode()/audit_inode_child().

Look at the handling of AUDIT_DEV{MAJOR,MINOR} or AUDIT_OBJ_{UID,GID}
or AUDIT_COMPARE_..._TO_OBJ; should they really apply to audit_names
resulting from copying the symlink body into the kernel?  And if they
should be applied to audit_names instance that had never been associated
with any inode, should that depend upon the string in those being
equal to another argument of the same syscall?

I'm going through the kernel/auditsc.c right now, but it's more of
a "document what it does" - I don't have the specs and I certainly
don't remember such details.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-23 14:48       ` Al Viro
@ 2024-09-23 16:14         ` Paul Moore
  2024-09-23 18:17           ` Al Viro
  2024-09-23 20:36           ` Al Viro
  0 siblings, 2 replies; 23+ messages in thread
From: Paul Moore @ 2024-09-23 16:14 UTC (permalink / raw)
  To: Al Viro; +Cc: Jens Axboe, linux-fsdevel, audit, io-uring

On Mon, Sep 23, 2024 at 10:48 AM Al Viro <[email protected]> wrote:
> On Mon, Sep 23, 2024 at 08:54:03AM -0400, Paul Moore wrote:
> > [Sorry for the delay, between flying back home, and just not wanting
> > to think about the kernel for a day, I took the weekend "off".]
> >
> > Jens and I have talked about similar issues in the past, and I think
> > the only real solution to ensure the correctness of the audit records
> > and provide some consistency between the io_uring approach and
> > traditional syscalls, is to introduce a mechanism where we
> > create/clone an audit_context in the io_uring prep stage to capture
> > things like PATH records, stash that audit_context in the io_kiocb
> > struct, and then restore it later when io_uring does/finishes the
> > operation.  I'm reasonably confident that we don't need to do it for
> > all of the io_uring ops, just the !audit_skip case.
> >
> > I'm always open to ideas, but everything else I can think of is either
> > far too op-specific to be maintainable long term, a performance
> > nightmare, or just plain wrong with respect to the audit records.
> >
> > I keep hoping to have some time to code it up properly, but so far
> > this year has been an exercise in "I'll just put this fire over here
> > with the other fire".  Believe it or not, this is at the top of my
> > TODO list, perhaps this week I can dedicate some time to this.
>
> What are the requirements regarding the order of audit_names in
> the ->names_list?

Uncertain.  As things currently stand there isn't really an explicit
ordering between the PATH records and the syscall, there is the
implicit order in which the PATH records appear in the event, but I
don't know that I would read too much into that.

From my point of view, stuff like that is largely driven by enterprise
distros chasing 3rd party security certifications so they can sell
products/services to a certain class of users.  These are the same
enterprise distros that haven't really bothered to contribute a lot to
the upstream Linux kernel's audit subsystem lately so I'm not going to
worry too much about them at this point.

> I really don't like the idea of having struct filename
> tied to audit_context - io_uring is not the only context where it might
> make sense to treat struct filename as first-class citizens.

From an audit perspective this is larger than just path walks, if I
recall previous conversations correctly, there was also an issue with
sockaddrs (slightly different problem, same solution).

I'm not opposed to seeing better support for struct filename, I think
in a lot of cases that would make things easier for audit (especially
where I would like to take audit ... eventually).  Assuming your ideas
for struct filename don't significantly break audit you can consider
me supportive so long as we still have a way to take a struct filename
reference inside the audit_context; we need to keep that ref until
syscall/io_uring-op exit time as we can't be certain if we need to log
the PATH until we know the success/fail status of the operation (among
other things).

> And having everything that passed through getname()/getname_kernel()
> shoved into ->names_list leads to very odd behaviour, especially with
> audit_names conversions in audit_inode()/audit_inode_child().
>
> Look at the handling of AUDIT_DEV{MAJOR,MINOR} or AUDIT_OBJ_{UID,GID}
> or AUDIT_COMPARE_..._TO_OBJ; should they really apply to audit_names
> resulting from copying the symlink body into the kernel?  And if they
> should be applied to audit_names instance that had never been associated
> with any inode, should that depend upon the string in those being
> equal to another argument of the same syscall?
>
> I'm going through the kernel/auditsc.c right now, but it's more of
> a "document what it does" - I don't have the specs and I certainly
> don't remember such details.

My approach to audit is "do what makes sense for a normal person", if
somebody needs silly behavior to satisfy some security cert then they
can get involved in upstream development and send me patches that
don't suck.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-23 16:14         ` Paul Moore
@ 2024-09-23 18:17           ` Al Viro
  2024-09-23 23:49             ` Paul Moore
  2024-09-23 20:36           ` Al Viro
  1 sibling, 1 reply; 23+ messages in thread
From: Al Viro @ 2024-09-23 18:17 UTC (permalink / raw)
  To: Paul Moore; +Cc: Jens Axboe, linux-fsdevel, audit, io-uring

On Mon, Sep 23, 2024 at 12:14:29PM -0400, Paul Moore wrote:

> > And having everything that passed through getname()/getname_kernel()
> > shoved into ->names_list leads to very odd behaviour, especially with
> > audit_names conversions in audit_inode()/audit_inode_child().
> >
> > Look at the handling of AUDIT_DEV{MAJOR,MINOR} or AUDIT_OBJ_{UID,GID}
> > or AUDIT_COMPARE_..._TO_OBJ; should they really apply to audit_names
> > resulting from copying the symlink body into the kernel?  And if they
> > should be applied to audit_names instance that had never been associated
> > with any inode, should that depend upon the string in those being
> > equal to another argument of the same syscall?
> >
> > I'm going through the kernel/auditsc.c right now, but it's more of
> > a "document what it does" - I don't have the specs and I certainly
> > don't remember such details.
> 
> My approach to audit is "do what makes sense for a normal person", if
> somebody needs silly behavior to satisfy some security cert then they
> can get involved in upstream development and send me patches that
> don't suck.

root@kvm1:/tmp# auditctl -l
-a always,exit -S all -F dir=/tmp/blah -F perm=rwxa -F obj_uid=0
root@kvm1:/tmp# su - al
al@kvm1:~$ cd /tmp/blah
al@kvm1:/tmp/blah$ ln -s a a
al@kvm1:/tmp/blah$ ln -s c b
al@kvm1:/tmp/blah$ ls -l
total 0
lrwxrwxrwx 1 al al 1 Sep 23 13:44 a -> a
lrwxrwxrwx 1 al al 1 Sep 23 13:44 b -> c
al@kvm1:/tmp/blah$ ls -ld 
drwxr-xr-x 2 al al 4096 Sep 23 13:44 .

resulting in
type=USER_START msg=audit(1727113434.684:497): pid=3433 uid=0 auid=0 ses=1 subj=unconfined msg='op=PAM:session_open grantors=pam_key
init,pam_env,pam_env,pam_mail,pam_limits,pam_permit,pam_unix,pam_ecryptfs acct="al" exe="/usr/bin/su" hostname=? addr=? terminal=/de
v/pts/0 res=success'^]UID="root" AUID="root"
type=SYSCALL msg=audit(1727113466.464:498): arch=c000003e syscall=266 success=yes exit=0 a0=7ffc441ad843 a1=ffffff9c a2=7ffc441ad845 a3=7f10e4816a90 items=3 ppid=3434 pid=3449 auid=0 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=1 comm="ln" exe="/usr/bin/ln" subj=unconfined key=(null)^]ARCH=x86_64 SYSCALL=symlinkat AUID="root" UID="al" GID="al" EUID="al" SUID="al" FSUID="al" EGID="al" SGID="al" FSGID="al"
type=CWD msg=audit(1727113466.464:498): cwd="/tmp/blah"
type=PATH msg=audit(1727113466.464:498): item=0 name="/tmp/blah" inode=135460 dev=08:14 mode=041777 ouid=1000 ogid=1000 rdev=00:00 nametype=PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0^]OUID="al" OGID="al"
type=PATH msg=audit(1727113466.464:498): item=1 name="c" nametype=UNKNOWN cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(1727113466.464:498): item=2 name="b" inode=135497 dev=08:14 mode=0120777 ouid=1000 ogid=1000 rdev=00:00 nametype=CREATE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0^]OUID="al" OGID="al"
type=PROCTITLE msg=audit(1727113466.464:498): proctitle=6C6E002D7300630062
type=USER_END msg=audit(1727113560.852:499): pid=3433 uid=0 auid=0 ses=1 subj=unconfined msg='op=PAM:session_close grantors=pam_keyinit,pam_env,pam_env,pam_mail,pam_limits,pam_permit,pam_unix,pam_ecryptfs acct="al" exe="/usr/bin/su" hostname=? addr=? terminal=/dev/pts/0 res=success'^]UID="root" AUID="root"

Note that
	* none of the filesystem objects involved have UID 0
	* of two calls of symlinkat(), only the second one triggers the rule
... and removing the -F obj_uid=0 from the rule catches both (as expected),
with the first one getting *two* items instead of 3 - there's PARENT (identical
to the second call), there's CREATE (for "a" instead of "b", obviously) and
there's no UNKNOWN with the symlink body.

Does the behaviour above make sense for a normal person, by your definition?

What happens is that we start with two UNKNOWN (for oldname and newname arguments
of symlinkat(); incidentally, for the order is undefined - it's up to compiler
which argument of do_symlinkat(getname(oldname), newdfd, getname(newname)) gets
evaluated first).  When we get through the call of __filename_parentat(), we get
the newname's audit_names instance hit by audit_inode(...., AUDIT_INODE_PARENT).
That converts the one for newname from UNKNOWN to PARENT.  Then, in may_create(),
we hit it with audit_inode_child() for AUDIT_TYPE_CHILD_CREATING.  Which finds
the parent (newname's audit_names, converted to PARENT by that point) and goes
looking for child.

That's where the execution paths diverge - for symlink("a", "a") the oldname's
audit_names instance looks like a valid candidate and gets converted to CHILD_CREATING.
For symlink("c", "b") such candidate is there, so we get a new item allocated -
CHILD_CREATING for "b".  Name reference gets cloned from PARENT.

Now, ->symlink() is done, it succeeds and fsnotify_create(dir, dentry) gets
called.  We get an identical audit_inode_child() call, except that now dentry
is positive.  It finds PARENT instance, finds CHILD_CREATING one, and slaps
the inode metadata into it, setting ->uid to non-zero.

As the result, in one case you are left with PARENT, UNKNOWN, CHILD_CREATING and
in another - PARENT, CHILD_CREATING.  Modulo reordering first two items if
compiler decides to evaluate the first argument of do_symlinkat() call before
the third one...  In any case, AUDIT_OBJ_UID(0) gives a match on UNKNOWN -
if it's there.  Since both PARENT and CHILD_CREATING refer to non-root-owned
inodes, AUDIT_OBJ_UID(0) does not match either...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-23 18:17           ` Al Viro
@ 2024-09-23 23:49             ` Paul Moore
  0 siblings, 0 replies; 23+ messages in thread
From: Paul Moore @ 2024-09-23 23:49 UTC (permalink / raw)
  To: Al Viro; +Cc: Jens Axboe, linux-fsdevel, audit, io-uring

On Mon, Sep 23, 2024 at 2:17 PM Al Viro <[email protected]> wrote:
> On Mon, Sep 23, 2024 at 12:14:29PM -0400, Paul Moore wrote:
>
> > > And having everything that passed through getname()/getname_kernel()
> > > shoved into ->names_list leads to very odd behaviour, especially with
> > > audit_names conversions in audit_inode()/audit_inode_child().
> > >
> > > Look at the handling of AUDIT_DEV{MAJOR,MINOR} or AUDIT_OBJ_{UID,GID}
> > > or AUDIT_COMPARE_..._TO_OBJ; should they really apply to audit_names
> > > resulting from copying the symlink body into the kernel?  And if they
> > > should be applied to audit_names instance that had never been associated
> > > with any inode, should that depend upon the string in those being
> > > equal to another argument of the same syscall?
> > >
> > > I'm going through the kernel/auditsc.c right now, but it's more of
> > > a "document what it does" - I don't have the specs and I certainly
> > > don't remember such details.
> >
> > My approach to audit is "do what makes sense for a normal person", if
> > somebody needs silly behavior to satisfy some security cert then they
> > can get involved in upstream development and send me patches that
> > don't suck.
>
> root@kvm1:/tmp# auditctl -l
> -a always,exit -S all -F dir=/tmp/blah -F perm=rwxa -F obj_uid=0
> root@kvm1:/tmp# su - al
> al@kvm1:~$ cd /tmp/blah
> al@kvm1:/tmp/blah$ ln -s a a
> al@kvm1:/tmp/blah$ ln -s c b
> al@kvm1:/tmp/blah$ ls -l
> total 0
> lrwxrwxrwx 1 al al 1 Sep 23 13:44 a -> a
> lrwxrwxrwx 1 al al 1 Sep 23 13:44 b -> c
> al@kvm1:/tmp/blah$ ls -ld
> drwxr-xr-x 2 al al 4096 Sep 23 13:44 .

...

> Note that
>         * none of the filesystem objects involved have UID 0
>         * of two calls of symlinkat(), only the second one triggers the rule
> ... and removing the -F obj_uid=0 from the rule catches both (as expected),
> with the first one getting *two* items instead of 3 - there's PARENT (identical
> to the second call), there's CREATE (for "a" instead of "b", obviously) and
> there's no UNKNOWN with the symlink body.
>
> Does the behaviour above make sense for a normal person, by your definition?

I never said that there aren't plenty of bugs in the current
implementation.  Really.  Audit is a train wreck for so many reasons,
most of my time spent with audit is spent making sure it doesn't
panic, or some other subsystem hasn't wrecked it even further.

If you're looking for someone to justify audit's behavior here, you'll
need to keep looking.  Personally I've got plans to burn the whole
thing down if I ever get enough time to work on it, but in the
meantime I want to try and make it as usable as possible.  Patches are
always welcome.


-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-23 16:14         ` Paul Moore
  2024-09-23 18:17           ` Al Viro
@ 2024-09-23 20:36           ` Al Viro
  2024-09-24  0:11             ` Paul Moore
  2024-09-24 21:40             ` Al Viro
  1 sibling, 2 replies; 23+ messages in thread
From: Al Viro @ 2024-09-23 20:36 UTC (permalink / raw)
  To: Paul Moore; +Cc: Jens Axboe, linux-fsdevel, audit, io-uring

On Mon, Sep 23, 2024 at 12:14:29PM -0400, Paul Moore wrote:

[ordering and number of PATH items for syscall]

> >From my point of view, stuff like that is largely driven by enterprise
> distros chasing 3rd party security certifications so they can sell
> products/services to a certain class of users.  These are the same
> enterprise distros that haven't really bothered to contribute a lot to
> the upstream Linux kernel's audit subsystem lately so I'm not going to
> worry too much about them at this point.

Umm...  IIRC, sgrubb had been involved in the spec-related horrors, but
that was a long time ago...

> where I would like to take audit ... eventually).  Assuming your ideas
> for struct filename don't significantly break audit you can consider
> me supportive so long as we still have a way to take a struct filename
> reference inside the audit_context; we need to keep that ref until
> syscall/io_uring-op exit time as we can't be certain if we need to log
> the PATH until we know the success/fail status of the operation (among
> other things).

OK...  As for what I would like to do:

	* go through the VFS side of things and make sure we have a consistent
set of helpers that would take struct filename * - *not* the ad-hoc mix we
have right now, when that's basically driven by io_uring borging them in
one by one - or duplicates them without bothering to share helpers.
E.g. mkdirat(2) does getname() and passes it to do_mkdirat(), which
consumes the sucker; so does mknodat(2), but do_mknodat() is static.  OTOH,
path_setxattr() does setxattr_copy(), then retry_estale loop with
user_path_at() + mnt_want_write() + do_setxattr() + mnt_drop_write() + path_put()
as a body, while on io_uring side we have retry_estale loop with filename_lookup() +
(io_uring helper that does mnt_want_write() + do_setxattr() + mnt_drop_write()) +
path_put().
	Sure, that user_path_at() call is getname() + filename_lookup() + putname(),
so they are equivalent, but keeping that shite in sync is going to be trouble.

	* get rid of the "repeated getname() on the same address is going to
give you the same object" - that can't be relied upon without audit, for one
thing and for another... having a syscall that takes two pathnames that gives
different audit log (if not predicate evaluation) in cases when those are
identical pointers vs. strings with identical contenst is, IMO, somewhat
undesirable.  That kills filename->uaddr.

	* looking at the users of that stuff, I would probably prefer to
separate getname*() from insertion into audit context.  It's not that
tricky - __set_nameidata() catches *everything* that uses nd->name (i.e.
all that audit_inode() calls in fs/namei.c use).  What remains is
	do_symlinkat() for symlink body
	fs_index() on the argument (if we want to bother - it's a part
of weird Missed'em'V sysfs(2) syscall; I sincerely doubt that there's
anybody who'd use it)
	fsconfig(2) FSCONFIG_SET_PATH handling.
	mq_open(2) and mq_unlink(2) - those bypass the normal pathwalk
logics, so __set_nameidata() won't catch them.
	_maybe_ alpha osf_mount(2) devname argument; or we could get rid
of that stupidity and have it use copy_mount_string() like mount(2) does,
instead of messing with getname().
	That's all it takes.  With that done, we can kill ->aname;
just look in the ->names_list for the first entry with given ->name -
as in, given struct filename * value, no need to look inside.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-23 20:36           ` Al Viro
@ 2024-09-24  0:11             ` Paul Moore
  2024-09-24  7:01               ` Al Viro
  2024-09-25 20:44               ` Al Viro
  2024-09-24 21:40             ` Al Viro
  1 sibling, 2 replies; 23+ messages in thread
From: Paul Moore @ 2024-09-24  0:11 UTC (permalink / raw)
  To: Al Viro; +Cc: Jens Axboe, linux-fsdevel, audit, io-uring

On Mon, Sep 23, 2024 at 4:37 PM Al Viro <[email protected]> wrote:
> On Mon, Sep 23, 2024 at 12:14:29PM -0400, Paul Moore wrote:
>
> [ordering and number of PATH items for syscall]
>
> > >From my point of view, stuff like that is largely driven by enterprise
> > distros chasing 3rd party security certifications so they can sell
> > products/services to a certain class of users.  These are the same
> > enterprise distros that haven't really bothered to contribute a lot to
> > the upstream Linux kernel's audit subsystem lately so I'm not going to
> > worry too much about them at this point.
>
> Umm...  IIRC, sgrubb had been involved in the spec-related horrors, but
> that was a long time ago...

Yep, he was.  Last I spoke to Steve a year or so ago, audit was no
longer part of his job description; Steve still maintains his
userspace audit tools, but that is a nights/weekends job as far as I
understand.

The last time I was involved in any audit/CC spec related work was
well over a decade ago now, and all of those CC protection profiles
have long since expired and been replaced.

> > where I would like to take audit ... eventually).  Assuming your ideas
> > for struct filename don't significantly break audit you can consider
> > me supportive so long as we still have a way to take a struct filename
> > reference inside the audit_context; we need to keep that ref until
> > syscall/io_uring-op exit time as we can't be certain if we need to log
> > the PATH until we know the success/fail status of the operation (among
> > other things).
>
> OK...  As for what I would like to do:
>
>         * go through the VFS side of things and make sure we have a consistent
> set of helpers that would take struct filename * - *not* the ad-hoc mix we
> have right now, when that's basically driven by io_uring borging them in
> one by one - or duplicates them without bothering to share helpers.
> E.g. mkdirat(2) does getname() and passes it to do_mkdirat(), which
> consumes the sucker; so does mknodat(2), but do_mknodat() is static.  OTOH,
> path_setxattr() does setxattr_copy(), then retry_estale loop with
> user_path_at() + mnt_want_write() + do_setxattr() + mnt_drop_write() + path_put()
> as a body, while on io_uring side we have retry_estale loop with filename_lookup() +
> (io_uring helper that does mnt_want_write() + do_setxattr() + mnt_drop_write()) +
> path_put().
>         Sure, that user_path_at() call is getname() + filename_lookup() + putname(),
> so they are equivalent, but keeping that shite in sync is going to be trouble.

I obviously trust you to do the right thing with the VFS bits, and
having a well defined struct filename interface sounds like a good
thing from an audit perspective.  I don't believe it completely solves
the audit/io_uring issue, but it should make things easier and
hopefully will result in less chance of breakage in the future.

>         * get rid of the "repeated getname() on the same address is going to
> give you the same object" - that can't be relied upon without audit, for one
> thing and for another... having a syscall that takes two pathnames that gives
> different audit log (if not predicate evaluation) in cases when those are
> identical pointers vs. strings with identical contenst is, IMO, somewhat
> undesirable.  That kills filename->uaddr.

/uaddr/uptr/ if I'm following you correctly, but yeah, that all seems good.

>         * looking at the users of that stuff, I would probably prefer to
> separate getname*() from insertion into audit context.  It's not that
> tricky - __set_nameidata() catches *everything* that uses nd->name (i.e.
> all that audit_inode() calls in fs/namei.c use).

That should be a pretty significant simplification, that sounds good to me.

> ... What remains is
>         do_symlinkat() for symlink body
>         fs_index() on the argument (if we want to bother - it's a part
> of weird Missed'em'V sysfs(2) syscall; I sincerely doubt that there's
> anybody who'd use it)

We probably should bother, folks that really care about audit don't
like blind spots.  Perhaps make it a separate patch if it isn't too
ugly to split it out.

>         fsconfig(2) FSCONFIG_SET_PATH handling.
>         mq_open(2) and mq_unlink(2) - those bypass the normal pathwalk
> logics, so __set_nameidata() won't catch them.
>         _maybe_ alpha osf_mount(2) devname argument; or we could get rid
> of that stupidity and have it use copy_mount_string() like mount(2) does,
> instead of messing with getname().
>         That's all it takes.  With that done, we can kill ->aname;
> just look in the ->names_list for the first entry with given ->name -
> as in, given struct filename * value, no need to look inside.

Seems reasonable to me.  I can't imagine these special cases being any
worse than what we have now in fs/namei.c, and if nothing else having
a single catch point for the bulk of the VFS lookups makes it worth it
as far as I'm concerned.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-24  0:11             ` Paul Moore
@ 2024-09-24  7:01               ` Al Viro
  2024-09-24 23:17                 ` Paul Moore
  2024-09-25 20:44               ` Al Viro
  1 sibling, 1 reply; 23+ messages in thread
From: Al Viro @ 2024-09-24  7:01 UTC (permalink / raw)
  To: Paul Moore; +Cc: Jens Axboe, linux-fsdevel, audit, io-uring

On Mon, Sep 23, 2024 at 08:11:51PM -0400, Paul Moore wrote:
> > Umm...  IIRC, sgrubb had been involved in the spec-related horrors, but
> > that was a long time ago...
> 
> Yep, he was.  Last I spoke to Steve a year or so ago, audit was no
> longer part of his job description; Steve still maintains his
> userspace audit tools, but that is a nights/weekends job as far as I
> understand.
> 
> The last time I was involved in any audit/CC spec related work was
> well over a decade ago now, and all of those CC protection profiles
> have long since expired and been replaced.

Interesting...  I guess eparis would be the next victim^Wpossible source
of information.

> >         * get rid of the "repeated getname() on the same address is going to
> > give you the same object" - that can't be relied upon without audit, for one
> > thing and for another... having a syscall that takes two pathnames that gives
> > different audit log (if not predicate evaluation) in cases when those are
> > identical pointers vs. strings with identical contenst is, IMO, somewhat
> > undesirable.  That kills filename->uaddr.
> 
> /uaddr/uptr/ if I'm following you correctly, but yeah, that all seems good.

*nod*

> >         * looking at the users of that stuff, I would probably prefer to
> > separate getname*() from insertion into audit context.  It's not that
> > tricky - __set_nameidata() catches *everything* that uses nd->name (i.e.
> > all that audit_inode() calls in fs/namei.c use).
> 
> That should be a pretty significant simplification, that sounds good to me.
> 
> > ... What remains is
> >         do_symlinkat() for symlink body
> >         fs_index() on the argument (if we want to bother - it's a part
> > of weird Missed'em'V sysfs(2) syscall; I sincerely doubt that there's
> > anybody who'd use it)
> 
> We probably should bother, folks that really care about audit don't
> like blind spots.  Perhaps make it a separate patch if it isn't too
> ugly to split it out.

Heh...  I suggest you to look at the manpage of that thing.

sysfs(1, "ext2") => echo $((`sed -ne "/\text2$/=" </proc/filesystems` - 1))
sysfs(2, 10) => sed -ne "11s/.*\t//p" </proc/filesystems
sysfs(3) => wc -l </proc/filesystems

Yes, really - find position of filesystem type in the list of registered
filesystems by name (0-based numeration), find the name of filesystem
type by position and find the number of registered filesystem types.
Missed'em'V had no synthetic filesystems...

And the string is, of course, not a pathname of any sort, so I'd argue that
spewing PATH record into audit log is a bug.  Not that the number you get
from sysfs(1, something) had been usable for anything other than passing it
to sysfs(2, number) and getting the same string back - you can't pass that
number to mount(2) or anything of that kind.  I suspect that the only
reason this syscall exists is some binary emulation - introduced in 1.1.11,
not supported by glibc at least since 2014 (and almost certainly way before
that).

> >         That's all it takes.  With that done, we can kill ->aname;
> > just look in the ->names_list for the first entry with given ->name -
> > as in, given struct filename * value, no need to look inside.
> 
> Seems reasonable to me.  I can't imagine these special cases being any
> worse than what we have now in fs/namei.c, and if nothing else having
> a single catch point for the bulk of the VFS lookups makes it worth it
> as far as I'm concerned.

Huh?  Right now we allocate audit_names at getname_flags()/getname_kernel()
time; grep for audit_getname() - that's as centralized as it gets.
What I want to do is somewhat _de_centralize it; that way they would not
go anywhere other than audit_context of the thread actually doing the
work.

There is a lot of calls of audit_inode(), but I'm not planning to touch
any of those.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-24  7:01               ` Al Viro
@ 2024-09-24 23:17                 ` Paul Moore
  0 siblings, 0 replies; 23+ messages in thread
From: Paul Moore @ 2024-09-24 23:17 UTC (permalink / raw)
  To: Al Viro; +Cc: Jens Axboe, linux-fsdevel, audit, io-uring

On Tue, Sep 24, 2024 at 3:01 AM Al Viro <[email protected]> wrote:
> On Mon, Sep 23, 2024 at 08:11:51PM -0400, Paul Moore wrote:
> > > Umm...  IIRC, sgrubb had been involved in the spec-related horrors, but
> > > that was a long time ago...
> >
> > Yep, he was.  Last I spoke to Steve a year or so ago, audit was no
> > longer part of his job description; Steve still maintains his
> > userspace audit tools, but that is a nights/weekends job as far as I
> > understand.
> >
> > The last time I was involved in any audit/CC spec related work was
> > well over a decade ago now, and all of those CC protection profiles
> > have long since expired and been replaced.
>
> Interesting...  I guess eparis would be the next victim^Wpossible source
> of information.

Eric was the one who dumped the audit subsystem in my lap ~10 years
ago so he could run off and play with containers.  Fortunately for
Eric I was much more trusting back then and didn't read all the fine
print before agreeing to look after audit.

> > >         * looking at the users of that stuff, I would probably prefer to
> > > separate getname*() from insertion into audit context.  It's not that
> > > tricky - __set_nameidata() catches *everything* that uses nd->name (i.e.
> > > all that audit_inode() calls in fs/namei.c use).
> >
> > That should be a pretty significant simplification, that sounds good to me.
> >
> > > ... What remains is
> > >         do_symlinkat() for symlink body
> > >         fs_index() on the argument (if we want to bother - it's a part
> > > of weird Missed'em'V sysfs(2) syscall; I sincerely doubt that there's
> > > anybody who'd use it)
> >
> > We probably should bother, folks that really care about audit don't
> > like blind spots.  Perhaps make it a separate patch if it isn't too
> > ugly to split it out.
>
> Heh...  I suggest you to look at the manpage of that thing.

Ooof.  That's something isn't it?  Yeah, that's ugly, and since it
doesn't really return any user or sensitive system information I think
it's safe to skip - my mistake.

> > >         That's all it takes.  With that done, we can kill ->aname;
> > > just look in the ->names_list for the first entry with given ->name -
> > > as in, given struct filename * value, no need to look inside.
> >
> > Seems reasonable to me.  I can't imagine these special cases being any
> > worse than what we have now in fs/namei.c, and if nothing else having
> > a single catch point for the bulk of the VFS lookups makes it worth it
> > as far as I'm concerned.
>
> Huh?  Right now we allocate audit_names at getname_flags()/getname_kernel()
> time; grep for audit_getname() - that's as centralized as it gets.
> What I want to do is somewhat _de_centralize it; that way they would not
> go anywhere other than audit_context of the thread actually doing the
> work.
>
> There is a lot of calls of audit_inode(), but I'm not planning to touch
> any of those.

Sorry.  I'm a bit foggy from traveling, and trying to play catch-up
back at home.  I'm sure it will make more sense once I see the
patches.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-24  0:11             ` Paul Moore
  2024-09-24  7:01               ` Al Viro
@ 2024-09-25 20:44               ` Al Viro
  2024-09-25 20:58                 ` Paul Moore
  1 sibling, 1 reply; 23+ messages in thread
From: Al Viro @ 2024-09-25 20:44 UTC (permalink / raw)
  To: Paul Moore; +Cc: Jens Axboe, linux-fsdevel, audit, io-uring

On Mon, Sep 23, 2024 at 08:11:51PM -0400, Paul Moore wrote:

> >         * get rid of the "repeated getname() on the same address is going to
> > give you the same object" - that can't be relied upon without audit, for one
> > thing and for another... having a syscall that takes two pathnames that gives
> > different audit log (if not predicate evaluation) in cases when those are
> > identical pointers vs. strings with identical contenst is, IMO, somewhat
> > undesirable.  That kills filename->uaddr.
> 
> /uaddr/uptr/ if I'm following you correctly, but yeah, that all seems good.

BTW, what should we do when e.g. mkdir(2) manages to get to the parent, calls
audit_inode() to memorize that one and then gets -ESTALE from nfs_mkdir()?
We repeat the pathwalk, this time with LOOKUP_REVAL (i.e. make sure to ask
the server about each NFS directory we are visiting, even if it had been seen
recently) and arrive to a different directory, which is not stale and where
subdirectory creation succeeds.

The thing is, we call audit_inode(...., AUDIT_INODE_PARENT) twice.  With the
same name, but with different inodes.  Should we log both, or should the
latter call cannibalize the audit_names instance from the earlier?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-25 20:44               ` Al Viro
@ 2024-09-25 20:58                 ` Paul Moore
  0 siblings, 0 replies; 23+ messages in thread
From: Paul Moore @ 2024-09-25 20:58 UTC (permalink / raw)
  To: Al Viro; +Cc: Jens Axboe, linux-fsdevel, audit, io-uring

On Wed, Sep 25, 2024 at 4:44 PM Al Viro <[email protected]> wrote:
> On Mon, Sep 23, 2024 at 08:11:51PM -0400, Paul Moore wrote:
>
> > >         * get rid of the "repeated getname() on the same address is going to
> > > give you the same object" - that can't be relied upon without audit, for one
> > > thing and for another... having a syscall that takes two pathnames that gives
> > > different audit log (if not predicate evaluation) in cases when those are
> > > identical pointers vs. strings with identical contenst is, IMO, somewhat
> > > undesirable.  That kills filename->uaddr.
> >
> > /uaddr/uptr/ if I'm following you correctly, but yeah, that all seems good.
>
> BTW, what should we do when e.g. mkdir(2) manages to get to the parent, calls
> audit_inode() to memorize that one and then gets -ESTALE from nfs_mkdir()?
> We repeat the pathwalk, this time with LOOKUP_REVAL (i.e. make sure to ask
> the server about each NFS directory we are visiting, even if it had been seen
> recently) and arrive to a different directory, which is not stale and where
> subdirectory creation succeeds.

Ah, that's fun.  I'm guessing we could run into similar issues with
other network filesystems, or is this specific to NFS?

> The thing is, we call audit_inode(...., AUDIT_INODE_PARENT) twice.  With the
> same name, but with different inodes.  Should we log both, or should the
> latter call cannibalize the audit_names instance from the earlier?

I think the proper behavior is to have the second call cannibalize the
state from the first.  The intent of logging is to capture the state
when/where the new directory is created, since we never created a
directory off the -ESTALE path I don't see why we would need to log
it.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-23 20:36           ` Al Viro
  2024-09-24  0:11             ` Paul Moore
@ 2024-09-24 21:40             ` Al Viro
  2024-09-25  6:01               ` Jens Axboe
  1 sibling, 1 reply; 23+ messages in thread
From: Al Viro @ 2024-09-24 21:40 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, io-uring

On Mon, Sep 23, 2024 at 09:36:59PM +0100, Al Viro wrote:

> 	* go through the VFS side of things and make sure we have a consistent
> set of helpers that would take struct filename * - *not* the ad-hoc mix we
> have right now, when that's basically driven by io_uring borging them in
> one by one - or duplicates them without bothering to share helpers.
> E.g. mkdirat(2) does getname() and passes it to do_mkdirat(), which
> consumes the sucker; so does mknodat(2), but do_mknodat() is static.  OTOH,
> path_setxattr() does setxattr_copy(), then retry_estale loop with
> user_path_at() + mnt_want_write() + do_setxattr() + mnt_drop_write() + path_put()
> as a body, while on io_uring side we have retry_estale loop with filename_lookup() +
> (io_uring helper that does mnt_want_write() + do_setxattr() + mnt_drop_write()) +
> path_put().
> 	Sure, that user_path_at() call is getname() + filename_lookup() + putname(),
> so they are equivalent, but keeping that shite in sync is going to be trouble.

BTW, re mess around xattr:
static int __io_getxattr_prep(struct io_kiocb *req,
                              const struct io_uring_sqe *sqe)
{
...
        ix->ctx.cvalue = u64_to_user_ptr(READ_ONCE(sqe->addr2));
	ix->ctx.size = READ_ONCE(sqe->len);
...
        ret = strncpy_from_user(ix->ctx.kname->name, name,
				sizeof(ix->ctx.kname->name));

}

int io_fgetxattr(struct io_kiocb *req, unsigned int issue_flags)
{
...
        ret = do_getxattr(file_mnt_idmap(req->file),
			req->file->f_path.dentry,
			&ix->ctx);
...
}

ssize_t
do_getxattr(struct mnt_idmap *idmap, struct dentry *d,
        struct xattr_ctx *ctx)
{
...
        if (error > 0) {
		if (ctx->size && copy_to_user(ctx->value, ctx->kvalue, error))
...
}

and we have
struct xattr_ctx {
        /* Value of attribute */
	union {
		const void __user *cvalue;
		void __user *value;
	};
	...
}

Undefined behaviour aside, there's something odd going on here:
why do we bother with copy-in in ->prep() when we do copy-out in
->issue() anyway?  ->issue() does run with initiator's ->mm in use,
right?

IOW, what's the io_uring policy on what gets copied in ->prep() vs.
in ->issue()?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-24 21:40             ` Al Viro
@ 2024-09-25  6:01               ` Jens Axboe
  2024-09-25 17:39                 ` Al Viro
  2024-09-26  3:56                 ` Al Viro
  0 siblings, 2 replies; 23+ messages in thread
From: Jens Axboe @ 2024-09-25  6:01 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, io-uring

On 9/24/24 3:40 PM, Al Viro wrote:
> On Mon, Sep 23, 2024 at 09:36:59PM +0100, Al Viro wrote:
> 
>> 	* go through the VFS side of things and make sure we have a consistent
>> set of helpers that would take struct filename * - *not* the ad-hoc mix we
>> have right now, when that's basically driven by io_uring borging them in
>> one by one - or duplicates them without bothering to share helpers.
>> E.g. mkdirat(2) does getname() and passes it to do_mkdirat(), which
>> consumes the sucker; so does mknodat(2), but do_mknodat() is static.  OTOH,
>> path_setxattr() does setxattr_copy(), then retry_estale loop with
>> user_path_at() + mnt_want_write() + do_setxattr() + mnt_drop_write() + path_put()
>> as a body, while on io_uring side we have retry_estale loop with filename_lookup() +
>> (io_uring helper that does mnt_want_write() + do_setxattr() + mnt_drop_write()) +
>> path_put().
>> 	Sure, that user_path_at() call is getname() + filename_lookup() + putname(),
>> so they are equivalent, but keeping that shite in sync is going to be trouble.
> 
> BTW, re mess around xattr:
> static int __io_getxattr_prep(struct io_kiocb *req,
>                               const struct io_uring_sqe *sqe)
> {
> ...
>         ix->ctx.cvalue = u64_to_user_ptr(READ_ONCE(sqe->addr2));
> 	ix->ctx.size = READ_ONCE(sqe->len);
> ...
>         ret = strncpy_from_user(ix->ctx.kname->name, name,
> 				sizeof(ix->ctx.kname->name));
> 
> }
> 
> int io_fgetxattr(struct io_kiocb *req, unsigned int issue_flags)
> {
> ...
>         ret = do_getxattr(file_mnt_idmap(req->file),
> 			req->file->f_path.dentry,
> 			&ix->ctx);
> ...
> }
> 
> ssize_t
> do_getxattr(struct mnt_idmap *idmap, struct dentry *d,
>         struct xattr_ctx *ctx)
> {
> ...
>         if (error > 0) {
> 		if (ctx->size && copy_to_user(ctx->value, ctx->kvalue, error))
> ...
> }
> 
> and we have
> struct xattr_ctx {
>         /* Value of attribute */
> 	union {
> 		const void __user *cvalue;
> 		void __user *value;
> 	};
> 	...
> }
> 
> Undefined behaviour aside, there's something odd going on here:
> why do we bother with copy-in in ->prep() when we do copy-out in
> ->issue() anyway?  ->issue() does run with initiator's ->mm in use,
> right?
> 
> IOW, what's the io_uring policy on what gets copied in ->prep() vs.
> in ->issue()?

The normal policy is that anything that is read-only should remain
stable after ->prep() has been called, so that ->issue() can use it.
That means the application can keep it on-stack as long as it's valid
until io_uring_submit() returns. For structs/buffers that are copied to
after IO, those the application obviously need to keep around until they
see a completion for that request. So yes, for the xattr cases where the
struct is copied to at completion time, those do not need to be stable
after ->prep(), could be handled purely on the ->issue() side.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-25  6:01               ` Jens Axboe
@ 2024-09-25 17:39                 ` Al Viro
  2024-09-25 17:58                   ` Jens Axboe
  2024-09-26  3:56                 ` Al Viro
  1 sibling, 1 reply; 23+ messages in thread
From: Al Viro @ 2024-09-25 17:39 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, io-uring

On Wed, Sep 25, 2024 at 12:01:01AM -0600, Jens Axboe wrote:

> The normal policy is that anything that is read-only should remain
> stable after ->prep() has been called, so that ->issue() can use it.
> That means the application can keep it on-stack as long as it's valid
> until io_uring_submit() returns. For structs/buffers that are copied to
> after IO, those the application obviously need to keep around until they
> see a completion for that request.  So yes, for the xattr cases where the
> struct is copied to at completion time, those do not need to be stable
> after ->prep(), could be handled purely on the ->issue() side.

Hmm...  Nothing in xattr is copied in both directions, actually.

AFAICS, the only copy-in you leave to ->issue() is the data for write
and sendmsg and ->msg_control for sendmsg.  Wait, there's that ioctl-like
mess you've got, so anything that feels like doing (seems to include
at least setsockopt)...  Oh, well...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-25 17:39                 ` Al Viro
@ 2024-09-25 17:58                   ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2024-09-25 17:58 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, io-uring

On 9/25/24 11:39 AM, Al Viro wrote:
> On Wed, Sep 25, 2024 at 12:01:01AM -0600, Jens Axboe wrote:
> 
>> The normal policy is that anything that is read-only should remain
>> stable after ->prep() has been called, so that ->issue() can use it.
>> That means the application can keep it on-stack as long as it's valid
>> until io_uring_submit() returns. For structs/buffers that are copied to
>> after IO, those the application obviously need to keep around until they
>> see a completion for that request.  So yes, for the xattr cases where the
>> struct is copied to at completion time, those do not need to be stable
>> after ->prep(), could be handled purely on the ->issue() side.
> 
> Hmm...  Nothing in xattr is copied in both directions, actually.

Even if it's just copied out at the end, it does not need to remain
stable after ->prep(). But most things will still do prep appropriate
things at prep time, just to keep it consistent across different op
types. Regardless, other things read from the sqe need to be stable
after prep anyway, so it's just the best place to do copy-in of structs
too, even if they are written at completion time and could defer reading
it in until issue. The ->issue() side is mostly just about that, issuing
a request and posting a completion - or deferring a retry if it's not
ready for issue just yet.

> AFAICS, the only copy-in you leave to ->issue() is the data for write
> and sendmsg and ->msg_control for sendmsg.  Wait, there's that ioctl-like
> mess you've got, so anything that feels like doing (seems to include
> at least setsockopt)...  Oh, well...

Right, things that obviously write to buffers will do that there, as
that's the context where the completion happens anyway. The setsockopt
parts should just copy out at issue time.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-25  6:01               ` Jens Axboe
  2024-09-25 17:39                 ` Al Viro
@ 2024-09-26  3:56                 ` Al Viro
  1 sibling, 0 replies; 23+ messages in thread
From: Al Viro @ 2024-09-26  3:56 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, io-uring

On Wed, Sep 25, 2024 at 12:01:01AM -0600, Jens Axboe wrote:

> The normal policy is that anything that is read-only should remain
> stable after ->prep() has been called, so that ->issue() can use it.
> That means the application can keep it on-stack as long as it's valid
> until io_uring_submit() returns. For structs/buffers that are copied to
> after IO, those the application obviously need to keep around until they
> see a completion for that request. So yes, for the xattr cases where the
> struct is copied to at completion time, those do not need to be stable
> after ->prep(), could be handled purely on the ->issue() side.

Looks like io_fsetxattr() was missing audit_file()... Anyway, in a local
branch I've added two helpers -

int file_setxattr(struct file *file, struct xattr_ctx *ctx);
int filename_setxattr(int dfd, struct filename *filename,
                      struct xattr_ctx *ctx, unsigned int lookup_flags);

and converted both fs/xattr.c and io_uring/xattr.c to those.

Completely untested delta follows; it's _not_ in the final form,
it misses getxattr side, etc.

BTW, I think fs/internal.h is a very wrong place for that, as well as
for do_mkdirat() et.al.  include/linux/marshalled_syscalls.h, perhaps?

diff --git a/fs/internal.h b/fs/internal.h
index 8cf42b327e5e..e39f80201ff8 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -285,8 +285,9 @@ ssize_t do_getxattr(struct mnt_idmap *idmap,
 		    struct xattr_ctx *ctx);
 
 int setxattr_copy(const char __user *name, struct xattr_ctx *ctx);
-int do_setxattr(struct mnt_idmap *idmap, struct dentry *dentry,
-		struct xattr_ctx *ctx);
+int file_setxattr(struct file *file, struct xattr_ctx *ctx);
+int filename_setxattr(int dfd, struct filename *filename,
+		      struct xattr_ctx *ctx, unsigned int lookup_flags);
 int may_write_xattr(struct mnt_idmap *idmap, struct inode *inode);
 
 #ifdef CONFIG_FS_POSIX_ACL
diff --git a/fs/xattr.c b/fs/xattr.c
index 0fc813cb005c..fc6409181c46 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -619,7 +619,7 @@ int setxattr_copy(const char __user *name, struct xattr_ctx *ctx)
 	return error;
 }
 
-int do_setxattr(struct mnt_idmap *idmap, struct dentry *dentry,
+static int do_setxattr(struct mnt_idmap *idmap, struct dentry *dentry,
 		struct xattr_ctx *ctx)
 {
 	if (is_posix_acl_xattr(ctx->kname->name))
@@ -630,32 +630,31 @@ int do_setxattr(struct mnt_idmap *idmap, struct dentry *dentry,
 			ctx->kvalue, ctx->size, ctx->flags);
 }
 
-static int path_setxattr(const char __user *pathname,
-			 const char __user *name, const void __user *value,
-			 size_t size, int flags, unsigned int lookup_flags)
+int file_setxattr(struct file *f, struct xattr_ctx *ctx)
+{
+	int error = mnt_want_write_file(f);
+
+	if (!error) {
+		audit_file(f);
+		error = do_setxattr(file_mnt_idmap(f), f->f_path.dentry, ctx);
+		mnt_drop_write_file(f);
+	}
+	return error;
+}
+
+int filename_setxattr(int dfd, struct filename *filename,
+		      struct xattr_ctx *ctx, unsigned int lookup_flags)
 {
-	struct xattr_name kname;
-	struct xattr_ctx ctx = {
-		.cvalue   = value,
-		.kvalue   = NULL,
-		.size     = size,
-		.kname    = &kname,
-		.flags    = flags,
-	};
 	struct path path;
 	int error;
 
-	error = setxattr_copy(name, &ctx);
-	if (error)
-		return error;
-
 retry:
-	error = user_path_at(AT_FDCWD, pathname, lookup_flags, &path);
+	error = filename_lookup(dfd, filename, lookup_flags, &path, NULL);
 	if (error)
 		goto out;
 	error = mnt_want_write(path.mnt);
 	if (!error) {
-		error = do_setxattr(mnt_idmap(path.mnt), path.dentry, &ctx);
+		error = do_setxattr(mnt_idmap(path.mnt), path.dentry, ctx);
 		mnt_drop_write(path.mnt);
 	}
 	path_put(&path);
@@ -665,6 +664,30 @@ static int path_setxattr(const char __user *pathname,
 	}
 
 out:
+	putname(filename);
+	return error;
+}
+
+static int path_setxattr(const char __user *pathname,
+			 const char __user *name, const void __user *value,
+			 size_t size, int flags, unsigned int lookup_flags)
+{
+	struct xattr_name kname;
+	struct xattr_ctx ctx = {
+		.cvalue   = value,
+		.kvalue   = NULL,
+		.size     = size,
+		.kname    = &kname,
+		.flags    = flags,
+	};
+	int error;
+
+	error = setxattr_copy(name, &ctx);
+	if (error)
+		return error;
+
+	error = filename_setxattr(AT_FDCWD, getname(pathname),
+				  &ctx, lookup_flags);
 	kvfree(ctx.kvalue);
 	return error;
 }
@@ -700,17 +723,11 @@ SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name,
 
 	if (fd_empty(f))
 		return -EBADF;
-	audit_file(fd_file(f));
 	error = setxattr_copy(name, &ctx);
 	if (error)
 		return error;
 
-	error = mnt_want_write_file(fd_file(f));
-	if (!error) {
-		error = do_setxattr(file_mnt_idmap(fd_file(f)),
-				    fd_file(f)->f_path.dentry, &ctx);
-		mnt_drop_write_file(fd_file(f));
-	}
+	error = file_setxattr(fd_file(f), &ctx);
 	kvfree(ctx.kvalue);
 	return error;
 }
diff --git a/io_uring/xattr.c b/io_uring/xattr.c
index 13e8d7d2cdc2..702d5981fd63 100644
--- a/io_uring/xattr.c
+++ b/io_uring/xattr.c
@@ -203,28 +203,14 @@ int io_fsetxattr_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return __io_setxattr_prep(req, sqe);
 }
 
-static int __io_setxattr(struct io_kiocb *req, unsigned int issue_flags,
-			const struct path *path)
-{
-	struct io_xattr *ix = io_kiocb_to_cmd(req, struct io_xattr);
-	int ret;
-
-	ret = mnt_want_write(path->mnt);
-	if (!ret) {
-		ret = do_setxattr(mnt_idmap(path->mnt), path->dentry, &ix->ctx);
-		mnt_drop_write(path->mnt);
-	}
-
-	return ret;
-}
-
 int io_fsetxattr(struct io_kiocb *req, unsigned int issue_flags)
 {
+	struct io_xattr *ix = io_kiocb_to_cmd(req, struct io_xattr);
 	int ret;
 
 	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
 
-	ret = __io_setxattr(req, issue_flags, &req->file->f_path);
+	ret = file_setxattr(req->file, &ix->ctx);
 	io_xattr_finish(req, ret);
 	return IOU_OK;
 }
@@ -232,22 +218,11 @@ int io_fsetxattr(struct io_kiocb *req, unsigned int issue_flags)
 int io_setxattr(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_xattr *ix = io_kiocb_to_cmd(req, struct io_xattr);
-	unsigned int lookup_flags = LOOKUP_FOLLOW;
-	struct path path;
 	int ret;
 
 	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
 
-retry:
-	ret = filename_lookup(AT_FDCWD, ix->filename, lookup_flags, &path, NULL);
-	if (!ret) {
-		ret = __io_setxattr(req, issue_flags, &path);
-		path_put(&path);
-		if (retry_estale(ret, lookup_flags)) {
-			lookup_flags |= LOOKUP_REVAL;
-			goto retry;
-		}
-	}
+	ret = filename_setxattr(AT_FDCWD, ix->filename, &ix->ctx, LOOKUP_FOLLOW);
 
 	io_xattr_finish(req, ret);
 	return IOU_OK;

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-23  6:30   ` Jens Axboe
  2024-09-23 12:54     ` Paul Moore
@ 2024-09-23 15:07     ` Al Viro
  2024-09-24 11:15       ` Jens Axboe
  1 sibling, 1 reply; 23+ messages in thread
From: Al Viro @ 2024-09-23 15:07 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, audit, io-uring

On Mon, Sep 23, 2024 at 12:30:48AM -0600, Jens Axboe wrote:

> 1) Just don't reuse the entry. Then we can drop the struct
>    filename->aname completely as well. Yes that might incur an extra
>    alloc for the odd case of audit_enabled and being deep enough that
>    the preallocated names have been used, but doesn't anyone really
>    care? It'll be noise in the overhead anyway. Side note - that would
>    unalign struct filename again. Would be nice to drop audit_names from
>    a core fs struct...

You'll get different output in logs, though.  Whether that breaks userland
setups/invalidates certifications/etc.... fuck knows.

If anything, a loop through the list, searching for matching entry would
be safer in that respect.  Order of the items... might or might not be
an issue - see above.

> 2) Add a ref to struct audit_names, RCU kfree it when it drops to zero.
>    This would mean dropping struct audit_context->preallocated_names, as

Costly, that.

>    otherwise we'd run into trouble there if a context gets blown away
>    while someone else has a ref to that audit_names struct. We could do
>    this without a ref as well, as long as we can store an audit_context
>    pointer in struct audit_names and be able to validate it under RCU.
>    If ctx doesn't match, don't use it.

That's one of the variants I mentioned upthread...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] struct filename, io_uring and audit troubles
  2024-09-23 15:07     ` Al Viro
@ 2024-09-24 11:15       ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2024-09-24 11:15 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, audit, io-uring

On 9/23/24 9:07 AM, Al Viro wrote:
> On Mon, Sep 23, 2024 at 12:30:48AM -0600, Jens Axboe wrote:
> 
>> 1) Just don't reuse the entry. Then we can drop the struct
>>    filename->aname completely as well. Yes that might incur an extra
>>    alloc for the odd case of audit_enabled and being deep enough that
>>    the preallocated names have been used, but doesn't anyone really
>>    care? It'll be noise in the overhead anyway. Side note - that would
>>    unalign struct filename again. Would be nice to drop audit_names from
>>    a core fs struct...
> 
> You'll get different output in logs, though.  Whether that breaks userland
> setups/invalidates certifications/etc.... fuck knows.

No idea about that... But I'd say without strong evidence that this
breaks userland for something as odd as audit, well... And honestly
really a layering problem that struct filename has an audit link in
there.

> If anything, a loop through the list, searching for matching entry would
> be safer in that respect.  Order of the items... might or might not be
> an issue - see above.
> 
>> 2) Add a ref to struct audit_names, RCU kfree it when it drops to zero.
>>    This would mean dropping struct audit_context->preallocated_names, as
> 
> Costly, that.

For sure. And you could keep preallocated_names if you rcu free the
context too. But I strongly believe that approach #1 is, by far, the
cheaper alternative. If we can tolerate the ordering potentially
changing.

>>    otherwise we'd run into trouble there if a context gets blown away
>>    while someone else has a ref to that audit_names struct. We could do
>>    this without a ref as well, as long as we can store an audit_context
>>    pointer in struct audit_names and be able to validate it under RCU.
>>    If ctx doesn't match, don't use it.
> 
> That's one of the variants I mentioned upthread...

Sorry, still away on travels and conferences, so haven't been keeping up
on replies.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2024-09-26  3:56 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-22  0:49 [RFC] struct filename, io_uring and audit troubles Al Viro
2024-09-22  4:10 ` Al Viro
2024-09-22 15:09   ` Al Viro
2024-09-23  1:50 ` Al Viro
2024-09-23  6:30   ` Jens Axboe
2024-09-23 12:54     ` Paul Moore
2024-09-23 14:48       ` Al Viro
2024-09-23 16:14         ` Paul Moore
2024-09-23 18:17           ` Al Viro
2024-09-23 23:49             ` Paul Moore
2024-09-23 20:36           ` Al Viro
2024-09-24  0:11             ` Paul Moore
2024-09-24  7:01               ` Al Viro
2024-09-24 23:17                 ` Paul Moore
2024-09-25 20:44               ` Al Viro
2024-09-25 20:58                 ` Paul Moore
2024-09-24 21:40             ` Al Viro
2024-09-25  6:01               ` Jens Axboe
2024-09-25 17:39                 ` Al Viro
2024-09-25 17:58                   ` Jens Axboe
2024-09-26  3:56                 ` Al Viro
2024-09-23 15:07     ` Al Viro
2024-09-24 11:15       ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox