From: Jens Axboe <[email protected]>
To: Christian Brauner <[email protected]>,
Dan Clash <[email protected]>
Cc: [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected]
Subject: Re: [PATCH] audit,io_uring: io_uring openat triggers audit reference count underflow
Date: Fri, 13 Oct 2023 08:21:32 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <20231013-insofern-gegolten-75ca48b24cf5@brauner>
On 10/13/23 2:24 AM, Christian Brauner wrote:
> On Thu, Oct 12, 2023 at 02:55:18PM -0700, Dan Clash wrote:
>> An io_uring openat operation can update an audit reference count
>> from multiple threads resulting in the call trace below.
>>
>> A call to io_uring_submit() with a single openat op with a flag of
>> IOSQE_ASYNC results in the following reference count updates.
>>
>> These first part of the system call performs two increments that do not race.
>>
>> do_syscall_64()
>> __do_sys_io_uring_enter()
>> io_submit_sqes()
>> io_openat_prep()
>> __io_openat_prep()
>> getname()
>> getname_flags() /* update 1 (increment) */
>> __audit_getname() /* update 2 (increment) */
>>
>> The openat op is queued to an io_uring worker thread which starts the
>> opportunity for a race. The system call exit performs one decrement.
>>
>> do_syscall_64()
>> syscall_exit_to_user_mode()
>> syscall_exit_to_user_mode_prepare()
>> __audit_syscall_exit()
>> audit_reset_context()
>> putname() /* update 3 (decrement) */
>>
>> The io_uring worker thread performs one increment and two decrements.
>> These updates can race with the system call decrement.
>>
>> io_wqe_worker()
>> io_worker_handle_work()
>> io_wq_submit_work()
>> io_issue_sqe()
>> io_openat()
>> io_openat2()
>> do_filp_open()
>> path_openat()
>> __audit_inode() /* update 4 (increment) */
>> putname() /* update 5 (decrement) */
>> __audit_uring_exit()
>> audit_reset_context()
>> putname() /* update 6 (decrement) */
>>
>> The fix is to change the refcnt member of struct audit_names
>> from int to atomic_t.
>>
>> kernel BUG at fs/namei.c:262!
>> Call Trace:
>> ...
>> ? putname+0x68/0x70
>> audit_reset_context.part.0.constprop.0+0xe1/0x300
>> __audit_uring_exit+0xda/0x1c0
>> io_issue_sqe+0x1f3/0x450
>> ? lock_timer_base+0x3b/0xd0
>> io_wq_submit_work+0x8d/0x2b0
>> ? __try_to_del_timer_sync+0x67/0xa0
>> io_worker_handle_work+0x17c/0x2b0
>> io_wqe_worker+0x10a/0x350
>>
>> Cc: <[email protected]>
>> Link: https://lore.kernel.org/lkml/MW2PR2101MB1033FFF044A258F84AEAA584F1C9A@MW2PR2101MB1033.namprd21.prod.outlook.com/
>> Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
>> Signed-off-by: Dan Clash <[email protected]>
>> ---
>> fs/namei.c | 9 +++++----
>> include/linux/fs.h | 2 +-
>> kernel/auditsc.c | 8 ++++----
>> 3 files changed, 10 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/namei.c b/fs/namei.c
>> index 567ee547492b..94565bd7e73f 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -188,7 +188,7 @@ getname_flags(const char __user *filename, int flags, int *empty)
>> }
>> }
>>
>> - result->refcnt = 1;
>> + atomic_set(&result->refcnt, 1);
>> /* The empty path is special. */
>> if (unlikely(!len)) {
>> if (empty)
>> @@ -249,7 +249,7 @@ getname_kernel(const char * filename)
>> memcpy((char *)result->name, filename, len);
>> result->uptr = NULL;
>> result->aname = NULL;
>> - result->refcnt = 1;
>> + atomic_set(&result->refcnt, 1);
>> audit_getname(result);
>>
>> return result;
>> @@ -261,9 +261,10 @@ void putname(struct filename *name)
>> if (IS_ERR(name))
>> return;
>>
>> - BUG_ON(name->refcnt <= 0);
>> + if (WARN_ON_ONCE(!atomic_read(&name->refcnt)))
>> + return;
>>
>> - if (--name->refcnt > 0)
>> + if (!atomic_dec_and_test(&name->refcnt))
>> return;
>
> Fine by me. I'd write this as:
>
> count = atomic_dec_if_positive(&name->refcnt);
> if (WARN_ON_ONCE(unlikely(count < 0))
> return;
> if (count > 0)
> return;
Would be fine too, my suspicion was that most archs don't implement a
primitive for that, and hence it might be more expensive than
atomic_read()/atomic_dec_and_test() which do. But I haven't looked at
the code generation. The dec_if_positive degenerates to a atomic cmpxchg
for most cases.
--
Jens Axboe
next prev parent reply other threads:[~2023-10-13 14:21 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-12 21:55 [PATCH] audit,io_uring: io_uring openat triggers audit reference count underflow Dan Clash
2023-10-12 22:07 ` Jens Axboe
2023-10-13 8:24 ` Christian Brauner
2023-10-13 14:21 ` Jens Axboe [this message]
2023-10-13 15:43 ` Paul Moore
2023-10-13 20:06 ` Dan Clash
2023-10-13 15:44 ` Christian Brauner
2023-10-13 15:53 ` Jens Axboe
2023-10-13 16:03 ` Christian Brauner
2023-10-13 15:56 ` Paul Moore
2023-10-13 16:00 ` Jens Axboe
2023-10-13 16:05 ` Paul Moore
2023-10-13 16:22 ` Christian Brauner
2023-10-13 16:38 ` Paul Moore
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox