From: Jens Axboe <[email protected]>
To: Olivier Langlois <[email protected]>,
Pavel Begunkov <[email protected]>,
[email protected]
Subject: Re: [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation
Date: Mon, 12 Aug 2024 14:40:47 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 8/12/24 2:15 PM, Olivier Langlois wrote:
> On Mon, 2024-08-12 at 12:11 -0600, Jens Axboe wrote:
>> On 8/12/24 12:10 PM, Jens Axboe wrote:
>>> On 8/11/24 7:00 PM, Olivier Langlois wrote:
>>>> On Sun, 2024-08-11 at 20:34 -0400, Olivier Langlois wrote:
>>>>> io_napi_entry() has 2 calling sites. One of them is unlikely to
>>>>> find
>>>>> an
>>>>> entry and if it does, the timeout should arguable not be
>>>>> updated.
>>>>>
>>>>> The other io_napi_entry() calling site is overwriting the
>>>>> update made
>>>>> by io_napi_entry() so the io_napi_entry() timeout value update
>>>>> has no
>>>>> or
>>>>> little value and therefore is removed.
>>>>>
>>>>> Signed-off-by: Olivier Langlois <[email protected]>
>>>>> ---
>>>>> io_uring/napi.c | 1 -
>>>>> 1 file changed, 1 deletion(-)
>>>>>
>>>>> diff --git a/io_uring/napi.c b/io_uring/napi.c
>>>>> index 73c4159e8405..1de1d4d62925 100644
>>>>> --- a/io_uring/napi.c
>>>>> +++ b/io_uring/napi.c
>>>>> @@ -26,7 +26,6 @@ static struct io_napi_entry
>>>>> *io_napi_hash_find(struct hlist_head *hash_list,
>>>>> hlist_for_each_entry_rcu(e, hash_list, node) {
>>>>> if (e->napi_id != napi_id)
>>>>> continue;
>>>>> - e->timeout = jiffies + NAPI_TIMEOUT;
>>>>> return e;
>>>>> }
>>>>>
>>>> I am commenting my own patch because I found something curious
>>>> that I
>>>> was not sure about when I was reviewing the code.
>>>>
>>>> Should the remaining e->timeout assignation be wrapped with a
>>>> WRITE_ONCE() macro to ensure an atomic store?
>>>
>>> I think that makes sense to do as lookup can be within rcu, and
>>> hence we have nothing serializing it. Not for torn writes, but to
>>> ensure that the memory sanitizer doesn't complain. I can just make
>>> this change while applying, or send a v2.
>>
>> As a separate patch I mean, not a v2. That part can wait until 6.12.
>>
> ok. np. I'll look into it soon.
>
> In the meantime, I have detected few suspicious things in the napi
> code.
>
> I am reporting them here to have few extra eye balls looking at them to
> be sure that everything is fine or not.
>
> 1. in __io_napi_remove_stale(),
>
> is it ok to use hash_for_each() instead of hash_for_each_safe()?
>
> it might be ok because it is a hash_del_rcu() and not a simple
> hash_del() but I have little experience with possible RCU shortcuts so
> I am unsure on this one...
Should use hash_for_each_rcu(), and I think for good measure, we sould
just keep it inside the RCU region. Then we know for a fact that the
deletion doesn't run.
> 2. in io_napi_free()
>
> list_del(&e->list); is not called. Can the only reason be that
> io_napi_free() is called as part of the ring destruction so it is an
> optimization to not clear the list since it is not expected to be
> reused?
>
> would calling INIT_LIST_HEAD() before exiting as an extra precaution to
> make the function is future proof in case it is reused in another
> context than the ring destruction be a good idea?
I think that's just an oversight, and doesn't matter since it's all
going away anyway. But it would be prudent to delete it regardless!
> 3. I am surprised to notice that in __io_napi_do_busy_loop(),
> list_for_each_entry_rcu() is called to traverse the list but the
> regular methods list_del() and list_add_tail() are called to update the
> list instead of their RCU variant.
Should all just use rcu variants.
Here's a mashup of the changes. Would be great if you can test - I'll do
some too, but always good with more than one person testing as it tends
to hit more cases.
diff --git a/io_uring/napi.c b/io_uring/napi.c
index d0cf694d0172..6251111a7e1f 100644
--- a/io_uring/napi.c
+++ b/io_uring/napi.c
@@ -81,7 +81,7 @@ void __io_napi_add(struct io_ring_ctx *ctx, struct socket *sock)
}
hlist_add_tail_rcu(&e->node, hash_list);
- list_add_tail(&e->list, &ctx->napi_list);
+ list_add_tail_rcu(&e->list, &ctx->napi_list);
spin_unlock(&ctx->napi_lock);
}
@@ -91,9 +91,9 @@ static void __io_napi_remove_stale(struct io_ring_ctx *ctx)
unsigned int i;
spin_lock(&ctx->napi_lock);
- hash_for_each(ctx->napi_ht, i, e, node) {
+ hash_for_each_rcu(ctx->napi_ht, i, e, node) {
if (time_after(jiffies, e->timeout)) {
- list_del(&e->list);
+ list_del_rcu(&e->list);
hash_del_rcu(&e->node);
kfree_rcu(e, rcu);
}
@@ -174,9 +174,8 @@ static void io_napi_blocking_busy_loop(struct io_ring_ctx *ctx,
do {
is_stale = __io_napi_do_busy_loop(ctx, loop_end_arg);
} while (!io_napi_busy_loop_should_end(iowq, start_time) && !loop_end_arg);
- rcu_read_unlock();
-
io_napi_remove_stale(ctx, is_stale);
+ rcu_read_unlock();
}
/*
@@ -209,6 +208,7 @@ void io_napi_free(struct io_ring_ctx *ctx)
spin_lock(&ctx->napi_lock);
hash_for_each(ctx->napi_ht, i, e, node) {
hash_del_rcu(&e->node);
+ list_del_rcu(&e->list);
kfree_rcu(e, rcu);
}
spin_unlock(&ctx->napi_lock);
@@ -309,9 +309,8 @@ int io_napi_sqpoll_busy_poll(struct io_ring_ctx *ctx)
rcu_read_lock();
is_stale = __io_napi_do_busy_loop(ctx, NULL);
- rcu_read_unlock();
-
io_napi_remove_stale(ctx, is_stale);
+ rcu_read_unlock();
return 1;
}
--
Jens Axboe
next prev parent reply other threads:[~2024-08-12 20:40 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-12 0:34 [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation Olivier Langlois
2024-08-12 1:00 ` Olivier Langlois
2024-08-12 18:10 ` Jens Axboe
2024-08-12 18:11 ` Jens Axboe
2024-08-12 20:15 ` Olivier Langlois
2024-08-12 20:40 ` Jens Axboe [this message]
2024-08-12 21:39 ` Olivier Langlois
2024-08-12 21:45 ` Olivier Langlois
2024-08-12 21:50 ` Jens Axboe
2024-08-13 17:22 ` Olivier Langlois
2024-08-13 18:35 ` Jens Axboe
2024-08-14 0:09 ` Olivier Langlois
2024-08-14 0:31 ` Jens Axboe
2024-08-14 0:44 ` Pavel Begunkov
2024-08-12 18:11 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox