public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Olivier Langlois <[email protected]>,
	Pavel Begunkov <[email protected]>,
	[email protected]
Subject: Re: [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation
Date: Mon, 12 Aug 2024 14:40:47 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 8/12/24 2:15 PM, Olivier Langlois wrote:
> On Mon, 2024-08-12 at 12:11 -0600, Jens Axboe wrote:
>> On 8/12/24 12:10 PM, Jens Axboe wrote:
>>> On 8/11/24 7:00 PM, Olivier Langlois wrote:
>>>> On Sun, 2024-08-11 at 20:34 -0400, Olivier Langlois wrote:
>>>>> io_napi_entry() has 2 calling sites. One of them is unlikely to
>>>>> find
>>>>> an
>>>>> entry and if it does, the timeout should arguable not be
>>>>> updated.
>>>>>
>>>>> The other io_napi_entry() calling site is overwriting the
>>>>> update made
>>>>> by io_napi_entry() so the io_napi_entry() timeout value update
>>>>> has no
>>>>> or
>>>>> little value and therefore is removed.
>>>>>
>>>>> Signed-off-by: Olivier Langlois <[email protected]>
>>>>> ---
>>>>>  io_uring/napi.c | 1 -
>>>>>  1 file changed, 1 deletion(-)
>>>>>
>>>>> diff --git a/io_uring/napi.c b/io_uring/napi.c
>>>>> index 73c4159e8405..1de1d4d62925 100644
>>>>> --- a/io_uring/napi.c
>>>>> +++ b/io_uring/napi.c
>>>>> @@ -26,7 +26,6 @@ static struct io_napi_entry
>>>>> *io_napi_hash_find(struct hlist_head *hash_list,
>>>>>  	hlist_for_each_entry_rcu(e, hash_list, node) {
>>>>>  		if (e->napi_id != napi_id)
>>>>>  			continue;
>>>>> -		e->timeout = jiffies + NAPI_TIMEOUT;
>>>>>  		return e;
>>>>>  	}
>>>>>  
>>>> I am commenting my own patch because I found something curious
>>>> that I
>>>> was not sure about when I was reviewing the code.
>>>>
>>>> Should the remaining e->timeout assignation be wrapped with a
>>>> WRITE_ONCE() macro to ensure an atomic store?
>>>
>>> I think that makes sense to do as lookup can be within rcu, and
>>> hence we have nothing serializing it. Not for torn writes, but to
>>> ensure that the memory sanitizer doesn't complain. I can just make
>>> this change while applying, or send a v2.
>>
>> As a separate patch I mean, not a v2. That part can wait until 6.12.
>>
> ok. np. I'll look into it soon.
> 
> In the meantime, I have detected few suspicious things in the napi
> code.
> 
> I am reporting them here to have few extra eye balls looking at them to
> be sure that everything is fine or not.
> 
> 1. in __io_napi_remove_stale(),
> 
> is it ok to use hash_for_each() instead of hash_for_each_safe()?
> 
> it might be ok because it is a hash_del_rcu() and not a simple
> hash_del() but I have little experience with possible RCU shortcuts so
> I am unsure on this one...

Should use hash_for_each_rcu(), and I think for good measure, we sould
just keep it inside the RCU region. Then we know for a fact that the
deletion doesn't run.

> 2. in io_napi_free()
> 
> list_del(&e->list); is not called. Can the only reason be that
> io_napi_free() is called as part of the ring destruction so it is an
> optimization to not clear the list since it is not expected to be
> reused?
> 
> would calling INIT_LIST_HEAD() before exiting as an extra precaution to
> make the function is future proof in case it is reused in another
> context than the ring destruction be a good idea?

I think that's just an oversight, and doesn't matter since it's all
going away anyway. But it would be prudent to delete it regardless!

> 3. I am surprised to notice that in __io_napi_do_busy_loop(),
> list_for_each_entry_rcu() is called to traverse the list but the
> regular methods list_del() and list_add_tail() are called to update the
> list instead of their RCU variant.

Should all just use rcu variants.

Here's a mashup of the changes. Would be great if you can test - I'll do
some too, but always good with more than one person testing as it tends
to hit more cases.


diff --git a/io_uring/napi.c b/io_uring/napi.c
index d0cf694d0172..6251111a7e1f 100644
--- a/io_uring/napi.c
+++ b/io_uring/napi.c
@@ -81,7 +81,7 @@ void __io_napi_add(struct io_ring_ctx *ctx, struct socket *sock)
 	}
 
 	hlist_add_tail_rcu(&e->node, hash_list);
-	list_add_tail(&e->list, &ctx->napi_list);
+	list_add_tail_rcu(&e->list, &ctx->napi_list);
 	spin_unlock(&ctx->napi_lock);
 }
 
@@ -91,9 +91,9 @@ static void __io_napi_remove_stale(struct io_ring_ctx *ctx)
 	unsigned int i;
 
 	spin_lock(&ctx->napi_lock);
-	hash_for_each(ctx->napi_ht, i, e, node) {
+	hash_for_each_rcu(ctx->napi_ht, i, e, node) {
 		if (time_after(jiffies, e->timeout)) {
-			list_del(&e->list);
+			list_del_rcu(&e->list);
 			hash_del_rcu(&e->node);
 			kfree_rcu(e, rcu);
 		}
@@ -174,9 +174,8 @@ static void io_napi_blocking_busy_loop(struct io_ring_ctx *ctx,
 	do {
 		is_stale = __io_napi_do_busy_loop(ctx, loop_end_arg);
 	} while (!io_napi_busy_loop_should_end(iowq, start_time) && !loop_end_arg);
-	rcu_read_unlock();
-
 	io_napi_remove_stale(ctx, is_stale);
+	rcu_read_unlock();
 }
 
 /*
@@ -209,6 +208,7 @@ void io_napi_free(struct io_ring_ctx *ctx)
 	spin_lock(&ctx->napi_lock);
 	hash_for_each(ctx->napi_ht, i, e, node) {
 		hash_del_rcu(&e->node);
+		list_del_rcu(&e->list);
 		kfree_rcu(e, rcu);
 	}
 	spin_unlock(&ctx->napi_lock);
@@ -309,9 +309,8 @@ int io_napi_sqpoll_busy_poll(struct io_ring_ctx *ctx)
 
 	rcu_read_lock();
 	is_stale = __io_napi_do_busy_loop(ctx, NULL);
-	rcu_read_unlock();
-
 	io_napi_remove_stale(ctx, is_stale);
+	rcu_read_unlock();
 	return 1;
 }
 

-- 
Jens Axboe


  reply	other threads:[~2024-08-12 20:40 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-12  0:34 [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation Olivier Langlois
2024-08-12  1:00 ` Olivier Langlois
2024-08-12 18:10   ` Jens Axboe
2024-08-12 18:11     ` Jens Axboe
2024-08-12 20:15       ` Olivier Langlois
2024-08-12 20:40         ` Jens Axboe [this message]
2024-08-12 21:39           ` Olivier Langlois
2024-08-12 21:45           ` Olivier Langlois
2024-08-12 21:50             ` Jens Axboe
2024-08-13 17:22           ` Olivier Langlois
2024-08-13 18:35             ` Jens Axboe
2024-08-14  0:09               ` Olivier Langlois
2024-08-14  0:31                 ` Jens Axboe
2024-08-14  0:44                 ` Pavel Begunkov
2024-08-12 18:11 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox