public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation
@ 2024-08-12  0:34 Olivier Langlois
  2024-08-12  1:00 ` Olivier Langlois
  2024-08-12 18:11 ` Jens Axboe
  0 siblings, 2 replies; 15+ messages in thread
From: Olivier Langlois @ 2024-08-12  0:34 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring

io_napi_entry() has 2 calling sites. One of them is unlikely to find an
entry and if it does, the timeout should arguable not be updated.

The other io_napi_entry() calling site is overwriting the update made
by io_napi_entry() so the io_napi_entry() timeout value update has no or
little value and therefore is removed.

Signed-off-by: Olivier Langlois <[email protected]>
---
 io_uring/napi.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/io_uring/napi.c b/io_uring/napi.c
index 73c4159e8405..1de1d4d62925 100644
--- a/io_uring/napi.c
+++ b/io_uring/napi.c
@@ -26,7 +26,6 @@ static struct io_napi_entry *io_napi_hash_find(struct hlist_head *hash_list,
 	hlist_for_each_entry_rcu(e, hash_list, node) {
 		if (e->napi_id != napi_id)
 			continue;
-		e->timeout = jiffies + NAPI_TIMEOUT;
 		return e;
 	}
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation
  2024-08-12  0:34 [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation Olivier Langlois
@ 2024-08-12  1:00 ` Olivier Langlois
  2024-08-12 18:10   ` Jens Axboe
  2024-08-12 18:11 ` Jens Axboe
  1 sibling, 1 reply; 15+ messages in thread
From: Olivier Langlois @ 2024-08-12  1:00 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring

On Sun, 2024-08-11 at 20:34 -0400, Olivier Langlois wrote:
> io_napi_entry() has 2 calling sites. One of them is unlikely to find
> an
> entry and if it does, the timeout should arguable not be updated.
> 
> The other io_napi_entry() calling site is overwriting the update made
> by io_napi_entry() so the io_napi_entry() timeout value update has no
> or
> little value and therefore is removed.
> 
> Signed-off-by: Olivier Langlois <[email protected]>
> ---
>  io_uring/napi.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/io_uring/napi.c b/io_uring/napi.c
> index 73c4159e8405..1de1d4d62925 100644
> --- a/io_uring/napi.c
> +++ b/io_uring/napi.c
> @@ -26,7 +26,6 @@ static struct io_napi_entry
> *io_napi_hash_find(struct hlist_head *hash_list,
>  	hlist_for_each_entry_rcu(e, hash_list, node) {
>  		if (e->napi_id != napi_id)
>  			continue;
> -		e->timeout = jiffies + NAPI_TIMEOUT;
>  		return e;
>  	}
>  
I am commenting my own patch because I found something curious that I
was not sure about when I was reviewing the code.

Should the remaining e->timeout assignation be wrapped with a
WRITE_ONCE() macro to ensure an atomic store?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation
  2024-08-12  1:00 ` Olivier Langlois
@ 2024-08-12 18:10   ` Jens Axboe
  2024-08-12 18:11     ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2024-08-12 18:10 UTC (permalink / raw)
  To: Olivier Langlois, Pavel Begunkov, io-uring

On 8/11/24 7:00 PM, Olivier Langlois wrote:
> On Sun, 2024-08-11 at 20:34 -0400, Olivier Langlois wrote:
>> io_napi_entry() has 2 calling sites. One of them is unlikely to find
>> an
>> entry and if it does, the timeout should arguable not be updated.
>>
>> The other io_napi_entry() calling site is overwriting the update made
>> by io_napi_entry() so the io_napi_entry() timeout value update has no
>> or
>> little value and therefore is removed.
>>
>> Signed-off-by: Olivier Langlois <[email protected]>
>> ---
>>  io_uring/napi.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/io_uring/napi.c b/io_uring/napi.c
>> index 73c4159e8405..1de1d4d62925 100644
>> --- a/io_uring/napi.c
>> +++ b/io_uring/napi.c
>> @@ -26,7 +26,6 @@ static struct io_napi_entry
>> *io_napi_hash_find(struct hlist_head *hash_list,
>>  	hlist_for_each_entry_rcu(e, hash_list, node) {
>>  		if (e->napi_id != napi_id)
>>  			continue;
>> -		e->timeout = jiffies + NAPI_TIMEOUT;
>>  		return e;
>>  	}
>>  
> I am commenting my own patch because I found something curious that I
> was not sure about when I was reviewing the code.
> 
> Should the remaining e->timeout assignation be wrapped with a
> WRITE_ONCE() macro to ensure an atomic store?

I think that makes sense to do as lookup can be within rcu, and
hence we have nothing serializing it. Not for torn writes, but to
ensure that the memory sanitizer doesn't complain. I can just make
this change while applying, or send a v2.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation
  2024-08-12 18:10   ` Jens Axboe
@ 2024-08-12 18:11     ` Jens Axboe
  2024-08-12 20:15       ` Olivier Langlois
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2024-08-12 18:11 UTC (permalink / raw)
  To: Olivier Langlois, Pavel Begunkov, io-uring

On 8/12/24 12:10 PM, Jens Axboe wrote:
> On 8/11/24 7:00 PM, Olivier Langlois wrote:
>> On Sun, 2024-08-11 at 20:34 -0400, Olivier Langlois wrote:
>>> io_napi_entry() has 2 calling sites. One of them is unlikely to find
>>> an
>>> entry and if it does, the timeout should arguable not be updated.
>>>
>>> The other io_napi_entry() calling site is overwriting the update made
>>> by io_napi_entry() so the io_napi_entry() timeout value update has no
>>> or
>>> little value and therefore is removed.
>>>
>>> Signed-off-by: Olivier Langlois <[email protected]>
>>> ---
>>>  io_uring/napi.c | 1 -
>>>  1 file changed, 1 deletion(-)
>>>
>>> diff --git a/io_uring/napi.c b/io_uring/napi.c
>>> index 73c4159e8405..1de1d4d62925 100644
>>> --- a/io_uring/napi.c
>>> +++ b/io_uring/napi.c
>>> @@ -26,7 +26,6 @@ static struct io_napi_entry
>>> *io_napi_hash_find(struct hlist_head *hash_list,
>>>  	hlist_for_each_entry_rcu(e, hash_list, node) {
>>>  		if (e->napi_id != napi_id)
>>>  			continue;
>>> -		e->timeout = jiffies + NAPI_TIMEOUT;
>>>  		return e;
>>>  	}
>>>  
>> I am commenting my own patch because I found something curious that I
>> was not sure about when I was reviewing the code.
>>
>> Should the remaining e->timeout assignation be wrapped with a
>> WRITE_ONCE() macro to ensure an atomic store?
> 
> I think that makes sense to do as lookup can be within rcu, and
> hence we have nothing serializing it. Not for torn writes, but to
> ensure that the memory sanitizer doesn't complain. I can just make
> this change while applying, or send a v2.

As a separate patch I mean, not a v2. That part can wait until 6.12.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation
  2024-08-12  0:34 [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation Olivier Langlois
  2024-08-12  1:00 ` Olivier Langlois
@ 2024-08-12 18:11 ` Jens Axboe
  1 sibling, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2024-08-12 18:11 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, Olivier Langlois


On Sun, 11 Aug 2024 20:34:46 -0400, Olivier Langlois wrote:
> io_napi_entry() has 2 calling sites. One of them is unlikely to find an
> entry and if it does, the timeout should arguable not be updated.
> 
> The other io_napi_entry() calling site is overwriting the update made
> by io_napi_entry() so the io_napi_entry() timeout value update has no or
> little value and therefore is removed.
> 
> [...]

Applied, thanks!

[1/1] io_uring/napi: remove duplicate io_napi_entry timeout assignation
      commit: 48cc7ecd3a68e0fbfa281ef1ed6f6b6cb7638390

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation
  2024-08-12 18:11     ` Jens Axboe
@ 2024-08-12 20:15       ` Olivier Langlois
  2024-08-12 20:40         ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Olivier Langlois @ 2024-08-12 20:15 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring

On Mon, 2024-08-12 at 12:11 -0600, Jens Axboe wrote:
> On 8/12/24 12:10 PM, Jens Axboe wrote:
> > On 8/11/24 7:00 PM, Olivier Langlois wrote:
> > > On Sun, 2024-08-11 at 20:34 -0400, Olivier Langlois wrote:
> > > > io_napi_entry() has 2 calling sites. One of them is unlikely to
> > > > find
> > > > an
> > > > entry and if it does, the timeout should arguable not be
> > > > updated.
> > > > 
> > > > The other io_napi_entry() calling site is overwriting the
> > > > update made
> > > > by io_napi_entry() so the io_napi_entry() timeout value update
> > > > has no
> > > > or
> > > > little value and therefore is removed.
> > > > 
> > > > Signed-off-by: Olivier Langlois <[email protected]>
> > > > ---
> > > >  io_uring/napi.c | 1 -
> > > >  1 file changed, 1 deletion(-)
> > > > 
> > > > diff --git a/io_uring/napi.c b/io_uring/napi.c
> > > > index 73c4159e8405..1de1d4d62925 100644
> > > > --- a/io_uring/napi.c
> > > > +++ b/io_uring/napi.c
> > > > @@ -26,7 +26,6 @@ static struct io_napi_entry
> > > > *io_napi_hash_find(struct hlist_head *hash_list,
> > > >  	hlist_for_each_entry_rcu(e, hash_list, node) {
> > > >  		if (e->napi_id != napi_id)
> > > >  			continue;
> > > > -		e->timeout = jiffies + NAPI_TIMEOUT;
> > > >  		return e;
> > > >  	}
> > > >  
> > > I am commenting my own patch because I found something curious
> > > that I
> > > was not sure about when I was reviewing the code.
> > > 
> > > Should the remaining e->timeout assignation be wrapped with a
> > > WRITE_ONCE() macro to ensure an atomic store?
> > 
> > I think that makes sense to do as lookup can be within rcu, and
> > hence we have nothing serializing it. Not for torn writes, but to
> > ensure that the memory sanitizer doesn't complain. I can just make
> > this change while applying, or send a v2.
> 
> As a separate patch I mean, not a v2. That part can wait until 6.12.
> 
ok. np. I'll look into it soon.

In the meantime, I have detected few suspicious things in the napi
code.

I am reporting them here to have few extra eye balls looking at them to
be sure that everything is fine or not.

1. in __io_napi_remove_stale(),

is it ok to use hash_for_each() instead of hash_for_each_safe()?

it might be ok because it is a hash_del_rcu() and not a simple
hash_del() but I have little experience with possible RCU shortcuts so
I am unsure on this one...

2. in io_napi_free()

list_del(&e->list); is not called. Can the only reason be that
io_napi_free() is called as part of the ring destruction so it is an
optimization to not clear the list since it is not expected to be
reused?

would calling INIT_LIST_HEAD() before exiting as an extra precaution to
make the function is future proof in case it is reused in another
context than the ring destruction be a good idea?

3. I am surprised to notice that in __io_napi_do_busy_loop(),
list_for_each_entry_rcu() is called to traverse the list but the
regular methods list_del() and list_add_tail() are called to update the
list instead of their RCU variant.

Is this ok?

if it is, the only plausible explanation that can think of is that it
is paired with another RCU hash table update... I guess maybe this is a
common RCU idiom that I am unaware of and this is why this is even not
derserving a small comment to address this point, I would think that
this might deserve one. This setup leaves me perplexed...


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation
  2024-08-12 20:15       ` Olivier Langlois
@ 2024-08-12 20:40         ` Jens Axboe
  2024-08-12 21:39           ` Olivier Langlois
                             ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Jens Axboe @ 2024-08-12 20:40 UTC (permalink / raw)
  To: Olivier Langlois, Pavel Begunkov, io-uring

On 8/12/24 2:15 PM, Olivier Langlois wrote:
> On Mon, 2024-08-12 at 12:11 -0600, Jens Axboe wrote:
>> On 8/12/24 12:10 PM, Jens Axboe wrote:
>>> On 8/11/24 7:00 PM, Olivier Langlois wrote:
>>>> On Sun, 2024-08-11 at 20:34 -0400, Olivier Langlois wrote:
>>>>> io_napi_entry() has 2 calling sites. One of them is unlikely to
>>>>> find
>>>>> an
>>>>> entry and if it does, the timeout should arguable not be
>>>>> updated.
>>>>>
>>>>> The other io_napi_entry() calling site is overwriting the
>>>>> update made
>>>>> by io_napi_entry() so the io_napi_entry() timeout value update
>>>>> has no
>>>>> or
>>>>> little value and therefore is removed.
>>>>>
>>>>> Signed-off-by: Olivier Langlois <[email protected]>
>>>>> ---
>>>>>  io_uring/napi.c | 1 -
>>>>>  1 file changed, 1 deletion(-)
>>>>>
>>>>> diff --git a/io_uring/napi.c b/io_uring/napi.c
>>>>> index 73c4159e8405..1de1d4d62925 100644
>>>>> --- a/io_uring/napi.c
>>>>> +++ b/io_uring/napi.c
>>>>> @@ -26,7 +26,6 @@ static struct io_napi_entry
>>>>> *io_napi_hash_find(struct hlist_head *hash_list,
>>>>>  	hlist_for_each_entry_rcu(e, hash_list, node) {
>>>>>  		if (e->napi_id != napi_id)
>>>>>  			continue;
>>>>> -		e->timeout = jiffies + NAPI_TIMEOUT;
>>>>>  		return e;
>>>>>  	}
>>>>>  
>>>> I am commenting my own patch because I found something curious
>>>> that I
>>>> was not sure about when I was reviewing the code.
>>>>
>>>> Should the remaining e->timeout assignation be wrapped with a
>>>> WRITE_ONCE() macro to ensure an atomic store?
>>>
>>> I think that makes sense to do as lookup can be within rcu, and
>>> hence we have nothing serializing it. Not for torn writes, but to
>>> ensure that the memory sanitizer doesn't complain. I can just make
>>> this change while applying, or send a v2.
>>
>> As a separate patch I mean, not a v2. That part can wait until 6.12.
>>
> ok. np. I'll look into it soon.
> 
> In the meantime, I have detected few suspicious things in the napi
> code.
> 
> I am reporting them here to have few extra eye balls looking at them to
> be sure that everything is fine or not.
> 
> 1. in __io_napi_remove_stale(),
> 
> is it ok to use hash_for_each() instead of hash_for_each_safe()?
> 
> it might be ok because it is a hash_del_rcu() and not a simple
> hash_del() but I have little experience with possible RCU shortcuts so
> I am unsure on this one...

Should use hash_for_each_rcu(), and I think for good measure, we sould
just keep it inside the RCU region. Then we know for a fact that the
deletion doesn't run.

> 2. in io_napi_free()
> 
> list_del(&e->list); is not called. Can the only reason be that
> io_napi_free() is called as part of the ring destruction so it is an
> optimization to not clear the list since it is not expected to be
> reused?
> 
> would calling INIT_LIST_HEAD() before exiting as an extra precaution to
> make the function is future proof in case it is reused in another
> context than the ring destruction be a good idea?

I think that's just an oversight, and doesn't matter since it's all
going away anyway. But it would be prudent to delete it regardless!

> 3. I am surprised to notice that in __io_napi_do_busy_loop(),
> list_for_each_entry_rcu() is called to traverse the list but the
> regular methods list_del() and list_add_tail() are called to update the
> list instead of their RCU variant.

Should all just use rcu variants.

Here's a mashup of the changes. Would be great if you can test - I'll do
some too, but always good with more than one person testing as it tends
to hit more cases.


diff --git a/io_uring/napi.c b/io_uring/napi.c
index d0cf694d0172..6251111a7e1f 100644
--- a/io_uring/napi.c
+++ b/io_uring/napi.c
@@ -81,7 +81,7 @@ void __io_napi_add(struct io_ring_ctx *ctx, struct socket *sock)
 	}
 
 	hlist_add_tail_rcu(&e->node, hash_list);
-	list_add_tail(&e->list, &ctx->napi_list);
+	list_add_tail_rcu(&e->list, &ctx->napi_list);
 	spin_unlock(&ctx->napi_lock);
 }
 
@@ -91,9 +91,9 @@ static void __io_napi_remove_stale(struct io_ring_ctx *ctx)
 	unsigned int i;
 
 	spin_lock(&ctx->napi_lock);
-	hash_for_each(ctx->napi_ht, i, e, node) {
+	hash_for_each_rcu(ctx->napi_ht, i, e, node) {
 		if (time_after(jiffies, e->timeout)) {
-			list_del(&e->list);
+			list_del_rcu(&e->list);
 			hash_del_rcu(&e->node);
 			kfree_rcu(e, rcu);
 		}
@@ -174,9 +174,8 @@ static void io_napi_blocking_busy_loop(struct io_ring_ctx *ctx,
 	do {
 		is_stale = __io_napi_do_busy_loop(ctx, loop_end_arg);
 	} while (!io_napi_busy_loop_should_end(iowq, start_time) && !loop_end_arg);
-	rcu_read_unlock();
-
 	io_napi_remove_stale(ctx, is_stale);
+	rcu_read_unlock();
 }
 
 /*
@@ -209,6 +208,7 @@ void io_napi_free(struct io_ring_ctx *ctx)
 	spin_lock(&ctx->napi_lock);
 	hash_for_each(ctx->napi_ht, i, e, node) {
 		hash_del_rcu(&e->node);
+		list_del_rcu(&e->list);
 		kfree_rcu(e, rcu);
 	}
 	spin_unlock(&ctx->napi_lock);
@@ -309,9 +309,8 @@ int io_napi_sqpoll_busy_poll(struct io_ring_ctx *ctx)
 
 	rcu_read_lock();
 	is_stale = __io_napi_do_busy_loop(ctx, NULL);
-	rcu_read_unlock();
-
 	io_napi_remove_stale(ctx, is_stale);
+	rcu_read_unlock();
 	return 1;
 }
 

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation
  2024-08-12 20:40         ` Jens Axboe
@ 2024-08-12 21:39           ` Olivier Langlois
  2024-08-12 21:45           ` Olivier Langlois
  2024-08-13 17:22           ` Olivier Langlois
  2 siblings, 0 replies; 15+ messages in thread
From: Olivier Langlois @ 2024-08-12 21:39 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring

On Mon, 2024-08-12 at 14:40 -0600, Jens Axboe wrote:
> 
> > 
> > 1. in __io_napi_remove_stale(),
> > 
> > is it ok to use hash_for_each() instead of hash_for_each_safe()?
> > 
> > it might be ok because it is a hash_del_rcu() and not a simple
> > hash_del() but I have little experience with possible RCU shortcuts
> > so
> > I am unsure on this one...
> 
> Should use hash_for_each_rcu(), and I think for good measure, we
> sould
> just keep it inside the RCU region. Then we know for a fact that the
> deletion doesn't run.

ok, after confirming that it was NOT paranoia, I did push further the
investigation.

hash_for_each_rcu() is meant for the readers. A writer can modify the
container as long as it protects the container against concurrent
writes with locks.

in include/linux/hashtable.h:
static inline void hash_del_rcu(struct hlist_node *node)
{
	hlist_del_init_rcu(node);
}

in include/linux/rculist.h:
static inline void hlist_del_init_rcu(struct hlist_node *n)
{
        if (!hlist_unhashed(n)) {
                __hlist_del(n);
                WRITE_ONCE(n->pprev, NULL);
        }
}

in include/linux/list.h:
static inline void __hlist_del(struct hlist_node *n)
{
	struct hlist_node *next = n->next;
	struct hlist_node **pprev = n->pprev;

	WRITE_ONCE(*pprev, next);
	if (next)
		WRITE_ONCE(next->pprev, pprev);
}

it seems we are ok since the deleted node next pointer is not reset.
the problem might arise if the next action after removing the node is
to kfree it. we are fine because kfree_rcu() must be defering the
actual free.

That being said, I recommend to replace hash_for_each_rcu() with
hash_for_each_safe() to make everyone sleep better at night...

based on this new info, I'll let you decide which macro to use and I
volunteer to make that change.

> 
> > 2. in io_napi_free()
> > 
> > list_del(&e->list); is not called. Can the only reason be that
> > io_napi_free() is called as part of the ring destruction so it is
> > an
> > optimization to not clear the list since it is not expected to be
> > reused?
> > 
> > would calling INIT_LIST_HEAD() before exiting as an extra
> > precaution to
> > make the function is future proof in case it is reused in another
> > context than the ring destruction be a good idea?
> 
> I think that's just an oversight, and doesn't matter since it's all
> going away anyway. But it would be prudent to delete it regardless!

ok. I did ask because I am in the process of adding a call to
io_napi_free() in the context of a super new cool patch that I am
working on.. I'll call INIT_LIST_HEAD() to reset the list (or its RCU
variant if it exists).
> 
> > 3. I am surprised to notice that in __io_napi_do_busy_loop(),
> > list_for_each_entry_rcu() is called to traverse the list but the
> > regular methods list_del() and list_add_tail() are called to update
> > the
> > list instead of their RCU variant.
> 
> Should all just use rcu variants.
> 
> Here's a mashup of the changes. Would be great if you can test - I'll
> do
> some too, but always good with more than one person testing as it
> tends
> to hit more cases.

I am so glag to have asked. It is going to be a pleasure to test this
and report back the result.

but it appears that my napi_list is very *static* (wink, wink)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation
  2024-08-12 20:40         ` Jens Axboe
  2024-08-12 21:39           ` Olivier Langlois
@ 2024-08-12 21:45           ` Olivier Langlois
  2024-08-12 21:50             ` Jens Axboe
  2024-08-13 17:22           ` Olivier Langlois
  2 siblings, 1 reply; 15+ messages in thread
From: Olivier Langlois @ 2024-08-12 21:45 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring

On Mon, 2024-08-12 at 14:40 -0600, Jens Axboe wrote:
> 
> @@ -174,9 +174,8 @@ static void io_napi_blocking_busy_loop(struct
> io_ring_ctx *ctx,
>  	do {
>  		is_stale = __io_napi_do_busy_loop(ctx,
> loop_end_arg);
>  	} while (!io_napi_busy_loop_should_end(iowq, start_time) &&
> !loop_end_arg);
> -	rcu_read_unlock();
> -
>  	io_napi_remove_stale(ctx, is_stale);
> +	rcu_read_unlock();
>  }
> 
> @@ -309,9 +309,8 @@ int io_napi_sqpoll_busy_poll(struct io_ring_ctx
> *ctx)
>  
>  	rcu_read_lock();
>  	is_stale = __io_napi_do_busy_loop(ctx, NULL);
> -	rcu_read_unlock();
> -
>  	io_napi_remove_stale(ctx, is_stale);
> +	rcu_read_unlock();
>  	return 1;
>  }
>  
> 
Jens,

I have big doubts that moving the rcu_read_unlock() call is correct.
The read-only list access if performed by the busy loops block.

io_napi_remove_stale() is then modifying the list after having acquired
the spinlock. IMHO, you should not hold the RCU read lock when you are
updating the data. I even wonder is this could not be a possible
livelock cause...


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation
  2024-08-12 21:45           ` Olivier Langlois
@ 2024-08-12 21:50             ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2024-08-12 21:50 UTC (permalink / raw)
  To: Olivier Langlois, Pavel Begunkov, io-uring

On 8/12/24 3:45 PM, Olivier Langlois wrote:
> On Mon, 2024-08-12 at 14:40 -0600, Jens Axboe wrote:
>>
>> @@ -174,9 +174,8 @@ static void io_napi_blocking_busy_loop(struct
>> io_ring_ctx *ctx,
>>  	do {
>>  		is_stale = __io_napi_do_busy_loop(ctx,
>> loop_end_arg);
>>  	} while (!io_napi_busy_loop_should_end(iowq, start_time) &&
>> !loop_end_arg);
>> -	rcu_read_unlock();
>> -
>>  	io_napi_remove_stale(ctx, is_stale);
>> +	rcu_read_unlock();
>>  }
>>
>> @@ -309,9 +309,8 @@ int io_napi_sqpoll_busy_poll(struct io_ring_ctx
>> *ctx)
>>  
>>  	rcu_read_lock();
>>  	is_stale = __io_napi_do_busy_loop(ctx, NULL);
>> -	rcu_read_unlock();
>> -
>>  	io_napi_remove_stale(ctx, is_stale);
>> +	rcu_read_unlock();
>>  	return 1;
>>  }
>>  
>>
> Jens,
> 
> I have big doubts that moving the rcu_read_unlock() call is correct.
> The read-only list access if performed by the busy loops block.
> 
> io_napi_remove_stale() is then modifying the list after having acquired
> the spinlock. IMHO, you should not hold the RCU read lock when you are
> updating the data. I even wonder is this could not be a possible
> livelock cause...

You can certainly nest a spinlock inside the rcu read side section,
that's not an issue. Only thing that matters here is the list
modification is done inside the lock that protects it, the rcu is just
for reader side protection (and to make the reader side cheaper). The
spinlock itself is reader side RCU protection as well, so the change
isn't strictly needed to begin with.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation
  2024-08-12 20:40         ` Jens Axboe
  2024-08-12 21:39           ` Olivier Langlois
  2024-08-12 21:45           ` Olivier Langlois
@ 2024-08-13 17:22           ` Olivier Langlois
  2024-08-13 18:35             ` Jens Axboe
  2 siblings, 1 reply; 15+ messages in thread
From: Olivier Langlois @ 2024-08-13 17:22 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring

On Mon, 2024-08-12 at 14:40 -0600, Jens Axboe wrote:
> 
> 
> > 3. I am surprised to notice that in __io_napi_do_busy_loop(),
> > list_for_each_entry_rcu() is called to traverse the list but the
> > regular methods list_del() and list_add_tail() are called to update
> > the
> > list instead of their RCU variant.
> 
> Should all just use rcu variants.
> 
> Here's a mashup of the changes. Would be great if you can test - I'll
> do
> some too, but always good with more than one person testing as it
> tends
> to hit more cases.
> 
Jens,

I have integrated our RCU corrections into
https://lore.kernel.org/io-uring/5fc9dd07e48a7178f547ed1b2aaa0814607fa246.1723567469.git.olivier@trillion01.com/T/#u

and my testing so far is not showing any problems...
but I have a very static setup...
I had no issues too without the corrections...


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation
  2024-08-13 17:22           ` Olivier Langlois
@ 2024-08-13 18:35             ` Jens Axboe
  2024-08-14  0:09               ` Olivier Langlois
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2024-08-13 18:35 UTC (permalink / raw)
  To: Olivier Langlois, Pavel Begunkov, io-uring

On 8/13/24 11:22 AM, Olivier Langlois wrote:
> On Mon, 2024-08-12 at 14:40 -0600, Jens Axboe wrote:
>>
>>
>>> 3. I am surprised to notice that in __io_napi_do_busy_loop(),
>>> list_for_each_entry_rcu() is called to traverse the list but the
>>> regular methods list_del() and list_add_tail() are called to update
>>> the
>>> list instead of their RCU variant.
>>
>> Should all just use rcu variants.
>>
>> Here's a mashup of the changes. Would be great if you can test - I'll
>> do
>> some too, but always good with more than one person testing as it
>> tends
>> to hit more cases.
>>
> Jens,
> 
> I have integrated our RCU corrections into
> https://lore.kernel.org/io-uring/5fc9dd07e48a7178f547ed1b2aaa0814607fa246.1723567469.git.olivier@trillion01.com/T/#u
> 
> and my testing so far is not showing any problems...
> but I have a very static setup...
> I had no issues too without the corrections...

Thanks for testing, but regardless of whether that series would go in or
not, I think those rcu changes should be done separately and upfront
rather than be integrated with other changes.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation
  2024-08-13 18:35             ` Jens Axboe
@ 2024-08-14  0:09               ` Olivier Langlois
  2024-08-14  0:31                 ` Jens Axboe
  2024-08-14  0:44                 ` Pavel Begunkov
  0 siblings, 2 replies; 15+ messages in thread
From: Olivier Langlois @ 2024-08-14  0:09 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring

On Tue, 2024-08-13 at 12:35 -0600, Jens Axboe wrote:
> On 8/13/24 11:22 AM, Olivier Langlois wrote:
> > On Mon, 2024-08-12 at 14:40 -0600, Jens Axboe wrote:
> > > 
> > > 
> > > > 3. I am surprised to notice that in __io_napi_do_busy_loop(),
> > > > list_for_each_entry_rcu() is called to traverse the list but
> > > > the
> > > > regular methods list_del() and list_add_tail() are called to
> > > > update
> > > > the
> > > > list instead of their RCU variant.
> > > 
> > > Should all just use rcu variants.
> > > 
> > > Here's a mashup of the changes. Would be great if you can test -
> > > I'll
> > > do
> > > some too, but always good with more than one person testing as it
> > > tends
> > > to hit more cases.
> > > 
> > Jens,
> > 
> > I have integrated our RCU corrections into
> > https://lore.kernel.org/io-uring/5fc9dd07e48a7178f547ed1b2aaa0814607fa246.1723567469.git.olivier@trillion01.com/T/#u
> > 
> > and my testing so far is not showing any problems...
> > but I have a very static setup...
> > I had no issues too without the corrections...
> 
> Thanks for testing, but regardless of whether that series would go in
> or
> not, I think those rcu changes should be done separately and upfront
> rather than be integrated with other changes.
> 
sorry about that...

I am going to share a little bit how I currently feel. I feel
disappointed because when I reread your initial reply, I have not been
able to spot a single positive thing said about my proposal despite
that I have prealably tested the water concerning my idea and the big
lines about how I was planning to design it. All, I have been told from
Pavel that the idea was so great that he was even currently playing
with a prototype around the same concept:
https://lore.kernel.org/io-uring/[email protected]/T/#mc7271764641f9c810ea5438ed3dc0662fbc08cb6

you also have to understand that all the small napi issues that I have
fixed this week are no stranger from me working on this new idea. The
RCU issues that I have reported back have been spotted when I was doing
my final code review before testing my patch before submitting it.

keep in mind that I am by far a git magician. I am a very casual
user... Anything that is outside the usual beaten trails such as
reordoring commits or breaking them down feels perilious to me...

I had 230+ lines changes committed when you confirmed that few lines
should be changed to address this new RCU issue. I did figure that it
would not that big a deal to include them with the rest of my change.

that being said, if my patch submission is acceptable conditional to
needed rework, I am willing to learn how to better use git to meet your
requirements.

Greetings,


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation
  2024-08-14  0:09               ` Olivier Langlois
@ 2024-08-14  0:31                 ` Jens Axboe
  2024-08-14  0:44                 ` Pavel Begunkov
  1 sibling, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2024-08-14  0:31 UTC (permalink / raw)
  To: Olivier Langlois, Pavel Begunkov, io-uring

On 8/13/24 6:09 PM, Olivier Langlois wrote:
> On Tue, 2024-08-13 at 12:35 -0600, Jens Axboe wrote:
>> On 8/13/24 11:22 AM, Olivier Langlois wrote:
>>> On Mon, 2024-08-12 at 14:40 -0600, Jens Axboe wrote:
>>>>
>>>>
>>>>> 3. I am surprised to notice that in __io_napi_do_busy_loop(),
>>>>> list_for_each_entry_rcu() is called to traverse the list but
>>>>> the
>>>>> regular methods list_del() and list_add_tail() are called to
>>>>> update
>>>>> the
>>>>> list instead of their RCU variant.
>>>>
>>>> Should all just use rcu variants.
>>>>
>>>> Here's a mashup of the changes. Would be great if you can test -
>>>> I'll
>>>> do
>>>> some too, but always good with more than one person testing as it
>>>> tends
>>>> to hit more cases.
>>>>
>>> Jens,
>>>
>>> I have integrated our RCU corrections into
>>> https://lore.kernel.org/io-uring/5fc9dd07e48a7178f547ed1b2aaa0814607fa246.1723567469.git.olivier@trillion01.com/T/#u
>>>
>>> and my testing so far is not showing any problems...
>>> but I have a very static setup...
>>> I had no issues too without the corrections...
>>
>> Thanks for testing, but regardless of whether that series would go in
>> or
>> not, I think those rcu changes should be done separately and upfront
>> rather than be integrated with other changes.
>>
> sorry about that...
> 
> I am going to share a little bit how I currently feel. I feel
> disappointed because when I reread your initial reply, I have not been
> able to spot a single positive thing said about my proposal despite
> that I have prealably tested the water concerning my idea and the big
> lines about how I was planning to design it. All, I have been told from
> Pavel that the idea was so great that he was even currently playing
> with a prototype around the same concept:
> https://lore.kernel.org/io-uring/[email protected]/T/#mc7271764641f9c810ea5438ed3dc0662fbc08cb6

Sorry if I made you feel like that, was not my intent. I just latched on
to the ops setup and didn't like that at all. I do think
pre-registrering an ID upfront is a good change, as it avoids a bunch of
hot path checking. And for a lot of use cases, single NAPI instance per
ring is indeed the common use case, by far.

Please realize that I get (and answer) a lot of email and also have a
lot of other things on my plate than answer emails, hence replies can
often be short and to the point. If I don't mention a part of your
patch, I either missed it or figured it necessitated a rework based on
the other feedback.

> you also have to understand that all the small napi issues that I have
> fixed this week are no stranger from me working on this new idea. The
> RCU issues that I have reported back have been spotted when I was doing
> my final code review before testing my patch before submitting it.
> 
> keep in mind that I am by far a git magician. I am a very casual
> user... Anything that is outside the usual beaten trails such as
> reordoring commits or breaking them down feels perilious to me...

Reviews and fixes are _always_ appreciated, even if emails don't include
that. They sure could, but then it'd just be a copied blurb, and I don't
think that's very genuine. FWIW, your patch submissions have been fine.
The critique I had for your patch was that you should not include little
fixes with larger changes. That's just not how kernel patches are done.
You do the little fixes first, and then the bigger change on top as
separate changes. The reason for that are two-fold: it makes it easier
to backport them to stable, if that is needed, and it makes it easier to
review the actual functional change. Both of those are really important.

> I had 230+ lines changes committed when you confirmed that few lines
> should be changed to address this new RCU issue. I did figure that it
> would not that big a deal to include them with the rest of my change.

See above, it is a big deal, and honestly something I recommend for any
kind of project, not just the kernel.

> that being said, if my patch submission is acceptable conditional to
> needed rework, I am willing to learn how to better use git to meet your
> requirements.

Your git skills are fine, don't worry about that, there's nothing wrong
with the mechanics of the submission. It's just the changes themselves
that need a splitting up, and rework.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation
  2024-08-14  0:09               ` Olivier Langlois
  2024-08-14  0:31                 ` Jens Axboe
@ 2024-08-14  0:44                 ` Pavel Begunkov
  1 sibling, 0 replies; 15+ messages in thread
From: Pavel Begunkov @ 2024-08-14  0:44 UTC (permalink / raw)
  To: Olivier Langlois, Jens Axboe, io-uring

On 8/14/24 01:09, Olivier Langlois wrote:
> On Tue, 2024-08-13 at 12:35 -0600, Jens Axboe wrote:
>> On 8/13/24 11:22 AM, Olivier Langlois wrote:
>>> On Mon, 2024-08-12 at 14:40 -0600, Jens Axboe wrote:
>>>>
>>>>
>>>>> 3. I am surprised to notice that in __io_napi_do_busy_loop(),
>>>>> list_for_each_entry_rcu() is called to traverse the list but
>>>>> the
>>>>> regular methods list_del() and list_add_tail() are called to
>>>>> update
>>>>> the
>>>>> list instead of their RCU variant.
>>>>
>>>> Should all just use rcu variants.
>>>>
>>>> Here's a mashup of the changes. Would be great if you can test -
>>>> I'll
>>>> do
>>>> some too, but always good with more than one person testing as it
>>>> tends
>>>> to hit more cases.
>>>>
>>> Jens,
>>>
>>> I have integrated our RCU corrections into
>>> https://lore.kernel.org/io-uring/5fc9dd07e48a7178f547ed1b2aaa0814607fa246.1723567469.git.olivier@trillion01.com/T/#u
>>>
>>> and my testing so far is not showing any problems...
>>> but I have a very static setup...
>>> I had no issues too without the corrections...
>>
>> Thanks for testing, but regardless of whether that series would go in
>> or
>> not, I think those rcu changes should be done separately and upfront
>> rather than be integrated with other changes.
>>
> sorry about that...
> 
> I am going to share a little bit how I currently feel. I feel
> disappointed because when I reread your initial reply, I have not been
> able to spot a single positive thing said about my proposal despite
> that I have prealably tested the water concerning my idea and the big
> lines about how I was planning to design it. All, I have been told from
> Pavel that the idea was so great that he was even currently playing
> with a prototype around the same concept:
> https://lore.kernel.org/io-uring/[email protected]/T/#mc7271764641f9c810ea5438ed3dc0662fbc08cb6

I've been playing with it but more of as a mean to some
other ideas. I had hard time to justify to myself to send
anything just to change the scheme, but it doesn't mean it's
a bad idea to get something like that merged. It just needs
some brushing mostly around excessive complexity, and it's
a part of normal dev process.

> you also have to understand that all the small napi issues that I have
> fixed this week are no stranger from me working on this new idea. The
> RCU issues that I have reported back have been spotted when I was doing
> my final code review before testing my patch before submitting it.
> 
> keep in mind that I am by far a git magician. I am a very casual
> user... Anything that is outside the usual beaten trails such as
> reordoring commits or breaking them down feels perilious to me...
> 
> I had 230+ lines changes committed when you confirmed that few lines
> should be changed to address this new RCU issue. I did figure that it
> would not that big a deal to include them with the rest of my change.

The main reason to have fixes in separate commits from new
features is because we're backporting fixes to older kernels.
It's usually just a cherry-pick, but being a part of a larger
commit complicates things a lot. There are also other usual
reasons like patch readability, keeping git history, not
mixing unrelated things together and so on.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-08-14  0:43 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-12  0:34 [PATCH] io_uring/napi: remove duplicate io_napi_entry timeout assignation Olivier Langlois
2024-08-12  1:00 ` Olivier Langlois
2024-08-12 18:10   ` Jens Axboe
2024-08-12 18:11     ` Jens Axboe
2024-08-12 20:15       ` Olivier Langlois
2024-08-12 20:40         ` Jens Axboe
2024-08-12 21:39           ` Olivier Langlois
2024-08-12 21:45           ` Olivier Langlois
2024-08-12 21:50             ` Jens Axboe
2024-08-13 17:22           ` Olivier Langlois
2024-08-13 18:35             ` Jens Axboe
2024-08-14  0:09               ` Olivier Langlois
2024-08-14  0:31                 ` Jens Axboe
2024-08-14  0:44                 ` Pavel Begunkov
2024-08-12 18:11 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox