public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH 0/2] add proper memory barrier for IOPOLL mode
@ 2020-06-15  9:24 Xiaoguang Wang
  2020-06-15  9:24 ` [PATCH 1/2] io_uring: don't fail links for EAGAIN error in " Xiaoguang Wang
  2020-06-15  9:24 ` [PATCH 2/2] io_uring: add memory barrier to synchronize io_kiocb's result and iopoll_completed Xiaoguang Wang
  0 siblings, 2 replies; 11+ messages in thread
From: Xiaoguang Wang @ 2020-06-15  9:24 UTC (permalink / raw)
  To: io-uring; +Cc: axboe, joseph.qi, Xiaoguang Wang

The first patch makes io_uring do not fail links for io request that
returns EAGAIN error, second patch adds proper memory barrier to
synchronize io_kiocb's result and iopoll_completed.

Xiaoguang Wang (2):
  io_uring: don't fail links for EAGAIN error in IOPOLL mode
  io_uring: add memory barrier to synchronize io_kiocb's result and
    iopoll_completed

 fs/io_uring.c | 55 +++++++++++++++++++++++++++------------------------
 1 file changed, 29 insertions(+), 26 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] io_uring: don't fail links for EAGAIN error in IOPOLL mode
  2020-06-15  9:24 [PATCH 0/2] add proper memory barrier for IOPOLL mode Xiaoguang Wang
@ 2020-06-15  9:24 ` Xiaoguang Wang
  2020-06-15  9:24 ` [PATCH 2/2] io_uring: add memory barrier to synchronize io_kiocb's result and iopoll_completed Xiaoguang Wang
  1 sibling, 0 replies; 11+ messages in thread
From: Xiaoguang Wang @ 2020-06-15  9:24 UTC (permalink / raw)
  To: io-uring; +Cc: axboe, joseph.qi, Xiaoguang Wang

In IOPOLL mode, for EAGAIN error, we'll try to submit io request
again using io-wq, so don't fail rest of links if this io request
has links.

Signed-off-by: Xiaoguang Wang <[email protected]>
---
 fs/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 61fca5afaac8..a3119fecfe40 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1974,7 +1974,7 @@ static void io_complete_rw_iopoll(struct kiocb *kiocb, long res, long res2)
 	if (kiocb->ki_flags & IOCB_WRITE)
 		kiocb_end_write(req);
 
-	if (res != req->result)
+	if (res != -EAGAIN && res != req->result)
 		req_set_fail_links(req);
 	req->result = res;
 	if (res != -EAGAIN)
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] io_uring: add memory barrier to synchronize io_kiocb's result and iopoll_completed
  2020-06-15  9:24 [PATCH 0/2] add proper memory barrier for IOPOLL mode Xiaoguang Wang
  2020-06-15  9:24 ` [PATCH 1/2] io_uring: don't fail links for EAGAIN error in " Xiaoguang Wang
@ 2020-06-15  9:24 ` Xiaoguang Wang
  2020-06-15 14:36   ` Jens Axboe
  1 sibling, 1 reply; 11+ messages in thread
From: Xiaoguang Wang @ 2020-06-15  9:24 UTC (permalink / raw)
  To: io-uring; +Cc: axboe, joseph.qi, Xiaoguang Wang

In io_complete_rw_iopoll(), stores to io_kiocb's result and iopoll
completed are two independent store operations, to ensure that once
iopoll_completed is ture and then req->result must been perceived by
the cpu executing io_do_iopoll(), proper memory barrier should be used.

And in io_do_iopoll(), we check whether req->result is EAGAIN, if it is,
we'll need to issue this io request using io-wq again. In order to just
issue a single smp_rmb() on the completion side, move the re-submit work
to io_iopoll_complete().

Signed-off-by: Xiaoguang Wang <[email protected]>
---
 fs/io_uring.c | 53 +++++++++++++++++++++++++++------------------------
 1 file changed, 28 insertions(+), 25 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index a3119fecfe40..5ba4d770081f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1728,6 +1728,18 @@ static int io_put_kbuf(struct io_kiocb *req)
 	return cflags;
 }
 
+static void io_iopoll_queue(struct list_head *again)
+{
+	struct io_kiocb *req;
+
+	do {
+		req = list_first_entry(again, struct io_kiocb, list);
+		list_del(&req->list);
+		refcount_inc(&req->refs);
+		io_queue_async_work(req);
+	} while (!list_empty(again));
+}
+
 /*
  * Find and free completed poll iocbs
  */
@@ -1736,11 +1748,20 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
 {
 	struct req_batch rb;
 	struct io_kiocb *req;
+	LIST_HEAD(again);
+
+	/* order with ->result store in io_complete_rw_iopoll() */
+	smp_rmb();
 
 	rb.to_free = rb.need_iter = 0;
 	while (!list_empty(done)) {
 		int cflags = 0;
 
+		if (READ_ONCE(req->result) == -EAGAIN) {
+			req->iopoll_completed = 0;
+			list_move_tail(&req->list, &again);
+			continue;
+		}
 		req = list_first_entry(done, struct io_kiocb, list);
 		list_del(&req->list);
 
@@ -1759,18 +1780,9 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
 	if (ctx->flags & IORING_SETUP_SQPOLL)
 		io_cqring_ev_posted(ctx);
 	io_free_req_many(ctx, &rb);
-}
 
-static void io_iopoll_queue(struct list_head *again)
-{
-	struct io_kiocb *req;
-
-	do {
-		req = list_first_entry(again, struct io_kiocb, list);
-		list_del(&req->list);
-		refcount_inc(&req->refs);
-		io_queue_async_work(req);
-	} while (!list_empty(again));
+	if (!list_empty(&again))
+		io_iopoll_queue(&again);
 }
 
 static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
@@ -1778,7 +1790,6 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
 {
 	struct io_kiocb *req, *tmp;
 	LIST_HEAD(done);
-	LIST_HEAD(again);
 	bool spin;
 	int ret;
 
@@ -1804,13 +1815,6 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
 		if (!list_empty(&done))
 			break;
 
-		if (req->result == -EAGAIN) {
-			list_move_tail(&req->list, &again);
-			continue;
-		}
-		if (!list_empty(&again))
-			break;
-
 		ret = kiocb->ki_filp->f_op->iopoll(kiocb, spin);
 		if (ret < 0)
 			break;
@@ -1823,9 +1827,6 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
 	if (!list_empty(&done))
 		io_iopoll_complete(ctx, nr_events, &done);
 
-	if (!list_empty(&again))
-		io_iopoll_queue(&again);
-
 	return ret;
 }
 
@@ -1976,9 +1977,11 @@ static void io_complete_rw_iopoll(struct kiocb *kiocb, long res, long res2)
 
 	if (res != -EAGAIN && res != req->result)
 		req_set_fail_links(req);
-	req->result = res;
-	if (res != -EAGAIN)
-		WRITE_ONCE(req->iopoll_completed, 1);
+
+	WRITE_ONCE(req->result, res);
+	/* order with io_poll_complete() checking ->result */
+	smp_wmb();
+	WRITE_ONCE(req->iopoll_completed, 1);
 }
 
 /*
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] io_uring: add memory barrier to synchronize io_kiocb's result and iopoll_completed
  2020-06-15  9:24 ` [PATCH 2/2] io_uring: add memory barrier to synchronize io_kiocb's result and iopoll_completed Xiaoguang Wang
@ 2020-06-15 14:36   ` Jens Axboe
  2020-06-15 14:48     ` Xiaoguang Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2020-06-15 14:36 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: joseph.qi

On 6/15/20 3:24 AM, Xiaoguang Wang wrote:
> In io_complete_rw_iopoll(), stores to io_kiocb's result and iopoll
> completed are two independent store operations, to ensure that once
> iopoll_completed is ture and then req->result must been perceived by
> the cpu executing io_do_iopoll(), proper memory barrier should be used.
> 
> And in io_do_iopoll(), we check whether req->result is EAGAIN, if it is,
> we'll need to issue this io request using io-wq again. In order to just
> issue a single smp_rmb() on the completion side, move the re-submit work
> to io_iopoll_complete().

Did you actually test this one?

> @@ -1736,11 +1748,20 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
>  {
>  	struct req_batch rb;
>  	struct io_kiocb *req;
> +	LIST_HEAD(again);
> +
> +	/* order with ->result store in io_complete_rw_iopoll() */
> +	smp_rmb();
>  
>  	rb.to_free = rb.need_iter = 0;
>  	while (!list_empty(done)) {
>  		int cflags = 0;
>  
> +		if (READ_ONCE(req->result) == -EAGAIN) {
> +			req->iopoll_completed = 0;
> +			list_move_tail(&req->list, &again);
> +			continue;
> +		}
>  		req = list_first_entry(done, struct io_kiocb, list);
>  		list_del(&req->list);
>  

You're using 'req' here before you initialize it...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] io_uring: add memory barrier to synchronize io_kiocb's result and iopoll_completed
  2020-06-15 14:36   ` Jens Axboe
@ 2020-06-15 14:48     ` Xiaoguang Wang
  2020-06-15 15:02       ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Xiaoguang Wang @ 2020-06-15 14:48 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: joseph.qi

hi,

> On 6/15/20 3:24 AM, Xiaoguang Wang wrote:
>> In io_complete_rw_iopoll(), stores to io_kiocb's result and iopoll
>> completed are two independent store operations, to ensure that once
>> iopoll_completed is ture and then req->result must been perceived by
>> the cpu executing io_do_iopoll(), proper memory barrier should be used.
>>
>> And in io_do_iopoll(), we check whether req->result is EAGAIN, if it is,
>> we'll need to issue this io request using io-wq again. In order to just
>> issue a single smp_rmb() on the completion side, move the re-submit work
>> to io_iopoll_complete().
> 
> Did you actually test this one?
I only run test cases in liburing/test in a vm.

> 
>> @@ -1736,11 +1748,20 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
>>   {
>>   	struct req_batch rb;
>>   	struct io_kiocb *req;
>> +	LIST_HEAD(again);
>> +
>> +	/* order with ->result store in io_complete_rw_iopoll() */
>> +	smp_rmb();
>>   
>>   	rb.to_free = rb.need_iter = 0;
>>   	while (!list_empty(done)) {
>>   		int cflags = 0;
>>   
>> +		if (READ_ONCE(req->result) == -EAGAIN) {
>> +			req->iopoll_completed = 0;
>> +			list_move_tail(&req->list, &again);
>> +			continue;
>> +		}
>>   		req = list_first_entry(done, struct io_kiocb, list);
>>   		list_del(&req->list);
>>   
> 
> You're using 'req' here before you initialize it...
Sorry, next time when I submit patches, I'll construct test cases which
will cover my codes changes.

Regards,
Xiaoguang Wang
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] io_uring: add memory barrier to synchronize io_kiocb's result and iopoll_completed
  2020-06-15 14:48     ` Xiaoguang Wang
@ 2020-06-15 15:02       ` Jens Axboe
  2020-06-15 15:09         ` Pavel Begunkov
  2020-06-15 15:32         ` Xiaoguang Wang
  0 siblings, 2 replies; 11+ messages in thread
From: Jens Axboe @ 2020-06-15 15:02 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: joseph.qi

On 6/15/20 8:48 AM, Xiaoguang Wang wrote:
> hi,
> 
>> On 6/15/20 3:24 AM, Xiaoguang Wang wrote:
>>> In io_complete_rw_iopoll(), stores to io_kiocb's result and iopoll
>>> completed are two independent store operations, to ensure that once
>>> iopoll_completed is ture and then req->result must been perceived by
>>> the cpu executing io_do_iopoll(), proper memory barrier should be used.
>>>
>>> And in io_do_iopoll(), we check whether req->result is EAGAIN, if it is,
>>> we'll need to issue this io request using io-wq again. In order to just
>>> issue a single smp_rmb() on the completion side, move the re-submit work
>>> to io_iopoll_complete().
>>
>> Did you actually test this one?
> I only run test cases in liburing/test in a vm.
> 
>>
>>> @@ -1736,11 +1748,20 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
>>>   {
>>>   	struct req_batch rb;
>>>   	struct io_kiocb *req;
>>> +	LIST_HEAD(again);
>>> +
>>> +	/* order with ->result store in io_complete_rw_iopoll() */
>>> +	smp_rmb();
>>>   
>>>   	rb.to_free = rb.need_iter = 0;
>>>   	while (!list_empty(done)) {
>>>   		int cflags = 0;
>>>   
>>> +		if (READ_ONCE(req->result) == -EAGAIN) {
>>> +			req->iopoll_completed = 0;
>>> +			list_move_tail(&req->list, &again);
>>> +			continue;
>>> +		}
>>>   		req = list_first_entry(done, struct io_kiocb, list);
>>>   		list_del(&req->list);
>>>   
>>
>> You're using 'req' here before you initialize it...
> Sorry, next time when I submit patches, I'll construct test cases which
> will cover my codes changes.

I'm surprised the compiler didn't complain, or that the regular testing
didn't barf on it.

Don't think you need a new test case for this, the iopoll test case
should cover it, if you limit the queue depth on the device by
setting /sys/block/<dev>/queue/nr_requests to 2 or something like
that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] io_uring: add memory barrier to synchronize io_kiocb's result and iopoll_completed
  2020-06-15 15:02       ` Jens Axboe
@ 2020-06-15 15:09         ` Pavel Begunkov
  2020-06-15 15:32         ` Xiaoguang Wang
  1 sibling, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2020-06-15 15:09 UTC (permalink / raw)
  To: Jens Axboe, Xiaoguang Wang, io-uring; +Cc: joseph.qi

On 15/06/2020 18:02, Jens Axboe wrote:
> On 6/15/20 8:48 AM, Xiaoguang Wang wrote:
>> hi,
>>
>>> On 6/15/20 3:24 AM, Xiaoguang Wang wrote:
>>>> In io_complete_rw_iopoll(), stores to io_kiocb's result and iopoll
>>>> completed are two independent store operations, to ensure that once
>>>> iopoll_completed is ture and then req->result must been perceived by
>>>> the cpu executing io_do_iopoll(), proper memory barrier should be used.
>>>>
>>>> And in io_do_iopoll(), we check whether req->result is EAGAIN, if it is,
>>>> we'll need to issue this io request using io-wq again. In order to just
>>>> issue a single smp_rmb() on the completion side, move the re-submit work
>>>> to io_iopoll_complete().
>>>
>>> Did you actually test this one?
>> I only run test cases in liburing/test in a vm.
>>
>>>
>>>> @@ -1736,11 +1748,20 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
>>>>   {
>>>>   	struct req_batch rb;
>>>>   	struct io_kiocb *req;
>>>> +	LIST_HEAD(again);
>>>> +
>>>> +	/* order with ->result store in io_complete_rw_iopoll() */
>>>> +	smp_rmb();
>>>>   
>>>>   	rb.to_free = rb.need_iter = 0;
>>>>   	while (!list_empty(done)) {
>>>>   		int cflags = 0;
>>>>   
>>>> +		if (READ_ONCE(req->result) == -EAGAIN) {
>>>> +			req->iopoll_completed = 0;
>>>> +			list_move_tail(&req->list, &again);
>>>> +			continue;
>>>> +		}
>>>>   		req = list_first_entry(done, struct io_kiocb, list);
>>>>   		list_del(&req->list);
>>>>   
>>>
>>> You're using 'req' here before you initialize it...
>> Sorry, next time when I submit patches, I'll construct test cases which
>> will cover my codes changes.
> 
> I'm surprised the compiler didn't complain, or that the regular testing
> didn't barf on it.
> 
> Don't think you need a new test case for this, the iopoll test case
> should cover it, if you limit the queue depth on the device by
> setting /sys/block/<dev>/queue/nr_requests to 2 or something like
> that.

Hmm, nice hint. I hooked a dirty ->iopoll in null_blk with fault
injection for that


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] io_uring: add memory barrier to synchronize io_kiocb's result and iopoll_completed
  2020-06-15 15:02       ` Jens Axboe
  2020-06-15 15:09         ` Pavel Begunkov
@ 2020-06-15 15:32         ` Xiaoguang Wang
  2020-06-15 15:35           ` Jens Axboe
  1 sibling, 1 reply; 11+ messages in thread
From: Xiaoguang Wang @ 2020-06-15 15:32 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: joseph.qi

hi,

> On 6/15/20 8:48 AM, Xiaoguang Wang wrote:
>> hi,
>>
>>> On 6/15/20 3:24 AM, Xiaoguang Wang wrote:
>>>> In io_complete_rw_iopoll(), stores to io_kiocb's result and iopoll
>>>> completed are two independent store operations, to ensure that once
>>>> iopoll_completed is ture and then req->result must been perceived by
>>>> the cpu executing io_do_iopoll(), proper memory barrier should be used.
>>>>
>>>> And in io_do_iopoll(), we check whether req->result is EAGAIN, if it is,
>>>> we'll need to issue this io request using io-wq again. In order to just
>>>> issue a single smp_rmb() on the completion side, move the re-submit work
>>>> to io_iopoll_complete().
>>>
>>> Did you actually test this one?
>> I only run test cases in liburing/test in a vm.
>>
>>>
>>>> @@ -1736,11 +1748,20 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
>>>>    {
>>>>    	struct req_batch rb;
>>>>    	struct io_kiocb *req;
>>>> +	LIST_HEAD(again);
>>>> +
>>>> +	/* order with ->result store in io_complete_rw_iopoll() */
>>>> +	smp_rmb();
>>>>    
>>>>    	rb.to_free = rb.need_iter = 0;
>>>>    	while (!list_empty(done)) {
>>>>    		int cflags = 0;
>>>>    
>>>> +		if (READ_ONCE(req->result) == -EAGAIN) {
>>>> +			req->iopoll_completed = 0;
>>>> +			list_move_tail(&req->list, &again);
>>>> +			continue;
>>>> +		}
>>>>    		req = list_first_entry(done, struct io_kiocb, list);
>>>>    		list_del(&req->list);
>>>>    
>>>
>>> You're using 'req' here before you initialize it...
>> Sorry, next time when I submit patches, I'll construct test cases which
>> will cover my codes changes.
> 
> I'm surprised the compiler didn't complain, or that the regular testing
> didn't barf on it.
I'm also surprised, will try to find the reason.
And indeed the iopoll test case failed, but below command displayed nothing:
[lege@localhost test]$ sudo ./iopoll
Then I considered this test case pass wrongly.

dmesg show errors:
[  127.806945] ==================================================================
[  127.806983] BUG: KASAN: use-after-free in io_iopoll_complete+0xbb/0x980
[  127.806989] Read of size 4 at addr ffff8886e3e98808 by task io_uring-sq/1643

[  127.806999] CPU: 16 PID: 1643 Comm: io_uring-sq Not tainted 5.7.0+ #501
[  127.807013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
[  127.807021] Call Trace:
[  127.807040]  dump_stack+0x75/0xa0
[  127.807047]  ? io_iopoll_complete+0xbb/0x980
[  127.807062]  print_address_description.constprop.6+0x1a/0x220
[  127.807086]  ? _raw_write_lock_irqsave+0xd0/0xd0
[  127.807092]  ? io_free_req_many.part.79+0x208/0x2a0
[  127.807107]  ? __rcu_read_unlock+0x37/0x200
[  127.807112]  ? io_iopoll_complete+0xbb/0x980
[  127.807117]  ? io_iopoll_complete+0xbb/0x980
[  127.807122]  kasan_report.cold.9+0x1f/0x42
[  127.807128]  ? io_iopoll_complete+0xbb/0x980
[  127.807133]  io_iopoll_complete+0xbb/0x980
[  127.807138]  ? io_timeout_fn+0x140/0x140
[  127.807150]  ? __switch_to+0x2e9/0x5a0
[  127.807157]  io_iopoll_getevents+0x287/0x310
[  127.807163]  ? io_iopoll_complete+0x980/0x980
[  127.807172]  ? finish_wait+0xcb/0xf0
[  127.807179]  io_sq_thread+0x1c1/0x600
[  127.807185]  ? __ia32_sys_io_uring_enter+0x450/0x450
[  127.807194]  ? preempt_count_add+0x77/0xd0
[  127.807200]  ? _raw_spin_lock_irqsave+0x84/0xd0
[  127.807206]  ? _raw_write_lock_irqsave+0xd0/0xd0
[  127.807210]  ? finish_wait+0xf0/0xf0
[  127.807215]  ? preempt_count_sub+0x18/0xc0
[  127.807224]  ? __kthread_parkme+0xaf/0xd0
[  127.807231]  ? __ia32_sys_io_uring_enter+0x450/0x450
[  127.807235]  kthread+0x1e4/0x210
[  127.807241]  ? kthread_create_on_node+0xa0/0xa0
[  127.807246]  ret_from_fork+0x22/0x30

> 
> Don't think you need a new test case for this, the iopoll test case
> should cover it, if you limit the queue depth on the device by
> setting /sys/block/<dev>/queue/nr_requests to 2 or something like
> that.
Thanks, I'll try later.

Regards,
Xiaoguang Wang

> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] io_uring: add memory barrier to synchronize io_kiocb's result and iopoll_completed
  2020-06-15 15:32         ` Xiaoguang Wang
@ 2020-06-15 15:35           ` Jens Axboe
  2020-06-15 16:51             ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2020-06-15 15:35 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: joseph.qi

On 6/15/20 9:32 AM, Xiaoguang Wang wrote:
> hi,
> 
>> On 6/15/20 8:48 AM, Xiaoguang Wang wrote:
>>> hi,
>>>
>>>> On 6/15/20 3:24 AM, Xiaoguang Wang wrote:
>>>>> In io_complete_rw_iopoll(), stores to io_kiocb's result and iopoll
>>>>> completed are two independent store operations, to ensure that once
>>>>> iopoll_completed is ture and then req->result must been perceived by
>>>>> the cpu executing io_do_iopoll(), proper memory barrier should be used.
>>>>>
>>>>> And in io_do_iopoll(), we check whether req->result is EAGAIN, if it is,
>>>>> we'll need to issue this io request using io-wq again. In order to just
>>>>> issue a single smp_rmb() on the completion side, move the re-submit work
>>>>> to io_iopoll_complete().
>>>>
>>>> Did you actually test this one?
>>> I only run test cases in liburing/test in a vm.
>>>
>>>>
>>>>> @@ -1736,11 +1748,20 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
>>>>>    {
>>>>>    	struct req_batch rb;
>>>>>    	struct io_kiocb *req;
>>>>> +	LIST_HEAD(again);
>>>>> +
>>>>> +	/* order with ->result store in io_complete_rw_iopoll() */
>>>>> +	smp_rmb();
>>>>>    
>>>>>    	rb.to_free = rb.need_iter = 0;
>>>>>    	while (!list_empty(done)) {
>>>>>    		int cflags = 0;
>>>>>    
>>>>> +		if (READ_ONCE(req->result) == -EAGAIN) {
>>>>> +			req->iopoll_completed = 0;
>>>>> +			list_move_tail(&req->list, &again);
>>>>> +			continue;
>>>>> +		}
>>>>>    		req = list_first_entry(done, struct io_kiocb, list);
>>>>>    		list_del(&req->list);
>>>>>    
>>>>
>>>> You're using 'req' here before you initialize it...
>>> Sorry, next time when I submit patches, I'll construct test cases which
>>> will cover my codes changes.
>>
>> I'm surprised the compiler didn't complain, or that the regular testing
>> didn't barf on it.
> I'm also surprised, will try to find the reason.
> And indeed the iopoll test case failed, but below command displayed nothing:
> [lege@localhost test]$ sudo ./iopoll
> Then I considered this test case pass wrongly.
> 
> dmesg show errors:
> [  127.806945] ==================================================================
> [  127.806983] BUG: KASAN: use-after-free in io_iopoll_complete+0xbb/0x980
> [  127.806989] Read of size 4 at addr ffff8886e3e98808 by task io_uring-sq/1643
> 
> [  127.806999] CPU: 16 PID: 1643 Comm: io_uring-sq Not tainted 5.7.0+ #501
> [  127.807013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
> [  127.807021] Call Trace:
> [  127.807040]  dump_stack+0x75/0xa0
> [  127.807047]  ? io_iopoll_complete+0xbb/0x980
> [  127.807062]  print_address_description.constprop.6+0x1a/0x220
> [  127.807086]  ? _raw_write_lock_irqsave+0xd0/0xd0
> [  127.807092]  ? io_free_req_many.part.79+0x208/0x2a0
> [  127.807107]  ? __rcu_read_unlock+0x37/0x200
> [  127.807112]  ? io_iopoll_complete+0xbb/0x980
> [  127.807117]  ? io_iopoll_complete+0xbb/0x980
> [  127.807122]  kasan_report.cold.9+0x1f/0x42
> [  127.807128]  ? io_iopoll_complete+0xbb/0x980
> [  127.807133]  io_iopoll_complete+0xbb/0x980
> [  127.807138]  ? io_timeout_fn+0x140/0x140
> [  127.807150]  ? __switch_to+0x2e9/0x5a0
> [  127.807157]  io_iopoll_getevents+0x287/0x310
> [  127.807163]  ? io_iopoll_complete+0x980/0x980
> [  127.807172]  ? finish_wait+0xcb/0xf0
> [  127.807179]  io_sq_thread+0x1c1/0x600
> [  127.807185]  ? __ia32_sys_io_uring_enter+0x450/0x450
> [  127.807194]  ? preempt_count_add+0x77/0xd0
> [  127.807200]  ? _raw_spin_lock_irqsave+0x84/0xd0
> [  127.807206]  ? _raw_write_lock_irqsave+0xd0/0xd0
> [  127.807210]  ? finish_wait+0xf0/0xf0
> [  127.807215]  ? preempt_count_sub+0x18/0xc0
> [  127.807224]  ? __kthread_parkme+0xaf/0xd0
> [  127.807231]  ? __ia32_sys_io_uring_enter+0x450/0x450
> [  127.807235]  kthread+0x1e4/0x210
> [  127.807241]  ? kthread_create_on_node+0xa0/0xa0
> [  127.807246]  ret_from_fork+0x22/0x30

There you go, so it did fail, just didn't register as a failure. I should
probably add a dmesg check for the liburing tests, and fail a test if
we trigger a WARNING or BUG condition. I'll look into that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] io_uring: add memory barrier to synchronize io_kiocb's result and iopoll_completed
  2020-06-15 15:35           ` Jens Axboe
@ 2020-06-15 16:51             ` Jens Axboe
  2020-06-15 17:53               ` Xiaoguang Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2020-06-15 16:51 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: joseph.qi

On 6/15/20 9:35 AM, Jens Axboe wrote:
> On 6/15/20 9:32 AM, Xiaoguang Wang wrote:
>> hi,
>>
>>> On 6/15/20 8:48 AM, Xiaoguang Wang wrote:
>>>> hi,
>>>>
>>>>> On 6/15/20 3:24 AM, Xiaoguang Wang wrote:
>>>>>> In io_complete_rw_iopoll(), stores to io_kiocb's result and iopoll
>>>>>> completed are two independent store operations, to ensure that once
>>>>>> iopoll_completed is ture and then req->result must been perceived by
>>>>>> the cpu executing io_do_iopoll(), proper memory barrier should be used.
>>>>>>
>>>>>> And in io_do_iopoll(), we check whether req->result is EAGAIN, if it is,
>>>>>> we'll need to issue this io request using io-wq again. In order to just
>>>>>> issue a single smp_rmb() on the completion side, move the re-submit work
>>>>>> to io_iopoll_complete().
>>>>>
>>>>> Did you actually test this one?
>>>> I only run test cases in liburing/test in a vm.
>>>>
>>>>>
>>>>>> @@ -1736,11 +1748,20 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
>>>>>>    {
>>>>>>    	struct req_batch rb;
>>>>>>    	struct io_kiocb *req;
>>>>>> +	LIST_HEAD(again);
>>>>>> +
>>>>>> +	/* order with ->result store in io_complete_rw_iopoll() */
>>>>>> +	smp_rmb();
>>>>>>    
>>>>>>    	rb.to_free = rb.need_iter = 0;
>>>>>>    	while (!list_empty(done)) {
>>>>>>    		int cflags = 0;
>>>>>>    
>>>>>> +		if (READ_ONCE(req->result) == -EAGAIN) {
>>>>>> +			req->iopoll_completed = 0;
>>>>>> +			list_move_tail(&req->list, &again);
>>>>>> +			continue;
>>>>>> +		}
>>>>>>    		req = list_first_entry(done, struct io_kiocb, list);
>>>>>>    		list_del(&req->list);
>>>>>>    
>>>>>
>>>>> You're using 'req' here before you initialize it...
>>>> Sorry, next time when I submit patches, I'll construct test cases which
>>>> will cover my codes changes.
>>>
>>> I'm surprised the compiler didn't complain, or that the regular testing
>>> didn't barf on it.
>> I'm also surprised, will try to find the reason.
>> And indeed the iopoll test case failed, but below command displayed nothing:
>> [lege@localhost test]$ sudo ./iopoll
>> Then I considered this test case pass wrongly.
>>
>> dmesg show errors:
>> [  127.806945] ==================================================================
>> [  127.806983] BUG: KASAN: use-after-free in io_iopoll_complete+0xbb/0x980
>> [  127.806989] Read of size 4 at addr ffff8886e3e98808 by task io_uring-sq/1643
>>
>> [  127.806999] CPU: 16 PID: 1643 Comm: io_uring-sq Not tainted 5.7.0+ #501
>> [  127.807013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
>> [  127.807021] Call Trace:
>> [  127.807040]  dump_stack+0x75/0xa0
>> [  127.807047]  ? io_iopoll_complete+0xbb/0x980
>> [  127.807062]  print_address_description.constprop.6+0x1a/0x220
>> [  127.807086]  ? _raw_write_lock_irqsave+0xd0/0xd0
>> [  127.807092]  ? io_free_req_many.part.79+0x208/0x2a0
>> [  127.807107]  ? __rcu_read_unlock+0x37/0x200
>> [  127.807112]  ? io_iopoll_complete+0xbb/0x980
>> [  127.807117]  ? io_iopoll_complete+0xbb/0x980
>> [  127.807122]  kasan_report.cold.9+0x1f/0x42
>> [  127.807128]  ? io_iopoll_complete+0xbb/0x980
>> [  127.807133]  io_iopoll_complete+0xbb/0x980
>> [  127.807138]  ? io_timeout_fn+0x140/0x140
>> [  127.807150]  ? __switch_to+0x2e9/0x5a0
>> [  127.807157]  io_iopoll_getevents+0x287/0x310
>> [  127.807163]  ? io_iopoll_complete+0x980/0x980
>> [  127.807172]  ? finish_wait+0xcb/0xf0
>> [  127.807179]  io_sq_thread+0x1c1/0x600
>> [  127.807185]  ? __ia32_sys_io_uring_enter+0x450/0x450
>> [  127.807194]  ? preempt_count_add+0x77/0xd0
>> [  127.807200]  ? _raw_spin_lock_irqsave+0x84/0xd0
>> [  127.807206]  ? _raw_write_lock_irqsave+0xd0/0xd0
>> [  127.807210]  ? finish_wait+0xf0/0xf0
>> [  127.807215]  ? preempt_count_sub+0x18/0xc0
>> [  127.807224]  ? __kthread_parkme+0xaf/0xd0
>> [  127.807231]  ? __ia32_sys_io_uring_enter+0x450/0x450
>> [  127.807235]  kthread+0x1e4/0x210
>> [  127.807241]  ? kthread_create_on_node+0xa0/0xa0
>> [  127.807246]  ret_from_fork+0x22/0x30
> 
> There you go, so it did fail, just didn't register as a failure. I should
> probably add a dmesg check for the liburing tests, and fail a test if
> we trigger a WARNING or BUG condition. I'll look into that.

I pushed a commit to liburing so that it should now catch dmesg
errors logged while running a test.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] io_uring: add memory barrier to synchronize io_kiocb's result and iopoll_completed
  2020-06-15 16:51             ` Jens Axboe
@ 2020-06-15 17:53               ` Xiaoguang Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Xiaoguang Wang @ 2020-06-15 17:53 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: joseph.qi

hi,

> On 6/15/20 9:35 AM, Jens Axboe wrote:
>> On 6/15/20 9:32 AM, Xiaoguang Wang wrote:
>>> hi,
>>>
>>>> On 6/15/20 8:48 AM, Xiaoguang Wang wrote:
>>>>> hi,
>>>>>
>>>>>> On 6/15/20 3:24 AM, Xiaoguang Wang wrote:
>>>>>>> In io_complete_rw_iopoll(), stores to io_kiocb's result and iopoll
>>>>>>> completed are two independent store operations, to ensure that once
>>>>>>> iopoll_completed is ture and then req->result must been perceived by
>>>>>>> the cpu executing io_do_iopoll(), proper memory barrier should be used.
>>>>>>>
>>>>>>> And in io_do_iopoll(), we check whether req->result is EAGAIN, if it is,
>>>>>>> we'll need to issue this io request using io-wq again. In order to just
>>>>>>> issue a single smp_rmb() on the completion side, move the re-submit work
>>>>>>> to io_iopoll_complete().
>>>>>>
>>>>>> Did you actually test this one?
>>>>> I only run test cases in liburing/test in a vm.
>>>>>
>>>>>>
>>>>>>> @@ -1736,11 +1748,20 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
>>>>>>>     {
>>>>>>>     	struct req_batch rb;
>>>>>>>     	struct io_kiocb *req;
>>>>>>> +	LIST_HEAD(again);
>>>>>>> +
>>>>>>> +	/* order with ->result store in io_complete_rw_iopoll() */
>>>>>>> +	smp_rmb();
>>>>>>>     
>>>>>>>     	rb.to_free = rb.need_iter = 0;
>>>>>>>     	while (!list_empty(done)) {
>>>>>>>     		int cflags = 0;
>>>>>>>     
>>>>>>> +		if (READ_ONCE(req->result) == -EAGAIN) {
>>>>>>> +			req->iopoll_completed = 0;
>>>>>>> +			list_move_tail(&req->list, &again);
>>>>>>> +			continue;
>>>>>>> +		}
>>>>>>>     		req = list_first_entry(done, struct io_kiocb, list);
>>>>>>>     		list_del(&req->list);
>>>>>>>     
>>>>>>
>>>>>> You're using 'req' here before you initialize it...
>>>>> Sorry, next time when I submit patches, I'll construct test cases which
>>>>> will cover my codes changes.
>>>>
>>>> I'm surprised the compiler didn't complain, or that the regular testing
>>>> didn't barf on it.
>>> I'm also surprised, will try to find the reason.
>>> And indeed the iopoll test case failed, but below command displayed nothing:
>>> [lege@localhost test]$ sudo ./iopoll
>>> Then I considered this test case pass wrongly.
>>>
>>> dmesg show errors:
>>> [  127.806945] ==================================================================
>>> [  127.806983] BUG: KASAN: use-after-free in io_iopoll_complete+0xbb/0x980
>>> [  127.806989] Read of size 4 at addr ffff8886e3e98808 by task io_uring-sq/1643
>>>
>>> [  127.806999] CPU: 16 PID: 1643 Comm: io_uring-sq Not tainted 5.7.0+ #501
>>> [  127.807013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
>>> [  127.807021] Call Trace:
>>> [  127.807040]  dump_stack+0x75/0xa0
>>> [  127.807047]  ? io_iopoll_complete+0xbb/0x980
>>> [  127.807062]  print_address_description.constprop.6+0x1a/0x220
>>> [  127.807086]  ? _raw_write_lock_irqsave+0xd0/0xd0
>>> [  127.807092]  ? io_free_req_many.part.79+0x208/0x2a0
>>> [  127.807107]  ? __rcu_read_unlock+0x37/0x200
>>> [  127.807112]  ? io_iopoll_complete+0xbb/0x980
>>> [  127.807117]  ? io_iopoll_complete+0xbb/0x980
>>> [  127.807122]  kasan_report.cold.9+0x1f/0x42
>>> [  127.807128]  ? io_iopoll_complete+0xbb/0x980
>>> [  127.807133]  io_iopoll_complete+0xbb/0x980
>>> [  127.807138]  ? io_timeout_fn+0x140/0x140
>>> [  127.807150]  ? __switch_to+0x2e9/0x5a0
>>> [  127.807157]  io_iopoll_getevents+0x287/0x310
>>> [  127.807163]  ? io_iopoll_complete+0x980/0x980
>>> [  127.807172]  ? finish_wait+0xcb/0xf0
>>> [  127.807179]  io_sq_thread+0x1c1/0x600
>>> [  127.807185]  ? __ia32_sys_io_uring_enter+0x450/0x450
>>> [  127.807194]  ? preempt_count_add+0x77/0xd0
>>> [  127.807200]  ? _raw_spin_lock_irqsave+0x84/0xd0
>>> [  127.807206]  ? _raw_write_lock_irqsave+0xd0/0xd0
>>> [  127.807210]  ? finish_wait+0xf0/0xf0
>>> [  127.807215]  ? preempt_count_sub+0x18/0xc0
>>> [  127.807224]  ? __kthread_parkme+0xaf/0xd0
>>> [  127.807231]  ? __ia32_sys_io_uring_enter+0x450/0x450
>>> [  127.807235]  kthread+0x1e4/0x210
>>> [  127.807241]  ? kthread_create_on_node+0xa0/0xa0
>>> [  127.807246]  ret_from_fork+0x22/0x30
>>
>> There you go, so it did fail, just didn't register as a failure. I should
>> probably add a dmesg check for the liburing tests, and fail a test if
>> we trigger a WARNING or BUG condition. I'll look into that.
> 
> I pushed a commit to liburing so that it should now catch dmesg
> errors logged while running a test.
Cool, thanks.
Will run new tests, and if there are not failures, I'll send a V2 soon.

Regards,
Xiaoguang Wang


> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-06-15 17:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-15  9:24 [PATCH 0/2] add proper memory barrier for IOPOLL mode Xiaoguang Wang
2020-06-15  9:24 ` [PATCH 1/2] io_uring: don't fail links for EAGAIN error in " Xiaoguang Wang
2020-06-15  9:24 ` [PATCH 2/2] io_uring: add memory barrier to synchronize io_kiocb's result and iopoll_completed Xiaoguang Wang
2020-06-15 14:36   ` Jens Axboe
2020-06-15 14:48     ` Xiaoguang Wang
2020-06-15 15:02       ` Jens Axboe
2020-06-15 15:09         ` Pavel Begunkov
2020-06-15 15:32         ` Xiaoguang Wang
2020-06-15 15:35           ` Jens Axboe
2020-06-15 16:51             ` Jens Axboe
2020-06-15 17:53               ` Xiaoguang Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox