[PATCH] io_uring: fix race with shadow drain deferrals

public inbox for [email protected]
 help / color / mirror / Atom feed

* [PATCH] io_uring: fix race with shadow drain deferrals
@ 2019-11-20 23:07 Jens Axboe
  2019-11-20 23:58 ` Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2019-11-20 23:07 UTC (permalink / raw)
  To: io-uring; +Cc: Jackie Liu

When we go and queue requests with drain, we check if we need to defer
based on sequence. This is done safely under the lock, but then we drop
the lock before actually inserting the shadow. If the original request
is found on the deferred list by another completion in the mean time,
it could have been started AND completed by the time we insert the
shadow, which will stall the queue.

After re-grabbing the completion lock, check if the original request is
still in the deferred list. If it isn't, then we know that someone else
already found and issued it. If that happened, then our job is done, we
can simply free the shadow.

Cc: Jackie Liu <[email protected]>
Fixes: 4fe2c963154c ("io_uring: add support for link with drain")
Signed-off-by: Jens Axboe <[email protected]>

---

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6175e2e195c0..6fb25ae53817 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2973,6 +2973,7 @@ static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
 	int ret;
 	int need_submit = false;
 	struct io_ring_ctx *ctx = req->ctx;
+	struct io_kiocb *tmp;
 
 	if (unlikely(req->flags & REQ_F_FAIL_LINK)) {
 		ret = -ECANCELED;
@@ -3011,8 +3012,30 @@ static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
 
 	/* Insert shadow req to defer_list, blocking next IOs */
 	spin_lock_irq(&ctx->completion_lock);
-	trace_io_uring_defer(ctx, shadow, true);
-	list_add_tail(&shadow->list, &ctx->defer_list);
+	if (ret) {
+		/*
+		 * We dropped the lock since deciding we needed to defer this
+		 * request. We must re-check under the lock, if it's now gone
+		 * from the list, that means that another completion came in
+		 * and submitted it since we decided we needed to defer. If
+		 * that's the case, simply drop the shadow, there's nothing
+		 * more we need to do here.
+		 */
+		list_for_each_entry(tmp, &ctx->defer_list, list) {
+			if (tmp == req)
+				break;
+		}
+		if (tmp != req) {
+			__io_free_req(shadow);
+			shadow = NULL;
+		}
+	}
+	if (shadow) {
+		trace_io_uring_defer(ctx, shadow, true);
+		list_add_tail(&shadow->list, &ctx->defer_list);
+	} else {
+		need_submit = false;
+	}
 	spin_unlock_irq(&ctx->completion_lock);
 
 	if (need_submit)

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] io_uring: fix race with shadow drain deferrals
  2019-11-20 23:07 [PATCH] io_uring: fix race with shadow drain deferrals Jens Axboe
@ 2019-11-20 23:58 ` Jens Axboe
  2019-11-21  1:32   ` Jackie Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2019-11-20 23:58 UTC (permalink / raw)
  To: io-uring; +Cc: Jackie Liu

On 11/20/19 4:07 PM, Jens Axboe wrote:
> When we go and queue requests with drain, we check if we need to defer
> based on sequence. This is done safely under the lock, but then we drop
> the lock before actually inserting the shadow. If the original request
> is found on the deferred list by another completion in the mean time,
> it could have been started AND completed by the time we insert the
> shadow, which will stall the queue.
> 
> After re-grabbing the completion lock, check if the original request is
> still in the deferred list. If it isn't, then we know that someone else
> already found and issued it. If that happened, then our job is done, we
> can simply free the shadow.
> 
> Cc: Jackie Liu <[email protected]>
> Fixes: 4fe2c963154c ("io_uring: add support for link with drain")
> Signed-off-by: Jens Axboe <[email protected]>

BTW, the other solution here is to not release the completion_lock if
we're going to return -EIOCBQUEUED, and let the caller do what it needs
before releasing it. That'd look something like this, with some sparse
annotations to keep things happy.

I think the original I posted here is easier to follow, and the
deferral list is going to be tiny in general so it won't really add
any extra overhead.

Let me know what you think and prefer.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6175e2e195c0..0d1f33bcedc0 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2552,6 +2552,11 @@ static int io_async_cancel(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	return 0;
 }
 
+/*
+ * Returns with ctx->completion_lock held if -EIOCBQUEUED is returned, so
+ * the caller can make decisions based on the deferral without worrying about
+ * the request being found and issued in the mean time.
+ */
 static int io_req_defer(struct io_kiocb *req)
 {
 	const struct io_uring_sqe *sqe = req->submit.sqe;
@@ -2579,7 +2584,7 @@ static int io_req_defer(struct io_kiocb *req)
 
 	trace_io_uring_defer(ctx, req, false);
 	list_add_tail(&req->list, &ctx->defer_list);
-	spin_unlock_irq(&ctx->completion_lock);
+	__release(&ctx->completion_lock);
 	return -EIOCBQUEUED;
 }
 
@@ -2954,6 +2959,7 @@ static void __io_queue_sqe(struct io_kiocb *req)
 
 static void io_queue_sqe(struct io_kiocb *req)
 {
+	struct io_ring_ctx *ctx = req->ctx;
 	int ret;
 
 	ret = io_req_defer(req);
@@ -2963,6 +2969,9 @@ static void io_queue_sqe(struct io_kiocb *req)
 			if (req->flags & REQ_F_LINK)
 				req->flags |= REQ_F_FAIL_LINK;
 			io_double_put_req(req);
+		} else {
+			__acquire(&ctx->completion_lock);
+			spin_unlock_irq(&ctx->completion_lock);
 		}
 	} else
 		__io_queue_sqe(req);
@@ -3001,16 +3010,17 @@ static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
 				__io_free_req(shadow);
 			return;
 		}
+		__acquire(&ctx->completion_lock);
 	} else {
 		/*
 		 * If ret == 0 means that all IOs in front of link io are
 		 * running done. let's queue link head.
 		 */
 		need_submit = true;
+		spin_lock_irq(&ctx->completion_lock);
 	}
 
 	/* Insert shadow req to defer_list, blocking next IOs */
-	spin_lock_irq(&ctx->completion_lock);
 	trace_io_uring_defer(ctx, shadow, true);
 	list_add_tail(&shadow->list, &ctx->defer_list);
 	spin_unlock_irq(&ctx->completion_lock);

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] io_uring: fix race with shadow drain deferrals
  2019-11-20 23:58 ` Jens Axboe
@ 2019-11-21  1:32   ` Jackie Liu
  2019-11-21  1:35     ` Jackie Liu
  2019-11-21  1:39     ` [PATCH] io_uring: fix race with shadow drain deferrals Jens Axboe
  0 siblings, 2 replies; 19+ messages in thread
From: Jackie Liu @ 2019-11-21  1:32 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring

2019年11月21日 07:58，Jens Axboe <[email protected]> 写道：

> 
> On 11/20/19 4:07 PM, Jens Axboe wrote:
>> When we go and queue requests with drain, we check if we need to defer
>> based on sequence. This is done safely under the lock, but then we drop
>> the lock before actually inserting the shadow. If the original request
>> is found on the deferred list by another completion in the mean time,
>> it could have been started AND completed by the time we insert the
>> shadow, which will stall the queue.
>> 
>> After re-grabbing the completion lock, check if the original request is
>> still in the deferred list. If it isn't, then we know that someone else
>> already found and issued it. If that happened, then our job is done, we
>> can simply free the shadow.
>> 
>> Cc: Jackie Liu <[email protected]>
>> Fixes: 4fe2c963154c ("io_uring: add support for link with drain")
>> Signed-off-by: Jens Axboe <[email protected]>
> 
> BTW, the other solution here is to not release the completion_lock if
> we're going to return -EIOCBQUEUED, and let the caller do what it needs
> before releasing it. That'd look something like this, with some sparse
> annotations to keep things happy.
> 
> I think the original I posted here is easier to follow, and the
> deferral list is going to be tiny in general so it won't really add
> any extra overhead.
> 
> Let me know what you think and prefer.
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 6175e2e195c0..0d1f33bcedc0 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -2552,6 +2552,11 @@ static int io_async_cancel(struct io_kiocb *req, const struct io_uring_sqe *sqe,
> 	return 0;
> }
> 
> +/*
> + * Returns with ctx->completion_lock held if -EIOCBQUEUED is returned, so
> + * the caller can make decisions based on the deferral without worrying about
> + * the request being found and issued in the mean time.
> + */
> static int io_req_defer(struct io_kiocb *req)
> {
> 	const struct io_uring_sqe *sqe = req->submit.sqe;
> @@ -2579,7 +2584,7 @@ static int io_req_defer(struct io_kiocb *req)
> 
> 	trace_io_uring_defer(ctx, req, false);
> 	list_add_tail(&req->list, &ctx->defer_list);
> -	spin_unlock_irq(&ctx->completion_lock);
> +	__release(&ctx->completion_lock);
> 	return -EIOCBQUEUED;
> }
> 
> @@ -2954,6 +2959,7 @@ static void __io_queue_sqe(struct io_kiocb *req)
> 
> static void io_queue_sqe(struct io_kiocb *req)
> {
> +	struct io_ring_ctx *ctx = req->ctx;
> 	int ret;
> 
> 	ret = io_req_defer(req);
> @@ -2963,6 +2969,9 @@ static void io_queue_sqe(struct io_kiocb *req)
> 			if (req->flags & REQ_F_LINK)
> 				req->flags |= REQ_F_FAIL_LINK;
> 			io_double_put_req(req);
> +		} else {
> +			__acquire(&ctx->completion_lock);
> +			spin_unlock_irq(&ctx->completion_lock);
> 		}
> 	} else
> 		__io_queue_sqe(req);
> @@ -3001,16 +3010,17 @@ static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
> 				__io_free_req(shadow);
> 			return;
> 		}
> +		__acquire(&ctx->completion_lock);
> 	} else {
> 		/*
> 		 * If ret == 0 means that all IOs in front of link io are
> 		 * running done. let's queue link head.
> 		 */
> 		need_submit = true;
> +		spin_lock_irq(&ctx->completion_lock);
> 	}
> 
> 	/* Insert shadow req to defer_list, blocking next IOs */
> -	spin_lock_irq(&ctx->completion_lock);
> 	trace_io_uring_defer(ctx, shadow, true);
> 	list_add_tail(&shadow->list, &ctx->defer_list);
> 	spin_unlock_irq(&ctx->completion_lock);

This is indeed a potential lock issue, thanks, I am prefer this solution, clearer than first one.
But It may be a bit difficult for other people who read the code, use 'io_req_defer_may_lock'?

who about this?

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 5ad652f..6fdaeb1 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2469,7 +2469,7 @@ static int io_async_cancel(struct io_kiocb *req, const struct io_uring_sqe *sqe,
        return 0;
 }

-static int io_req_defer(struct io_kiocb *req)
+static int __io_req_defer(struct io_kiocb *req)
 {
        const struct io_uring_sqe *sqe = req->submit.sqe;
        struct io_uring_sqe *sqe_copy;
@@ -2495,8 +2495,21 @@ static int io_req_defer(struct io_kiocb *req)

        trace_io_uring_defer(ctx, req, false);
        list_add_tail(&req->list, &ctx->defer_list);
+
+       return -EIOCBQUEUED;
+}
+
+static int io_req_defer(struct io_kiocb *req)
+{
+       int ret = __io_req_defer(req);
        spin_unlock_irq(&ctx->completion_lock);
-       return -EIOCBQUEUED;
+       return ret;
+}
+
+static int io_req_defer_may_lock(struct io_kiocb *req)
+{
+       return __io_req_defer(req);
+
 }

 static int __io_submit_sqe(struct io_kiocb *req, struct io_kiocb **nxt,
@@ -2927,7 +2940,7 @@ static int io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
         * list.
         */
        req->flags |= REQ_F_IO_DRAIN;
-       ret = io_req_defer(req);
+       ret = io_req_defer_may_lock(req);
        if (ret) {
                if (ret != -EIOCBQUEUED) {
                        io_cqring_add_event(req, ret);
@@ -2941,10 +2954,10 @@ static int io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
                 * running done. let's queue link head.
                 */
                need_submit = true;
+               spin_lock_irq(&ctx->completion_lock);
        }

        /* Insert shadow req to defer_list, blocking next IOs */
-       spin_lock_irq(&ctx->completion_lock);
        trace_io_uring_defer(ctx, shadow, true);
        list_add_tail(&shadow->list, &ctx->defer_list);
        spin_unlock_irq(&ctx->completion_lock);

--
BR, Jackie Liu




^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] io_uring: fix race with shadow drain deferrals
  2019-11-21  1:32   ` Jackie Liu
@ 2019-11-21  1:35     ` Jackie Liu
  2019-11-21  1:40       ` Jens Axboe
  2019-11-21  1:39     ` [PATCH] io_uring: fix race with shadow drain deferrals Jens Axboe
  1 sibling, 1 reply; 19+ messages in thread
From: Jackie Liu @ 2019-11-21  1:35 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring



> 2019年11月21日 09:32，Jackie Liu <[email protected]> 写道：
> 
> 2019年11月21日 07:58，Jens Axboe <[email protected]> 写道：
> 
>> 
>> On 11/20/19 4:07 PM, Jens Axboe wrote:
>>> When we go and queue requests with drain, we check if we need to defer
>>> based on sequence. This is done safely under the lock, but then we drop
>>> the lock before actually inserting the shadow. If the original request
>>> is found on the deferred list by another completion in the mean time,
>>> it could have been started AND completed by the time we insert the
>>> shadow, which will stall the queue.
>>> 
>>> After re-grabbing the completion lock, check if the original request is
>>> still in the deferred list. If it isn't, then we know that someone else
>>> already found and issued it. If that happened, then our job is done, we
>>> can simply free the shadow.
>>> 
>>> Cc: Jackie Liu <[email protected]>
>>> Fixes: 4fe2c963154c ("io_uring: add support for link with drain")
>>> Signed-off-by: Jens Axboe <[email protected]>
>> 
>> BTW, the other solution here is to not release the completion_lock if
>> we're going to return -EIOCBQUEUED, and let the caller do what it needs
>> before releasing it. That'd look something like this, with some sparse
>> annotations to keep things happy.
>> 
>> I think the original I posted here is easier to follow, and the
>> deferral list is going to be tiny in general so it won't really add
>> any extra overhead.
>> 
>> Let me know what you think and prefer.
>> 
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 6175e2e195c0..0d1f33bcedc0 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -2552,6 +2552,11 @@ static int io_async_cancel(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>> 	return 0;
>> }
>> 
>> +/*
>> + * Returns with ctx->completion_lock held if -EIOCBQUEUED is returned, so
>> + * the caller can make decisions based on the deferral without worrying about
>> + * the request being found and issued in the mean time.
>> + */
>> static int io_req_defer(struct io_kiocb *req)
>> {
>> 	const struct io_uring_sqe *sqe = req->submit.sqe;
>> @@ -2579,7 +2584,7 @@ static int io_req_defer(struct io_kiocb *req)
>> 
>> 	trace_io_uring_defer(ctx, req, false);
>> 	list_add_tail(&req->list, &ctx->defer_list);
>> -	spin_unlock_irq(&ctx->completion_lock);
>> +	__release(&ctx->completion_lock);
>> 	return -EIOCBQUEUED;
>> }
>> 
>> @@ -2954,6 +2959,7 @@ static void __io_queue_sqe(struct io_kiocb *req)
>> 
>> static void io_queue_sqe(struct io_kiocb *req)
>> {
>> +	struct io_ring_ctx *ctx = req->ctx;
>> 	int ret;
>> 
>> 	ret = io_req_defer(req);
>> @@ -2963,6 +2969,9 @@ static void io_queue_sqe(struct io_kiocb *req)
>> 			if (req->flags & REQ_F_LINK)
>> 				req->flags |= REQ_F_FAIL_LINK;
>> 			io_double_put_req(req);
>> +		} else {
>> +			__acquire(&ctx->completion_lock);
>> +			spin_unlock_irq(&ctx->completion_lock);
>> 		}
>> 	} else
>> 		__io_queue_sqe(req);
>> @@ -3001,16 +3010,17 @@ static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
>> 				__io_free_req(shadow);
>> 			return;
>> 		}
>> +		__acquire(&ctx->completion_lock);
>> 	} else {
>> 		/*
>> 		 * If ret == 0 means that all IOs in front of link io are
>> 		 * running done. let's queue link head.
>> 		 */
>> 		need_submit = true;
>> +		spin_lock_irq(&ctx->completion_lock);
>> 	}
>> 
>> 	/* Insert shadow req to defer_list, blocking next IOs */
>> -	spin_lock_irq(&ctx->completion_lock);
>> 	trace_io_uring_defer(ctx, shadow, true);
>> 	list_add_tail(&shadow->list, &ctx->defer_list);
>> 	spin_unlock_irq(&ctx->completion_lock);
> 
> This is indeed a potential lock issue, thanks, I am prefer this solution, clearer than first one.
> But It may be a bit difficult for other people who read the code, use 'io_req_defer_may_lock'?
> 
> who about this?
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 5ad652f..6fdaeb1 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -2469,7 +2469,7 @@ static int io_async_cancel(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>        return 0;
> }
> 
> -static int io_req_defer(struct io_kiocb *req)
> +static int __io_req_defer(struct io_kiocb *req)
> {
>        const struct io_uring_sqe *sqe = req->submit.sqe;
>        struct io_uring_sqe *sqe_copy;
> @@ -2495,8 +2495,21 @@ static int io_req_defer(struct io_kiocb *req)
> 
>        trace_io_uring_defer(ctx, req, false);
>        list_add_tail(&req->list, &ctx->defer_list);
> +
> +       return -EIOCBQUEUED;
> +}
> +
> +static int io_req_defer(struct io_kiocb *req)
> +{
> +       int ret = __io_req_defer(req);

There have an problem, need fix.

static int io_req_defer(struct io_kiocb *req)
{
	int ret = __io_req_defer(req);
	if (ret == -EIOCBQUEUED)
		spin_unlock_irq(&ctx->completion_lock);
	return ret;
}

>        spin_unlock_irq(&ctx->completion_lock);
> -       return-EIOCBQUEUED;
> +       return ret;
> +}
> +
> +static int io_req_defer_may_lock(struct io_kiocb *req)
> +{
> +       return __io_req_defer(req);
> +
> }
> 
> static int __io_submit_sqe(struct io_kiocb *req, struct io_kiocb **nxt,
> @@ -2927,7 +2940,7 @@ static int io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
>         * list.
>         */
>        req->flags |= REQ_F_IO_DRAIN;
> -       ret = io_req_defer(req);
> +       ret = io_req_defer_may_lock(req);
>        if (ret) {
>                if (ret != -EIOCBQUEUED) {
>                        io_cqring_add_event(req, ret);
> @@ -2941,10 +2954,10 @@ static int io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
>                 * running done. let's queue link head.
>                 */
>                need_submit = true;
> +               spin_lock_irq(&ctx->completion_lock);
>        }
> 
>        /* Insert shadow req to defer_list, blocking next IOs */
> -       spin_lock_irq(&ctx->completion_lock);
>        trace_io_uring_defer(ctx, shadow, true);
>        list_add_tail(&shadow->list, &ctx->defer_list);
>        spin_unlock_irq(&ctx->completion_lock);
> 
> --
> BR, Jackie Liu


--
BR, Jackie Liu




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] io_uring: fix race with shadow drain deferrals
  2019-11-21  1:35     ` Jackie Liu
@ 2019-11-21  1:40       ` Jens Axboe
  2019-11-21  1:49         ` Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2019-11-21  1:40 UTC (permalink / raw)
  To: Jackie Liu; +Cc: io-uring

On 11/20/19 6:35 PM, Jackie Liu wrote:
> 
> 
>> 2019年11月21日 09:32，Jackie Liu <[email protected]> 写道：
>>
>> 2019年11月21日 07:58，Jens Axboe <[email protected]> 写道：
>>
>>>
>>> On 11/20/19 4:07 PM, Jens Axboe wrote:
>>>> When we go and queue requests with drain, we check if we need to defer
>>>> based on sequence. This is done safely under the lock, but then we drop
>>>> the lock before actually inserting the shadow. If the original request
>>>> is found on the deferred list by another completion in the mean time,
>>>> it could have been started AND completed by the time we insert the
>>>> shadow, which will stall the queue.
>>>>
>>>> After re-grabbing the completion lock, check if the original request is
>>>> still in the deferred list. If it isn't, then we know that someone else
>>>> already found and issued it. If that happened, then our job is done, we
>>>> can simply free the shadow.
>>>>
>>>> Cc: Jackie Liu <[email protected]>
>>>> Fixes: 4fe2c963154c ("io_uring: add support for link with drain")
>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>
>>> BTW, the other solution here is to not release the completion_lock if
>>> we're going to return -EIOCBQUEUED, and let the caller do what it needs
>>> before releasing it. That'd look something like this, with some sparse
>>> annotations to keep things happy.
>>>
>>> I think the original I posted here is easier to follow, and the
>>> deferral list is going to be tiny in general so it won't really add
>>> any extra overhead.
>>>
>>> Let me know what you think and prefer.
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 6175e2e195c0..0d1f33bcedc0 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -2552,6 +2552,11 @@ static int io_async_cancel(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>>> 	return 0;
>>> }
>>>
>>> +/*
>>> + * Returns with ctx->completion_lock held if -EIOCBQUEUED is returned, so
>>> + * the caller can make decisions based on the deferral without worrying about
>>> + * the request being found and issued in the mean time.
>>> + */
>>> static int io_req_defer(struct io_kiocb *req)
>>> {
>>> 	const struct io_uring_sqe *sqe = req->submit.sqe;
>>> @@ -2579,7 +2584,7 @@ static int io_req_defer(struct io_kiocb *req)
>>>
>>> 	trace_io_uring_defer(ctx, req, false);
>>> 	list_add_tail(&req->list, &ctx->defer_list);
>>> -	spin_unlock_irq(&ctx->completion_lock);
>>> +	__release(&ctx->completion_lock);
>>> 	return -EIOCBQUEUED;
>>> }
>>>
>>> @@ -2954,6 +2959,7 @@ static void __io_queue_sqe(struct io_kiocb *req)
>>>
>>> static void io_queue_sqe(struct io_kiocb *req)
>>> {
>>> +	struct io_ring_ctx *ctx = req->ctx;
>>> 	int ret;
>>>
>>> 	ret = io_req_defer(req);
>>> @@ -2963,6 +2969,9 @@ static void io_queue_sqe(struct io_kiocb *req)
>>> 			if (req->flags & REQ_F_LINK)
>>> 				req->flags |= REQ_F_FAIL_LINK;
>>> 			io_double_put_req(req);
>>> +		} else {
>>> +			__acquire(&ctx->completion_lock);
>>> +			spin_unlock_irq(&ctx->completion_lock);
>>> 		}
>>> 	} else
>>> 		__io_queue_sqe(req);
>>> @@ -3001,16 +3010,17 @@ static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
>>> 				__io_free_req(shadow);
>>> 			return;
>>> 		}
>>> +		__acquire(&ctx->completion_lock);
>>> 	} else {
>>> 		/*
>>> 		 * If ret == 0 means that all IOs in front of link io are
>>> 		 * running done. let's queue link head.
>>> 		 */
>>> 		need_submit = true;
>>> +		spin_lock_irq(&ctx->completion_lock);
>>> 	}
>>>
>>> 	/* Insert shadow req to defer_list, blocking next IOs */
>>> -	spin_lock_irq(&ctx->completion_lock);
>>> 	trace_io_uring_defer(ctx, shadow, true);
>>> 	list_add_tail(&shadow->list, &ctx->defer_list);
>>> 	spin_unlock_irq(&ctx->completion_lock);
>>
>> This is indeed a potential lock issue, thanks, I am prefer this solution, clearer than first one.
>> But It may be a bit difficult for other people who read the code, use 'io_req_defer_may_lock'?
>>
>> who about this?
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 5ad652f..6fdaeb1 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -2469,7 +2469,7 @@ static int io_async_cancel(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>>         return 0;
>> }
>>
>> -static int io_req_defer(struct io_kiocb *req)
>> +static int __io_req_defer(struct io_kiocb *req)
>> {
>>         const struct io_uring_sqe *sqe = req->submit.sqe;
>>         struct io_uring_sqe *sqe_copy;
>> @@ -2495,8 +2495,21 @@ static int io_req_defer(struct io_kiocb *req)
>>
>>         trace_io_uring_defer(ctx, req, false);
>>         list_add_tail(&req->list, &ctx->defer_list);
>> +
>> +       return -EIOCBQUEUED;
>> +}
>> +
>> +static int io_req_defer(struct io_kiocb *req)
>> +{
>> +       int ret = __io_req_defer(req);
> 
> There have an problem, need fix.
> 
> static int io_req_defer(struct io_kiocb *req)
> {
> 	int ret = __io_req_defer(req);
> 	if (ret == -EIOCBQUEUED)
> 		spin_unlock_irq(&ctx->completion_lock);
> 	return ret;
> }

Mid-air collision, indeed.

But as I wrote in the previous email, I don't think this one improves on
the situation... And fwiw, I did test both of mine, both are verified to
fix the issue.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] io_uring: fix race with shadow drain deferrals
  2019-11-21  1:40       ` Jens Axboe
@ 2019-11-21  1:49         ` Jens Axboe
  2019-11-21  1:57           ` Jackie Liu
  2019-11-21  8:54           ` [PATCH] io_uring: drain next sqe instead of shadowing Pavel Begunkov
  0 siblings, 2 replies; 19+ messages in thread
From: Jens Axboe @ 2019-11-21  1:49 UTC (permalink / raw)
  To: Jackie Liu; +Cc: io-uring

On 11/20/19 6:40 PM, Jens Axboe wrote:
> On 11/20/19 6:35 PM, Jackie Liu wrote:
>>
>>
>>> 2019年11月21日 09:32，Jackie Liu <[email protected]> 写道：
>>>
>>> 2019年11月21日 07:58，Jens Axboe <[email protected]> 写道：
>>>
>>>>
>>>> On 11/20/19 4:07 PM, Jens Axboe wrote:
>>>>> When we go and queue requests with drain, we check if we need to defer
>>>>> based on sequence. This is done safely under the lock, but then we drop
>>>>> the lock before actually inserting the shadow. If the original request
>>>>> is found on the deferred list by another completion in the mean time,
>>>>> it could have been started AND completed by the time we insert the
>>>>> shadow, which will stall the queue.
>>>>>
>>>>> After re-grabbing the completion lock, check if the original request is
>>>>> still in the deferred list. If it isn't, then we know that someone else
>>>>> already found and issued it. If that happened, then our job is done, we
>>>>> can simply free the shadow.
>>>>>
>>>>> Cc: Jackie Liu <[email protected]>
>>>>> Fixes: 4fe2c963154c ("io_uring: add support for link with drain")
>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>
>>>> BTW, the other solution here is to not release the completion_lock if
>>>> we're going to return -EIOCBQUEUED, and let the caller do what it needs
>>>> before releasing it. That'd look something like this, with some sparse
>>>> annotations to keep things happy.
>>>>
>>>> I think the original I posted here is easier to follow, and the
>>>> deferral list is going to be tiny in general so it won't really add
>>>> any extra overhead.
>>>>
>>>> Let me know what you think and prefer.
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index 6175e2e195c0..0d1f33bcedc0 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -2552,6 +2552,11 @@ static int io_async_cancel(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>>>> 	return 0;
>>>> }
>>>>
>>>> +/*
>>>> + * Returns with ctx->completion_lock held if -EIOCBQUEUED is returned, so
>>>> + * the caller can make decisions based on the deferral without worrying about
>>>> + * the request being found and issued in the mean time.
>>>> + */
>>>> static int io_req_defer(struct io_kiocb *req)
>>>> {
>>>> 	const struct io_uring_sqe *sqe = req->submit.sqe;
>>>> @@ -2579,7 +2584,7 @@ static int io_req_defer(struct io_kiocb *req)
>>>>
>>>> 	trace_io_uring_defer(ctx, req, false);
>>>> 	list_add_tail(&req->list, &ctx->defer_list);
>>>> -	spin_unlock_irq(&ctx->completion_lock);
>>>> +	__release(&ctx->completion_lock);
>>>> 	return -EIOCBQUEUED;
>>>> }
>>>>
>>>> @@ -2954,6 +2959,7 @@ static void __io_queue_sqe(struct io_kiocb *req)
>>>>
>>>> static void io_queue_sqe(struct io_kiocb *req)
>>>> {
>>>> +	struct io_ring_ctx *ctx = req->ctx;
>>>> 	int ret;
>>>>
>>>> 	ret = io_req_defer(req);
>>>> @@ -2963,6 +2969,9 @@ static void io_queue_sqe(struct io_kiocb *req)
>>>> 			if (req->flags & REQ_F_LINK)
>>>> 				req->flags |= REQ_F_FAIL_LINK;
>>>> 			io_double_put_req(req);
>>>> +		} else {
>>>> +			__acquire(&ctx->completion_lock);
>>>> +			spin_unlock_irq(&ctx->completion_lock);
>>>> 		}
>>>> 	} else
>>>> 		__io_queue_sqe(req);
>>>> @@ -3001,16 +3010,17 @@ static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
>>>> 				__io_free_req(shadow);
>>>> 			return;
>>>> 		}
>>>> +		__acquire(&ctx->completion_lock);
>>>> 	} else {
>>>> 		/*
>>>> 		 * If ret == 0 means that all IOs in front of link io are
>>>> 		 * running done. let's queue link head.
>>>> 		 */
>>>> 		need_submit = true;
>>>> +		spin_lock_irq(&ctx->completion_lock);
>>>> 	}
>>>>
>>>> 	/* Insert shadow req to defer_list, blocking next IOs */
>>>> -	spin_lock_irq(&ctx->completion_lock);
>>>> 	trace_io_uring_defer(ctx, shadow, true);
>>>> 	list_add_tail(&shadow->list, &ctx->defer_list);
>>>> 	spin_unlock_irq(&ctx->completion_lock);
>>>
>>> This is indeed a potential lock issue, thanks, I am prefer this solution, clearer than first one.
>>> But It may be a bit difficult for other people who read the code, use 'io_req_defer_may_lock'?
>>>
>>> who about this?
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 5ad652f..6fdaeb1 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -2469,7 +2469,7 @@ static int io_async_cancel(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>>>          return 0;
>>> }
>>>
>>> -static int io_req_defer(struct io_kiocb *req)
>>> +static int __io_req_defer(struct io_kiocb *req)
>>> {
>>>          const struct io_uring_sqe *sqe = req->submit.sqe;
>>>          struct io_uring_sqe *sqe_copy;
>>> @@ -2495,8 +2495,21 @@ static int io_req_defer(struct io_kiocb *req)
>>>
>>>          trace_io_uring_defer(ctx, req, false);
>>>          list_add_tail(&req->list, &ctx->defer_list);
>>> +
>>> +       return -EIOCBQUEUED;
>>> +}
>>> +
>>> +static int io_req_defer(struct io_kiocb *req)
>>> +{
>>> +       int ret = __io_req_defer(req);
>>
>> There have an problem, need fix.
>>
>> static int io_req_defer(struct io_kiocb *req)
>> {
>> 	int ret = __io_req_defer(req);
>> 	if (ret == -EIOCBQUEUED)
>> 		spin_unlock_irq(&ctx->completion_lock);
>> 	return ret;
>> }
> 
> Mid-air collision, indeed.
> 
> But as I wrote in the previous email, I don't think this one improves on
> the situation... And fwiw, I did test both of mine, both are verified to
> fix the issue.

Maybe we can compromise on something like this? Doesn't introduce any
may_lock() naming, just uses the __io_req_defer() to take that blame.
And uses the right sparse annotations to keep things happy with C=2 as
well. Uses your trick to make io_req_defer() do the lock drop for the
other caller.

Ran it through 400x rounds of testing, confirmed as well.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6175e2e195c0..299a218e9552 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2552,7 +2552,12 @@ static int io_async_cancel(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	return 0;
 }
 
-static int io_req_defer(struct io_kiocb *req)
+/*
+ * Returns with ctx->completion_lock held if -EIOCBQUEUED is returned, so
+ * the caller can make decisions based on the deferral without worrying about
+ * the request being found and issued in the mean time.
+ */
+static int __io_req_defer(struct io_kiocb *req)
 {
 	const struct io_uring_sqe *sqe = req->submit.sqe;
 	struct io_uring_sqe *sqe_copy;
@@ -2579,10 +2584,23 @@ static int io_req_defer(struct io_kiocb *req)
 
 	trace_io_uring_defer(ctx, req, false);
 	list_add_tail(&req->list, &ctx->defer_list);
-	spin_unlock_irq(&ctx->completion_lock);
+	__release(&ctx->completion_lock);
 	return -EIOCBQUEUED;
 }
 
+static int io_req_defer(struct io_kiocb *req)
+{
+	struct io_ring_ctx *ctx = req->ctx;
+	int ret;
+
+	ret = __io_req_defer(req);
+	if (ret == -EIOCBQUEUED) {
+		__acquire(&ctx->completion_lock);
+		spin_unlock_irq(&ctx->completion_lock);
+	}
+	return ret;
+}
+
 static int __io_submit_sqe(struct io_kiocb *req, struct io_kiocb **nxt,
 			   bool force_nonblock)
 {
@@ -2957,15 +2975,14 @@ static void io_queue_sqe(struct io_kiocb *req)
 	int ret;
 
 	ret = io_req_defer(req);
-	if (ret) {
-		if (ret != -EIOCBQUEUED) {
-			io_cqring_add_event(req, ret);
-			if (req->flags & REQ_F_LINK)
-				req->flags |= REQ_F_FAIL_LINK;
-			io_double_put_req(req);
-		}
-	} else
+	if (!ret) {
 		__io_queue_sqe(req);
+	} else if (ret != -EIOCBQUEUED) {
+		io_cqring_add_event(req, ret);
+		if (req->flags & REQ_F_LINK)
+			req->flags |= REQ_F_FAIL_LINK;
+		io_double_put_req(req);
+	}
 }
 
 static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
@@ -2989,7 +3006,7 @@ static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
 	 * list.
 	 */
 	req->flags |= REQ_F_IO_DRAIN;
-	ret = io_req_defer(req);
+	ret = __io_req_defer(req);
 	if (ret) {
 		if (ret != -EIOCBQUEUED) {
 err:
@@ -3001,16 +3018,17 @@ static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
 				__io_free_req(shadow);
 			return;
 		}
+		__acquire(&ctx->completion_lock);
 	} else {
 		/*
 		 * If ret == 0 means that all IOs in front of link io are
 		 * running done. let's queue link head.
 		 */
 		need_submit = true;
+		spin_lock_irq(&ctx->completion_lock);
 	}
 
 	/* Insert shadow req to defer_list, blocking next IOs */
-	spin_lock_irq(&ctx->completion_lock);
 	trace_io_uring_defer(ctx, shadow, true);
 	list_add_tail(&shadow->list, &ctx->defer_list);
 	spin_unlock_irq(&ctx->completion_lock);

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] io_uring: fix race with shadow drain deferrals
  2019-11-21  1:49         ` Jens Axboe
@ 2019-11-21  1:57           ` Jackie Liu
  2019-11-20 23:14             ` Jens Axboe
  2019-11-21  8:54           ` [PATCH] io_uring: drain next sqe instead of shadowing Pavel Begunkov
  1 sibling, 1 reply; 19+ messages in thread
From: Jackie Liu @ 2019-11-21  1:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring



> 2019年11月21日 09:49，Jens Axboe <[email protected]> 写道：
> 
> On 11/20/19 6:40 PM, Jens Axboe wrote:
>> On 11/20/19 6:35 PM, Jackie Liu wrote:
>>> 
>>> 
>>>> 2019年11月21日 09:32，Jackie Liu <[email protected]> 写道：
>>>> 
>>>> 2019年11月21日 07:58，Jens Axboe <[email protected]> 写道：
>>>> 
>>>>> 
>>>>> On 11/20/19 4:07 PM, Jens Axboe wrote:
>>>>>> When we go and queue requests with drain, we check if we need to defer
>>>>>> based on sequence. This is done safely under the lock, but then we drop
>>>>>> the lock before actually inserting the shadow. If the original request
>>>>>> is found on the deferred list by another completion in the mean time,
>>>>>> it could have been started AND completed by the time we insert the
>>>>>> shadow, which will stall the queue.
>>>>>> 
>>>>>> After re-grabbing the completion lock, check if the original request is
>>>>>> still in the deferred list. If it isn't, then we know that someone else
>>>>>> already found and issued it. If that happened, then our job is done, we
>>>>>> can simply free the shadow.
>>>>>> 
>>>>>> Cc: Jackie Liu <[email protected]>
>>>>>> Fixes: 4fe2c963154c ("io_uring: add support for link with drain")
>>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>> 
>>>>> BTW, the other solution here is to not release the completion_lock if
>>>>> we're going to return -EIOCBQUEUED, and let the caller do what it needs
>>>>> before releasing it. That'd look something like this, with some sparse
>>>>> annotations to keep things happy.
>>>>> 
>>>>> I think the original I posted here is easier to follow, and the
>>>>> deferral list is going to be tiny in general so it won't really add
>>>>> any extra overhead.
>>>>> 
>>>>> Let me know what you think and prefer.
>>>>> 
>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>> index 6175e2e195c0..0d1f33bcedc0 100644
>>>>> --- a/fs/io_uring.c
>>>>> +++ b/fs/io_uring.c
>>>>> @@ -2552,6 +2552,11 @@ static int io_async_cancel(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>>>>> 	return 0;
>>>>> }
>>>>> 
>>>>> +/*
>>>>> + * Returns with ctx->completion_lock held if -EIOCBQUEUED is returned, so
>>>>> + * the caller can make decisions based on the deferral without worrying about
>>>>> + * the request being found and issued in the mean time.
>>>>> + */
>>>>> static int io_req_defer(struct io_kiocb *req)
>>>>> {
>>>>> 	const struct io_uring_sqe *sqe = req->submit.sqe;
>>>>> @@ -2579,7 +2584,7 @@ static int io_req_defer(struct io_kiocb *req)
>>>>> 
>>>>> 	trace_io_uring_defer(ctx, req, false);
>>>>> 	list_add_tail(&req->list, &ctx->defer_list);
>>>>> -	spin_unlock_irq(&ctx->completion_lock);
>>>>> +	__release(&ctx->completion_lock);
>>>>> 	return -EIOCBQUEUED;
>>>>> }
>>>>> 
>>>>> @@ -2954,6 +2959,7 @@ static void __io_queue_sqe(struct io_kiocb *req)
>>>>> 
>>>>> static void io_queue_sqe(struct io_kiocb *req)
>>>>> {
>>>>> +	struct io_ring_ctx *ctx = req->ctx;
>>>>> 	int ret;
>>>>> 
>>>>> 	ret = io_req_defer(req);
>>>>> @@ -2963,6 +2969,9 @@ static void io_queue_sqe(struct io_kiocb *req)
>>>>> 			if (req->flags & REQ_F_LINK)
>>>>> 				req->flags |= REQ_F_FAIL_LINK;
>>>>> 			io_double_put_req(req);
>>>>> +		} else {
>>>>> +			__acquire(&ctx->completion_lock);
>>>>> +			spin_unlock_irq(&ctx->completion_lock);
>>>>> 		}
>>>>> 	} else
>>>>> 		__io_queue_sqe(req);
>>>>> @@ -3001,16 +3010,17 @@ static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
>>>>> 				__io_free_req(shadow);
>>>>> 			return;
>>>>> 		}
>>>>> +		__acquire(&ctx->completion_lock);
>>>>> 	} else {
>>>>> 		/*
>>>>> 		 * If ret == 0 means that all IOs in front of link io are
>>>>> 		 * running done. let's queue link head.
>>>>> 		 */
>>>>> 		need_submit = true;
>>>>> +		spin_lock_irq(&ctx->completion_lock);
>>>>> 	}
>>>>> 
>>>>> 	/* Insert shadow req to defer_list, blocking next IOs */
>>>>> -	spin_lock_irq(&ctx->completion_lock);
>>>>> 	trace_io_uring_defer(ctx, shadow, true);
>>>>> 	list_add_tail(&shadow->list, &ctx->defer_list);
>>>>> 	spin_unlock_irq(&ctx->completion_lock);
>>>> 
>>>> This is indeed a potential lock issue, thanks, I am prefer this solution, clearer than first one.
>>>> But It may be a bit difficult for other people who read the code, use 'io_req_defer_may_lock'?
>>>> 
>>>> who about this?
>>>> 
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index 5ad652f..6fdaeb1 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -2469,7 +2469,7 @@ static int io_async_cancel(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>>>>         return 0;
>>>> }
>>>> 
>>>> -static int io_req_defer(struct io_kiocb *req)
>>>> +static int __io_req_defer(struct io_kiocb *req)
>>>> {
>>>>         const struct io_uring_sqe *sqe = req->submit.sqe;
>>>>         struct io_uring_sqe *sqe_copy;
>>>> @@ -2495,8 +2495,21 @@ static int io_req_defer(struct io_kiocb *req)
>>>> 
>>>>         trace_io_uring_defer(ctx, req, false);
>>>>         list_add_tail(&req->list, &ctx->defer_list);
>>>> +
>>>> +       return -EIOCBQUEUED;
>>>> +}
>>>> +
>>>> +static int io_req_defer(struct io_kiocb *req)
>>>> +{
>>>> +       int ret = __io_req_defer(req);
>>> 
>>> There have an problem, need fix.
>>> 
>>> static int io_req_defer(struct io_kiocb *req)
>>> {
>>> 	int ret = __io_req_defer(req);
>>> 	if (ret == -EIOCBQUEUED)
>>> 		spin_unlock_irq(&ctx->completion_lock);
>>> 	return ret;
>>> }
>> 
>> Mid-air collision, indeed.
>> 
>> But as I wrote in the previous email, I don't think this one improves on
>> the situation... And fwiw, I did test both of mine, both are verified to
>> fix the issue.
> 
> Maybe we can compromise on something like this? Doesn't introduce any
> may_lock() naming, just uses the __io_req_defer() to take that blame.
> And uses the right sparse annotations to keep things happy with C=2 as
> well. Uses your trick to make io_req_defer() do the lock drop for the
> other caller.
> 
> Ran it through 400x rounds of testing, confirmed as well.
> 
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 6175e2e195c0..299a218e9552 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -2552,7 +2552,12 @@ static int io_async_cancel(struct io_kiocb *req, const struct io_uring_sqe *sqe,
> 	return 0;
> }
> 
> -static int io_req_defer(struct io_kiocb *req)
> +/*
> + * Returns with ctx->completion_lock held if -EIOCBQUEUED is returned, so
> + * the caller can make decisions based on the deferral without worrying about
> + * the request being found and issued in the mean time.
> + */
> +static int __io_req_defer(struct io_kiocb *req)
> {
> 	const struct io_uring_sqe *sqe = req->submit.sqe;
> 	struct io_uring_sqe *sqe_copy;
> @@ -2579,10 +2584,23 @@ static int io_req_defer(struct io_kiocb *req)
> 
> 	trace_io_uring_defer(ctx, req, false);
> 	list_add_tail(&req->list, &ctx->defer_list);
> -	spin_unlock_irq(&ctx->completion_lock);
> +	__release(&ctx->completion_lock);
> 	return -EIOCBQUEUED;
> }
> 
> +static int io_req_defer(struct io_kiocb *req)
> +{
> +	struct io_ring_ctx *ctx = req->ctx;
> +	int ret;
> +
> +	ret = __io_req_defer(req);
> +	if (ret == -EIOCBQUEUED) {
> +		__acquire(&ctx->completion_lock);
> +		spin_unlock_irq(&ctx->completion_lock);
> +	}
> +	return ret;
> +}
> +
> static int __io_submit_sqe(struct io_kiocb *req, struct io_kiocb **nxt,
> 			   bool force_nonblock)
> {
> @@ -2957,15 +2975,14 @@ static void io_queue_sqe(struct io_kiocb *req)
> 	int ret;
> 
> 	ret = io_req_defer(req);
> -	if (ret) {
> -		if (ret != -EIOCBQUEUED) {
> -			io_cqring_add_event(req, ret);
> -			if (req->flags & REQ_F_LINK)
> -				req->flags |= REQ_F_FAIL_LINK;
> -			io_double_put_req(req);
> -		}
> -	} else
> +	if (!ret) {
> 		__io_queue_sqe(req);
> +	} else if (ret != -EIOCBQUEUED) {
> +		io_cqring_add_event(req, ret);
> +		if (req->flags & REQ_F_LINK)
> +			req->flags |= REQ_F_FAIL_LINK;
> +		io_double_put_req(req);
> +	}
> }

Hmm.. Why we need rewrite there? clear code? Seems to be unrelated to this issue.

> 
> static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
> @@ -2989,7 +3006,7 @@ static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
> 	 * list.
> 	 */
> 	req->flags |= REQ_F_IO_DRAIN;
> -	ret = io_req_defer(req);
> +	ret = __io_req_defer(req);
> 	if (ret) {
> 		if (ret != -EIOCBQUEUED) {
> err:
> @@ -3001,16 +3018,17 @@ static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
> 				__io_free_req(shadow);
> 			return;
> 		}
> +		__acquire(&ctx->completion_lock);
> 	} else {
> 		/*
> 		 * If ret == 0 means that all IOs in front of link io are
> 		 * running done. let's queue link head.
> 		 */
> 		need_submit = true;
> +		spin_lock_irq(&ctx->completion_lock);
> 	}
> 
> 	/* Insert shadow req to defer_list, blocking next IOs */
> -	spin_lock_irq(&ctx->completion_lock);
> 	trace_io_uring_defer(ctx, shadow, true);
> 	list_add_tail(&shadow->list, &ctx->defer_list);
> 	spin_unlock_irq(&ctx->completion_lock);

--
BR, Jackie Liu




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] io_uring: fix race with shadow drain deferrals
  2019-11-21  1:57           ` Jackie Liu
@ 2019-11-20 23:14             ` Jens Axboe
       [not found]               ` <[email protected]>
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2019-11-20 23:14 UTC (permalink / raw)
  To: Jackie Liu; +Cc: io-uring

On 11/20/19 6:57 PM, Jackie Liu wrote:
>> @@ -2957,15 +2975,14 @@ static void io_queue_sqe(struct io_kiocb *req)
>> 	int ret;
>>
>> 	ret = io_req_defer(req);
>> -	if (ret) {
>> -		if (ret != -EIOCBQUEUED) {
>> -			io_cqring_add_event(req, ret);
>> -			if (req->flags & REQ_F_LINK)
>> -				req->flags |= REQ_F_FAIL_LINK;
>> -			io_double_put_req(req);
>> -		}
>> -	} else
>> +	if (!ret) {
>> 		__io_queue_sqe(req);
>> +	} else if (ret != -EIOCBQUEUED) {
>> +		io_cqring_add_event(req, ret);
>> +		if (req->flags & REQ_F_LINK)
>> +			req->flags |= REQ_F_FAIL_LINK;
>> +		io_double_put_req(req);
>> +	}
>> }
> 
> Hmm.. Why we need rewrite there? clear code? Seems to be unrelated to
> this issue.

We don't need to, and the previous patch touched it, but it's much
easier to read after this change. Before it was:

if (ret) {
	if (ret != -EIOCBQUEUED) {
		...
	}
} else {
	...
}

which is now just

if (!ret) {
	...
} else if (ret != -EIOCBQUEUED) {
	...
}

not related to the change really, but kind of silly to make a separate
patch for imho.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <[email protected]>]

* Re: [PATCH] io_uring: fix race with shadow drain deferrals
       [not found]               ` <[email protected]>
@ 2019-11-20 23:03                 ` Jens Axboe
  0 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2019-11-20 23:03 UTC (permalink / raw)
  To: Jackie Liu; +Cc: io-uring

On 11/20/19 7:07 PM, Jackie Liu wrote:
> 
> 
>> 2019年11月21日 07:14，Jens Axboe <[email protected]> 写道：
>>
>> On 11/20/19 6:57 PM, Jackie Liu wrote:
>>>> @@ -2957,15 +2975,14 @@ static void io_queue_sqe(struct io_kiocb *req)
>>>> 	int ret;
>>>>
>>>> 	ret = io_req_defer(req);
>>>> -	if (ret) {
>>>> -		if (ret != -EIOCBQUEUED) {
>>>> -			io_cqring_add_event(req, ret);
>>>> -			if (req->flags & REQ_F_LINK)
>>>> -				req->flags |= REQ_F_FAIL_LINK;
>>>> -			io_double_put_req(req);
>>>> -		}
>>>> -	} else
>>>> +	if (!ret) {
>>>> 		__io_queue_sqe(req);
>>>> +	} else if (ret != -EIOCBQUEUED) {
>>>> +		io_cqring_add_event(req, ret);
>>>> +		if (req->flags & REQ_F_LINK)
>>>> +			req->flags |= REQ_F_FAIL_LINK;
>>>> +		io_double_put_req(req);
>>>> +	}
>>>> }
>>>
>>> Hmm.. Why we need rewrite there? clear code? Seems to be unrelated to
>>> this issue.
>>
>> We don't need to, and the previous patch touched it, but it's much
>> easier to read after this change. Before it was:
>>
>> if (ret) {
>> 	if (ret != -EIOCBQUEUED) {
>> 		...
>> 	}
>> } else {
>> 	...
>> }
>>
>> which is now just
>>
>> if (!ret) {
>> 	...
>> } else if (ret != -EIOCBQUEUED) {
>> 	...
>> }
>>
>> not related to the change really, but kind of silly to make a separate
>> patch for imho.
>>
> 
> Understand, thanks for explaining and fixing this problem,
> And now, please add:
> 
> Reviewed-by: Jackie Liu <[email protected]>

Thanks for reviewing it, I've added your reviewed-by.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] io_uring: drain next sqe instead of shadowing
  2019-11-21  1:49         ` Jens Axboe
  2019-11-21  1:57           ` Jackie Liu
@ 2019-11-21  8:54           ` Pavel Begunkov
       [not found]             ` <[email protected]>
  1 sibling, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2019-11-21  8:54 UTC (permalink / raw)
  To: Jens Axboe, io-uring, iuyun01

If there is a DRAIN in the middle of a link, it uses shadow req. Defer
the next request/link instead. This:

Pros:
1. removes semi-duplicated code
2. doesn't allocate memory for shadows
3. works better if only the head marked for drain
4. doesn't need complex synchronisation

Cons:
1. removes shadow->seq = last_draind_in_link->seq optimisation
It shouldn't be a common case, and can be added back

Signed-off-by: Pavel Begunkov <[email protected]>

---
Hmm... How about to go in another way and just remove this shadow
requests? I think this patch makes things easier.


 fs/io_uring.c | 86 +++++++++++----------------------------------------
 1 file changed, 18 insertions(+), 68 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6175e2e195c0..dd220f415c39 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -186,6 +186,7 @@ struct io_ring_ctx {
 		bool			compat;
 		bool			account_mem;
 		bool			cq_overflow_flushed;
+		bool			drain_next;
 
 		/*
 		 * Ring buffer of indices into array of io_uring_sqe, which is
@@ -346,7 +347,7 @@ struct io_kiocb {
 #define REQ_F_LINK		64	/* linked sqes */
 #define REQ_F_LINK_TIMEOUT	128	/* has linked timeout */
 #define REQ_F_FAIL_LINK		256	/* fail rest of links */
-#define REQ_F_SHADOW_DRAIN	512	/* link-drain shadow req */
+#define REQ_F_DRAIN_LINK	512	/* link should be fully drained */
 #define REQ_F_TIMEOUT		1024	/* timeout request */
 #define REQ_F_ISREG		2048	/* regular file */
 #define REQ_F_MUST_PUNT		4096	/* must be punted even for NONBLOCK */
@@ -615,11 +616,6 @@ static void io_commit_cqring(struct io_ring_ctx *ctx)
 	__io_commit_cqring(ctx);
 
 	while ((req = io_get_deferred_req(ctx)) != NULL) {
-		if (req->flags & REQ_F_SHADOW_DRAIN) {
-			/* Just for drain, free it. */
-			__io_free_req(req);
-			continue;
-		}
 		req->flags |= REQ_F_IO_DRAINED;
 		io_queue_async_work(req);
 	}
@@ -2956,6 +2952,12 @@ static void io_queue_sqe(struct io_kiocb *req)
 {
 	int ret;
 
+	if (unlikely(req->ctx->drain_next)) {
+		req->flags |= REQ_F_IO_DRAIN;
+		req->ctx->drain_next = false;
+	}
+	req->ctx->drain_next = (req->flags & REQ_F_DRAIN_LINK);
+
 	ret = io_req_defer(req);
 	if (ret) {
 		if (ret != -EIOCBQUEUED) {
@@ -2968,57 +2970,16 @@ static void io_queue_sqe(struct io_kiocb *req)
 		__io_queue_sqe(req);
 }
 
-static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
+static inline void io_queue_link_head(struct io_kiocb *req)
 {
-	int ret;
-	int need_submit = false;
-	struct io_ring_ctx *ctx = req->ctx;
-
 	if (unlikely(req->flags & REQ_F_FAIL_LINK)) {
-		ret = -ECANCELED;
-		goto err;
-	}
-	if (!shadow) {
+		io_cqring_add_event(req, -ECANCELED);
+		io_double_put_req(req);
+	} else
 		io_queue_sqe(req);
-		return;
-	}
-
-	/*
-	 * Mark the first IO in link list as DRAIN, let all the following
-	 * IOs enter the defer list. all IO needs to be completed before link
-	 * list.
-	 */
-	req->flags |= REQ_F_IO_DRAIN;
-	ret = io_req_defer(req);
-	if (ret) {
-		if (ret != -EIOCBQUEUED) {
-err:
-			io_cqring_add_event(req, ret);
-			if (req->flags & REQ_F_LINK)
-				req->flags |= REQ_F_FAIL_LINK;
-			io_double_put_req(req);
-			if (shadow)
-				__io_free_req(shadow);
-			return;
-		}
-	} else {
-		/*
-		 * If ret == 0 means that all IOs in front of link io are
-		 * running done. let's queue link head.
-		 */
-		need_submit = true;
-	}
-
-	/* Insert shadow req to defer_list, blocking next IOs */
-	spin_lock_irq(&ctx->completion_lock);
-	trace_io_uring_defer(ctx, shadow, true);
-	list_add_tail(&shadow->list, &ctx->defer_list);
-	spin_unlock_irq(&ctx->completion_lock);
-
-	if (need_submit)
-		__io_queue_sqe(req);
 }
 
+
 #define SQE_VALID_FLAGS	(IOSQE_FIXED_FILE|IOSQE_IO_DRAIN|IOSQE_IO_LINK)
 
 static void io_submit_sqe(struct io_kiocb *req, struct io_submit_state *state,
@@ -3055,6 +3016,9 @@ static void io_submit_sqe(struct io_kiocb *req, struct io_submit_state *state,
 		struct io_kiocb *prev = *link;
 		struct io_uring_sqe *sqe_copy;
 
+		if (s->sqe->flags & IOSQE_IO_DRAIN)
+			(*link)->flags |= REQ_F_DRAIN_LINK | REQ_F_IO_DRAIN;
+
 		if (READ_ONCE(s->sqe->opcode) == IORING_OP_LINK_TIMEOUT) {
 			ret = io_timeout_setup(req);
 			/* common setup allows offset being set, we don't */
@@ -3173,7 +3137,6 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr,
 {
 	struct io_submit_state state, *statep = NULL;
 	struct io_kiocb *link = NULL;
-	struct io_kiocb *shadow_req = NULL;
 	int i, submitted = 0;
 	bool mm_fault = false;
 
@@ -3212,18 +3175,6 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr,
 
 		sqe_flags = req->submit.sqe->flags;
 
-		if (link && (sqe_flags & IOSQE_IO_DRAIN)) {
-			if (!shadow_req) {
-				shadow_req = io_get_req(ctx, NULL);
-				if (unlikely(!shadow_req))
-					goto out;
-				shadow_req->flags |= (REQ_F_IO_DRAIN | REQ_F_SHADOW_DRAIN);
-				refcount_dec(&shadow_req->refs);
-			}
-			shadow_req->sequence = req->submit.sequence;
-		}
-
-out:
 		req->submit.ring_file = ring_file;
 		req->submit.ring_fd = ring_fd;
 		req->submit.has_user = *mm != NULL;
@@ -3239,14 +3190,13 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr,
 		 * that's the end of the chain. Submit the previous link.
 		 */
 		if (!(sqe_flags & IOSQE_IO_LINK) && link) {
-			io_queue_link_head(link, shadow_req);
+			io_queue_link_head(link);
 			link = NULL;
-			shadow_req = NULL;
 		}
 	}
 
 	if (link)
-		io_queue_link_head(link, shadow_req);
+		io_queue_link_head(link);
 	if (statep)
 		io_submit_state_end(&state);
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

[parent not found: <[email protected]>]

* Re: [PATCH] io_uring: drain next sqe instead of shadowing
       [not found]             ` <[email protected]>
@ 2019-11-21  9:43               ` Pavel Begunkov
       [not found]                 ` <[email protected]>
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2019-11-21  9:43 UTC (permalink / raw)
  To: Jackie Liu; +Cc: Jens Axboe, io-uring

On 11/21/2019 12:26 PM, Jackie Liu wrote:
> 2019年11月21日 16:54，Pavel Begunkov <[email protected]> 写道：
>>
>> If there is a DRAIN in the middle of a link, it uses shadow req. Defer
>> the next request/link instead. This:
>>
>> Pros:
>> 1. removes semi-duplicated code
>> 2. doesn't allocate memory for shadows
>> 3. works better if only the head marked for drain
> 
> I thought about this before, just only drain the head, but if the latter IO depends
> on the link-list, then latter IO will run in front of the link-list. If we think it
> is acceptable, then I think it is ok for me.

If I got your point right, latter requests won't run ahead of the
link-list. There shouldn't be change of behaviour.

The purpose of shadow requests is to mark some request right ahead of
the link for draining. This patch uses not a specially added shadow
request, but the following regular one. And, as drained IO shouldn't be
issued until every request behind completed, this should give the same
effect.

Am I missed something?

Just to notice, @drain_next is in @ctx, so it's preserved across
multiple io_enter_uring().
> --
> BR, Jackie Liu
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <[email protected]>]

* Re: [PATCH] io_uring: drain next sqe instead of shadowing
       [not found]                 ` <[email protected]>
@ 2019-11-21 12:40                   ` Pavel Begunkov
       [not found]                     ` <[email protected]>
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2019-11-21 12:40 UTC (permalink / raw)
  To: Jackie Liu; +Cc: Jens Axboe, io-uring

> 在 2019/11/21 17:43, Pavel Begunkov 写道:
>> On 11/21/2019 12:26 PM, Jackie Liu wrote:
>>> 2019年11月21日 16:54，Pavel Begunkov <[email protected]> 写道：
>>>>
>>>> If there is a DRAIN in the middle of a link, it uses shadow req. Defer
>>>> the next request/link instead. This:
>>>>
>>>> Pros:
>>>> 1. removes semi-duplicated code
>>>> 2. doesn't allocate memory for shadows
>>>> 3. works better if only the head marked for drain
>>>
>>> I thought about this before, just only drain the head, but if the
>>> latter IO depends
>>> on the link-list, then latter IO will run in front of the link-list.
>>> If we think it
>>> is acceptable, then I think it is ok for me.
>>
>> If I got your point right, latter requests won't run ahead of the
>> link-list. There shouldn't be change of behaviour.
>>
>> The purpose of shadow requests is to mark some request right ahead of
>> the link for draining. This patch uses not a specially added shadow
>> request, but the following regular one. And, as drained IO shouldn't be
>> issued until every request behind completed, this should give the same
>> effect.
>>
>> Am I missed something?
> 
> Thanks for explaining. This is also correct, if I understand
> correctly, It seems that other IOs will wait for all the links are
> done. this is a little different, is it?

Yes, you're right, it also was briefly stated in the patch description
(see Cons). I hope, links + drain in the middle is an uncommon case.
But it can be added back, but may become a little bit uglier.

What do you think, should we care about this case?

> 
> -- 
> Jackie Liu
> 
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <[email protected]>]

* Re: [PATCH] io_uring: drain next sqe instead of shadowing
       [not found]                     ` <[email protected]>
@ 2019-11-21 13:47                       ` Jens Axboe
       [not found]                         ` <[email protected]>
       [not found]                         ` <[email protected]>
  0 siblings, 2 replies; 19+ messages in thread
From: Jens Axboe @ 2019-11-21 13:47 UTC (permalink / raw)
  To: Jackie Liu, Pavel Begunkov; +Cc: io-uring

On 11/21/19 5:49 AM, Jackie Liu wrote:
> 
> 
> On 2019/11/21 20:40, Pavel Begunkov wrote:
>>> 在 2019/11/21 17:43, Pavel Begunkov 写道:
>>>> On 11/21/2019 12:26 PM, Jackie Liu wrote:
>>>>> 2019年11月21日 16:54，Pavel Begunkov <[email protected]> 写道：
>>>>>>
>>>>>> If there is a DRAIN in the middle of a link, it uses shadow req. Defer
>>>>>> the next request/link instead. This:
>>>>>>
>>>>>> Pros:
>>>>>> 1. removes semi-duplicated code
>>>>>> 2. doesn't allocate memory for shadows
>>>>>> 3. works better if only the head marked for drain
>>>>>
>>>>> I thought about this before, just only drain the head, but if the
>>>>> latter IO depends
>>>>> on the link-list, then latter IO will run in front of the link-list.
>>>>> If we think it
>>>>> is acceptable, then I think it is ok for me.
>>>>
>>>> If I got your point right, latter requests won't run ahead of the
>>>> link-list. There shouldn't be change of behaviour.
>>>>
>>>> The purpose of shadow requests is to mark some request right ahead of
>>>> the link for draining. This patch uses not a specially added shadow
>>>> request, but the following regular one. And, as drained IO shouldn't be
>>>> issued until every request behind completed, this should give the same
>>>> effect.
>>>>
>>>> Am I missed something?
>>>
>>> Thanks for explaining. This is also correct, if I understand
>>> correctly, It seems that other IOs will wait for all the links are
>>> done. this is a little different, is it?
>>
>> Yes, you're right, it also was briefly stated in the patch description
>> (see Cons). I hope, links + drain in the middle is an uncommon case.
>> But it can be added back, but may become a little bit uglier.
>>
>> What do you think, should we care about this case?
> 
> Yes, this is a very tiny scene. When I first time wrote this part of the
> code, my suggestion was to ban it directly.
> 
> For this patch, I am fine, Jens, what do you think.

I am fine with it as well, it'd be nice to get rid of needing that
extra request.

Was that a reviewed-by from you? It'd be nice to get these more
formally so I can add the attributes. I'll drop the other patch.

> The mailing list always rejects my mail, is my smtp server IP banned?

Probably because you have text/html in your email, the list is pretty
picky when it comes to anything that isn't just text/plain.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <[email protected]>]

* Re: [PATCH] io_uring: drain next sqe instead of shadowing
       [not found]                         ` <[email protected]>
@ 2019-11-21 13:54                           ` Jens Axboe
  0 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2019-11-21 13:54 UTC (permalink / raw)
  To: Jackie Liu, Pavel Begunkov; +Cc: io-uring

On 11/21/19 7:16 AM, Jackie Liu wrote:
> 在 2019/11/21 21:47, Jens Axboe 写道:
>> On 11/21/19 5:49 AM, Jackie Liu wrote:
>>>
>>>
>>> On 2019/11/21 20:40, Pavel Begunkov wrote:
>>>>> 在 2019/11/21 17:43, Pavel Begunkov 写道:
>>>>>> On 11/21/2019 12:26 PM, Jackie Liu wrote:
>>>>>>> 2019年11月21日 16:54，Pavel Begunkov <[email protected]> 写道：
>>>>>>>>
>>>>>>>> If there is a DRAIN in the middle of a link, it uses shadow req. Defer
>>>>>>>> the next request/link instead. This:
>>>>>>>>
>>>>>>>> Pros:
>>>>>>>> 1. removes semi-duplicated code
>>>>>>>> 2. doesn't allocate memory for shadows
>>>>>>>> 3. works better if only the head marked for drain
>>>>>>>
>>>>>>> I thought about this before, just only drain the head, but if the
>>>>>>> latter IO depends
>>>>>>> on the link-list, then latter IO will run in front of the link-list.
>>>>>>> If we think it
>>>>>>> is acceptable, then I think it is ok for me.
>>>>>>
>>>>>> If I got your point right, latter requests won't run ahead of the
>>>>>> link-list. There shouldn't be change of behaviour.
>>>>>>
>>>>>> The purpose of shadow requests is to mark some request right ahead of
>>>>>> the link for draining. This patch uses not a specially added shadow
>>>>>> request, but the following regular one. And, as drained IO shouldn't be
>>>>>> issued until every request behind completed, this should give the same
>>>>>> effect.
>>>>>>
>>>>>> Am I missed something?
>>>>>
>>>>> Thanks for explaining. This is also correct, if I understand
>>>>> correctly, It seems that other IOs will wait for all the links are
>>>>> done. this is a little different, is it?
>>>>
>>>> Yes, you're right, it also was briefly stated in the patch description
>>>> (see Cons). I hope, links + drain in the middle is an uncommon case.
>>>> But it can be added back, but may become a little bit uglier.
>>>>
>>>> What do you think, should we care about this case?
>>>
>>> Yes, this is a very tiny scene. When I first time wrote this part of the
>>> code, my suggestion was to ban it directly.
>>>
>>> For this patch, I am fine, Jens, what do you think.
>>
>> I am fine with it as well, it'd be nice to get rid of needing that
>> extra request.
>>
>> Was that a reviewed-by from you? It'd be nice to get these more
>> formally so I can add the attributes. I'll drop the other patch.
> 
> I want to do more tests. There is no test machine at this time, at least
> until tomorrow morning.

OK, no worries, for the record I did run it through testing this morning
and it passes for me. Before this (or my) patch, we'd stall pretty
quickly in the link_drain tested if run repeatedly.

>>> The mailing list always rejects my mail, is my smtp server IP banned?
>>
>> Probably because you have text/html in your email, the list is pretty
>> picky when it comes to anything that isn't just text/plain.
> 
> I don't know, I use thunderbird to write emails, and it has been set to
> plain text, perhaps because of the signature, I have to check my mail
> client.

Feel free to try and send an email to me personally, then I can see what
it looks like.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <[email protected]>]

* Re: [PATCH] io_uring: drain next sqe instead of shadowing
       [not found]                         ` <[email protected]>
@ 2019-11-21 14:28                           ` Pavel Begunkov
  2019-11-21 13:53                             ` Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2019-11-21 14:28 UTC (permalink / raw)
  To: Jackie Liu, Jens Axboe; +Cc: io-uring

>>
>> Was that a reviewed-by from you? It'd be nice to get these more
>> formally so I can add the attributes. I'll drop the other patch.
> 
> I want to do more tests. There is no test machine at this time, at least
> until tomorrow morning.
> 
BTW, aside from the locking problem, that it fixes another one. If
allocation for a shadow req is failed, io_submit_sqes() just continues
without it ignoring draining.

>>
>>> The mailing list always rejects my mail, is my smtp server IP banned?
>>
>> Probably because you have text/html in your email, the list is pretty
>> picky when it comes to anything that isn't just text/plain.
> 
> I don't know, I use thunderbird to write emails, and it has been set to
> plain text, perhaps because of the signature, I have to check my mail
> client.
> 
> -- 
> Jackie Liu
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] io_uring: drain next sqe instead of shadowing
  2019-11-21 14:28                           ` Pavel Begunkov
@ 2019-11-21 13:53                             ` Jens Axboe
  2019-11-21 15:23                               ` Pavel Begunkov
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2019-11-21 13:53 UTC (permalink / raw)
  To: Pavel Begunkov, Jackie Liu; +Cc: io-uring

On 11/21/19 7:28 AM, Pavel Begunkov wrote:
>>>
>>> Was that a reviewed-by from you? It'd be nice to get these more
>>> formally so I can add the attributes. I'll drop the other patch.
>>
>> I want to do more tests. There is no test machine at this time, at least
>> until tomorrow morning.
>>
> BTW, aside from the locking problem, that it fixes another one. If
> allocation for a shadow req is failed, io_submit_sqes() just continues
> without it ignoring draining.

Indeed. BTW, your commit message is way too weak for this patch. It
doesn't explain why we're making this change at all. I'm going to fix
it up.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] io_uring: drain next sqe instead of shadowing
  2019-11-21 13:53                             ` Jens Axboe
@ 2019-11-21 15:23                               ` Pavel Begunkov
  2019-11-21 13:50                                 ` Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2019-11-21 15:23 UTC (permalink / raw)
  To: Jens Axboe, Jackie Liu; +Cc: io-uring

On 11/21/2019 4:53 PM, Jens Axboe wrote:
> On 11/21/19 7:28 AM, Pavel Begunkov wrote:
>>>>
>>>> Was that a reviewed-by from you? It'd be nice to get these more
>>>> formally so I can add the attributes. I'll drop the other patch.
>>>
>>> I want to do more tests. There is no test machine at this time, at least
>>> until tomorrow morning.
>>>
>> BTW, aside from the locking problem, that it fixes another one. If
>> allocation for a shadow req is failed, io_submit_sqes() just continues
>> without it ignoring draining.
> 
> Indeed. BTW, your commit message is way too weak for this patch. It
> doesn't explain why we're making this change at all. I'm going to fix
> it up.
> 
Yeah, I were going to do that today. This one is of quick-before-leaving
kind, I didn't even looked up into the problem discussion properly.

It needs 2 problems statements + reported-by

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] io_uring: drain next sqe instead of shadowing
  2019-11-21 15:23                               ` Pavel Begunkov
@ 2019-11-21 13:50                                 ` Jens Axboe
  0 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2019-11-21 13:50 UTC (permalink / raw)
  To: Pavel Begunkov, Jackie Liu; +Cc: io-uring

On 11/21/19 8:23 AM, Pavel Begunkov wrote:
> On 11/21/2019 4:53 PM, Jens Axboe wrote:
>> On 11/21/19 7:28 AM, Pavel Begunkov wrote:
>>>>>
>>>>> Was that a reviewed-by from you? It'd be nice to get these more
>>>>> formally so I can add the attributes. I'll drop the other patch.
>>>>
>>>> I want to do more tests. There is no test machine at this time, at least
>>>> until tomorrow morning.
>>>>
>>> BTW, aside from the locking problem, that it fixes another one. If
>>> allocation for a shadow req is failed, io_submit_sqes() just continues
>>> without it ignoring draining.
>>
>> Indeed. BTW, your commit message is way too weak for this patch. It
>> doesn't explain why we're making this change at all. I'm going to fix
>> it up.
>>
> Yeah, I were going to do that today. This one is of quick-before-leaving
> kind, I didn't even looked up into the problem discussion properly.
> 
> It needs 2 problems statements + reported-by

Here's what I have:

https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.5/io_uring-post&id=b1417baddedff723b22a4817753625350cfa895a


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] io_uring: fix race with shadow drain deferrals
  2019-11-21  1:32   ` Jackie Liu
  2019-11-21  1:35     ` Jackie Liu
@ 2019-11-21  1:39     ` Jens Axboe
  1 sibling, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2019-11-21  1:39 UTC (permalink / raw)
  To: Jackie Liu; +Cc: io-uring

On 11/20/19 6:32 PM, Jackie Liu wrote:
> 2019年11月21日 07:58，Jens Axboe <[email protected]> 写道：
> 
>>
>> On 11/20/19 4:07 PM, Jens Axboe wrote:
>>> When we go and queue requests with drain, we check if we need to defer
>>> based on sequence. This is done safely under the lock, but then we drop
>>> the lock before actually inserting the shadow. If the original request
>>> is found on the deferred list by another completion in the mean time,
>>> it could have been started AND completed by the time we insert the
>>> shadow, which will stall the queue.
>>>
>>> After re-grabbing the completion lock, check if the original request is
>>> still in the deferred list. If it isn't, then we know that someone else
>>> already found and issued it. If that happened, then our job is done, we
>>> can simply free the shadow.
>>>
>>> Cc: Jackie Liu <[email protected]>
>>> Fixes: 4fe2c963154c ("io_uring: add support for link with drain")
>>> Signed-off-by: Jens Axboe <[email protected]>
>>
>> BTW, the other solution here is to not release the completion_lock if
>> we're going to return -EIOCBQUEUED, and let the caller do what it needs
>> before releasing it. That'd look something like this, with some sparse
>> annotations to keep things happy.
>>
>> I think the original I posted here is easier to follow, and the
>> deferral list is going to be tiny in general so it won't really add
>> any extra overhead.
>>
>> Let me know what you think and prefer.
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 6175e2e195c0..0d1f33bcedc0 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -2552,6 +2552,11 @@ static int io_async_cancel(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>> 	return 0;
>> }
>>
>> +/*
>> + * Returns with ctx->completion_lock held if -EIOCBQUEUED is returned, so
>> + * the caller can make decisions based on the deferral without worrying about
>> + * the request being found and issued in the mean time.
>> + */
>> static int io_req_defer(struct io_kiocb *req)
>> {
>> 	const struct io_uring_sqe *sqe = req->submit.sqe;
>> @@ -2579,7 +2584,7 @@ static int io_req_defer(struct io_kiocb *req)
>>
>> 	trace_io_uring_defer(ctx, req, false);
>> 	list_add_tail(&req->list, &ctx->defer_list);
>> -	spin_unlock_irq(&ctx->completion_lock);
>> +	__release(&ctx->completion_lock);
>> 	return -EIOCBQUEUED;
>> }
>>
>> @@ -2954,6 +2959,7 @@ static void __io_queue_sqe(struct io_kiocb *req)
>>
>> static void io_queue_sqe(struct io_kiocb *req)
>> {
>> +	struct io_ring_ctx *ctx = req->ctx;
>> 	int ret;
>>
>> 	ret = io_req_defer(req);
>> @@ -2963,6 +2969,9 @@ static void io_queue_sqe(struct io_kiocb *req)
>> 			if (req->flags & REQ_F_LINK)
>> 				req->flags |= REQ_F_FAIL_LINK;
>> 			io_double_put_req(req);
>> +		} else {
>> +			__acquire(&ctx->completion_lock);
>> +			spin_unlock_irq(&ctx->completion_lock);
>> 		}
>> 	} else
>> 		__io_queue_sqe(req);
>> @@ -3001,16 +3010,17 @@ static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
>> 				__io_free_req(shadow);
>> 			return;
>> 		}
>> +		__acquire(&ctx->completion_lock);
>> 	} else {
>> 		/*
>> 		 * If ret == 0 means that all IOs in front of link io are
>> 		 * running done. let's queue link head.
>> 		 */
>> 		need_submit = true;
>> +		spin_lock_irq(&ctx->completion_lock);
>> 	}
>>
>> 	/* Insert shadow req to defer_list, blocking next IOs */
>> -	spin_lock_irq(&ctx->completion_lock);
>> 	trace_io_uring_defer(ctx, shadow, true);
>> 	list_add_tail(&shadow->list, &ctx->defer_list);
>> 	spin_unlock_irq(&ctx->completion_lock);
> 
> This is indeed a potential lock issue, thanks, I am prefer this solution, clearer than first one.
> But It may be a bit difficult for other people who read the code, use 'io_req_defer_may_lock'?
> 
> who about this?

I really don't think that improves it, I'm afraid. The ugly part is not
the naming, it's the fact that -EIOCBQUEUED returns with the lock held
and having to deal with that. It would be cleaner to have a helper that
just has the lock held, but that is difficult to do since we also need
the sqe copy allocation.

There's also an issue with your patch and the unconditional unlock...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-11-21 15:26 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-11-20 23:07 [PATCH] io_uring: fix race with shadow drain deferrals Jens Axboe
2019-11-20 23:58 ` Jens Axboe
2019-11-21  1:32   ` Jackie Liu
2019-11-21  1:35     ` Jackie Liu
2019-11-21  1:40       ` Jens Axboe
2019-11-21  1:49         ` Jens Axboe
2019-11-21  1:57           ` Jackie Liu
2019-11-20 23:14             ` Jens Axboe
     [not found]               ` <[email protected]>
2019-11-20 23:03                 ` Jens Axboe
2019-11-21  8:54           ` [PATCH] io_uring: drain next sqe instead of shadowing Pavel Begunkov
     [not found]             ` <[email protected]>
2019-11-21  9:43               ` Pavel Begunkov
     [not found]                 ` <[email protected]>
2019-11-21 12:40                   ` Pavel Begunkov
     [not found]                     ` <[email protected]>
2019-11-21 13:47                       ` Jens Axboe
     [not found]                         ` <[email protected]>
2019-11-21 13:54                           ` Jens Axboe
     [not found]                         ` <[email protected]>
2019-11-21 14:28                           ` Pavel Begunkov
2019-11-21 13:53                             ` Jens Axboe
2019-11-21 15:23                               ` Pavel Begunkov
2019-11-21 13:50                                 ` Jens Axboe
2019-11-21  1:39     ` [PATCH] io_uring: fix race with shadow drain deferrals Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox