public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v2] io_uring: provide fallback request for OOM situations
@ 2019-11-08 21:25 Jens Axboe
  2019-11-18  6:57 ` Bob Liu
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2019-11-08 21:25 UTC (permalink / raw)
  To: io-uring

One thing that really sucks for userspace APIs is if the kernel passes
back -ENOMEM/-EAGAIN for resource shortages. The application really has
no idea of what to do in those cases. Should it try and reap
completions? Probably a good idea. Will it solve the issue? Who knows.

This patch adds a simple fallback mechanism if we fail to allocate
memory for a request. If we fail allocating memory from the slab for a
request, we punt to a pre-allocated request. There's just one of these
per io_ring_ctx, but the important part is if we ever return -EBUSY to
the application, the applications knows that it can wait for events and
make forward progress when events have completed. This is the important
part.

Signed-off-by: Jens Axboe <[email protected]>

---

Changes since v1:
- Get rid of the GFP_ATOMIC fallback, just provide the fallback. That
  should be plenty, and we probably don't want to dip into the atomic
  pool if GFP_KERNEL failed.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 1e4c1b7eac6e..81457913e9c9 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -238,6 +238,9 @@ struct io_ring_ctx {
 	/* 0 is for ctx quiesce/reinit/free, 1 is for sqo_thread started */
 	struct completion	*completions;
 
+	/* if all else fails... */
+	struct io_kiocb		*fallback_req;
+
 #if defined(CONFIG_UNIX)
 	struct socket		*ring_sock;
 #endif
@@ -407,6 +410,10 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	if (!ctx)
 		return NULL;
 
+	ctx->fallback_req = kmem_cache_alloc(req_cachep, GFP_KERNEL);
+	if (!ctx->fallback_req)
+		goto err;
+
 	ctx->completions = kmalloc(2 * sizeof(struct completion), GFP_KERNEL);
 	if (!ctx->completions)
 		goto err;
@@ -432,6 +439,8 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	INIT_LIST_HEAD(&ctx->inflight_list);
 	return ctx;
 err:
+	if (ctx->fallback_req)
+		kmem_cache_free(req_cachep, ctx->fallback_req);
 	kfree(ctx->completions);
 	kfree(ctx);
 	return NULL;
@@ -711,6 +720,23 @@ static void io_cqring_add_event(struct io_kiocb *req, long res)
 	io_cqring_ev_posted(ctx);
 }
 
+static inline bool io_is_fallback_req(struct io_kiocb *req)
+{
+	return req == (struct io_kiocb *)
+			((unsigned long) req->ctx->fallback_req & ~1UL);
+}
+
+static struct io_kiocb *io_get_fallback_req(struct io_ring_ctx *ctx)
+{
+	struct io_kiocb *req;
+
+	req = ctx->fallback_req;
+	if (!test_and_set_bit_lock(0, (unsigned long *) ctx->fallback_req))
+		return req;
+
+	return NULL;
+}
+
 static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx,
 				   struct io_submit_state *state)
 {
@@ -723,7 +749,7 @@ static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx,
 	if (!state) {
 		req = kmem_cache_alloc(req_cachep, gfp);
 		if (unlikely(!req))
-			goto out;
+			goto fallback;
 	} else if (!state->free_reqs) {
 		size_t sz;
 		int ret;
@@ -738,7 +764,7 @@ static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx,
 		if (unlikely(ret <= 0)) {
 			state->reqs[0] = kmem_cache_alloc(req_cachep, gfp);
 			if (!state->reqs[0])
-				goto out;
+				goto fallback;
 			ret = 1;
 		}
 		state->free_reqs = ret - 1;
@@ -750,6 +776,7 @@ static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx,
 		state->cur_req++;
 	}
 
+got_it:
 	req->file = NULL;
 	req->ctx = ctx;
 	req->flags = 0;
@@ -758,7 +785,10 @@ static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx,
 	req->result = 0;
 	INIT_IO_WORK(&req->work, io_wq_submit_work);
 	return req;
-out:
+fallback:
+	req = io_get_fallback_req(ctx);
+	if (req)
+		goto got_it;
 	percpu_ref_put(&ctx->refs);
 	return NULL;
 }
@@ -788,7 +818,10 @@ static void __io_free_req(struct io_kiocb *req)
 		spin_unlock_irqrestore(&ctx->inflight_lock, flags);
 	}
 	percpu_ref_put(&ctx->refs);
-	kmem_cache_free(req_cachep, req);
+	if (likely(!io_is_fallback_req(req)))
+		kmem_cache_free(req_cachep, req);
+	else
+		clear_bit_unlock(0, (unsigned long *) ctx->fallback_req);
 }
 
 static bool io_link_cancel_timeout(struct io_kiocb *req)
@@ -1000,8 +1033,8 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
 			 * completions for those, only batch free for fixed
 			 * file and non-linked commands.
 			 */
-			if ((req->flags & (REQ_F_FIXED_FILE|REQ_F_LINK)) ==
-			    REQ_F_FIXED_FILE) {
+			if (((req->flags & (REQ_F_FIXED_FILE|REQ_F_LINK)) ==
+			    REQ_F_FIXED_FILE) && !io_is_fallback_req(req)) {
 				reqs[to_free++] = req;
 				if (to_free == ARRAY_SIZE(reqs))
 					io_free_req_many(ctx, reqs, &to_free);
@@ -4119,6 +4152,7 @@ static void io_ring_ctx_free(struct io_ring_ctx *ctx)
 				ring_pages(ctx->sq_entries, ctx->cq_entries));
 	free_uid(ctx->user);
 	kfree(ctx->completions);
+	kmem_cache_free(req_cachep, ctx->fallback_req);
 	kfree(ctx);
 }
 
-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] io_uring: provide fallback request for OOM situations
  2019-11-08 21:25 [PATCH v2] io_uring: provide fallback request for OOM situations Jens Axboe
@ 2019-11-18  6:57 ` Bob Liu
  2019-11-18 14:32   ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Bob Liu @ 2019-11-18  6:57 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 11/9/19 5:25 AM, Jens Axboe wrote:
> One thing that really sucks for userspace APIs is if the kernel passes
> back -ENOMEM/-EAGAIN for resource shortages. The application really has
> no idea of what to do in those cases. Should it try and reap
> completions? Probably a good idea. Will it solve the issue? Who knows.
> 
> This patch adds a simple fallback mechanism if we fail to allocate
> memory for a request. If we fail allocating memory from the slab for a
> request, we punt to a pre-allocated request. There's just one of these
> per io_ring_ctx, but the important part is if we ever return -EBUSY to
> the application, the applications knows that it can wait for events and
> make forward progress when events have completed. This is the important
> part.
> 

I'm lost how -EBUSY will be returned if allocating from the pre-allocated request.
Could you please explain a bit more? 

Thanks, -Bob

> Signed-off-by: Jens Axboe <[email protected]>
> 
> ---
> 
> Changes since v1:
> - Get rid of the GFP_ATOMIC fallback, just provide the fallback. That
>   should be plenty, and we probably don't want to dip into the atomic
>   pool if GFP_KERNEL failed.
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 1e4c1b7eac6e..81457913e9c9 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -238,6 +238,9 @@ struct io_ring_ctx {
>  	/* 0 is for ctx quiesce/reinit/free, 1 is for sqo_thread started */
>  	struct completion	*completions;
>  
> +	/* if all else fails... */
> +	struct io_kiocb		*fallback_req;
> +
>  #if defined(CONFIG_UNIX)
>  	struct socket		*ring_sock;
>  #endif
> @@ -407,6 +410,10 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
>  	if (!ctx)
>  		return NULL;
>  
> +	ctx->fallback_req = kmem_cache_alloc(req_cachep, GFP_KERNEL);
> +	if (!ctx->fallback_req)
> +		goto err;
> +
>  	ctx->completions = kmalloc(2 * sizeof(struct completion), GFP_KERNEL);
>  	if (!ctx->completions)
>  		goto err;
> @@ -432,6 +439,8 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
>  	INIT_LIST_HEAD(&ctx->inflight_list);
>  	return ctx;
>  err:
> +	if (ctx->fallback_req)
> +		kmem_cache_free(req_cachep, ctx->fallback_req);
>  	kfree(ctx->completions);
>  	kfree(ctx);
>  	return NULL;
> @@ -711,6 +720,23 @@ static void io_cqring_add_event(struct io_kiocb *req, long res)
>  	io_cqring_ev_posted(ctx);
>  }
>  
> +static inline bool io_is_fallback_req(struct io_kiocb *req)
> +{
> +	return req == (struct io_kiocb *)
> +			((unsigned long) req->ctx->fallback_req & ~1UL);
> +}
> +
> +static struct io_kiocb *io_get_fallback_req(struct io_ring_ctx *ctx)
> +{
> +	struct io_kiocb *req;
> +
> +	req = ctx->fallback_req;
> +	if (!test_and_set_bit_lock(0, (unsigned long *) ctx->fallback_req))
> +		return req;
> +
> +	return NULL;
> +}
> +
>  static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx,
>  				   struct io_submit_state *state)
>  {
> @@ -723,7 +749,7 @@ static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx,
>  	if (!state) {
>  		req = kmem_cache_alloc(req_cachep, gfp);
>  		if (unlikely(!req))
> -			goto out;
> +			goto fallback;
>  	} else if (!state->free_reqs) {
>  		size_t sz;
>  		int ret;
> @@ -738,7 +764,7 @@ static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx,
>  		if (unlikely(ret <= 0)) {
>  			state->reqs[0] = kmem_cache_alloc(req_cachep, gfp);
>  			if (!state->reqs[0])
> -				goto out;
> +				goto fallback;
>  			ret = 1;
>  		}
>  		state->free_reqs = ret - 1;
> @@ -750,6 +776,7 @@ static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx,
>  		state->cur_req++;
>  	}
>  
> +got_it:
>  	req->file = NULL;
>  	req->ctx = ctx;
>  	req->flags = 0;
> @@ -758,7 +785,10 @@ static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx,
>  	req->result = 0;
>  	INIT_IO_WORK(&req->work, io_wq_submit_work);
>  	return req;
> -out:
> +fallback:
> +	req = io_get_fallback_req(ctx);
> +	if (req)
> +		goto got_it;
>  	percpu_ref_put(&ctx->refs);
>  	return NULL;
>  }
> @@ -788,7 +818,10 @@ static void __io_free_req(struct io_kiocb *req)
>  		spin_unlock_irqrestore(&ctx->inflight_lock, flags);
>  	}
>  	percpu_ref_put(&ctx->refs);
> -	kmem_cache_free(req_cachep, req);
> +	if (likely(!io_is_fallback_req(req)))
> +		kmem_cache_free(req_cachep, req);
> +	else
> +		clear_bit_unlock(0, (unsigned long *) ctx->fallback_req);
>  }
>  
>  static bool io_link_cancel_timeout(struct io_kiocb *req)
> @@ -1000,8 +1033,8 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
>  			 * completions for those, only batch free for fixed
>  			 * file and non-linked commands.
>  			 */
> -			if ((req->flags & (REQ_F_FIXED_FILE|REQ_F_LINK)) ==
> -			    REQ_F_FIXED_FILE) {
> +			if (((req->flags & (REQ_F_FIXED_FILE|REQ_F_LINK)) ==
> +			    REQ_F_FIXED_FILE) && !io_is_fallback_req(req)) {
>  				reqs[to_free++] = req;
>  				if (to_free == ARRAY_SIZE(reqs))
>  					io_free_req_many(ctx, reqs, &to_free);
> @@ -4119,6 +4152,7 @@ static void io_ring_ctx_free(struct io_ring_ctx *ctx)
>  				ring_pages(ctx->sq_entries, ctx->cq_entries));
>  	free_uid(ctx->user);
>  	kfree(ctx->completions);
> +	kmem_cache_free(req_cachep, ctx->fallback_req);
>  	kfree(ctx);
>  }
>  
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] io_uring: provide fallback request for OOM situations
  2019-11-18  6:57 ` Bob Liu
@ 2019-11-18 14:32   ` Jens Axboe
  2019-11-19  9:22     ` Bob Liu
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2019-11-18 14:32 UTC (permalink / raw)
  To: Bob Liu, io-uring

On 11/17/19 11:57 PM, Bob Liu wrote:
> On 11/9/19 5:25 AM, Jens Axboe wrote:
>> One thing that really sucks for userspace APIs is if the kernel passes
>> back -ENOMEM/-EAGAIN for resource shortages. The application really has
>> no idea of what to do in those cases. Should it try and reap
>> completions? Probably a good idea. Will it solve the issue? Who knows.
>>
>> This patch adds a simple fallback mechanism if we fail to allocate
>> memory for a request. If we fail allocating memory from the slab for a
>> request, we punt to a pre-allocated request. There's just one of these
>> per io_ring_ctx, but the important part is if we ever return -EBUSY to
>> the application, the applications knows that it can wait for events and
>> make forward progress when events have completed. This is the important
>> part.
>>
> 
> I'm lost how -EBUSY will be returned if allocating from the pre-allocated request.
> Could you please explain a bit more?

The patch actually returns -EAGAIN, not -EBUSY... The last -EBUSY
mention in that commit message should be -EAGAIN.

But the point is that if you get a busy return back, then you know that
things are moving forward as we have a backup request. This is a similar
concept to the mempools we have in the kernel, have any kind of reserve
guarantees forward progress.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] io_uring: provide fallback request for OOM situations
  2019-11-18 14:32   ` Jens Axboe
@ 2019-11-19  9:22     ` Bob Liu
  0 siblings, 0 replies; 4+ messages in thread
From: Bob Liu @ 2019-11-19  9:22 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 11/18/19 10:32 PM, Jens Axboe wrote:
> On 11/17/19 11:57 PM, Bob Liu wrote:
>> On 11/9/19 5:25 AM, Jens Axboe wrote:
>>> One thing that really sucks for userspace APIs is if the kernel passes
>>> back -ENOMEM/-EAGAIN for resource shortages. The application really has
>>> no idea of what to do in those cases. Should it try and reap
>>> completions? Probably a good idea. Will it solve the issue? Who knows.
>>>
>>> This patch adds a simple fallback mechanism if we fail to allocate
>>> memory for a request. If we fail allocating memory from the slab for a
>>> request, we punt to a pre-allocated request. There's just one of these
>>> per io_ring_ctx, but the important part is if we ever return -EBUSY to
>>> the application, the applications knows that it can wait for events and
>>> make forward progress when events have completed. This is the important
>>> part.
>>>
>>
>> I'm lost how -EBUSY will be returned if allocating from the pre-allocated request.
>> Could you please explain a bit more?
> 
> The patch actually returns -EAGAIN, not -EBUSY... The last -EBUSY
> mention in that commit message should be -EAGAIN.
> 
> But the point is that if you get a busy return back, then you know that
> things are moving forward as we have a backup request. This is a similar
> concept to the mempools we have in the kernel, have any kind of reserve
> guarantees forward progress.
> 

I see.
But there are two more potential place may fail to allocate memory,
'shadow_req = io_get_req()' and 'sqe_copy = kmalloc()'.

We may need one more pre-allocated request and make sure pre-allocated req can't be deferred?
So as to guarantee things can really moving forward.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-11-19  9:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-11-08 21:25 [PATCH v2] io_uring: provide fallback request for OOM situations Jens Axboe
2019-11-18  6:57 ` Bob Liu
2019-11-18 14:32   ` Jens Axboe
2019-11-19  9:22     ` Bob Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox