Re: [PATCH] io_uring: store back buffer in case of failure

public inbox for [email protected]
 help / color / mirror / Atom feed

* Re: [PATCH] io_uring: store back buffer in case of failure
       [not found] <[email protected]>
@ 2021-06-15 10:01 ` Pavel Begunkov
  2021-06-15 21:51   ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Begunkov @ 2021-06-15 10:01 UTC (permalink / raw)
  To: Olivier Langlois, Jens Axboe, io-uring, linux-kernel

On 6/11/21 8:46 PM, Olivier Langlois wrote:
> the man page says the following:
> 
>     If succesful, the resulting CQE will have IORING_CQE_F_BUFFER set
> in the flags part of the struct, and the upper IORING_CQE_BUFFER_SHIFT
> bits will contain the ID of the selected buffers.
> 
> in order to respect this contract, the buffer is stored back in case of
> an error. There are several reasons to do that:
> 
> 1. doing otherwise is counter-intuitive and error-prone (I cannot think
> of a single example of a syscall failing and still require the user to
> free the allocated resources). Especially when the documention
> explicitly mention that this is the behavior to expect.
> 
> 2. it is inefficient because the buffer is unneeded since there is no
> data to transfer back to the user and the buffer will need to be
> returned back to io_uring to avoid a leak.
> 
> Signed-off-by: Olivier Langlois <[email protected]>
> ---
>  fs/io_uring.c | 97 +++++++++++++++++++++++++++++++++------------------
>  1 file changed, 64 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 42380ed563c4..502d7cd81a8c 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
[...]
> +static unsigned int io_put_kbuf(struct io_kiocb *req, struct io_buffer *kbuf,
> +				u16 bgid, long res, unsigned int issue_flags)
>  {
>  	unsigned int cflags;
> +	struct io_ring_ctx *ctx = req->ctx;
> +	struct io_buffer *head;
> +	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>  
>  	cflags = kbuf->bid << IORING_CQE_BUFFER_SHIFT;
>  	cflags |= IORING_CQE_F_BUFFER;
>  	req->flags &= ~REQ_F_BUFFER_SELECTED;
> -	kfree(kbuf);
> +
> +	/*
> +	 * Theoritically, res == 0 could be included as well but that would
> +	 * break the contract established in the man page saying that
> +	 * a buffer is returned if the operation is successful.
> +	 */
> +	if (unlikely(res < 0)) {
> +		io_ring_submit_lock(ctx, !force_nonblock);

io_complete_rw() is called from an IRQ context, so it can't sleep/wait.

> +
> +		lockdep_assert_held(&ctx->uring_lock);
> +
> +		head = xa_load(&ctx->io_buffers, bgid);
> +		if (head) {
> +			list_add_tail(&kbuf->list, &head->list);
> +			cflags = 0;
> +		} else {
> +			INIT_LIST_HEAD(&kbuf->list);
> +			if (!xa_insert(&ctx->io_buffers, bgid, kbuf, GFP_KERNEL))
> +				cflags = 0;
> +		}
> +		io_ring_submit_unlock(ctx, !force_nonblock);
> +	}
> +	if (cflags)
> +		kfree(kbuf);
>  	return cflags;
>  }
>  
[...]
> -static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret)
> +static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret,
> +			      unsigned int issue_flags)
>  {
>  	switch (ret) {
>  	case -EIOCBQUEUED:
> @@ -2728,7 +2775,7 @@ static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret)
>  		ret = -EINTR;
>  		fallthrough;
>  	default:
> -		kiocb->ki_complete(kiocb, ret, 0);
> +		kiocb->ki_complete(kiocb, ret, issue_flags);

Don't remember what the second argument of .ki_complete is for,
but it definitely should not be used for issue_flags. E.g. because
we get two different meanings for it depending on a context --
either a result from the block layer or issue_flags.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] io_uring: store back buffer in case of failure
  2021-06-15 10:01 ` [PATCH] io_uring: store back buffer in case of failure Pavel Begunkov
@ 2021-06-15 21:51   ` Jens Axboe
  2021-06-16 13:42     ` Olivier Langlois
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2021-06-15 21:51 UTC (permalink / raw)
  To: Pavel Begunkov, Olivier Langlois, io-uring, linux-kernel

Ditto for this one, don't see it in my email nor on the list.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] io_uring: store back buffer in case of failure
  2021-06-15 21:51   ` Jens Axboe
@ 2021-06-16 13:42     ` Olivier Langlois
  2021-06-16 14:01       ` Pavel Begunkov
  0 siblings, 1 reply; 7+ messages in thread
From: Olivier Langlois @ 2021-06-16 13:42 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring, linux-kernel

On Tue, 2021-06-15 at 15:51 -0600, Jens Axboe wrote:
> Ditto for this one, don't see it in my email nor on the list.
> 
I can resend you a private copy of this one but as Pavel pointed out,
it contains fatal flaws.

So unless someone can tell me that the idea is interesting and has
potential and can give me some a hint or 2 about how to address the
challenges to fix the current flaws, it is pretty much a show stopper
to me and I think that I am going to let it go...



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] io_uring: store back buffer in case of failure
  2021-06-16 13:42     ` Olivier Langlois
@ 2021-06-16 14:01       ` Pavel Begunkov
  2021-06-16 14:44         ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Begunkov @ 2021-06-16 14:01 UTC (permalink / raw)
  To: Olivier Langlois, Jens Axboe, io-uring, linux-kernel

On 6/16/21 2:42 PM, Olivier Langlois wrote:
> On Tue, 2021-06-15 at 15:51 -0600, Jens Axboe wrote:
>> Ditto for this one, don't see it in my email nor on the list.
>>
> I can resend you a private copy of this one but as Pavel pointed out,
> it contains fatal flaws.
> 
> So unless someone can tell me that the idea is interesting and has
> potential and can give me some a hint or 2 about how to address the
> challenges to fix the current flaws, it is pretty much a show stopper
> to me and I think that I am going to let it go...

It'd need to go through some other context, e.g. task context.
task_work_add() + custom handler would work, either buf-select
synchronisation can be reworked, but both would rather be
bulky and not great.

I'm more curious if we ever hit REQ_F_BUFFER_SELECTED branch in
io_clean_op(), because it would just drop the id on the floor...
It might use a WARN_ONCE.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] io_uring: store back buffer in case of failure
  2021-06-16 14:01       ` Pavel Begunkov
@ 2021-06-16 14:44         ` Jens Axboe
  2021-06-16 15:37           ` Pavel Begunkov
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2021-06-16 14:44 UTC (permalink / raw)
  To: Pavel Begunkov, Olivier Langlois, io-uring, linux-kernel

On 6/16/21 8:01 AM, Pavel Begunkov wrote:
> On 6/16/21 2:42 PM, Olivier Langlois wrote:
>> On Tue, 2021-06-15 at 15:51 -0600, Jens Axboe wrote:
>>> Ditto for this one, don't see it in my email nor on the list.
>>>
>> I can resend you a private copy of this one but as Pavel pointed out,
>> it contains fatal flaws.
>>
>> So unless someone can tell me that the idea is interesting and has
>> potential and can give me some a hint or 2 about how to address the
>> challenges to fix the current flaws, it is pretty much a show stopper
>> to me and I think that I am going to let it go...
> 
> It'd need to go through some other context, e.g. task context.
> task_work_add() + custom handler would work, either buf-select
> synchronisation can be reworked, but both would rather be
> bulky and not great.

Indeed - that'd solve both the passing around of locking state which
I really don't like, and make it much simpler. Just use task work for
the re-insert, and you can grab the ring lock unconditionally from
there.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] io_uring: store back buffer in case of failure
  2021-06-16 14:44         ` Jens Axboe
@ 2021-06-16 15:37           ` Pavel Begunkov
  2021-06-18 21:26             ` Olivier Langlois
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Begunkov @ 2021-06-16 15:37 UTC (permalink / raw)
  To: Jens Axboe, Olivier Langlois, io-uring, linux-kernel

On 6/16/21 3:44 PM, Jens Axboe wrote:
> On 6/16/21 8:01 AM, Pavel Begunkov wrote:
>> On 6/16/21 2:42 PM, Olivier Langlois wrote:
>>> On Tue, 2021-06-15 at 15:51 -0600, Jens Axboe wrote:
>>>> Ditto for this one, don't see it in my email nor on the list.
>>>>
>>> I can resend you a private copy of this one but as Pavel pointed out,
>>> it contains fatal flaws.
>>>
>>> So unless someone can tell me that the idea is interesting and has
>>> potential and can give me some a hint or 2 about how to address the
>>> challenges to fix the current flaws, it is pretty much a show stopper
>>> to me and I think that I am going to let it go...
>>
>> It'd need to go through some other context, e.g. task context.
>> task_work_add() + custom handler would work, either buf-select
>> synchronisation can be reworked, but both would rather be
>> bulky and not great.
> 
> Indeed - that'd solve both the passing around of locking state which
> I really don't like, and make it much simpler. Just use task work for
> the re-insert, and you can grab the ring lock unconditionally from
> there.

Hmm, it might be much simpler than I thought if we allocate
a separate struct callback_head, i.e. task_work, queued it
with exactly task_work_add() but not io_req_task_work_add(),
and continue with the request handler. 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] io_uring: store back buffer in case of failure
  2021-06-16 15:37           ` Pavel Begunkov
@ 2021-06-18 21:26             ` Olivier Langlois
  0 siblings, 0 replies; 7+ messages in thread
From: Olivier Langlois @ 2021-06-18 21:26 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring, linux-kernel

On Wed, 2021-06-16 at 16:37 +0100, Pavel Begunkov wrote:
> On 6/16/21 3:44 PM, Jens Axboe wrote:
> > On 6/16/21 8:01 AM, Pavel Begunkov wrote:
> > > On 6/16/21 2:42 PM, Olivier Langlois wrote:
> > > > On Tue, 2021-06-15 at 15:51 -0600, Jens Axboe wrote:
> > > > > Ditto for this one, don't see it in my email nor on the list.
> > > > > 
> > > > I can resend you a private copy of this one but as Pavel
> > > > pointed out,
> > > > it contains fatal flaws.
> > > > 
> > > > So unless someone can tell me that the idea is interesting and
> > > > has
> > > > potential and can give me some a hint or 2 about how to address
> > > > the
> > > > challenges to fix the current flaws, it is pretty much a show
> > > > stopper
> > > > to me and I think that I am going to let it go...
> > > 
> > > It'd need to go through some other context, e.g. task context.
> > > task_work_add() + custom handler would work, either buf-select
> > > synchronisation can be reworked, but both would rather be
> > > bulky and not great.
> > 
> > Indeed - that'd solve both the passing around of locking state
> > which
> > I really don't like, and make it much simpler. Just use task work
> > for
> > the re-insert, and you can grab the ring lock unconditionally from
> > there.
> 
> Hmm, it might be much simpler than I thought if we allocate
> a separate struct callback_head, i.e. task_work, queued it
> with exactly task_work_add() but not io_req_task_work_add(),
> and continue with the request handler. 
> 
ok thx a lot for the excellent suggestions! I think that you have
provided me everything that I need to give a shot for a second version
of this patch.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-06-18 21:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <[email protected]>
2021-06-15 10:01 ` [PATCH] io_uring: store back buffer in case of failure Pavel Begunkov
2021-06-15 21:51   ` Jens Axboe
2021-06-16 13:42     ` Olivier Langlois
2021-06-16 14:01       ` Pavel Begunkov
2021-06-16 14:44         ` Jens Axboe
2021-06-16 15:37           ` Pavel Begunkov
2021-06-18 21:26             ` Olivier Langlois

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox