From: Ming Lei <[email protected]>
To: Keith Busch <[email protected]>
Cc: Keith Busch <[email protected]>,
[email protected], [email protected],
[email protected], [email protected],
[email protected]
Subject: Re: [PATCHv2 4/6] ublk: zc register/unregister bvec
Date: Wed, 12 Feb 2025 17:24:34 +0800 [thread overview]
Message-ID: <Z6xo0mhJDRa0eaxv@fedora> (raw)
In-Reply-To: <Z6wfXijUX_6Q3HiC@kbusch-mbp>
On Tue, Feb 11, 2025 at 09:11:10PM -0700, Keith Busch wrote:
> On Wed, Feb 12, 2025 at 10:49:15AM +0800, Ming Lei wrote:
> > On Mon, Feb 10, 2025 at 04:56:44PM -0800, Keith Busch wrote:
> > > From: Keith Busch <[email protected]>
> > >
> > > Provide new operations for the user to request mapping an active request
> > > to an io uring instance's buf_table. The user has to provide the index
> > > it wants to install the buffer.
> > >
> > > A reference count is taken on the request to ensure it can't be
> > > completed while it is active in a ring's buf_table.
> > >
> > > Signed-off-by: Keith Busch <[email protected]>
> > > ---
> > > drivers/block/ublk_drv.c | 145 +++++++++++++++++++++++++---------
> > > include/uapi/linux/ublk_cmd.h | 4 +
> > > 2 files changed, 113 insertions(+), 36 deletions(-)
> > >
> > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > index 529085181f355..ccfda7b2c24da 100644
> > > --- a/drivers/block/ublk_drv.c
> > > +++ b/drivers/block/ublk_drv.c
> > > @@ -51,6 +51,9 @@
> > > /* private ioctl command mirror */
> > > #define UBLK_CMD_DEL_DEV_ASYNC _IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC)
> > >
> > > +#define UBLK_IO_REGISTER_IO_BUF _IOC_NR(UBLK_U_IO_REGISTER_IO_BUF)
> > > +#define UBLK_IO_UNREGISTER_IO_BUF _IOC_NR(UBLK_U_IO_UNREGISTER_IO_BUF)
> >
> > UBLK_IO_REGISTER_IO_BUF command may be completed, and buffer isn't used
> > by RW_FIXED yet in the following cases:
> >
> > - application doesn't submit any RW_FIXED consumer OP
> >
> > - io_uring_enter() only issued UBLK_IO_REGISTER_IO_BUF, and the other
> > OPs can't be issued because of out of resource
> >
> > ...
> >
> > Then io_uring_enter() returns, and the application is panic or killed,
> > how to avoid buffer leak?
>
> The death of the uring that registered the node tears down the table
> that it's registered with, which releases its reference. All good.
OK, looks I miss the point.
io_sqe_buffers_unregister() is called from io_ring_ctx_free(), when the
registered buffer can be released.
However, it still may cause use-after-free on this request which has
been failed from io_uring_try_cancel_uring_cmd(), and please see the
following code path:
io_uring_try_cancel_requests
io_uring_try_cancel_uring_cmd
ublk_uring_cmd_cancel_fn
ublk_abort_requests
ublk_abort_queue
__ublk_fail_req
ublk_put_req_ref
The above race needs to be covered.
>
> > It need to deal with in io_uring cancel code for calling ->release() if
> > the kbuffer node isn't released.
>
> There should be no situation here where it isn't released after its use
> is completed. Either the resource was gracefully unregistered or the
> ring close while it was still active, but either one drops its
> reference.
>
> > UBLK_IO_UNREGISTER_IO_BUF still need to call ->release() if the node
> > buffer isn't used.
>
> Only once the last reference is dropped. Which should happen no matter
> which way the node is freed.
>
> > > +static void ublk_io_release(void *priv)
> > > +{
> > > + struct request *rq = priv;
> > > + struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> > > +
> > > + ublk_put_req_ref(ubq, rq);
> > > +}
> >
> > It isn't enough to just get & put request reference here between registering
> > buffer and freeing the registered node buf, because the same reference can be
> > dropped from ublk_commit_completion() which is from queueing
> > UBLK_IO_COMMIT_AND_FETCH_REQ, and buggy app may queue this command multiple
> > times for freeing the request.
> >
> > One solution is to not allow request completion until the ->release() is
> > returned.
>
> Double completions are tricky because the same request id can be reused
> pretty quickly and there's no immediate way to tell if the 2nd
> completion is a double or a genuine completion of the reused request.
>
> We have rotating sequence numbers in the nvme driver to try to detect a
> similar situation. So far it hasn't revealed any real bugs as far as I
> know. This feels like the other side screwed up and that's their fault.
Not same with nvme, in which nvme controller won't run DMA on this buffer
after the 1st completion.
The ublk request buffer has been leased to io_uring for running read_fixed/write_fixed,
meantime it is freed and reused by kernel for other purpose.
As I mentioned, it can be solved by not allowing to complete the IO
command if the buffer is leased to io_uring.
Thanks,
Ming
next prev parent reply other threads:[~2025-02-12 9:24 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-11 0:56 [PATCHv2 0/6] ublk zero-copy support Keith Busch
2025-02-11 0:56 ` [PATCHv2 1/6] io_uring: use node for import Keith Busch
2025-02-11 0:56 ` [PATCHv2 2/6] io_uring: create resource release callback Keith Busch
2025-02-13 1:31 ` Pavel Begunkov
2025-02-13 1:58 ` Keith Busch
2025-02-13 13:06 ` Pavel Begunkov
2025-02-11 0:56 ` [PATCHv2 3/6] io_uring: add support for kernel registered bvecs Keith Busch
2025-02-13 1:33 ` Pavel Begunkov
2025-02-14 3:30 ` Ming Lei
2025-02-14 15:26 ` Keith Busch
2025-02-15 1:34 ` Ming Lei
2025-02-18 20:34 ` Keith Busch
2025-02-11 0:56 ` [PATCHv2 4/6] ublk: zc register/unregister bvec Keith Busch
2025-02-12 2:49 ` Ming Lei
2025-02-12 4:11 ` Keith Busch
2025-02-12 9:24 ` Ming Lei [this message]
2025-02-12 14:59 ` Keith Busch
2025-02-13 2:12 ` Pavel Begunkov
2025-02-11 0:56 ` [PATCHv2 5/6] io_uring: add abstraction for buf_table rsrc data Keith Busch
2025-02-11 0:56 ` [PATCHv2 6/6] io_uring: cache nodes and mapped buffers Keith Busch
2025-02-11 16:47 ` Keith Busch
2025-02-12 2:29 ` [PATCHv2 0/6] ublk zero-copy support Ming Lei
2025-02-12 15:28 ` Keith Busch
2025-02-12 16:06 ` Pavel Begunkov
2025-02-13 1:52 ` Ming Lei
2025-02-13 15:12 ` lizetao
2025-02-13 16:06 ` Keith Busch
2025-02-14 3:39 ` lizetao
2025-02-14 2:41 ` Ming Lei
2025-02-14 4:21 ` lizetao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6xo0mhJDRa0eaxv@fedora \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox