* [RFC 00/13] uring-passthru for nvme [not found] <CGME20211220142227epcas5p280851b0a62baa78379979eb81af7a096@epcas5p2.samsung.com> @ 2021-12-20 14:17 ` Kanchan Joshi [not found] ` <CGME20211220142228epcas5p2978d92d38f2015148d5f72913d6dbc3e@epcas5p2.samsung.com> ` (13 more replies) 0 siblings, 14 replies; 25+ messages in thread From: Kanchan Joshi @ 2021-12-20 14:17 UTC (permalink / raw) To: io-uring, linux-nvme, linux-block Cc: axboe, hch, kbusch, javier, anuj20.g, joshiiitr, pankydev8 Here is a revamped series on uring-passthru which is on top of Jens "nvme-passthru-wip.2" branch. https://git.kernel.dk/cgit/linux-block/commit/?h=nvme-passthru-wip.2 This scales much better than before with the addition of following: - plugging - passthru polling (sync and async; sync part comes from a patch that Keith did earlier) - bio-cache (this is regardless of irq/polling since we submit/complete in task-contex anyway. Currently kicks in when fixed-buffer option is also passed, but that's primarily to keep the plumbing simple) Also the feedback from Christoph (previous fixed-buffer series) is in which has streamlined the plumbing. I look forward to further feedback/comments. KIOPS(512b) on P5800x looked like this: QD uring pt uring-poll pt-poll 8 538 589 831 902 64 967 1131 1351 1378 256 1043 1230 1376 1429 Here uring is operating on block-interface (nvme0n1) while 'pt' refers to uring-passthru operating on char-interface (ng0n1). Perf/testing is with this custom fio that turnes regular io into passthru on supplying "uring_cmd=1" option. https://github.com/joshkan/fio/tree/nvme-passthru-wip-polling Example command-line: fio -iodepth=256 -rw=randread -ioengine=io_uring -bs=512 -numjobs=1 -runtime=60 -group_reporting -iodepth_batch_submit=64 -iodepth_batch_complete_min=1 -iodepth_batch_complete_max=64 -fixedbufs=1 -hipri=1 -sqthread_poll=0 -filename=/dev/ng0n1 -name=io_uring_256 -uring_cmd=1 background/context: https://linuxplumbersconf.org/event/11/contributions/989/attachments/747/1723/lpc-2021-building-a-fast-passthru.pdf Changes from v5: https://lore.kernel.org/linux-nvme/[email protected]/ 1. Fixed-buffer passthru with same ioctl code + other feedback from hch 2. Plugging (from Jens) 3. Sync polling (from Keith) 3. Async polling via io_uring 4. Enable bio-cache for fixed-buffer passthru Changes from v4: https://lore.kernel.org/linux-nvme/[email protected]/ 1. Moved to v5 branch of Jens, adapted to task-work changes in io_uring 2. Removed support for block-passthrough (over nvme0n1) for now 3. Added support for char-passthrough (over ng0n1) 4. Added fixed-buffer passthrough in io_uring and nvme plumbing Anuj Gupta (3): io_uring: mark iopoll not supported for uring-cmd io_uring: modify unused field in io_uring_cmd to store flags io_uring: add support for uring_cmd with fixed-buffer Jens Axboe (2): io_uring: plug for async bypass block: wire-up support for plugging Kanchan Joshi (6): io_uring: add infra for uring_cmd completion in submitter-task nvme: wire-up support for async-passthru on char-device. io_uring: add flag and helper for fixed-buffer uring-cmd nvme: enable passthrough with fixed-buffer block: factor out helper for bio allocation from cache nvme: enable bio-cache for fixed-buffer passthru Keith Busch (1): nvme: allow user passthrough commands to poll Pankaj Raghav (1): nvme: Add async passthru polling support block/bio.c | 43 +++-- block/blk-map.c | 46 ++++++ block/blk-mq.c | 93 +++++------ drivers/nvme/host/core.c | 21 ++- drivers/nvme/host/ioctl.c | 271 ++++++++++++++++++++++++++++---- drivers/nvme/host/multipath.c | 2 + drivers/nvme/host/nvme.h | 13 +- drivers/nvme/host/pci.c | 4 +- drivers/nvme/target/passthru.c | 2 +- fs/io_uring.c | 113 +++++++++++-- include/linux/bio.h | 1 + include/linux/blk-mq.h | 4 + include/linux/io_uring.h | 26 ++- include/uapi/linux/io_uring.h | 6 +- include/uapi/linux/nvme_ioctl.h | 4 + 15 files changed, 542 insertions(+), 107 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <CGME20211220142228epcas5p2978d92d38f2015148d5f72913d6dbc3e@epcas5p2.samsung.com>]
* [RFC 01/13] io_uring: add infra for uring_cmd completion in submitter-task [not found] ` <CGME20211220142228epcas5p2978d92d38f2015148d5f72913d6dbc3e@epcas5p2.samsung.com> @ 2021-12-20 14:17 ` Kanchan Joshi 2022-02-17 2:13 ` Luis Chamberlain 0 siblings, 1 reply; 25+ messages in thread From: Kanchan Joshi @ 2021-12-20 14:17 UTC (permalink / raw) To: io-uring, linux-nvme, linux-block Cc: axboe, hch, kbusch, javier, anuj20.g, joshiiitr, pankydev8 Completion of a uring_cmd ioctl may involve referencing certain ioctl-specific fields, requiring original submitter context. Export an API that driver can use for this purpose. The API facilitates reusing task-work infra of io_uring, while driver gets to implement cmd-specific handling in a callback. Signed-off-by: Kanchan Joshi <[email protected]> Signed-off-by: Anuj Gupta <[email protected]> --- fs/io_uring.c | 16 ++++++++++++++++ include/linux/io_uring.h | 8 ++++++++ 2 files changed, 24 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index e96ed3d0385e..246f1085404d 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2450,6 +2450,22 @@ static void io_req_task_submit(struct io_kiocb *req, bool *locked) io_req_complete_failed(req, -EFAULT); } +static void io_uring_cmd_work(struct io_kiocb *req, bool *locked) +{ + req->uring_cmd.driver_cb(&req->uring_cmd); +} + +void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, + void (*driver_cb)(struct io_uring_cmd *)) +{ + struct io_kiocb *req = container_of(ioucmd, struct io_kiocb, uring_cmd); + + req->uring_cmd.driver_cb = driver_cb; + req->io_task_work.func = io_uring_cmd_work; + io_req_task_work_add(req, !!(req->ctx->flags & IORING_SETUP_SQPOLL)); +} +EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task); + static void io_req_task_queue_fail(struct io_kiocb *req, int ret) { req->result = ret; diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 64e788b39a86..f4b4990a3b62 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -14,11 +14,15 @@ struct io_uring_cmd { __u16 op; __u16 unused; __u32 len; + /* used if driver requires update in task context*/ + void (*driver_cb)(struct io_uring_cmd *cmd); __u64 pdu[5]; /* 40 bytes available inline for free use */ }; #if defined(CONFIG_IO_URING) void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret); +void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, + void (*driver_cb)(struct io_uring_cmd *)); struct sock *io_uring_get_socket(struct file *file); void __io_uring_cancel(bool cancel_all); void __io_uring_free(struct task_struct *tsk); @@ -42,6 +46,10 @@ static inline void io_uring_free(struct task_struct *tsk) static inline void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret) { } +static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, + void (*driver_cb)(struct io_uring_cmd *)) +{ +} static inline struct sock *io_uring_get_socket(struct file *file) { return NULL; -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [RFC 01/13] io_uring: add infra for uring_cmd completion in submitter-task 2021-12-20 14:17 ` [RFC 01/13] io_uring: add infra for uring_cmd completion in submitter-task Kanchan Joshi @ 2022-02-17 2:13 ` Luis Chamberlain 2022-02-17 15:39 ` Kanchan Joshi 0 siblings, 1 reply; 25+ messages in thread From: Luis Chamberlain @ 2022-02-17 2:13 UTC (permalink / raw) To: Kanchan Joshi Cc: io-uring, linux-nvme, linux-block, axboe, hch, kbusch, javier, anuj20.g, joshiiitr, pankydev8 On Mon, Dec 20, 2021 at 07:47:22PM +0530, Kanchan Joshi wrote: > Completion of a uring_cmd ioctl may involve referencing certain > ioctl-specific fields, requiring original submitter context. > Export an API that driver can use for this purpose. > The API facilitates reusing task-work infra of io_uring, while driver > gets to implement cmd-specific handling in a callback. > > Signed-off-by: Kanchan Joshi <[email protected]> > Signed-off-by: Anuj Gupta <[email protected]> > --- > fs/io_uring.c | 16 ++++++++++++++++ > include/linux/io_uring.h | 8 ++++++++ > 2 files changed, 24 insertions(+) > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index e96ed3d0385e..246f1085404d 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -2450,6 +2450,22 @@ static void io_req_task_submit(struct io_kiocb *req, bool *locked) > io_req_complete_failed(req, -EFAULT); > } > > +static void io_uring_cmd_work(struct io_kiocb *req, bool *locked) > +{ > + req->uring_cmd.driver_cb(&req->uring_cmd); If the callback memory area is gone, boom. > +} > + > +void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, > + void (*driver_cb)(struct io_uring_cmd *)) Adding kdoc style comment for this would be nice. Please document the context that is allowed. > +{ > + struct io_kiocb *req = container_of(ioucmd, struct io_kiocb, uring_cmd); > + > + req->uring_cmd.driver_cb = driver_cb; > + req->io_task_work.func = io_uring_cmd_work; > + io_req_task_work_add(req, !!(req->ctx->flags & IORING_SETUP_SQPOLL)); This can schedules, and so the callback may go fishing in the meantime. > +} > +EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task); > + > static void io_req_task_queue_fail(struct io_kiocb *req, int ret) > { > req->result = ret; > diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h > index 64e788b39a86..f4b4990a3b62 100644 > --- a/include/linux/io_uring.h > +++ b/include/linux/io_uring.h > @@ -14,11 +14,15 @@ struct io_uring_cmd { > __u16 op; > __u16 unused; > __u32 len; > + /* used if driver requires update in task context*/ By using kdoc above youcan remove this comment. > + void (*driver_cb)(struct io_uring_cmd *cmd); So we'd need a struct module here I think if we're going to defer this into memory which can be removed. Luis ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 01/13] io_uring: add infra for uring_cmd completion in submitter-task 2022-02-17 2:13 ` Luis Chamberlain @ 2022-02-17 15:39 ` Kanchan Joshi 2022-02-17 15:50 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Kanchan Joshi @ 2022-02-17 15:39 UTC (permalink / raw) To: Luis Chamberlain Cc: Kanchan Joshi, io-uring, linux-nvme, linux-block, Jens Axboe, Christoph Hellwig, Keith Busch, Javier González, Anuj Gupta, Pankaj Raghav On Thu, Feb 17, 2022 at 7:43 AM Luis Chamberlain <[email protected]> wrote: > > On Mon, Dec 20, 2021 at 07:47:22PM +0530, Kanchan Joshi wrote: > > Completion of a uring_cmd ioctl may involve referencing certain > > ioctl-specific fields, requiring original submitter context. > > Export an API that driver can use for this purpose. > > The API facilitates reusing task-work infra of io_uring, while driver > > gets to implement cmd-specific handling in a callback. > > > > Signed-off-by: Kanchan Joshi <[email protected]> > > Signed-off-by: Anuj Gupta <[email protected]> > > --- > > fs/io_uring.c | 16 ++++++++++++++++ > > include/linux/io_uring.h | 8 ++++++++ > > 2 files changed, 24 insertions(+) > > > > diff --git a/fs/io_uring.c b/fs/io_uring.c > > index e96ed3d0385e..246f1085404d 100644 > > --- a/fs/io_uring.c > > +++ b/fs/io_uring.c > > @@ -2450,6 +2450,22 @@ static void io_req_task_submit(struct io_kiocb *req, bool *locked) > > io_req_complete_failed(req, -EFAULT); > > } > > > > +static void io_uring_cmd_work(struct io_kiocb *req, bool *locked) > > +{ > > + req->uring_cmd.driver_cb(&req->uring_cmd); > > If the callback memory area is gone, boom. Why will the memory area be gone? Module removal is protected because try_module_get is done anyway when the namespace was opened. > > +} > > + > > +void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, > > + void (*driver_cb)(struct io_uring_cmd *)) > > Adding kdoc style comment for this would be nice. Please document > the context that is allowed. Sure, for all kdoc style comments. Will add that in the next version. > > +{ > > + struct io_kiocb *req = container_of(ioucmd, struct io_kiocb, uring_cmd); > > + > > + req->uring_cmd.driver_cb = driver_cb; > > + req->io_task_work.func = io_uring_cmd_work; > > + io_req_task_work_add(req, !!(req->ctx->flags & IORING_SETUP_SQPOLL)); > > This can schedules, and so the callback may go fishing in the meantime. io_req_task_work_add is safe to be called in atomic context. FWIW, io_uring uses this for regular (i.e. direct block) io completion too. > > +} > > +EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task); > > + > > static void io_req_task_queue_fail(struct io_kiocb *req, int ret) > > { > > req->result = ret; > > diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h > > index 64e788b39a86..f4b4990a3b62 100644 > > --- a/include/linux/io_uring.h > > +++ b/include/linux/io_uring.h > > @@ -14,11 +14,15 @@ struct io_uring_cmd { > > __u16 op; > > __u16 unused; > > __u32 len; > > + /* used if driver requires update in task context*/ > > By using kdoc above youcan remove this comment. > > > + void (*driver_cb)(struct io_uring_cmd *cmd); > > So we'd need a struct module here I think if we're going to > defer this into memory which can be removed. > Same as the previous module-removal comment.Do we really need that? Thanks, -- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 01/13] io_uring: add infra for uring_cmd completion in submitter-task 2022-02-17 15:39 ` Kanchan Joshi @ 2022-02-17 15:50 ` Jens Axboe 2022-02-17 17:56 ` Luis Chamberlain 2022-02-17 18:46 ` Luis Chamberlain 0 siblings, 2 replies; 25+ messages in thread From: Jens Axboe @ 2022-02-17 15:50 UTC (permalink / raw) To: Kanchan Joshi, Luis Chamberlain Cc: Kanchan Joshi, io-uring, linux-nvme, linux-block, Christoph Hellwig, Keith Busch, Javier González, Anuj Gupta, Pankaj Raghav On 2/17/22 8:39 AM, Kanchan Joshi wrote: > On Thu, Feb 17, 2022 at 7:43 AM Luis Chamberlain <[email protected]> wrote: >> >> On Mon, Dec 20, 2021 at 07:47:22PM +0530, Kanchan Joshi wrote: >>> Completion of a uring_cmd ioctl may involve referencing certain >>> ioctl-specific fields, requiring original submitter context. >>> Export an API that driver can use for this purpose. >>> The API facilitates reusing task-work infra of io_uring, while driver >>> gets to implement cmd-specific handling in a callback. >>> >>> Signed-off-by: Kanchan Joshi <[email protected]> >>> Signed-off-by: Anuj Gupta <[email protected]> >>> --- >>> fs/io_uring.c | 16 ++++++++++++++++ >>> include/linux/io_uring.h | 8 ++++++++ >>> 2 files changed, 24 insertions(+) >>> >>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>> index e96ed3d0385e..246f1085404d 100644 >>> --- a/fs/io_uring.c >>> +++ b/fs/io_uring.c >>> @@ -2450,6 +2450,22 @@ static void io_req_task_submit(struct io_kiocb *req, bool *locked) >>> io_req_complete_failed(req, -EFAULT); >>> } >>> >>> +static void io_uring_cmd_work(struct io_kiocb *req, bool *locked) >>> +{ >>> + req->uring_cmd.driver_cb(&req->uring_cmd); >> >> If the callback memory area is gone, boom. > > Why will the memory area be gone? > Module removal is protected because try_module_get is done anyway when > the namespace was opened. And the req isn't going away before it's completed. >>> +{ >>> + struct io_kiocb *req = container_of(ioucmd, struct io_kiocb, uring_cmd); >>> + >>> + req->uring_cmd.driver_cb = driver_cb; >>> + req->io_task_work.func = io_uring_cmd_work; >>> + io_req_task_work_add(req, !!(req->ctx->flags & IORING_SETUP_SQPOLL)); >> >> This can schedules, and so the callback may go fishing in the meantime. > > io_req_task_work_add is safe to be called in atomic context. FWIW, > io_uring uses this for regular (i.e. direct block) io completion too. Correct, it doesn't schedule and is safe from irq context as long as the task is pinned (which it is, via the req itself). -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 01/13] io_uring: add infra for uring_cmd completion in submitter-task 2022-02-17 15:50 ` Jens Axboe @ 2022-02-17 17:56 ` Luis Chamberlain 2022-02-18 17:41 ` Kanchan Joshi 2022-02-17 18:46 ` Luis Chamberlain 1 sibling, 1 reply; 25+ messages in thread From: Luis Chamberlain @ 2022-02-17 17:56 UTC (permalink / raw) To: Jens Axboe Cc: Kanchan Joshi, Kanchan Joshi, io-uring, linux-nvme, linux-block, Christoph Hellwig, Keith Busch, Javier González, Anuj Gupta, Pankaj Raghav On Thu, Feb 17, 2022 at 08:50:59AM -0700, Jens Axboe wrote: > On 2/17/22 8:39 AM, Kanchan Joshi wrote: > > On Thu, Feb 17, 2022 at 7:43 AM Luis Chamberlain <[email protected]> wrote: > >> > >> On Mon, Dec 20, 2021 at 07:47:22PM +0530, Kanchan Joshi wrote: > >>> Completion of a uring_cmd ioctl may involve referencing certain > >>> ioctl-specific fields, requiring original submitter context. > >>> Export an API that driver can use for this purpose. > >>> The API facilitates reusing task-work infra of io_uring, while driver > >>> gets to implement cmd-specific handling in a callback. > >>> > >>> Signed-off-by: Kanchan Joshi <[email protected]> > >>> Signed-off-by: Anuj Gupta <[email protected]> > >>> --- > >>> fs/io_uring.c | 16 ++++++++++++++++ > >>> include/linux/io_uring.h | 8 ++++++++ > >>> 2 files changed, 24 insertions(+) > >>> > >>> diff --git a/fs/io_uring.c b/fs/io_uring.c > >>> index e96ed3d0385e..246f1085404d 100644 > >>> --- a/fs/io_uring.c > >>> +++ b/fs/io_uring.c > >>> @@ -2450,6 +2450,22 @@ static void io_req_task_submit(struct io_kiocb *req, bool *locked) > >>> io_req_complete_failed(req, -EFAULT); > >>> } > >>> > >>> +static void io_uring_cmd_work(struct io_kiocb *req, bool *locked) > >>> +{ > >>> + req->uring_cmd.driver_cb(&req->uring_cmd); > >> > >> If the callback memory area is gone, boom. > > > > Why will the memory area be gone? > > Module removal is protected because try_module_get is done anyway when > > the namespace was opened. > > And the req isn't going away before it's completed. Groovy, it would be nice to add a little /* comment */ to just remind the reader? > >>> +{ > >>> + struct io_kiocb *req = container_of(ioucmd, struct io_kiocb, uring_cmd); > >>> + > >>> + req->uring_cmd.driver_cb = driver_cb; > >>> + req->io_task_work.func = io_uring_cmd_work; > >>> + io_req_task_work_add(req, !!(req->ctx->flags & IORING_SETUP_SQPOLL)); > >> > >> This can schedules, and so the callback may go fishing in the meantime. > > > > io_req_task_work_add is safe to be called in atomic context. FWIW, > > io_uring uses this for regular (i.e. direct block) io completion too. > > Correct, it doesn't schedule and is safe from irq context as long as the > task is pinned (which it is, via the req itself). Great, a kdoc explaining the routine and that it can be called from atomic context and the rationale would be very useful to users. And .. so the callback *must* be safe in atomic context too or can it sleep? Luis ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 01/13] io_uring: add infra for uring_cmd completion in submitter-task 2022-02-17 17:56 ` Luis Chamberlain @ 2022-02-18 17:41 ` Kanchan Joshi 0 siblings, 0 replies; 25+ messages in thread From: Kanchan Joshi @ 2022-02-18 17:41 UTC (permalink / raw) To: Luis Chamberlain Cc: Jens Axboe, Kanchan Joshi, io-uring, linux-nvme, linux-block, Christoph Hellwig, Keith Busch, Javier González, Anuj Gupta, Pankaj Raghav On Thu, Feb 17, 2022 at 11:26 PM Luis Chamberlain <[email protected]> wrote: > > On Thu, Feb 17, 2022 at 08:50:59AM -0700, Jens Axboe wrote: > > On 2/17/22 8:39 AM, Kanchan Joshi wrote: > > > On Thu, Feb 17, 2022 at 7:43 AM Luis Chamberlain <[email protected]> wrote: > > >> > > >> On Mon, Dec 20, 2021 at 07:47:22PM +0530, Kanchan Joshi wrote: > > >>> Completion of a uring_cmd ioctl may involve referencing certain > > >>> ioctl-specific fields, requiring original submitter context. > > >>> Export an API that driver can use for this purpose. > > >>> The API facilitates reusing task-work infra of io_uring, while driver > > >>> gets to implement cmd-specific handling in a callback. > > >>> > > >>> Signed-off-by: Kanchan Joshi <[email protected]> > > >>> Signed-off-by: Anuj Gupta <[email protected]> > > >>> --- > > >>> fs/io_uring.c | 16 ++++++++++++++++ > > >>> include/linux/io_uring.h | 8 ++++++++ > > >>> 2 files changed, 24 insertions(+) > > >>> > > >>> diff --git a/fs/io_uring.c b/fs/io_uring.c > > >>> index e96ed3d0385e..246f1085404d 100644 > > >>> --- a/fs/io_uring.c > > >>> +++ b/fs/io_uring.c > > >>> @@ -2450,6 +2450,22 @@ static void io_req_task_submit(struct io_kiocb *req, bool *locked) > > >>> io_req_complete_failed(req, -EFAULT); > > >>> } > > >>> > > >>> +static void io_uring_cmd_work(struct io_kiocb *req, bool *locked) > > >>> +{ > > >>> + req->uring_cmd.driver_cb(&req->uring_cmd); > > >> > > >> If the callback memory area is gone, boom. > > > > > > Why will the memory area be gone? > > > Module removal is protected because try_module_get is done anyway when > > > the namespace was opened. > > > > And the req isn't going away before it's completed. > > Groovy, it would be nice to add a little /* comment */ to just remind > the reader? > > > >>> +{ > > >>> + struct io_kiocb *req = container_of(ioucmd, struct io_kiocb, uring_cmd); > > >>> + > > >>> + req->uring_cmd.driver_cb = driver_cb; > > >>> + req->io_task_work.func = io_uring_cmd_work; > > >>> + io_req_task_work_add(req, !!(req->ctx->flags & IORING_SETUP_SQPOLL)); > > >> > > >> This can schedules, and so the callback may go fishing in the meantime. > > > > > > io_req_task_work_add is safe to be called in atomic context. FWIW, > > > io_uring uses this for regular (i.e. direct block) io completion too. > > > > Correct, it doesn't schedule and is safe from irq context as long as the > > task is pinned (which it is, via the req itself). > > Great, a kdoc explaining the routine and that it can be called from > atomic context and the rationale would be very useful to users. And .. > so the callback *must* be safe in atomic context too or can it sleep? Callback will be invoked in task/process context. Allowing much more to do than we can in atomic context, including sleep if necessary. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 01/13] io_uring: add infra for uring_cmd completion in submitter-task 2022-02-17 15:50 ` Jens Axboe 2022-02-17 17:56 ` Luis Chamberlain @ 2022-02-17 18:46 ` Luis Chamberlain 2022-02-17 18:53 ` Jens Axboe 1 sibling, 1 reply; 25+ messages in thread From: Luis Chamberlain @ 2022-02-17 18:46 UTC (permalink / raw) To: Jens Axboe Cc: Kanchan Joshi, Kanchan Joshi, io-uring, linux-nvme, linux-block, Christoph Hellwig, Keith Busch, Javier González, Anuj Gupta, Pankaj Raghav On Thu, Feb 17, 2022 at 08:50:59AM -0700, Jens Axboe wrote: > On 2/17/22 8:39 AM, Kanchan Joshi wrote: > > On Thu, Feb 17, 2022 at 7:43 AM Luis Chamberlain <[email protected]> wrote: > >> > >> On Mon, Dec 20, 2021 at 07:47:22PM +0530, Kanchan Joshi wrote: > >>> Completion of a uring_cmd ioctl may involve referencing certain > >>> ioctl-specific fields, requiring original submitter context. > >>> Export an API that driver can use for this purpose. > >>> The API facilitates reusing task-work infra of io_uring, while driver > >>> gets to implement cmd-specific handling in a callback. > >>> > >>> Signed-off-by: Kanchan Joshi <[email protected]> > >>> Signed-off-by: Anuj Gupta <[email protected]> > >>> --- > >>> fs/io_uring.c | 16 ++++++++++++++++ > >>> include/linux/io_uring.h | 8 ++++++++ > >>> 2 files changed, 24 insertions(+) > >>> > >>> diff --git a/fs/io_uring.c b/fs/io_uring.c > >>> index e96ed3d0385e..246f1085404d 100644 > >>> --- a/fs/io_uring.c > >>> +++ b/fs/io_uring.c > >>> @@ -2450,6 +2450,22 @@ static void io_req_task_submit(struct io_kiocb *req, bool *locked) > >>> io_req_complete_failed(req, -EFAULT); > >>> } > >>> > >>> +static void io_uring_cmd_work(struct io_kiocb *req, bool *locked) > >>> +{ > >>> + req->uring_cmd.driver_cb(&req->uring_cmd); > >> > >> If the callback memory area is gone, boom. > > > > Why will the memory area be gone? > > Module removal is protected because try_module_get is done anyway when > > the namespace was opened. > > And the req isn't going away before it's completed. Just to be clear, I was thinking outside of the block layer context too. Does this still hold true? Luis ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 01/13] io_uring: add infra for uring_cmd completion in submitter-task 2022-02-17 18:46 ` Luis Chamberlain @ 2022-02-17 18:53 ` Jens Axboe 0 siblings, 0 replies; 25+ messages in thread From: Jens Axboe @ 2022-02-17 18:53 UTC (permalink / raw) To: Luis Chamberlain Cc: Kanchan Joshi, Kanchan Joshi, io-uring, linux-nvme, linux-block, Christoph Hellwig, Keith Busch, Javier González, Anuj Gupta, Pankaj Raghav On 2/17/22 11:46 AM, Luis Chamberlain wrote: > On Thu, Feb 17, 2022 at 08:50:59AM -0700, Jens Axboe wrote: >> On 2/17/22 8:39 AM, Kanchan Joshi wrote: >>> On Thu, Feb 17, 2022 at 7:43 AM Luis Chamberlain <[email protected]> wrote: >>>> >>>> On Mon, Dec 20, 2021 at 07:47:22PM +0530, Kanchan Joshi wrote: >>>>> Completion of a uring_cmd ioctl may involve referencing certain >>>>> ioctl-specific fields, requiring original submitter context. >>>>> Export an API that driver can use for this purpose. >>>>> The API facilitates reusing task-work infra of io_uring, while driver >>>>> gets to implement cmd-specific handling in a callback. >>>>> >>>>> Signed-off-by: Kanchan Joshi <[email protected]> >>>>> Signed-off-by: Anuj Gupta <[email protected]> >>>>> --- >>>>> fs/io_uring.c | 16 ++++++++++++++++ >>>>> include/linux/io_uring.h | 8 ++++++++ >>>>> 2 files changed, 24 insertions(+) >>>>> >>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>>>> index e96ed3d0385e..246f1085404d 100644 >>>>> --- a/fs/io_uring.c >>>>> +++ b/fs/io_uring.c >>>>> @@ -2450,6 +2450,22 @@ static void io_req_task_submit(struct io_kiocb *req, bool *locked) >>>>> io_req_complete_failed(req, -EFAULT); >>>>> } >>>>> >>>>> +static void io_uring_cmd_work(struct io_kiocb *req, bool *locked) >>>>> +{ >>>>> + req->uring_cmd.driver_cb(&req->uring_cmd); >>>> >>>> If the callback memory area is gone, boom. >>> >>> Why will the memory area be gone? >>> Module removal is protected because try_module_get is done anyway when >>> the namespace was opened. >> >> And the req isn't going away before it's completed. > > Just to be clear, I was thinking outside of the block layer context too. > Does this still hold true? It has nothing to do with the block layer, the req belongs to io_uring context. io_uring has ownership of it. -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <CGME20211220142231epcas5p1482c78f91feabdbc3e62341790ab22e1@epcas5p1.samsung.com>]
* [RFC 02/13] nvme: wire-up support for async-passthru on char-device. [not found] ` <CGME20211220142231epcas5p1482c78f91feabdbc3e62341790ab22e1@epcas5p1.samsung.com> @ 2021-12-20 14:17 ` Kanchan Joshi 0 siblings, 0 replies; 25+ messages in thread From: Kanchan Joshi @ 2021-12-20 14:17 UTC (permalink / raw) To: io-uring, linux-nvme, linux-block Cc: axboe, hch, kbusch, javier, anuj20.g, joshiiitr, pankydev8 Introduce handlers for fops->async_cmd(), implementing async passthru on char device (including the multipath one). The handlers supports NVME_IOCTL_IO64_CMD. Signed-off-by: Kanchan Joshi <[email protected]> Signed-off-by: Anuj Gupta <[email protected]> --- drivers/nvme/host/core.c | 1 + drivers/nvme/host/ioctl.c | 147 +++++++++++++++++++++++++++++++--- drivers/nvme/host/multipath.c | 1 + drivers/nvme/host/nvme.h | 5 ++ 4 files changed, 145 insertions(+), 9 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 290f26ed74c2..bce2e93d14a3 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3670,6 +3670,7 @@ static const struct file_operations nvme_ns_chr_fops = { .release = nvme_ns_chr_release, .unlocked_ioctl = nvme_ns_chr_ioctl, .compat_ioctl = compat_ptr_ioctl, + .async_cmd = nvme_ns_chr_async_cmd, }; static int nvme_add_ns_cdev(struct nvme_ns *ns) diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index 22314962842d..7d9c51d9c0a8 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -18,6 +18,84 @@ static void __user *nvme_to_user_ptr(uintptr_t ptrval) ptrval = (compat_uptr_t)ptrval; return (void __user *)ptrval; } +/* This overlays struct io_uring_cmd pdu (40 bytes) */ +struct nvme_uring_cmd { + u32 ioctl_cmd; + u32 meta_len; + void __user *argp; + union { + struct bio *bio; + struct request *req; + }; + void *meta; /* kernel-resident buffer */ + void __user *meta_buffer; +}; + +static struct nvme_uring_cmd *nvme_uring_cmd(struct io_uring_cmd *ioucmd) +{ + return (struct nvme_uring_cmd *)&ioucmd->pdu; +} + +static void nvme_pt_task_cb(struct io_uring_cmd *ioucmd) +{ + struct nvme_uring_cmd *cmd = nvme_uring_cmd(ioucmd); + struct nvme_passthru_cmd64 __user *ptcmd64 = cmd->argp; + struct request *req = cmd->req; + int status; + u64 result; + + if (nvme_req(req)->flags & NVME_REQ_CANCELLED) + status = -EINTR; + else + status = nvme_req(req)->status; + result = le64_to_cpu(nvme_req(req)->result.u64); + + /* we can free request */ + blk_mq_free_request(req); + + if (cmd->meta) { + if (status) + if (copy_to_user(cmd->meta_buffer, cmd->meta, cmd->meta_len)) + status = -EFAULT; + kfree(cmd->meta); + } + + if (put_user(result, &ptcmd64->result)) + status = -EFAULT; + io_uring_cmd_done(ioucmd, status); +} + +static void nvme_end_async_pt(struct request *req, blk_status_t err) +{ + struct io_uring_cmd *ioucmd = req->end_io_data; + struct nvme_uring_cmd *cmd = nvme_uring_cmd(ioucmd); + /* extract bio before reusing the same field for request */ + struct bio *bio = cmd->bio; + + cmd->req = req; + /* this takes care of setting up task-work */ + io_uring_cmd_complete_in_task(ioucmd, nvme_pt_task_cb); + blk_rq_unmap_user(bio); +} + +static void nvme_setup_uring_cmd_data(struct request *rq, + struct io_uring_cmd *ioucmd, void *meta, + void __user *meta_buffer, u32 meta_len, bool write) +{ + struct nvme_uring_cmd *cmd = nvme_uring_cmd(ioucmd); + + /* to free bio on completion, as req->bio will be null at that time */ + cmd->bio = rq->bio; + /* meta update is required only for read requests */ + if (meta && !write) { + cmd->meta = meta; + cmd->meta_buffer = meta_buffer; + cmd->meta_len = meta_len; + } else { + cmd->meta = NULL; + } + rq->end_io_data = ioucmd; +} static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf, unsigned len, u32 seed, bool write) @@ -56,7 +134,8 @@ static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf, static int nvme_submit_user_cmd(struct request_queue *q, struct nvme_command *cmd, void __user *ubuffer, unsigned bufflen, void __user *meta_buffer, unsigned meta_len, - u32 meta_seed, u64 *result, unsigned timeout) + u32 meta_seed, u64 *result, unsigned timeout, + struct io_uring_cmd *ioucmd) { bool write = nvme_is_write(cmd); struct nvme_ns *ns = q->queuedata; @@ -92,6 +171,12 @@ static int nvme_submit_user_cmd(struct request_queue *q, req->cmd_flags |= REQ_INTEGRITY; } } + if (ioucmd) { /* async dispatch */ + nvme_setup_uring_cmd_data(req, ioucmd, meta, meta_buffer, + meta_len, write); + blk_execute_rq_nowait(req, 0, nvme_end_async_pt); + return 0; + } ret = nvme_execute_passthru_rq(req); if (result) @@ -170,7 +255,8 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio) return nvme_submit_user_cmd(ns->queue, &c, nvme_to_user_ptr(io.addr), length, - metadata, meta_len, lower_32_bits(io.slba), NULL, 0); + metadata, meta_len, lower_32_bits(io.slba), NULL, 0, + NULL); } static bool nvme_validate_passthru_nsid(struct nvme_ctrl *ctrl, @@ -224,7 +310,7 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns, status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c, nvme_to_user_ptr(cmd.addr), cmd.data_len, nvme_to_user_ptr(cmd.metadata), cmd.metadata_len, - 0, &result, timeout); + 0, &result, timeout, NULL); if (status >= 0) { if (put_user(result, &ucmd->result)) @@ -235,7 +321,8 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns, } static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns, - struct nvme_passthru_cmd64 __user *ucmd) + struct nvme_passthru_cmd64 __user *ucmd, + struct io_uring_cmd *ioucmd) { struct nvme_passthru_cmd64 cmd; struct nvme_command c; @@ -270,9 +357,9 @@ static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns, status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c, nvme_to_user_ptr(cmd.addr), cmd.data_len, nvme_to_user_ptr(cmd.metadata), cmd.metadata_len, - 0, &cmd.result, timeout); + 0, &cmd.result, timeout, ioucmd); - if (status >= 0) { + if (!ioucmd && status >= 0) { if (put_user(cmd.result, &ucmd->result)) return -EFAULT; } @@ -296,7 +383,7 @@ static int nvme_ctrl_ioctl(struct nvme_ctrl *ctrl, unsigned int cmd, case NVME_IOCTL_ADMIN_CMD: return nvme_user_cmd(ctrl, NULL, argp); case NVME_IOCTL_ADMIN64_CMD: - return nvme_user_cmd64(ctrl, NULL, argp); + return nvme_user_cmd64(ctrl, NULL, argp, NULL); default: return sed_ioctl(ctrl->opal_dev, cmd, argp); } @@ -340,7 +427,7 @@ static int nvme_ns_ioctl(struct nvme_ns *ns, unsigned int cmd, case NVME_IOCTL_SUBMIT_IO: return nvme_submit_io(ns, argp); case NVME_IOCTL_IO64_CMD: - return nvme_user_cmd64(ns->ctrl, ns, argp); + return nvme_user_cmd64(ns->ctrl, ns, argp, NULL); default: return -ENOTTY; } @@ -369,6 +456,33 @@ long nvme_ns_chr_ioctl(struct file *file, unsigned int cmd, unsigned long arg) return __nvme_ioctl(ns, cmd, (void __user *)arg); } +static int nvme_ns_async_ioctl(struct nvme_ns *ns, struct io_uring_cmd *ioucmd) +{ + struct nvme_uring_cmd *cmd = nvme_uring_cmd(ioucmd); + int ret; + + switch (cmd->ioctl_cmd) { + case NVME_IOCTL_IO64_CMD: + ret = nvme_user_cmd64(ns->ctrl, ns, cmd->argp, ioucmd); + break; + default: + ret = -ENOTTY; + } + + if (ret >= 0) + ret = -EIOCBQUEUED; + return ret; +} + +int nvme_ns_chr_async_cmd(struct io_uring_cmd *ioucmd, + enum io_uring_cmd_flags flags) +{ + struct nvme_ns *ns = container_of(file_inode(ioucmd->file)->i_cdev, + struct nvme_ns, cdev); + + return nvme_ns_async_ioctl(ns, ioucmd); +} + #ifdef CONFIG_NVME_MULTIPATH static int nvme_ns_head_ctrl_ioctl(struct nvme_ns *ns, unsigned int cmd, void __user *argp, struct nvme_ns_head *head, int srcu_idx) @@ -412,6 +526,21 @@ int nvme_ns_head_ioctl(struct block_device *bdev, fmode_t mode, return ret; } +int nvme_ns_head_chr_async_cmd(struct io_uring_cmd *ioucmd, + enum io_uring_cmd_flags flags) +{ + struct cdev *cdev = file_inode(ioucmd->file)->i_cdev; + struct nvme_ns_head *head = container_of(cdev, struct nvme_ns_head, cdev); + int srcu_idx = srcu_read_lock(&head->srcu); + struct nvme_ns *ns = nvme_find_path(head); + int ret = -EWOULDBLOCK; + + if (ns) + ret = nvme_ns_async_ioctl(ns, ioucmd); + srcu_read_unlock(&head->srcu, srcu_idx); + return ret; +} + long nvme_ns_head_chr_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -480,7 +609,7 @@ long nvme_dev_ioctl(struct file *file, unsigned int cmd, case NVME_IOCTL_ADMIN_CMD: return nvme_user_cmd(ctrl, NULL, argp); case NVME_IOCTL_ADMIN64_CMD: - return nvme_user_cmd64(ctrl, NULL, argp); + return nvme_user_cmd64(ctrl, NULL, argp, NULL); case NVME_IOCTL_IO_CMD: return nvme_dev_user_cmd(ctrl, argp); case NVME_IOCTL_RESET: diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 13e5d503ed07..1e59c8e06622 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -423,6 +423,7 @@ static const struct file_operations nvme_ns_head_chr_fops = { .release = nvme_ns_head_chr_release, .unlocked_ioctl = nvme_ns_head_chr_ioctl, .compat_ioctl = compat_ptr_ioctl, + .async_cmd = nvme_ns_head_chr_async_cmd, }; static int nvme_add_ns_head_cdev(struct nvme_ns_head *head) diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 9b095ee01364..9a901b954a87 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -16,6 +16,7 @@ #include <linux/rcupdate.h> #include <linux/wait.h> #include <linux/t10-pi.h> +#include <linux/io_uring.h> #include <trace/events/block.h> @@ -747,6 +748,10 @@ long nvme_ns_head_chr_ioctl(struct file *file, unsigned int cmd, unsigned long arg); long nvme_dev_ioctl(struct file *file, unsigned int cmd, unsigned long arg); +int nvme_ns_chr_async_cmd(struct io_uring_cmd *ucmd, + enum io_uring_cmd_flags flags); +int nvme_ns_head_chr_async_cmd(struct io_uring_cmd *ucmd, + enum io_uring_cmd_flags flags); int nvme_getgeo(struct block_device *bdev, struct hd_geometry *geo); extern const struct attribute_group *nvme_ns_id_attr_groups[]; -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
[parent not found: <CGME20211220142233epcas5p3b54aa591fb7b81bfb58bc33b5f92a2d3@epcas5p3.samsung.com>]
* [RFC 03/13] io_uring: mark iopoll not supported for uring-cmd [not found] ` <CGME20211220142233epcas5p3b54aa591fb7b81bfb58bc33b5f92a2d3@epcas5p3.samsung.com> @ 2021-12-20 14:17 ` Kanchan Joshi 2022-02-17 2:16 ` Luis Chamberlain 0 siblings, 1 reply; 25+ messages in thread From: Kanchan Joshi @ 2021-12-20 14:17 UTC (permalink / raw) To: io-uring, linux-nvme, linux-block Cc: axboe, hch, kbusch, javier, anuj20.g, joshiiitr, pankydev8 From: Anuj Gupta <[email protected]> Currently uring-passthrough doesn't support iopoll. Bail out to avoid the panic. Signed-off-by: Anuj Gupta <[email protected]> --- fs/io_uring.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 246f1085404d..1061b4cde4be 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4131,6 +4131,11 @@ static int io_uring_cmd_prep(struct io_kiocb *req, if (!req->file->f_op->async_cmd) return -EOPNOTSUPP; + if (req->ctx->flags & IORING_SETUP_IOPOLL) { + printk_once(KERN_WARNING "io_uring: iopoll not supported!\n"); + return -EOPNOTSUPP; + } + cmd->op = READ_ONCE(csqe->op); cmd->len = READ_ONCE(csqe->len); -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [RFC 03/13] io_uring: mark iopoll not supported for uring-cmd 2021-12-20 14:17 ` [RFC 03/13] io_uring: mark iopoll not supported for uring-cmd Kanchan Joshi @ 2022-02-17 2:16 ` Luis Chamberlain 2022-02-17 2:52 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Luis Chamberlain @ 2022-02-17 2:16 UTC (permalink / raw) To: Kanchan Joshi, axboe Cc: io-uring, linux-nvme, linux-block, axboe, hch, kbusch, javier, anuj20.g, joshiiitr, pankydev8 On Mon, Dec 20, 2021 at 07:47:24PM +0530, Kanchan Joshi wrote: > From: Anuj Gupta <[email protected]> > > Currently uring-passthrough doesn't support iopoll. Bail out to avoid > the panic. > > Signed-off-by: Anuj Gupta <[email protected]> Jens, can you fold this in to your series? Luis ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 03/13] io_uring: mark iopoll not supported for uring-cmd 2022-02-17 2:16 ` Luis Chamberlain @ 2022-02-17 2:52 ` Jens Axboe 0 siblings, 0 replies; 25+ messages in thread From: Jens Axboe @ 2022-02-17 2:52 UTC (permalink / raw) To: Luis Chamberlain, Kanchan Joshi Cc: io-uring, linux-nvme, linux-block, hch, kbusch, javier, anuj20.g, joshiiitr, pankydev8 On 2/16/22 7:16 PM, Luis Chamberlain wrote: > On Mon, Dec 20, 2021 at 07:47:24PM +0530, Kanchan Joshi wrote: >> From: Anuj Gupta <[email protected]> >> >> Currently uring-passthrough doesn't support iopoll. Bail out to avoid >> the panic. >> >> Signed-off-by: Anuj Gupta <[email protected]> > > Jens, can you fold this in to your series? Yes, we really need to spin a new series with the bits combined. I've got some ideas for that... -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <CGME20211220142235epcas5p3b8d56cd39d9710278ec3360be47f2cca@epcas5p3.samsung.com>]
* [RFC 04/13] io_uring: modify unused field in io_uring_cmd to store flags [not found] ` <CGME20211220142235epcas5p3b8d56cd39d9710278ec3360be47f2cca@epcas5p3.samsung.com> @ 2021-12-20 14:17 ` Kanchan Joshi 0 siblings, 0 replies; 25+ messages in thread From: Kanchan Joshi @ 2021-12-20 14:17 UTC (permalink / raw) To: io-uring, linux-nvme, linux-block Cc: axboe, hch, kbusch, javier, anuj20.g, joshiiitr, pankydev8 From: Anuj Gupta <[email protected]> This patch modifies the unused field in io_uring_cmd structure to store the flags. This is a prep patch for providing fixedbufs support via passthrough ioctls. Signed-off-by: Anuj Gupta <[email protected]> --- include/linux/io_uring.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index f4b4990a3b62..5ab824ced147 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -12,7 +12,7 @@ struct io_uring_cmd { struct file *file; __u16 op; - __u16 unused; + __u16 flags; __u32 len; /* used if driver requires update in task context*/ void (*driver_cb)(struct io_uring_cmd *cmd); -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
[parent not found: <CGME20211220142237epcas5p48729a52293e4f7627e6ec53ca67b9c58@epcas5p4.samsung.com>]
* [RFC 05/13] io_uring: add flag and helper for fixed-buffer uring-cmd [not found] ` <CGME20211220142237epcas5p48729a52293e4f7627e6ec53ca67b9c58@epcas5p4.samsung.com> @ 2021-12-20 14:17 ` Kanchan Joshi 0 siblings, 0 replies; 25+ messages in thread From: Kanchan Joshi @ 2021-12-20 14:17 UTC (permalink / raw) To: io-uring, linux-nvme, linux-block Cc: axboe, hch, kbusch, javier, anuj20.g, joshiiitr, pankydev8 Add URING_CMD_FIXEDBUFS flag to use fixedbufs enabled passthrough. Refactor the existing code and factor out helper that can be used for passthrough with fixed-buffer. This is a prep patch. Signed-off-by: Kanchan Joshi <[email protected]> Signed-off-by: Anuj Gupta <[email protected]> --- fs/io_uring.c | 20 ++++++++++++++------ include/linux/io_uring.h | 10 ++++++++++ 2 files changed, 24 insertions(+), 6 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 1061b4cde4be..cc6735913c4b 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -3125,12 +3125,10 @@ static void kiocb_done(struct io_kiocb *req, ssize_t ret, } } } - -static int __io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter, - struct io_mapped_ubuf *imu) +static int __io_import_fixed(u64 buf_addr, size_t len, int rw, + struct iov_iter *iter, struct io_mapped_ubuf *imu) { - size_t len = req->rw.len; - u64 buf_end, buf_addr = req->rw.addr; + u64 buf_end; size_t offset; if (unlikely(check_add_overflow(buf_addr, (u64)len, &buf_end))) @@ -3199,8 +3197,18 @@ static int io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter) imu = READ_ONCE(ctx->user_bufs[index]); req->imu = imu; } - return __io_import_fixed(req, rw, iter, imu); + return __io_import_fixed(req->rw.addr, req->rw.len, rw, iter, imu); +} + +int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, + int rw, struct iov_iter *iter, void *ioucmd) +{ + struct io_kiocb *req = container_of(ioucmd, struct io_kiocb, uring_cmd); + struct io_mapped_ubuf *imu = req->imu; + + return __io_import_fixed(ubuf, len, rw, iter, imu); } +EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed); static void io_ring_submit_unlock(struct io_ring_ctx *ctx, bool needs_lock) { diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 5ab824ced147..07732bc850af 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -5,6 +5,9 @@ #include <linux/sched.h> #include <linux/xarray.h> +enum { + URING_CMD_FIXEDBUFS = (1 << 1), +}; /* * Note that the first member here must be a struct file, as the * io_uring command layout depends on that. @@ -20,6 +23,8 @@ struct io_uring_cmd { }; #if defined(CONFIG_IO_URING) +int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, + int rw, struct iov_iter *iter, void *ioucmd); void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret); void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, void (*driver_cb)(struct io_uring_cmd *)); @@ -50,6 +55,11 @@ static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, void (*driver_cb)(struct io_uring_cmd *)) { } +int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, + int rw, struct iov_iter *iter, void *ioucmd) +{ + return -1; +} static inline struct sock *io_uring_get_socket(struct file *file) { return NULL; -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
[parent not found: <CGME20211220142239epcas5p3efc3c89bd536f3f5d728c81bc550e143@epcas5p3.samsung.com>]
* [RFC 06/13] io_uring: add support for uring_cmd with fixed-buffer [not found] ` <CGME20211220142239epcas5p3efc3c89bd536f3f5d728c81bc550e143@epcas5p3.samsung.com> @ 2021-12-20 14:17 ` Kanchan Joshi 0 siblings, 0 replies; 25+ messages in thread From: Kanchan Joshi @ 2021-12-20 14:17 UTC (permalink / raw) To: io-uring, linux-nvme, linux-block Cc: axboe, hch, kbusch, javier, anuj20.g, joshiiitr, pankydev8 From: Anuj Gupta <[email protected]> Add IORING_OP_URING_CMD_FIXED opcode that enables performing the operation with previously registered buffers. Signed-off-by: Anuj Gupta <[email protected]> --- fs/io_uring.c | 29 ++++++++++++++++++++++++++++- include/uapi/linux/io_uring.h | 6 +++++- 2 files changed, 33 insertions(+), 2 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index cc6735913c4b..2870a891e441 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1122,6 +1122,10 @@ static const struct io_op_def io_op_defs[] = { .needs_file = 1, .offsets = 1, }, + [IORING_OP_URING_CMD_FIXED] = { + .needs_file = 1, + .offsets = 1, + }, }; /* requests with any of those set should undergo io_disarm_next() */ @@ -4133,6 +4137,7 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_done); static int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { + struct io_ring_ctx *ctx = req->ctx; const struct io_uring_cmd_sqe *csqe = (const void *) sqe; struct io_uring_cmd *cmd = &req->uring_cmd; @@ -4145,7 +4150,13 @@ static int io_uring_cmd_prep(struct io_kiocb *req, } cmd->op = READ_ONCE(csqe->op); - cmd->len = READ_ONCE(csqe->len); + if (req->opcode == IORING_OP_URING_CMD_FIXED) { + req->imu = NULL; + io_req_set_rsrc_node(req, ctx); + req->buf_index = READ_ONCE(csqe->buf_index); + req->uring_cmd.flags |= URING_CMD_FIXEDBUFS; + } else + cmd->len = READ_ONCE(csqe->len); /* * The payload is the last 40 bytes of an io_uring_cmd_sqe, with the @@ -4160,6 +4171,20 @@ static int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags) struct file *file = req->file; int ret; + if (req->opcode == IORING_OP_URING_CMD_FIXED) { + u32 index, buf_index = req->buf_index; + struct io_ring_ctx *ctx = req->ctx; + struct io_mapped_ubuf *imu = req->imu; + + if (likely(!imu)) { + if (unlikely(buf_index >= ctx->nr_user_bufs)) + return -EFAULT; + index = array_index_nospec(buf_index, ctx->nr_user_bufs); + imu = READ_ONCE(ctx->user_bufs[index]); + req->imu = imu; + } + } + ret = file->f_op->async_cmd(&req->uring_cmd, issue_flags); /* queued async, consumer will call io_uring_cmd_done() when complete */ if (ret == -EIOCBQUEUED) @@ -6675,6 +6700,7 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) case IORING_OP_LINKAT: return io_linkat_prep(req, sqe); case IORING_OP_URING_CMD: + case IORING_OP_URING_CMD_FIXED: return io_uring_cmd_prep(req, sqe); } @@ -6960,6 +6986,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) ret = io_linkat(req, issue_flags); break; case IORING_OP_URING_CMD: + case IORING_OP_URING_CMD_FIXED: ret = io_uring_cmd(req, issue_flags); break; default: diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 7191419f2236..cb331f201255 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -79,7 +79,10 @@ struct io_uring_cmd_sqe { __u64 user_data; __u16 op; __u16 personality; - __u32 len; + union { + __u32 len; + __u16 buf_index; + }; __u64 pdu[5]; }; @@ -164,6 +167,7 @@ enum { IORING_OP_SYMLINKAT, IORING_OP_LINKAT, IORING_OP_URING_CMD, + IORING_OP_URING_CMD_FIXED, /* this goes last, obviously */ IORING_OP_LAST, -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
[parent not found: <CGME20211220142242epcas5p45dddab51a9f20a8ec3d8b8e4f1dda40a@epcas5p4.samsung.com>]
* [RFC 07/13] nvme: enable passthrough with fixed-buffer [not found] ` <CGME20211220142242epcas5p45dddab51a9f20a8ec3d8b8e4f1dda40a@epcas5p4.samsung.com> @ 2021-12-20 14:17 ` Kanchan Joshi 0 siblings, 0 replies; 25+ messages in thread From: Kanchan Joshi @ 2021-12-20 14:17 UTC (permalink / raw) To: io-uring, linux-nvme, linux-block Cc: axboe, hch, kbusch, javier, anuj20.g, joshiiitr, pankydev8 Add support to carry out passthrough command with pre-mapped buffers. Signed-off-by: Kanchan Joshi <[email protected]> --- block/blk-map.c | 46 +++++++++++++++++++++++++++++++++++++++ drivers/nvme/host/ioctl.c | 30 +++++++++++++++---------- include/linux/blk-mq.h | 3 +++ 3 files changed, 67 insertions(+), 12 deletions(-) diff --git a/block/blk-map.c b/block/blk-map.c index 4526adde0156..9aa9864eab55 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -8,6 +8,7 @@ #include <linux/bio.h> #include <linux/blkdev.h> #include <linux/uio.h> +#include <linux/io_uring.h> #include "blk.h" @@ -577,6 +578,51 @@ int blk_rq_map_user(struct request_queue *q, struct request *rq, } EXPORT_SYMBOL(blk_rq_map_user); +/* Unlike blk_rq_map_user () this is only for fixed-buffer async passthrough. */ +int blk_rq_map_user_fixedb(struct request_queue *q, struct request *rq, + u64 ubuf, unsigned long len, gfp_t gfp_mask, + struct io_uring_cmd *ioucmd) +{ + struct iov_iter iter; + size_t iter_count, nr_segs; + struct bio *bio; + int ret; + + /* + * Talk to io_uring to obtain BVEC iterator for the buffer. + * And use that iterator to form bio/request. + */ + ret = io_uring_cmd_import_fixed(ubuf, len, rq_data_dir(rq), &iter, + ioucmd); + if (unlikely(ret < 0)) + return ret; + iter_count = iov_iter_count(&iter); + nr_segs = iter.nr_segs; + + if (!iter_count || (iter_count >> 9) > queue_max_hw_sectors(q)) + return -EINVAL; + if (nr_segs > queue_max_segments(q)) + return -EINVAL; + /* no iovecs to alloc, as we already have a BVEC iterator */ + bio = bio_alloc(gfp_mask, 0); + if (!bio) + return -ENOMEM; + + bio->bi_opf |= req_op(rq); + ret = bio_iov_iter_get_pages(bio, &iter); + if (ret) + goto out_free; + + blk_rq_bio_prep(rq, bio, nr_segs); + return 0; + +out_free: + bio_release_pages(bio, false); + bio_put(bio); + return ret; +} +EXPORT_SYMBOL(blk_rq_map_user_fixedb); + /** * blk_rq_unmap_user - unmap a request with user data * @bio: start of bio list diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index 7d9c51d9c0a8..dc6a5f1b81ca 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -131,8 +131,13 @@ static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf, return ERR_PTR(ret); } +static inline bool nvme_is_fixedb_passthru(struct io_uring_cmd *ioucmd) +{ + return ((ioucmd) && (ioucmd->flags & URING_CMD_FIXEDBUFS)); +} + static int nvme_submit_user_cmd(struct request_queue *q, - struct nvme_command *cmd, void __user *ubuffer, + struct nvme_command *cmd, u64 ubuffer, unsigned bufflen, void __user *meta_buffer, unsigned meta_len, u32 meta_seed, u64 *result, unsigned timeout, struct io_uring_cmd *ioucmd) @@ -154,8 +159,12 @@ static int nvme_submit_user_cmd(struct request_queue *q, nvme_req(req)->flags |= NVME_REQ_USERCMD; if (ubuffer && bufflen) { - ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen, - GFP_KERNEL); + if (likely(!nvme_is_fixedb_passthru(ioucmd))) + ret = blk_rq_map_user(q, req, NULL, nvme_to_user_ptr(ubuffer), + bufflen, GFP_KERNEL); + else + ret = blk_rq_map_user_fixedb(q, req, ubuffer, bufflen, + GFP_KERNEL, ioucmd); if (ret) goto out; bio = req->bio; @@ -254,9 +263,8 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio) c.rw.appmask = cpu_to_le16(io.appmask); return nvme_submit_user_cmd(ns->queue, &c, - nvme_to_user_ptr(io.addr), length, - metadata, meta_len, lower_32_bits(io.slba), NULL, 0, - NULL); + io.addr, length, metadata, meta_len, + lower_32_bits(io.slba), NULL, 0, NULL); } static bool nvme_validate_passthru_nsid(struct nvme_ctrl *ctrl, @@ -308,9 +316,8 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns, timeout = msecs_to_jiffies(cmd.timeout_ms); status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c, - nvme_to_user_ptr(cmd.addr), cmd.data_len, - nvme_to_user_ptr(cmd.metadata), cmd.metadata_len, - 0, &result, timeout, NULL); + cmd.addr, cmd.data_len, nvme_to_user_ptr(cmd.metadata), + cmd.metadata_len, 0, &result, timeout, NULL); if (status >= 0) { if (put_user(result, &ucmd->result)) @@ -355,9 +362,8 @@ static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns, timeout = msecs_to_jiffies(cmd.timeout_ms); status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c, - nvme_to_user_ptr(cmd.addr), cmd.data_len, - nvme_to_user_ptr(cmd.metadata), cmd.metadata_len, - 0, &cmd.result, timeout, ioucmd); + cmd.addr, cmd.data_len, nvme_to_user_ptr(cmd.metadata), + cmd.metadata_len, 0, &cmd.result, timeout, ioucmd); if (!ioucmd && status >= 0) { if (put_user(cmd.result, &ucmd->result)) diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 30e54ab05333..a82b054eebde 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -922,6 +922,9 @@ struct rq_map_data { int blk_rq_map_user(struct request_queue *, struct request *, struct rq_map_data *, void __user *, unsigned long, gfp_t); +int blk_rq_map_user_fixedb(struct request_queue *, struct request *, + u64 ubuf, unsigned long, gfp_t, + struct io_uring_cmd *); int blk_rq_map_user_iov(struct request_queue *, struct request *, struct rq_map_data *, const struct iov_iter *, gfp_t); int blk_rq_unmap_user(struct bio *); -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
[parent not found: <CGME20211220142244epcas5p2f311ed168b8f31b9301bcc2002076db4@epcas5p2.samsung.com>]
* [RFC 08/13] io_uring: plug for async bypass [not found] ` <CGME20211220142244epcas5p2f311ed168b8f31b9301bcc2002076db4@epcas5p2.samsung.com> @ 2021-12-20 14:17 ` Kanchan Joshi 0 siblings, 0 replies; 25+ messages in thread From: Kanchan Joshi @ 2021-12-20 14:17 UTC (permalink / raw) To: io-uring, linux-nvme, linux-block Cc: axboe, hch, kbusch, javier, anuj20.g, joshiiitr, pankydev8 From: Jens Axboe <[email protected]> Signed-off-by: Jens Axboe <[email protected]> --- fs/io_uring.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 2870a891e441..f77dde1bdc75 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1121,10 +1121,12 @@ static const struct io_op_def io_op_defs[] = { [IORING_OP_URING_CMD] = { .needs_file = 1, .offsets = 1, + .plug = 1, }, [IORING_OP_URING_CMD_FIXED] = { .needs_file = 1, .offsets = 1, + .plug = 1, }, }; -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
[parent not found: <CGME20211220142246epcas5p303c64b6b1b832c7fcd5ac31fc79c91d1@epcas5p3.samsung.com>]
* [RFC 09/13] block: wire-up support for plugging [not found] ` <CGME20211220142246epcas5p303c64b6b1b832c7fcd5ac31fc79c91d1@epcas5p3.samsung.com> @ 2021-12-20 14:17 ` Kanchan Joshi 0 siblings, 0 replies; 25+ messages in thread From: Kanchan Joshi @ 2021-12-20 14:17 UTC (permalink / raw) To: io-uring, linux-nvme, linux-block Cc: axboe, hch, kbusch, javier, anuj20.g, joshiiitr, pankydev8 From: Jens Axboe <[email protected]> Add support to use plugging if it is enabled, else use default path. Signed-off-by: Jens Axboe <[email protected]> --- block/blk-mq.c | 90 ++++++++++++++++++++++++++------------------------ 1 file changed, 47 insertions(+), 43 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 0ebd09492aa5..c77991688bfd 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2326,6 +2326,40 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, blk_mq_hctx_mark_pending(hctx, ctx); } +/* + * Allow 2x BLK_MAX_REQUEST_COUNT requests on plug queue for multiple + * queues. This is important for md arrays to benefit from merging + * requests. + */ +static inline unsigned short blk_plug_max_rq_count(struct blk_plug *plug) +{ + if (plug->multiple_queues) + return BLK_MAX_REQUEST_COUNT * 2; + return BLK_MAX_REQUEST_COUNT; +} + +static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq) +{ + struct request *last = rq_list_peek(&plug->mq_list); + + if (!plug->rq_count) { + trace_block_plug(rq->q); + } else if (plug->rq_count >= blk_plug_max_rq_count(plug) || + (!blk_queue_nomerges(rq->q) && + blk_rq_bytes(last) >= BLK_PLUG_FLUSH_SIZE)) { + blk_mq_flush_plug_list(plug, false); + trace_block_plug(rq->q); + } + + if (!plug->multiple_queues && last && last->q != rq->q) + plug->multiple_queues = true; + if (!plug->has_elevator && (rq->rq_flags & RQF_ELV)) + plug->has_elevator = true; + rq->rq_next = NULL; + rq_list_add(&plug->mq_list, rq); + plug->rq_count++; +} + /** * blk_mq_request_bypass_insert - Insert a request at dispatch list. * @rq: Pointer to request to be inserted. @@ -2339,16 +2373,20 @@ void blk_mq_request_bypass_insert(struct request *rq, bool at_head, bool run_queue) { struct blk_mq_hw_ctx *hctx = rq->mq_hctx; + struct blk_plug *plug = current->plug; - spin_lock(&hctx->lock); - if (at_head) - list_add(&rq->queuelist, &hctx->dispatch); - else - list_add_tail(&rq->queuelist, &hctx->dispatch); - spin_unlock(&hctx->lock); - - if (run_queue) - blk_mq_run_hw_queue(hctx, false); + if (plug) { + blk_add_rq_to_plug(plug, rq); + } else { + spin_lock(&hctx->lock); + if (at_head) + list_add(&rq->queuelist, &hctx->dispatch); + else + list_add_tail(&rq->queuelist, &hctx->dispatch); + spin_unlock(&hctx->lock); + if (run_queue) + blk_mq_run_hw_queue(hctx, false); + } } void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx, @@ -2658,40 +2696,6 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx, hctx->queue->mq_ops->commit_rqs(hctx); } -/* - * Allow 2x BLK_MAX_REQUEST_COUNT requests on plug queue for multiple - * queues. This is important for md arrays to benefit from merging - * requests. - */ -static inline unsigned short blk_plug_max_rq_count(struct blk_plug *plug) -{ - if (plug->multiple_queues) - return BLK_MAX_REQUEST_COUNT * 2; - return BLK_MAX_REQUEST_COUNT; -} - -static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq) -{ - struct request *last = rq_list_peek(&plug->mq_list); - - if (!plug->rq_count) { - trace_block_plug(rq->q); - } else if (plug->rq_count >= blk_plug_max_rq_count(plug) || - (!blk_queue_nomerges(rq->q) && - blk_rq_bytes(last) >= BLK_PLUG_FLUSH_SIZE)) { - blk_mq_flush_plug_list(plug, false); - trace_block_plug(rq->q); - } - - if (!plug->multiple_queues && last && last->q != rq->q) - plug->multiple_queues = true; - if (!plug->has_elevator && (rq->rq_flags & RQF_ELV)) - plug->has_elevator = true; - rq->rq_next = NULL; - rq_list_add(&plug->mq_list, rq); - plug->rq_count++; -} - static bool blk_mq_attempt_bio_merge(struct request_queue *q, struct bio *bio, unsigned int nr_segs) { -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
[parent not found: <CGME20211220142248epcas5p1e5904e10396f8cdea54bbd8d7aeca9a6@epcas5p1.samsung.com>]
* [RFC 10/13] block: factor out helper for bio allocation from cache [not found] ` <CGME20211220142248epcas5p1e5904e10396f8cdea54bbd8d7aeca9a6@epcas5p1.samsung.com> @ 2021-12-20 14:17 ` Kanchan Joshi 0 siblings, 0 replies; 25+ messages in thread From: Kanchan Joshi @ 2021-12-20 14:17 UTC (permalink / raw) To: io-uring, linux-nvme, linux-block Cc: axboe, hch, kbusch, javier, anuj20.g, joshiiitr, pankydev8 Factor the code to pull out bio_from_cache helper which is not tied to kiocb. This is prep patch. Signed-off-by: Kanchan Joshi <[email protected]> --- block/bio.c | 43 ++++++++++++++++++++++++++----------------- include/linux/bio.h | 1 + 2 files changed, 27 insertions(+), 17 deletions(-) diff --git a/block/bio.c b/block/bio.c index 6fadc977cd7f..46d0f278d3aa 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1682,27 +1682,12 @@ int bioset_init_from_src(struct bio_set *bs, struct bio_set *src) } EXPORT_SYMBOL(bioset_init_from_src); -/** - * bio_alloc_kiocb - Allocate a bio from bio_set based on kiocb - * @kiocb: kiocb describing the IO - * @nr_vecs: number of iovecs to pre-allocate - * @bs: bio_set to allocate from - * - * Description: - * Like @bio_alloc_bioset, but pass in the kiocb. The kiocb is only - * used to check if we should dip into the per-cpu bio_set allocation - * cache. The allocation uses GFP_KERNEL internally. On return, the - * bio is marked BIO_PERCPU_CACHEABLE, and the final put of the bio - * MUST be done from process context, not hard/soft IRQ. - * - */ -struct bio *bio_alloc_kiocb(struct kiocb *kiocb, unsigned short nr_vecs, - struct bio_set *bs) +struct bio *bio_from_cache(unsigned short nr_vecs, struct bio_set *bs) { struct bio_alloc_cache *cache; struct bio *bio; - if (!(kiocb->ki_flags & IOCB_ALLOC_CACHE) || nr_vecs > BIO_INLINE_VECS) + if (nr_vecs > BIO_INLINE_VECS) return bio_alloc_bioset(GFP_KERNEL, nr_vecs, bs); cache = per_cpu_ptr(bs->cache, get_cpu()); @@ -1721,6 +1706,30 @@ struct bio *bio_alloc_kiocb(struct kiocb *kiocb, unsigned short nr_vecs, bio_set_flag(bio, BIO_PERCPU_CACHE); return bio; } +EXPORT_SYMBOL_GPL(bio_from_cache); + +/** + * bio_alloc_kiocb - Allocate a bio from bio_set based on kiocb + * @kiocb: kiocb describing the IO + * @nr_vecs: number of iovecs to pre-allocate + * @bs: bio_set to allocate from + * + * Description: + * Like @bio_alloc_bioset, but pass in the kiocb. The kiocb is only + * used to check if we should dip into the per-cpu bio_set allocation + * cache. The allocation uses GFP_KERNEL internally. On return, the + * bio is marked BIO_PERCPU_CACHEABLE, and the final put of the bio + * MUST be done from process context, not hard/soft IRQ. + * + */ +struct bio *bio_alloc_kiocb(struct kiocb *kiocb, unsigned short nr_vecs, + struct bio_set *bs) +{ + if (!(kiocb->ki_flags & IOCB_ALLOC_CACHE)) + return bio_alloc_bioset(GFP_KERNEL, nr_vecs, bs); + + return bio_from_cache(nr_vecs, bs); +} EXPORT_SYMBOL_GPL(bio_alloc_kiocb); static int __init init_bio(void) diff --git a/include/linux/bio.h b/include/linux/bio.h index fe6bdfbbef66..77eceb2bda4b 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -358,6 +358,7 @@ struct bio *bio_alloc_bioset(gfp_t gfp, unsigned short nr_iovecs, struct bio_set *bs); struct bio *bio_alloc_kiocb(struct kiocb *kiocb, unsigned short nr_vecs, struct bio_set *bs); +struct bio *bio_from_cache(unsigned short nr_vecs, struct bio_set *bs); struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned short nr_iovecs); extern void bio_put(struct bio *); -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
[parent not found: <CGME20211220142250epcas5p34b9d93b1dd3388af6209a4223befe40f@epcas5p3.samsung.com>]
* [RFC 11/13] nvme: enable bio-cache for fixed-buffer passthru [not found] ` <CGME20211220142250epcas5p34b9d93b1dd3388af6209a4223befe40f@epcas5p3.samsung.com> @ 2021-12-20 14:17 ` Kanchan Joshi 0 siblings, 0 replies; 25+ messages in thread From: Kanchan Joshi @ 2021-12-20 14:17 UTC (permalink / raw) To: io-uring, linux-nvme, linux-block Cc: axboe, hch, kbusch, javier, anuj20.g, joshiiitr, pankydev8 Since we do submission/completion in task, we can have this up. Add a bio-set for nvme as we need that for bio-cache. Signed-off-by: Kanchan Joshi <[email protected]> Signed-off-by: Anuj Gupta <[email protected]> --- block/blk-map.c | 4 ++-- drivers/nvme/host/core.c | 9 +++++++++ drivers/nvme/host/ioctl.c | 6 ++++-- drivers/nvme/host/nvme.h | 1 + include/linux/blk-mq.h | 2 +- 5 files changed, 17 insertions(+), 5 deletions(-) diff --git a/block/blk-map.c b/block/blk-map.c index 9aa9864eab55..e3e28b628fba 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -580,7 +580,7 @@ EXPORT_SYMBOL(blk_rq_map_user); /* Unlike blk_rq_map_user () this is only for fixed-buffer async passthrough. */ int blk_rq_map_user_fixedb(struct request_queue *q, struct request *rq, - u64 ubuf, unsigned long len, gfp_t gfp_mask, + u64 ubuf, unsigned long len, struct bio_set *bs, struct io_uring_cmd *ioucmd) { struct iov_iter iter; @@ -604,7 +604,7 @@ int blk_rq_map_user_fixedb(struct request_queue *q, struct request *rq, if (nr_segs > queue_max_segments(q)) return -EINVAL; /* no iovecs to alloc, as we already have a BVEC iterator */ - bio = bio_alloc(gfp_mask, 0); + bio = bio_from_cache(0, bs); if (!bio) return -ENOMEM; diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index bce2e93d14a3..0c231946a310 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -30,6 +30,9 @@ #define NVME_MINORS (1U << MINORBITS) +#define NVME_BIO_POOL_SZ (4) +struct bio_set nvme_bio_pool; + unsigned int admin_timeout = 60; module_param(admin_timeout, uint, 0644); MODULE_PARM_DESC(admin_timeout, "timeout in seconds for admin commands"); @@ -4793,6 +4796,11 @@ static int __init nvme_core_init(void) goto unregister_generic_ns; } + result = bioset_init(&nvme_bio_pool, NVME_BIO_POOL_SZ, 0, + BIOSET_NEED_BVECS | BIOSET_PERCPU_CACHE); + if (result < 0) + goto unregister_generic_ns; + return 0; unregister_generic_ns: @@ -4815,6 +4823,7 @@ static int __init nvme_core_init(void) static void __exit nvme_core_exit(void) { + bioset_exit(&nvme_bio_pool); class_destroy(nvme_ns_chr_class); class_destroy(nvme_subsys_class); class_destroy(nvme_class); diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index dc6a5f1b81ca..013ff9baa78e 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -43,6 +43,7 @@ static void nvme_pt_task_cb(struct io_uring_cmd *ioucmd) struct request *req = cmd->req; int status; u64 result; + struct bio *bio = req->bio; if (nvme_req(req)->flags & NVME_REQ_CANCELLED) status = -EINTR; @@ -52,6 +53,7 @@ static void nvme_pt_task_cb(struct io_uring_cmd *ioucmd) /* we can free request */ blk_mq_free_request(req); + blk_rq_unmap_user(bio); if (cmd->meta) { if (status) @@ -73,9 +75,9 @@ static void nvme_end_async_pt(struct request *req, blk_status_t err) struct bio *bio = cmd->bio; cmd->req = req; + req->bio = bio; /* this takes care of setting up task-work */ io_uring_cmd_complete_in_task(ioucmd, nvme_pt_task_cb); - blk_rq_unmap_user(bio); } static void nvme_setup_uring_cmd_data(struct request *rq, @@ -164,7 +166,7 @@ static int nvme_submit_user_cmd(struct request_queue *q, bufflen, GFP_KERNEL); else ret = blk_rq_map_user_fixedb(q, req, ubuffer, bufflen, - GFP_KERNEL, ioucmd); + &nvme_bio_pool, ioucmd); if (ret) goto out; bio = req->bio; diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 9a901b954a87..6bbb8ed868eb 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -47,6 +47,7 @@ extern unsigned int admin_timeout; extern struct workqueue_struct *nvme_wq; extern struct workqueue_struct *nvme_reset_wq; extern struct workqueue_struct *nvme_delete_wq; +extern struct bio_set nvme_bio_pool; /* * List of workarounds for devices that required behavior not specified in diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index a82b054eebde..e35a5d835b1f 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -923,7 +923,7 @@ struct rq_map_data { int blk_rq_map_user(struct request_queue *, struct request *, struct rq_map_data *, void __user *, unsigned long, gfp_t); int blk_rq_map_user_fixedb(struct request_queue *, struct request *, - u64 ubuf, unsigned long, gfp_t, + u64 ubuf, unsigned long, struct bio_set *, struct io_uring_cmd *); int blk_rq_map_user_iov(struct request_queue *, struct request *, struct rq_map_data *, const struct iov_iter *, gfp_t); -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
[parent not found: <CGME20211220142252epcas5p4611297f9970acbc8ee3b0e325ca5ceec@epcas5p4.samsung.com>]
* [RFC 12/13] nvme: allow user passthrough commands to poll [not found] ` <CGME20211220142252epcas5p4611297f9970acbc8ee3b0e325ca5ceec@epcas5p4.samsung.com> @ 2021-12-20 14:17 ` Kanchan Joshi 0 siblings, 0 replies; 25+ messages in thread From: Kanchan Joshi @ 2021-12-20 14:17 UTC (permalink / raw) To: io-uring, linux-nvme, linux-block Cc: axboe, hch, kbusch, javier, anuj20.g, joshiiitr, pankydev8 From: Keith Busch <[email protected]> The block layer knows how to deal with polled requests. Let the NVMe driver use the previously reserved user "flags" fields to define an option to allocate the request from the polled hardware contexts. If polling is not enabled, then the block layer will automatically fallback to a non-polled request.[1] [1] https://lore.kernel.org/linux-block/[email protected]/ Signed-off-by: Keith Busch <[email protected]> --- drivers/nvme/host/core.c | 10 ++++++---- drivers/nvme/host/ioctl.c | 33 +++++++++++++++++++-------------- drivers/nvme/host/nvme.h | 3 ++- drivers/nvme/host/pci.c | 4 ++-- drivers/nvme/target/passthru.c | 2 +- include/uapi/linux/nvme_ioctl.h | 4 ++++ 6 files changed, 34 insertions(+), 22 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 0c231946a310..5199adf7ae92 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -633,11 +633,13 @@ static inline void nvme_init_request(struct request *req, } struct request *nvme_alloc_request(struct request_queue *q, - struct nvme_command *cmd, blk_mq_req_flags_t flags) + struct nvme_command *cmd, blk_mq_req_flags_t flags, + unsigned int rq_flags) { + unsigned int cmd_flags = nvme_req_op(cmd) | rq_flags; struct request *req; - req = blk_mq_alloc_request(q, nvme_req_op(cmd), flags); + req = blk_mq_alloc_request(q, cmd_flags, flags); if (!IS_ERR(req)) nvme_init_request(req, cmd); return req; @@ -1081,7 +1083,7 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd, int ret; if (qid == NVME_QID_ANY) - req = nvme_alloc_request(q, cmd, flags); + req = nvme_alloc_request(q, cmd, flags, 0); else req = nvme_alloc_request_qid(q, cmd, flags, qid); if (IS_ERR(req)) @@ -1277,7 +1279,7 @@ static void nvme_keep_alive_work(struct work_struct *work) } rq = nvme_alloc_request(ctrl->admin_q, &ctrl->ka_cmd, - BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT); + BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT, 0); if (IS_ERR(rq)) { /* allocation failure, reset the controller */ dev_err(ctrl->device, "keep-alive failed: %ld\n", PTR_ERR(rq)); diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index 013ff9baa78e..bdaf8f317aa8 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -142,7 +142,7 @@ static int nvme_submit_user_cmd(struct request_queue *q, struct nvme_command *cmd, u64 ubuffer, unsigned bufflen, void __user *meta_buffer, unsigned meta_len, u32 meta_seed, u64 *result, unsigned timeout, - struct io_uring_cmd *ioucmd) + struct io_uring_cmd *ioucmd, unsigned int rq_flags) { bool write = nvme_is_write(cmd); struct nvme_ns *ns = q->queuedata; @@ -152,7 +152,7 @@ static int nvme_submit_user_cmd(struct request_queue *q, void *meta = NULL; int ret; - req = nvme_alloc_request(q, cmd, 0); + req = nvme_alloc_request(q, cmd, 0, rq_flags); if (IS_ERR(req)) return PTR_ERR(req); @@ -212,11 +212,13 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio) struct nvme_command c; unsigned length, meta_len; void __user *metadata; + unsigned int rq_flags = 0; if (copy_from_user(&io, uio, sizeof(io))) return -EFAULT; - if (io.flags) - return -EINVAL; + + if (io.flags & NVME_HIPRI) + rq_flags |= REQ_POLLED; switch (io.opcode) { case nvme_cmd_write: @@ -254,7 +256,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio) memset(&c, 0, sizeof(c)); c.rw.opcode = io.opcode; - c.rw.flags = io.flags; + c.rw.flags = 0; c.rw.nsid = cpu_to_le32(ns->head->ns_id); c.rw.slba = cpu_to_le64(io.slba); c.rw.length = cpu_to_le16(io.nblocks); @@ -266,7 +268,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio) return nvme_submit_user_cmd(ns->queue, &c, io.addr, length, metadata, meta_len, - lower_32_bits(io.slba), NULL, 0, NULL); + lower_32_bits(io.slba), NULL, 0, NULL, rq_flags); } static bool nvme_validate_passthru_nsid(struct nvme_ctrl *ctrl, @@ -288,6 +290,7 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns, { struct nvme_passthru_cmd cmd; struct nvme_command c; + unsigned int rq_flags = 0; unsigned timeout = 0; u64 result; int status; @@ -296,14 +299,14 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns, return -EACCES; if (copy_from_user(&cmd, ucmd, sizeof(cmd))) return -EFAULT; - if (cmd.flags) - return -EINVAL; + if (cmd.flags & NVME_HIPRI) + rq_flags |= REQ_POLLED; if (!nvme_validate_passthru_nsid(ctrl, ns, cmd.nsid)) return -EINVAL; memset(&c, 0, sizeof(c)); c.common.opcode = cmd.opcode; - c.common.flags = cmd.flags; + c.common.flags = 0; c.common.nsid = cpu_to_le32(cmd.nsid); c.common.cdw2[0] = cpu_to_le32(cmd.cdw2); c.common.cdw2[1] = cpu_to_le32(cmd.cdw3); @@ -319,7 +322,7 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns, status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c, cmd.addr, cmd.data_len, nvme_to_user_ptr(cmd.metadata), - cmd.metadata_len, 0, &result, timeout, NULL); + cmd.metadata_len, 0, &result, timeout, NULL, rq_flags); if (status >= 0) { if (put_user(result, &ucmd->result)) @@ -335,6 +338,7 @@ static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns, { struct nvme_passthru_cmd64 cmd; struct nvme_command c; + unsigned int rq_flags = 0; unsigned timeout = 0; int status; @@ -342,14 +346,15 @@ static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns, return -EACCES; if (copy_from_user(&cmd, ucmd, sizeof(cmd))) return -EFAULT; - if (cmd.flags) - return -EINVAL; + if (cmd.flags & NVME_HIPRI) + rq_flags |= REQ_POLLED; + if (!nvme_validate_passthru_nsid(ctrl, ns, cmd.nsid)) return -EINVAL; memset(&c, 0, sizeof(c)); c.common.opcode = cmd.opcode; - c.common.flags = cmd.flags; + c.common.flags = 0; c.common.nsid = cpu_to_le32(cmd.nsid); c.common.cdw2[0] = cpu_to_le32(cmd.cdw2); c.common.cdw2[1] = cpu_to_le32(cmd.cdw3); @@ -365,7 +370,7 @@ static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns, status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c, cmd.addr, cmd.data_len, nvme_to_user_ptr(cmd.metadata), - cmd.metadata_len, 0, &cmd.result, timeout, ioucmd); + cmd.metadata_len, 0, &cmd.result, timeout, ioucmd, rq_flags); if (!ioucmd && status >= 0) { if (put_user(cmd.result, &ucmd->result)) diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 6bbb8ed868eb..56a7cc8421fc 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -696,7 +696,8 @@ void nvme_start_freeze(struct nvme_ctrl *ctrl); #define NVME_QID_ANY -1 struct request *nvme_alloc_request(struct request_queue *q, - struct nvme_command *cmd, blk_mq_req_flags_t flags); + struct nvme_command *cmd, blk_mq_req_flags_t flags, + unsigned int rq_flags); void nvme_cleanup_cmd(struct request *req); blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req); blk_status_t nvme_fail_nonready_command(struct nvme_ctrl *ctrl, diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 50deb8b69c40..3d013a88af9d 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1429,7 +1429,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) req->tag, nvmeq->qid); abort_req = nvme_alloc_request(dev->ctrl.admin_q, &cmd, - BLK_MQ_REQ_NOWAIT); + BLK_MQ_REQ_NOWAIT, 0); if (IS_ERR(abort_req)) { atomic_inc(&dev->ctrl.abort_limit); return BLK_EH_RESET_TIMER; @@ -2475,7 +2475,7 @@ static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode) cmd.delete_queue.opcode = opcode; cmd.delete_queue.qid = cpu_to_le16(nvmeq->qid); - req = nvme_alloc_request(q, &cmd, BLK_MQ_REQ_NOWAIT); + req = nvme_alloc_request(q, &cmd, BLK_MQ_REQ_NOWAIT, 0); if (IS_ERR(req)) return PTR_ERR(req); diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c index 9e5b89ae29df..2a9e2fd3b137 100644 --- a/drivers/nvme/target/passthru.c +++ b/drivers/nvme/target/passthru.c @@ -253,7 +253,7 @@ static void nvmet_passthru_execute_cmd(struct nvmet_req *req) timeout = nvmet_req_subsys(req)->admin_timeout; } - rq = nvme_alloc_request(q, req->cmd, 0); + rq = nvme_alloc_request(q, req->cmd, 0, 0); if (IS_ERR(rq)) { status = NVME_SC_INTERNAL; goto out_put_ns; diff --git a/include/uapi/linux/nvme_ioctl.h b/include/uapi/linux/nvme_ioctl.h index d99b5a772698..df2c138c38d9 100644 --- a/include/uapi/linux/nvme_ioctl.h +++ b/include/uapi/linux/nvme_ioctl.h @@ -9,6 +9,10 @@ #include <linux/types.h> +enum nvme_io_flags { + NVME_HIPRI = 1 << 0, /* use polling queue if available */ +}; + struct nvme_user_io { __u8 opcode; __u8 flags; -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
[parent not found: <CGME20211220142256epcas5p49e0804ff8b075e8063259f94ccc9ced0@epcas5p4.samsung.com>]
* [RFC 13/13] nvme: Add async passthru polling support [not found] ` <CGME20211220142256epcas5p49e0804ff8b075e8063259f94ccc9ced0@epcas5p4.samsung.com> @ 2021-12-20 14:17 ` Kanchan Joshi 0 siblings, 0 replies; 25+ messages in thread From: Kanchan Joshi @ 2021-12-20 14:17 UTC (permalink / raw) To: io-uring, linux-nvme, linux-block Cc: axboe, hch, kbusch, javier, anuj20.g, joshiiitr, pankydev8 From: Pankaj Raghav <[email protected]> IO_URING already has polling support for read and write. This patch extends that support for uring cmd passthu. The unused flag in uring_cmd struct is used to indicate if the completion should be polled. If device side polling is not enabled, then the submission request will fallback to a non-polled request. Signed-off-by: Pankaj Raghav <[email protected]> --- block/blk-mq.c | 3 +- drivers/nvme/host/core.c | 1 + drivers/nvme/host/ioctl.c | 79 ++++++++++++++++++++++++++++++++++- drivers/nvme/host/multipath.c | 1 + drivers/nvme/host/nvme.h | 4 ++ fs/io_uring.c | 45 ++++++++++++++++++-- include/linux/blk-mq.h | 1 + include/linux/io_uring.h | 10 ++++- 8 files changed, 135 insertions(+), 9 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index c77991688bfd..acfa55c96a43 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1193,7 +1193,7 @@ void blk_execute_rq_nowait(struct request *rq, bool at_head, rq_end_io_fn *done) } EXPORT_SYMBOL_GPL(blk_execute_rq_nowait); -static bool blk_rq_is_poll(struct request *rq) +bool blk_rq_is_poll(struct request *rq) { if (!rq->mq_hctx) return false; @@ -1203,6 +1203,7 @@ static bool blk_rq_is_poll(struct request *rq) return false; return true; } +EXPORT_SYMBOL_GPL(blk_rq_is_poll); static void blk_rq_poll_completion(struct request *rq, struct completion *wait) { diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 5199adf7ae92..f0697cbe2bf1 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3676,6 +3676,7 @@ static const struct file_operations nvme_ns_chr_fops = { .unlocked_ioctl = nvme_ns_chr_ioctl, .compat_ioctl = compat_ptr_ioctl, .async_cmd = nvme_ns_chr_async_cmd, + .iopoll = nvme_iopoll, }; static int nvme_add_ns_cdev(struct nvme_ns *ns) diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index bdaf8f317aa8..ce2fe94df3ad 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -31,6 +31,12 @@ struct nvme_uring_cmd { void __user *meta_buffer; }; +static inline bool is_polling_enabled(struct io_uring_cmd *ioucmd, + struct request *req) +{ + return (ioucmd->flags & URING_CMD_POLLED) && blk_rq_is_poll(req); +} + static struct nvme_uring_cmd *nvme_uring_cmd(struct io_uring_cmd *ioucmd) { return (struct nvme_uring_cmd *)&ioucmd->pdu; @@ -76,8 +82,16 @@ static void nvme_end_async_pt(struct request *req, blk_status_t err) cmd->req = req; req->bio = bio; - /* this takes care of setting up task-work */ - io_uring_cmd_complete_in_task(ioucmd, nvme_pt_task_cb); + + /*IO can be completed immediately when the callback + * is in the same task context + */ + if (is_polling_enabled(ioucmd, req)) { + nvme_pt_task_cb(ioucmd); + } else { + /* this takes care of setting up task-work */ + io_uring_cmd_complete_in_task(ioucmd, nvme_pt_task_cb); + } } static void nvme_setup_uring_cmd_data(struct request *rq, @@ -183,6 +197,12 @@ static int nvme_submit_user_cmd(struct request_queue *q, } } if (ioucmd) { /* async dispatch */ + + if (bio && is_polling_enabled(ioucmd, req)) { + ioucmd->bio = bio; + bio->bi_opf |= REQ_POLLED; + } + nvme_setup_uring_cmd_data(req, ioucmd, meta, meta_buffer, meta_len, write); blk_execute_rq_nowait(req, 0, nvme_end_async_pt); @@ -496,6 +516,32 @@ int nvme_ns_chr_async_cmd(struct io_uring_cmd *ioucmd, return nvme_ns_async_ioctl(ns, ioucmd); } +int nvme_iopoll(struct kiocb *kiocb, struct io_comp_batch *iob, + unsigned int flags) +{ + struct bio *bio = NULL; + struct nvme_ns *ns = NULL; + struct request_queue *q = NULL; + int ret = 0; + + rcu_read_lock(); + bio = READ_ONCE(kiocb->private); + ns = container_of(file_inode(kiocb->ki_filp)->i_cdev, struct nvme_ns, + cdev); + q = ns->queue; + + /* bio and driver_cb are a part of the same union type in io_uring_cmd + * struct. When there are no poll queues, driver_cb is used for IRQ cb + * but polling is performed from the io_uring side. To avoid unnecessary + * polling, a check is added to see if it is a polled queue and return 0 + * if it is not. + */ + if ((test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) && bio && bio->bi_bdev) + ret = bio_poll(bio, iob, flags); + rcu_read_unlock(); + return ret; +} + #ifdef CONFIG_NVME_MULTIPATH static int nvme_ns_head_ctrl_ioctl(struct nvme_ns *ns, unsigned int cmd, void __user *argp, struct nvme_ns_head *head, int srcu_idx) @@ -577,6 +623,35 @@ long nvme_ns_head_chr_ioctl(struct file *file, unsigned int cmd, srcu_read_unlock(&head->srcu, srcu_idx); return ret; } + +int nvme_ns_head_iopoll(struct kiocb *kiocb, struct io_comp_batch *iob, + unsigned int flags) +{ + struct bio *bio = NULL; + struct request_queue *q = NULL; + struct cdev *cdev = file_inode(kiocb->ki_filp)->i_cdev; + struct nvme_ns_head *head = container_of(cdev, struct nvme_ns_head, cdev); + int srcu_idx = srcu_read_lock(&head->srcu); + struct nvme_ns *ns = nvme_find_path(head); + int ret = -EWOULDBLOCK; + + if (ns) { + bio = READ_ONCE(kiocb->private); + q = ns->queue; + /* bio and driver_cb are a part of the same union type in io_uring_cmd + * struct. When there are no poll queues, driver_cb is used for IRQ cb + * but polling is performed from the io_uring side. To avoid unnecessary + * polling, a check is added to see if it is a polled queue and return 0 + * if it is not. + */ + if ((test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) && bio && + bio->bi_bdev) + ret = bio_poll(bio, iob, flags); + } + + srcu_read_unlock(&head->srcu, srcu_idx); + return ret; +} #endif /* CONFIG_NVME_MULTIPATH */ static int nvme_dev_user_cmd(struct nvme_ctrl *ctrl, void __user *argp) diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 1e59c8e06622..df91b2953932 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -424,6 +424,7 @@ static const struct file_operations nvme_ns_head_chr_fops = { .unlocked_ioctl = nvme_ns_head_chr_ioctl, .compat_ioctl = compat_ptr_ioctl, .async_cmd = nvme_ns_head_chr_async_cmd, + .iopoll = nvme_ns_head_iopoll, }; static int nvme_add_ns_head_cdev(struct nvme_ns_head *head) diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 56a7cc8421fc..730ada8a3e8e 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -752,8 +752,12 @@ long nvme_dev_ioctl(struct file *file, unsigned int cmd, unsigned long arg); int nvme_ns_chr_async_cmd(struct io_uring_cmd *ucmd, enum io_uring_cmd_flags flags); +int nvme_iopoll(struct kiocb *kiocb, struct io_comp_batch *iob, + unsigned int flags); int nvme_ns_head_chr_async_cmd(struct io_uring_cmd *ucmd, enum io_uring_cmd_flags flags); +int nvme_ns_head_iopoll(struct kiocb *kiocb, struct io_comp_batch *iob, + unsigned int flags); int nvme_getgeo(struct block_device *bdev, struct hd_geometry *geo); extern const struct attribute_group *nvme_ns_id_attr_groups[]; diff --git a/fs/io_uring.c b/fs/io_uring.c index f77dde1bdc75..ae2e7666622e 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2655,7 +2655,20 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) if (READ_ONCE(req->iopoll_completed)) break; - ret = kiocb->ki_filp->f_op->iopoll(kiocb, &iob, poll_flags); + if (req->opcode == IORING_OP_URING_CMD || + req->opcode == IORING_OP_URING_CMD_FIXED) { + /* uring_cmd structure does not contain kiocb struct */ + struct kiocb kiocb_uring_cmd; + + kiocb_uring_cmd.private = req->uring_cmd.bio; + kiocb_uring_cmd.ki_filp = req->uring_cmd.file; + ret = req->uring_cmd.file->f_op->iopoll(&kiocb_uring_cmd, + &iob, poll_flags); + } else { + ret = kiocb->ki_filp->f_op->iopoll(kiocb, &iob, + poll_flags); + } + if (unlikely(ret < 0)) return ret; else if (ret) @@ -2768,6 +2781,15 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min) wq_list_empty(&ctx->iopoll_list)) break; } + + /* + * In some scenarios, completion callback has been queued up to be + * completed in-task context but polling happens in the same task + * not giving a chance for the completion callback to complete. + */ + if (current->task_works) + io_run_task_work(); + ret = io_do_iopoll(ctx, !min); if (ret < 0) break; @@ -4122,6 +4144,14 @@ static int io_linkat(struct io_kiocb *req, unsigned int issue_flags) return 0; } +static void io_complete_uring_cmd_iopoll(struct io_kiocb *req, long res) +{ + WRITE_ONCE(req->result, res); + /* order with io_iopoll_complete() checking ->result */ + smp_wmb(); + WRITE_ONCE(req->iopoll_completed, 1); +} + /* * Called by consumers of io_uring_cmd, if they originally returned * -EIOCBQUEUED upon receiving the command. @@ -4132,7 +4162,11 @@ void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret) if (ret < 0) req_set_fail(req); - io_req_complete(req, ret); + + if (req->uring_cmd.flags & URING_CMD_POLLED) + io_complete_uring_cmd_iopoll(req, ret); + else + io_req_complete(req, ret); } EXPORT_SYMBOL_GPL(io_uring_cmd_done); @@ -4147,8 +4181,11 @@ static int io_uring_cmd_prep(struct io_kiocb *req, return -EOPNOTSUPP; if (req->ctx->flags & IORING_SETUP_IOPOLL) { - printk_once(KERN_WARNING "io_uring: iopoll not supported!\n"); - return -EOPNOTSUPP; + req->uring_cmd.flags = URING_CMD_POLLED; + req->uring_cmd.bio = NULL; + req->iopoll_completed = 0; + } else { + req->uring_cmd.flags = 0; } cmd->op = READ_ONCE(csqe->op); diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index e35a5d835b1f..2233ccf41c19 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -933,6 +933,7 @@ int blk_rq_map_kern(struct request_queue *, struct request *, void *, int blk_rq_append_bio(struct request *rq, struct bio *bio); void blk_execute_rq_nowait(struct request *rq, bool at_head, rq_end_io_fn *end_io); +bool blk_rq_is_poll(struct request *rq); blk_status_t blk_execute_rq(struct request *rq, bool at_head); struct req_iterator { diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 07732bc850af..bbc9c4ea19c3 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -6,6 +6,7 @@ #include <linux/xarray.h> enum { + URING_CMD_POLLED = (1 << 0), URING_CMD_FIXEDBUFS = (1 << 1), }; /* @@ -17,8 +18,13 @@ struct io_uring_cmd { __u16 op; __u16 flags; __u32 len; - /* used if driver requires update in task context*/ - void (*driver_cb)(struct io_uring_cmd *cmd); + union { + void *bio; // Used for polling based completion + + /* used if driver requires update in task context for IRQ based completion*/ + void (*driver_cb)(struct io_uring_cmd *cmd); + }; + __u64 pdu[5]; /* 40 bytes available inline for free use */ }; -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [RFC 00/13] uring-passthru for nvme 2021-12-20 14:17 ` [RFC 00/13] uring-passthru for nvme Kanchan Joshi ` (12 preceding siblings ...) [not found] ` <CGME20211220142256epcas5p49e0804ff8b075e8063259f94ccc9ced0@epcas5p4.samsung.com> @ 2021-12-21 3:45 ` Jens Axboe 2021-12-21 14:36 ` Kanchan Joshi 13 siblings, 1 reply; 25+ messages in thread From: Jens Axboe @ 2021-12-21 3:45 UTC (permalink / raw) To: Kanchan Joshi, io-uring, linux-nvme, linux-block Cc: hch, kbusch, javier, anuj20.g, joshiiitr, pankydev8 On 12/20/21 7:17 AM, Kanchan Joshi wrote: > Here is a revamped series on uring-passthru which is on top of Jens > "nvme-passthru-wip.2" branch. > https://git.kernel.dk/cgit/linux-block/commit/?h=nvme-passthru-wip.2 > > This scales much better than before with the addition of following: > - plugging > - passthru polling (sync and async; sync part comes from a patch that > Keith did earlier) > - bio-cache (this is regardless of irq/polling since we submit/complete in > task-contex anyway. Currently kicks in when fixed-buffer option is > also passed, but that's primarily to keep the plumbing simple) > > Also the feedback from Christoph (previous fixed-buffer series) is in > which has streamlined the plumbing. > > I look forward to further feedback/comments. > > KIOPS(512b) on P5800x looked like this: > > QD uring pt uring-poll pt-poll > 8 538 589 831 902 > 64 967 1131 1351 1378 > 256 1043 1230 1376 1429 These are nice results! Can you share all the job files or fio invocations for each of these? I guess it's just two variants, with QD varied between them? We really (REALLY) should turn the nvme-wip branch into something coherent, but at least with this we have some idea of an end result and something that is testable. This looks so much better from the performance POV than the earlier versions, passthrough _should_ be faster than non-pt. -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC 00/13] uring-passthru for nvme 2021-12-21 3:45 ` [RFC 00/13] uring-passthru for nvme Jens Axboe @ 2021-12-21 14:36 ` Kanchan Joshi 0 siblings, 0 replies; 25+ messages in thread From: Kanchan Joshi @ 2021-12-21 14:36 UTC (permalink / raw) To: Jens Axboe Cc: Kanchan Joshi, io-uring, linux-nvme, linux-block, Christoph Hellwig, Keith Busch, Javier González, anuj20.g, pankydev8 On Tue, Dec 21, 2021 at 9:15 AM Jens Axboe <[email protected]> wrote: > > On 12/20/21 7:17 AM, Kanchan Joshi wrote: > > Here is a revamped series on uring-passthru which is on top of Jens > > "nvme-passthru-wip.2" branch. > > https://git.kernel.dk/cgit/linux-block/commit/?h=nvme-passthru-wip.2 > > > > This scales much better than before with the addition of following: > > - plugging > > - passthru polling (sync and async; sync part comes from a patch that > > Keith did earlier) > > - bio-cache (this is regardless of irq/polling since we submit/complete in > > task-contex anyway. Currently kicks in when fixed-buffer option is > > also passed, but that's primarily to keep the plumbing simple) > > > > Also the feedback from Christoph (previous fixed-buffer series) is in > > which has streamlined the plumbing. > > > > I look forward to further feedback/comments. > > > > KIOPS(512b) on P5800x looked like this: > > > > QD uring pt uring-poll pt-poll > > 8 538 589 831 902 > > 64 967 1131 1351 1378 > > 256 1043 1230 1376 1429 > > These are nice results! Can you share all the job files or fio > invocations for each of these? I guess it's just two variants, with QD > varied between them? Yes, just two variants with three QD/batch combinations. Here are all the job files for the above data: https://github.com/joshkan/fio/tree/nvme-passthru-wip-polling/pt-perf-jobs > We really (REALLY) should turn the nvme-wip branch into something > coherent, but at least with this we have some idea of an end result and > something that is testable. This looks so much better from the > performance POV than the earlier versions, passthrough _should_ be > faster than non-pt. > It'd be great to know how it performs in your setup. And please let me know how I can help in making things more coherent. -- Joshi ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2022-02-18 17:41 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CGME20211220142227epcas5p280851b0a62baa78379979eb81af7a096@epcas5p2.samsung.com> 2021-12-20 14:17 ` [RFC 00/13] uring-passthru for nvme Kanchan Joshi [not found] ` <CGME20211220142228epcas5p2978d92d38f2015148d5f72913d6dbc3e@epcas5p2.samsung.com> 2021-12-20 14:17 ` [RFC 01/13] io_uring: add infra for uring_cmd completion in submitter-task Kanchan Joshi 2022-02-17 2:13 ` Luis Chamberlain 2022-02-17 15:39 ` Kanchan Joshi 2022-02-17 15:50 ` Jens Axboe 2022-02-17 17:56 ` Luis Chamberlain 2022-02-18 17:41 ` Kanchan Joshi 2022-02-17 18:46 ` Luis Chamberlain 2022-02-17 18:53 ` Jens Axboe [not found] ` <CGME20211220142231epcas5p1482c78f91feabdbc3e62341790ab22e1@epcas5p1.samsung.com> 2021-12-20 14:17 ` [RFC 02/13] nvme: wire-up support for async-passthru on char-device Kanchan Joshi [not found] ` <CGME20211220142233epcas5p3b54aa591fb7b81bfb58bc33b5f92a2d3@epcas5p3.samsung.com> 2021-12-20 14:17 ` [RFC 03/13] io_uring: mark iopoll not supported for uring-cmd Kanchan Joshi 2022-02-17 2:16 ` Luis Chamberlain 2022-02-17 2:52 ` Jens Axboe [not found] ` <CGME20211220142235epcas5p3b8d56cd39d9710278ec3360be47f2cca@epcas5p3.samsung.com> 2021-12-20 14:17 ` [RFC 04/13] io_uring: modify unused field in io_uring_cmd to store flags Kanchan Joshi [not found] ` <CGME20211220142237epcas5p48729a52293e4f7627e6ec53ca67b9c58@epcas5p4.samsung.com> 2021-12-20 14:17 ` [RFC 05/13] io_uring: add flag and helper for fixed-buffer uring-cmd Kanchan Joshi [not found] ` <CGME20211220142239epcas5p3efc3c89bd536f3f5d728c81bc550e143@epcas5p3.samsung.com> 2021-12-20 14:17 ` [RFC 06/13] io_uring: add support for uring_cmd with fixed-buffer Kanchan Joshi [not found] ` <CGME20211220142242epcas5p45dddab51a9f20a8ec3d8b8e4f1dda40a@epcas5p4.samsung.com> 2021-12-20 14:17 ` [RFC 07/13] nvme: enable passthrough " Kanchan Joshi [not found] ` <CGME20211220142244epcas5p2f311ed168b8f31b9301bcc2002076db4@epcas5p2.samsung.com> 2021-12-20 14:17 ` [RFC 08/13] io_uring: plug for async bypass Kanchan Joshi [not found] ` <CGME20211220142246epcas5p303c64b6b1b832c7fcd5ac31fc79c91d1@epcas5p3.samsung.com> 2021-12-20 14:17 ` [RFC 09/13] block: wire-up support for plugging Kanchan Joshi [not found] ` <CGME20211220142248epcas5p1e5904e10396f8cdea54bbd8d7aeca9a6@epcas5p1.samsung.com> 2021-12-20 14:17 ` [RFC 10/13] block: factor out helper for bio allocation from cache Kanchan Joshi [not found] ` <CGME20211220142250epcas5p34b9d93b1dd3388af6209a4223befe40f@epcas5p3.samsung.com> 2021-12-20 14:17 ` [RFC 11/13] nvme: enable bio-cache for fixed-buffer passthru Kanchan Joshi [not found] ` <CGME20211220142252epcas5p4611297f9970acbc8ee3b0e325ca5ceec@epcas5p4.samsung.com> 2021-12-20 14:17 ` [RFC 12/13] nvme: allow user passthrough commands to poll Kanchan Joshi [not found] ` <CGME20211220142256epcas5p49e0804ff8b075e8063259f94ccc9ced0@epcas5p4.samsung.com> 2021-12-20 14:17 ` [RFC 13/13] nvme: Add async passthru polling support Kanchan Joshi 2021-12-21 3:45 ` [RFC 00/13] uring-passthru for nvme Jens Axboe 2021-12-21 14:36 ` Kanchan Joshi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox