public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCHSET v2 0/4] Enable NO_OFFLOAD support
@ 2023-04-20 18:31 Jens Axboe
  2023-04-20 18:31 ` [PATCH 1/4] io_uring: add support for NO_OFFLOAD Jens Axboe
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Jens Axboe @ 2023-04-20 18:31 UTC (permalink / raw)
  To: io-uring

This series enables support for forcing no-offload for requests that
otherwise would have been punted to io-wq. In essence, it bypasses
the normal non-blocking issue in favor of just letting the issue block.
This is only done for requests that would've otherwise hit io-wq in
the offload path, anything pollable will still be doing non-blocking
issue. See patch 3 for details.

Git tree: https://git.kernel.dk/cgit/linux/log/?h=io_uring-sync-issue

Since v1:
- Get rid of per-io_kiocb state, only apply this logic off the direct
  and initial submission path. Add IO_URING_F_NO_OFFLOAD for that.
- Due to the above, drop the two first prep patches as they are no
  longer needed.
- Make it mutually exclusive with IORING_SETUP_IOPOLL.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/4] io_uring: add support for NO_OFFLOAD
  2023-04-20 18:31 [PATCHSET v2 0/4] Enable NO_OFFLOAD support Jens Axboe
@ 2023-04-20 18:31 ` Jens Axboe
  2023-04-20 18:31 ` [PATCH 2/4] Revert "io_uring: always go async for unsupported fadvise flags" Jens Axboe
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2023-04-20 18:31 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe

Some applications don't necessarily care about io_uring not blocking for
request issue, they simply want to use io_uring for batched submission
of IO. However, io_uring will always do non-blocking issues, and for
some request types, there's simply no support for doing non-blocking
issue and hence they get punted to io-wq unconditionally. If the
application doesn't care about issue potentially blocking, this causes
a performance slowdown as thread offload is not nearly as efficient as
inline issue.

Add support for configuring the ring with IORING_SETUP_NO_OFFLOAD, and
add an IORING_ENTER_NO_OFFLOAD flag to io_uring_enter(2). If either one
of these is set, then io_uring will ignore the non-block issue attempt
for any file which we cannot poll for readiness. The simplified io_uring
issue model looks as follows:

1) Non-blocking issue is attempted for IO. If successful, we're done for
   now.

2) Case 1 failed. Now we have two options
  	a) We can poll the file. We arm poll, and we're done for now
	   until that triggers.
   	b) File cannot be polled, we punt to io-wq which then does a
	   blocking attempt.

If either of the NO_OFFLOAD flags are set, we should never hit case
2b. Instead, case 1 would issue the IO without the non-blocking flag
being set and perform an inline completion.

Signed-off-by: Jens Axboe <[email protected]>
---
 include/linux/io_uring.h      |  1 +
 include/uapi/linux/io_uring.h |  7 +++++
 io_uring/io_uring.c           | 52 +++++++++++++++++++++++++----------
 io_uring/io_uring.h           |  2 +-
 io_uring/sqpoll.c             |  6 ++--
 5 files changed, 50 insertions(+), 18 deletions(-)

diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 35b9328ca335..386d6b722481 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -13,6 +13,7 @@ enum io_uring_cmd_flags {
 	IO_URING_F_MULTISHOT		= 4,
 	/* executed by io-wq */
 	IO_URING_F_IOWQ			= 8,
+	IO_URING_F_NO_OFFLOAD		= 16,
 	/* int's last bit, sign checks are usually faster than a bit test */
 	IO_URING_F_NONBLOCK		= INT_MIN,
 
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 0716cb17e436..ea903a677ce9 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -173,6 +173,12 @@ enum {
  */
 #define IORING_SETUP_DEFER_TASKRUN	(1U << 13)
 
+/*
+ * Don't attempt non-blocking issue on file types that would otherwise
+ * punt to io-wq if they cannot be completed non-blocking.
+ */
+#define IORING_SETUP_NO_OFFLOAD		(1U << 14)
+
 enum io_uring_op {
 	IORING_OP_NOP,
 	IORING_OP_READV,
@@ -443,6 +449,7 @@ struct io_cqring_offsets {
 #define IORING_ENTER_SQ_WAIT		(1U << 2)
 #define IORING_ENTER_EXT_ARG		(1U << 3)
 #define IORING_ENTER_REGISTERED_RING	(1U << 4)
+#define IORING_ENTER_NO_OFFLOAD		(1U << 5)
 
 /*
  * Passed in for io_uring_setup(2). Copied back with updated info on success
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 3d43df8f1e4e..fee3e461e149 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -147,7 +147,7 @@ static bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
 
 static void io_dismantle_req(struct io_kiocb *req);
 static void io_clean_op(struct io_kiocb *req);
-static void io_queue_sqe(struct io_kiocb *req);
+static void io_queue_sqe(struct io_kiocb *req, unsigned int issue_flags);
 static void io_move_task_work_from_local(struct io_ring_ctx *ctx);
 static void __io_submit_flush_completions(struct io_ring_ctx *ctx);
 static __cold void io_fallback_tw(struct io_uring_task *tctx);
@@ -1471,7 +1471,7 @@ void io_req_task_submit(struct io_kiocb *req, struct io_tw_state *ts)
 	else if (req->flags & REQ_F_FORCE_ASYNC)
 		io_queue_iowq(req, ts);
 	else
-		io_queue_sqe(req);
+		io_queue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
 }
 
 void io_req_task_queue_fail(struct io_kiocb *req, int ret)
@@ -1947,6 +1947,10 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
 	if (unlikely(!io_assign_file(req, def, issue_flags)))
 		return -EBADF;
 
+	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
+	    (!req->file || !file_can_poll(req->file)))
+		issue_flags &= ~IO_URING_F_NONBLOCK;
+
 	if (unlikely((req->flags & REQ_F_CREDS) && req->creds != current_cred()))
 		creds = override_creds(req->creds);
 
@@ -2120,12 +2124,12 @@ static void io_queue_async(struct io_kiocb *req, int ret)
 		io_queue_linked_timeout(linked_timeout);
 }
 
-static inline void io_queue_sqe(struct io_kiocb *req)
+static inline void io_queue_sqe(struct io_kiocb *req, unsigned int issue_flags)
 	__must_hold(&req->ctx->uring_lock)
 {
 	int ret;
 
-	ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
+	ret = io_issue_sqe(req, issue_flags);
 
 	/*
 	 * We async punt it if the file wasn't marked NOWAIT, or if the file
@@ -2337,7 +2341,8 @@ static __cold int io_submit_fail_init(const struct io_uring_sqe *sqe,
 }
 
 static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
-			 const struct io_uring_sqe *sqe)
+			 const struct io_uring_sqe *sqe,
+			 unsigned int aux_issue_flags)
 	__must_hold(&ctx->uring_lock)
 {
 	struct io_submit_link *link = &ctx->submit_state.link;
@@ -2385,7 +2390,8 @@ static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
 		return 0;
 	}
 
-	io_queue_sqe(req);
+	io_queue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER|
+		     aux_issue_flags);
 	return 0;
 }
 
@@ -2466,7 +2472,8 @@ static bool io_get_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe)
 	return false;
 }
 
-int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
+int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr,
+		   unsigned int aux_issue_flags)
 	__must_hold(&ctx->uring_lock)
 {
 	unsigned int entries = io_sqring_entries(ctx);
@@ -2495,7 +2502,7 @@ int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
 		 * Continue submitting even for sqe failure if the
 		 * ring was setup with IORING_SETUP_SUBMIT_ALL
 		 */
-		if (unlikely(io_submit_sqe(ctx, req, sqe)) &&
+		if (unlikely(io_submit_sqe(ctx, req, sqe, aux_issue_flags)) &&
 		    !(ctx->flags & IORING_SETUP_SUBMIT_ALL)) {
 			left--;
 			break;
@@ -3524,7 +3531,8 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 
 	if (unlikely(flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP |
 			       IORING_ENTER_SQ_WAIT | IORING_ENTER_EXT_ARG |
-			       IORING_ENTER_REGISTERED_RING)))
+			       IORING_ENTER_REGISTERED_RING |
+			       IORING_ENTER_NO_OFFLOAD)))
 		return -EINVAL;
 
 	/*
@@ -3575,12 +3583,18 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 
 		ret = to_submit;
 	} else if (to_submit) {
+		unsigned int aux_issue_flags = 0;
+
 		ret = io_uring_add_tctx_node(ctx);
 		if (unlikely(ret))
 			goto out;
 
+		if (flags & IORING_ENTER_NO_OFFLOAD ||
+		    ctx->flags & IORING_SETUP_NO_OFFLOAD)
+			aux_issue_flags = IO_URING_F_NO_OFFLOAD;
+
 		mutex_lock(&ctx->uring_lock);
-		ret = io_submit_sqes(ctx, to_submit);
+		ret = io_submit_sqes(ctx, to_submit, aux_issue_flags);
 		if (ret != to_submit) {
 			mutex_unlock(&ctx->uring_lock);
 			goto out;
@@ -3827,9 +3841,17 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
 	 * polling again, they can rely on io_sq_thread to do polling
 	 * work, which can reduce cpu usage and uring_lock contention.
 	 */
-	if (ctx->flags & IORING_SETUP_IOPOLL &&
-	    !(ctx->flags & IORING_SETUP_SQPOLL))
-		ctx->syscall_iopoll = 1;
+	ret = -EINVAL;
+	if (ctx->flags & IORING_SETUP_IOPOLL) {
+		/*
+		 * Can't sanely block for issue for IOPOLL, nor does this
+		 * combination make any sense. Disallow it.
+		 */
+		if (ctx->flags & IORING_SETUP_NO_OFFLOAD)
+			goto err;
+		if (!(ctx->flags & IORING_SETUP_SQPOLL))
+			ctx->syscall_iopoll = 1;
+	}
 
 	ctx->compat = in_compat_syscall();
 	if (!capable(CAP_IPC_LOCK))
@@ -3839,7 +3861,6 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
 	 * For SQPOLL, we just need a wakeup, always. For !SQPOLL, if
 	 * COOP_TASKRUN is set, then IPIs are never needed by the app.
 	 */
-	ret = -EINVAL;
 	if (ctx->flags & IORING_SETUP_SQPOLL) {
 		/* IPI related flags don't make sense with SQPOLL */
 		if (ctx->flags & (IORING_SETUP_COOP_TASKRUN |
@@ -3969,7 +3990,8 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
 			IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL |
 			IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG |
 			IORING_SETUP_SQE128 | IORING_SETUP_CQE32 |
-			IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN))
+			IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN |
+			IORING_SETUP_NO_OFFLOAD))
 		return -EINVAL;
 
 	return io_uring_create(entries, &p, params);
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 25515d69d205..fb3619ae0fd3 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -76,7 +76,7 @@ int io_uring_alloc_task_context(struct task_struct *task,
 				struct io_ring_ctx *ctx);
 
 int io_poll_issue(struct io_kiocb *req, struct io_tw_state *ts);
-int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr);
+int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr, unsigned int aux_issue_flags);
 int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin);
 void io_free_batch_list(struct io_ring_ctx *ctx, struct io_wq_work_node *node);
 int io_req_prep_async(struct io_kiocb *req);
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index 9db4bc1f521a..9f2968a441ce 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -166,7 +166,7 @@ static inline bool io_sqd_events_pending(struct io_sq_data *sqd)
 
 static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
 {
-	unsigned int to_submit;
+	unsigned int to_submit, aux_issue_flags = 0;
 	int ret = 0;
 
 	to_submit = io_sqring_entries(ctx);
@@ -179,6 +179,8 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
 
 		if (ctx->sq_creds != current_cred())
 			creds = override_creds(ctx->sq_creds);
+		if (ctx->flags & IORING_SETUP_NO_OFFLOAD)
+			aux_issue_flags = IO_URING_F_NO_OFFLOAD;
 
 		mutex_lock(&ctx->uring_lock);
 		if (!wq_list_empty(&ctx->iopoll_list))
@@ -190,7 +192,7 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
 		 */
 		if (to_submit && likely(!percpu_ref_is_dying(&ctx->refs)) &&
 		    !(ctx->flags & IORING_SETUP_R_DISABLED))
-			ret = io_submit_sqes(ctx, to_submit);
+			ret = io_submit_sqes(ctx, to_submit, aux_issue_flags);
 		mutex_unlock(&ctx->uring_lock);
 
 		if (to_submit && wq_has_sleeper(&ctx->sqo_sq_wait))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/4] Revert "io_uring: always go async for unsupported fadvise flags"
  2023-04-20 18:31 [PATCHSET v2 0/4] Enable NO_OFFLOAD support Jens Axboe
  2023-04-20 18:31 ` [PATCH 1/4] io_uring: add support for NO_OFFLOAD Jens Axboe
@ 2023-04-20 18:31 ` Jens Axboe
  2023-04-20 18:31 ` [PATCH 3/4] Revert "io_uring: for requests that require async, force it" Jens Axboe
  2023-04-20 18:31 ` [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt Jens Axboe
  3 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2023-04-20 18:31 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe

This reverts commit c31cc60fddd11134031e7f9eb76812353cfaac84.

In preparation for handling this a bit differently, revert this
cleanup.

Signed-off-by: Jens Axboe <[email protected]>
---
 io_uring/advise.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/io_uring/advise.c b/io_uring/advise.c
index 7085804c513c..cf600579bffe 100644
--- a/io_uring/advise.c
+++ b/io_uring/advise.c
@@ -62,18 +62,6 @@ int io_madvise(struct io_kiocb *req, unsigned int issue_flags)
 #endif
 }
 
-static bool io_fadvise_force_async(struct io_fadvise *fa)
-{
-	switch (fa->advice) {
-	case POSIX_FADV_NORMAL:
-	case POSIX_FADV_RANDOM:
-	case POSIX_FADV_SEQUENTIAL:
-		return false;
-	default:
-		return true;
-	}
-}
-
 int io_fadvise_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_fadvise *fa = io_kiocb_to_cmd(req, struct io_fadvise);
@@ -84,8 +72,6 @@ int io_fadvise_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	fa->offset = READ_ONCE(sqe->off);
 	fa->len = READ_ONCE(sqe->len);
 	fa->advice = READ_ONCE(sqe->fadvise_advice);
-	if (io_fadvise_force_async(fa))
-		req->flags |= REQ_F_FORCE_ASYNC;
 	return 0;
 }
 
@@ -94,7 +80,16 @@ int io_fadvise(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_fadvise *fa = io_kiocb_to_cmd(req, struct io_fadvise);
 	int ret;
 
-	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK && io_fadvise_force_async(fa));
+	if (issue_flags & IO_URING_F_NONBLOCK) {
+		switch (fa->advice) {
+		case POSIX_FADV_NORMAL:
+		case POSIX_FADV_RANDOM:
+		case POSIX_FADV_SEQUENTIAL:
+			break;
+		default:
+			return -EAGAIN;
+		}
+	}
 
 	ret = vfs_fadvise(req->file, fa->offset, fa->len, fa->advice);
 	if (ret < 0)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/4] Revert "io_uring: for requests that require async, force it"
  2023-04-20 18:31 [PATCHSET v2 0/4] Enable NO_OFFLOAD support Jens Axboe
  2023-04-20 18:31 ` [PATCH 1/4] io_uring: add support for NO_OFFLOAD Jens Axboe
  2023-04-20 18:31 ` [PATCH 2/4] Revert "io_uring: always go async for unsupported fadvise flags" Jens Axboe
@ 2023-04-20 18:31 ` Jens Axboe
  2023-04-20 18:31 ` [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt Jens Axboe
  3 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2023-04-20 18:31 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe

This reverts commit aebb224fd4fc7352cd839ad90414c548387142fd.

In preparation for handling this a bit differently, revert this
cleanup.

Signed-off-by: Jens Axboe <[email protected]>
---
 io_uring/advise.c |  4 ++--
 io_uring/fs.c     | 20 ++++++++++----------
 io_uring/net.c    |  4 ++--
 io_uring/splice.c |  7 ++++---
 io_uring/statx.c  |  4 ++--
 io_uring/sync.c   | 14 ++++++--------
 io_uring/xattr.c  | 14 ++++++++------
 7 files changed, 34 insertions(+), 33 deletions(-)

diff --git a/io_uring/advise.c b/io_uring/advise.c
index cf600579bffe..449c6f14649f 100644
--- a/io_uring/advise.c
+++ b/io_uring/advise.c
@@ -39,7 +39,6 @@ int io_madvise_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	ma->addr = READ_ONCE(sqe->addr);
 	ma->len = READ_ONCE(sqe->len);
 	ma->advice = READ_ONCE(sqe->fadvise_advice);
-	req->flags |= REQ_F_FORCE_ASYNC;
 	return 0;
 #else
 	return -EOPNOTSUPP;
@@ -52,7 +51,8 @@ int io_madvise(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_madvise *ma = io_kiocb_to_cmd(req, struct io_madvise);
 	int ret;
 
-	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
 
 	ret = do_madvise(current->mm, ma->addr, ma->len, ma->advice);
 	io_req_set_res(req, ret, 0);
diff --git a/io_uring/fs.c b/io_uring/fs.c
index f6a69a549fd4..7100c293c13a 100644
--- a/io_uring/fs.c
+++ b/io_uring/fs.c
@@ -74,7 +74,6 @@ int io_renameat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	}
 
 	req->flags |= REQ_F_NEED_CLEANUP;
-	req->flags |= REQ_F_FORCE_ASYNC;
 	return 0;
 }
 
@@ -83,7 +82,8 @@ int io_renameat(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_rename *ren = io_kiocb_to_cmd(req, struct io_rename);
 	int ret;
 
-	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
 
 	ret = do_renameat2(ren->old_dfd, ren->oldpath, ren->new_dfd,
 				ren->newpath, ren->flags);
@@ -123,7 +123,6 @@ int io_unlinkat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 		return PTR_ERR(un->filename);
 
 	req->flags |= REQ_F_NEED_CLEANUP;
-	req->flags |= REQ_F_FORCE_ASYNC;
 	return 0;
 }
 
@@ -132,7 +131,8 @@ int io_unlinkat(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_unlink *un = io_kiocb_to_cmd(req, struct io_unlink);
 	int ret;
 
-	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
 
 	if (un->flags & AT_REMOVEDIR)
 		ret = do_rmdir(un->dfd, un->filename);
@@ -170,7 +170,6 @@ int io_mkdirat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 		return PTR_ERR(mkd->filename);
 
 	req->flags |= REQ_F_NEED_CLEANUP;
-	req->flags |= REQ_F_FORCE_ASYNC;
 	return 0;
 }
 
@@ -179,7 +178,8 @@ int io_mkdirat(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_mkdir *mkd = io_kiocb_to_cmd(req, struct io_mkdir);
 	int ret;
 
-	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
 
 	ret = do_mkdirat(mkd->dfd, mkd->filename, mkd->mode);
 
@@ -220,7 +220,6 @@ int io_symlinkat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	}
 
 	req->flags |= REQ_F_NEED_CLEANUP;
-	req->flags |= REQ_F_FORCE_ASYNC;
 	return 0;
 }
 
@@ -229,7 +228,8 @@ int io_symlinkat(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_link *sl = io_kiocb_to_cmd(req, struct io_link);
 	int ret;
 
-	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
 
 	ret = do_symlinkat(sl->oldpath, sl->new_dfd, sl->newpath);
 
@@ -265,7 +265,6 @@ int io_linkat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	}
 
 	req->flags |= REQ_F_NEED_CLEANUP;
-	req->flags |= REQ_F_FORCE_ASYNC;
 	return 0;
 }
 
@@ -274,7 +273,8 @@ int io_linkat(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_link *lnk = io_kiocb_to_cmd(req, struct io_link);
 	int ret;
 
-	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
 
 	ret = do_linkat(lnk->old_dfd, lnk->oldpath, lnk->new_dfd,
 				lnk->newpath, lnk->flags);
diff --git a/io_uring/net.c b/io_uring/net.c
index 4040cf093318..baa30f578dcd 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -91,7 +91,6 @@ int io_shutdown_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 		return -EINVAL;
 
 	shutdown->how = READ_ONCE(sqe->len);
-	req->flags |= REQ_F_FORCE_ASYNC;
 	return 0;
 }
 
@@ -101,7 +100,8 @@ int io_shutdown(struct io_kiocb *req, unsigned int issue_flags)
 	struct socket *sock;
 	int ret;
 
-	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
 
 	sock = sock_from_file(req->file);
 	if (unlikely(!sock))
diff --git a/io_uring/splice.c b/io_uring/splice.c
index 2a4bbb719531..53e4232d0866 100644
--- a/io_uring/splice.c
+++ b/io_uring/splice.c
@@ -34,7 +34,6 @@ static int __io_splice_prep(struct io_kiocb *req,
 	if (unlikely(sp->flags & ~valid_flags))
 		return -EINVAL;
 	sp->splice_fd_in = READ_ONCE(sqe->splice_fd_in);
-	req->flags |= REQ_F_FORCE_ASYNC;
 	return 0;
 }
 
@@ -53,7 +52,8 @@ int io_tee(struct io_kiocb *req, unsigned int issue_flags)
 	struct file *in;
 	long ret = 0;
 
-	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
 
 	if (sp->flags & SPLICE_F_FD_IN_FIXED)
 		in = io_file_get_fixed(req, sp->splice_fd_in, issue_flags);
@@ -94,7 +94,8 @@ int io_splice(struct io_kiocb *req, unsigned int issue_flags)
 	struct file *in;
 	long ret = 0;
 
-	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
 
 	if (sp->flags & SPLICE_F_FD_IN_FIXED)
 		in = io_file_get_fixed(req, sp->splice_fd_in, issue_flags);
diff --git a/io_uring/statx.c b/io_uring/statx.c
index abb874209caa..d8fc933d3f59 100644
--- a/io_uring/statx.c
+++ b/io_uring/statx.c
@@ -48,7 +48,6 @@ int io_statx_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	}
 
 	req->flags |= REQ_F_NEED_CLEANUP;
-	req->flags |= REQ_F_FORCE_ASYNC;
 	return 0;
 }
 
@@ -57,7 +56,8 @@ int io_statx(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_statx *sx = io_kiocb_to_cmd(req, struct io_statx);
 	int ret;
 
-	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
 
 	ret = do_statx(sx->dfd, sx->filename, sx->flags, sx->mask, sx->buffer);
 	io_req_set_res(req, ret, 0);
diff --git a/io_uring/sync.c b/io_uring/sync.c
index 255f68c37e55..64e87ea2b8fb 100644
--- a/io_uring/sync.c
+++ b/io_uring/sync.c
@@ -32,8 +32,6 @@ int io_sfr_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	sync->off = READ_ONCE(sqe->off);
 	sync->len = READ_ONCE(sqe->len);
 	sync->flags = READ_ONCE(sqe->sync_range_flags);
-	req->flags |= REQ_F_FORCE_ASYNC;
-
 	return 0;
 }
 
@@ -43,7 +41,8 @@ int io_sync_file_range(struct io_kiocb *req, unsigned int issue_flags)
 	int ret;
 
 	/* sync_file_range always requires a blocking context */
-	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
 
 	ret = sync_file_range(req->file, sync->off, sync->len, sync->flags);
 	io_req_set_res(req, ret, 0);
@@ -63,7 +62,6 @@ int io_fsync_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 
 	sync->off = READ_ONCE(sqe->off);
 	sync->len = READ_ONCE(sqe->len);
-	req->flags |= REQ_F_FORCE_ASYNC;
 	return 0;
 }
 
@@ -74,7 +72,8 @@ int io_fsync(struct io_kiocb *req, unsigned int issue_flags)
 	int ret;
 
 	/* fsync always requires a blocking context */
-	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
 
 	ret = vfs_fsync_range(req->file, sync->off, end > 0 ? end : LLONG_MAX,
 				sync->flags & IORING_FSYNC_DATASYNC);
@@ -92,7 +91,6 @@ int io_fallocate_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	sync->off = READ_ONCE(sqe->off);
 	sync->len = READ_ONCE(sqe->addr);
 	sync->mode = READ_ONCE(sqe->len);
-	req->flags |= REQ_F_FORCE_ASYNC;
 	return 0;
 }
 
@@ -102,8 +100,8 @@ int io_fallocate(struct io_kiocb *req, unsigned int issue_flags)
 	int ret;
 
 	/* fallocate always requiring blocking context */
-	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
-
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
 	ret = vfs_fallocate(req->file, sync->mode, sync->off, sync->len);
 	if (ret >= 0)
 		fsnotify_modify(req->file);
diff --git a/io_uring/xattr.c b/io_uring/xattr.c
index e1c810e0b85a..6201a9f442c6 100644
--- a/io_uring/xattr.c
+++ b/io_uring/xattr.c
@@ -75,7 +75,6 @@ static int __io_getxattr_prep(struct io_kiocb *req,
 	}
 
 	req->flags |= REQ_F_NEED_CLEANUP;
-	req->flags |= REQ_F_FORCE_ASYNC;
 	return 0;
 }
 
@@ -110,7 +109,8 @@ int io_fgetxattr(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_xattr *ix = io_kiocb_to_cmd(req, struct io_xattr);
 	int ret;
 
-	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
 
 	ret = do_getxattr(mnt_idmap(req->file->f_path.mnt),
 			req->file->f_path.dentry,
@@ -127,7 +127,8 @@ int io_getxattr(struct io_kiocb *req, unsigned int issue_flags)
 	struct path path;
 	int ret;
 
-	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
 
 retry:
 	ret = filename_lookup(AT_FDCWD, ix->filename, lookup_flags, &path, NULL);
@@ -173,7 +174,6 @@ static int __io_setxattr_prep(struct io_kiocb *req,
 	}
 
 	req->flags |= REQ_F_NEED_CLEANUP;
-	req->flags |= REQ_F_FORCE_ASYNC;
 	return 0;
 }
 
@@ -222,7 +222,8 @@ int io_fsetxattr(struct io_kiocb *req, unsigned int issue_flags)
 {
 	int ret;
 
-	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
 
 	ret = __io_setxattr(req, issue_flags, &req->file->f_path);
 	io_xattr_finish(req, ret);
@@ -236,7 +237,8 @@ int io_setxattr(struct io_kiocb *req, unsigned int issue_flags)
 	struct path path;
 	int ret;
 
-	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
 
 retry:
 	ret = filename_lookup(AT_FDCWD, ix->filename, lookup_flags, &path, NULL);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-20 18:31 [PATCHSET v2 0/4] Enable NO_OFFLOAD support Jens Axboe
                   ` (2 preceding siblings ...)
  2023-04-20 18:31 ` [PATCH 3/4] Revert "io_uring: for requests that require async, force it" Jens Axboe
@ 2023-04-20 18:31 ` Jens Axboe
  2023-04-24  7:30   ` Ming Lei
  3 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2023-04-20 18:31 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe

Add an opdef bit for them, and set it for the opcodes where we always
need io-wq punt. With that done, exclude them from the file_can_poll()
check in terms of whether or not we need to punt them if any of the
NO_OFFLOAD flags are set.

Signed-off-by: Jens Axboe <[email protected]>
---
 io_uring/io_uring.c |  2 +-
 io_uring/opdef.c    | 22 ++++++++++++++++++++--
 io_uring/opdef.h    |  2 ++
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index fee3e461e149..420cfd35ebc6 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
 		return -EBADF;
 
 	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
-	    (!req->file || !file_can_poll(req->file)))
+	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
 		issue_flags &= ~IO_URING_F_NONBLOCK;
 
 	if (unlikely((req->flags & REQ_F_CREDS) && req->creds != current_cred()))
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index cca7c5b55208..686d46001622 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -82,6 +82,7 @@ const struct io_issue_def io_issue_defs[] = {
 	[IORING_OP_FSYNC] = {
 		.needs_file		= 1,
 		.audit_skip		= 1,
+		.always_iowq		= 1,
 		.prep			= io_fsync_prep,
 		.issue			= io_fsync,
 	},
@@ -125,6 +126,7 @@ const struct io_issue_def io_issue_defs[] = {
 	[IORING_OP_SYNC_FILE_RANGE] = {
 		.needs_file		= 1,
 		.audit_skip		= 1,
+		.always_iowq		= 1,
 		.prep			= io_sfr_prep,
 		.issue			= io_sync_file_range,
 	},
@@ -202,6 +204,7 @@ const struct io_issue_def io_issue_defs[] = {
 	},
 	[IORING_OP_FALLOCATE] = {
 		.needs_file		= 1,
+		.always_iowq		= 1,
 		.prep			= io_fallocate_prep,
 		.issue			= io_fallocate,
 	},
@@ -221,6 +224,7 @@ const struct io_issue_def io_issue_defs[] = {
 	},
 	[IORING_OP_STATX] = {
 		.audit_skip		= 1,
+		.always_iowq		= 1,
 		.prep			= io_statx_prep,
 		.issue			= io_statx,
 	},
@@ -253,11 +257,13 @@ const struct io_issue_def io_issue_defs[] = {
 	[IORING_OP_FADVISE] = {
 		.needs_file		= 1,
 		.audit_skip		= 1,
+		.always_iowq		= 1,
 		.prep			= io_fadvise_prep,
 		.issue			= io_fadvise,
 	},
 	[IORING_OP_MADVISE] = {
 		.audit_skip		= 1,
+		.always_iowq		= 1,
 		.prep			= io_madvise_prep,
 		.issue			= io_madvise,
 	},
@@ -308,6 +314,7 @@ const struct io_issue_def io_issue_defs[] = {
 		.hash_reg_file		= 1,
 		.unbound_nonreg_file	= 1,
 		.audit_skip		= 1,
+		.always_iowq		= 1,
 		.prep			= io_splice_prep,
 		.issue			= io_splice,
 	},
@@ -328,11 +335,13 @@ const struct io_issue_def io_issue_defs[] = {
 		.hash_reg_file		= 1,
 		.unbound_nonreg_file	= 1,
 		.audit_skip		= 1,
+		.always_iowq		= 1,
 		.prep			= io_tee_prep,
 		.issue			= io_tee,
 	},
 	[IORING_OP_SHUTDOWN] = {
 		.needs_file		= 1,
+		.always_iowq		= 1,
 #if defined(CONFIG_NET)
 		.prep			= io_shutdown_prep,
 		.issue			= io_shutdown,
@@ -343,22 +352,27 @@ const struct io_issue_def io_issue_defs[] = {
 	[IORING_OP_RENAMEAT] = {
 		.prep			= io_renameat_prep,
 		.issue			= io_renameat,
+		.always_iowq		= 1,
 	},
 	[IORING_OP_UNLINKAT] = {
 		.prep			= io_unlinkat_prep,
 		.issue			= io_unlinkat,
+		.always_iowq		= 1,
 	},
 	[IORING_OP_MKDIRAT] = {
 		.prep			= io_mkdirat_prep,
 		.issue			= io_mkdirat,
+		.always_iowq		= 1,
 	},
 	[IORING_OP_SYMLINKAT] = {
 		.prep			= io_symlinkat_prep,
 		.issue			= io_symlinkat,
+		.always_iowq		= 1,
 	},
 	[IORING_OP_LINKAT] = {
 		.prep			= io_linkat_prep,
 		.issue			= io_linkat,
+		.always_iowq		= 1,
 	},
 	[IORING_OP_MSG_RING] = {
 		.needs_file		= 1,
@@ -367,20 +381,24 @@ const struct io_issue_def io_issue_defs[] = {
 		.issue			= io_msg_ring,
 	},
 	[IORING_OP_FSETXATTR] = {
-		.needs_file = 1,
+		.needs_file		= 1,
+		.always_iowq		= 1,
 		.prep			= io_fsetxattr_prep,
 		.issue			= io_fsetxattr,
 	},
 	[IORING_OP_SETXATTR] = {
+		.always_iowq		= 1,
 		.prep			= io_setxattr_prep,
 		.issue			= io_setxattr,
 	},
 	[IORING_OP_FGETXATTR] = {
-		.needs_file = 1,
+		.needs_file		= 1,
+		.always_iowq		= 1,
 		.prep			= io_fgetxattr_prep,
 		.issue			= io_fgetxattr,
 	},
 	[IORING_OP_GETXATTR] = {
+		.always_iowq		= 1,
 		.prep			= io_getxattr_prep,
 		.issue			= io_getxattr,
 	},
diff --git a/io_uring/opdef.h b/io_uring/opdef.h
index c22c8696e749..657a831249ff 100644
--- a/io_uring/opdef.h
+++ b/io_uring/opdef.h
@@ -29,6 +29,8 @@ struct io_issue_def {
 	unsigned		iopoll_queue : 1;
 	/* opcode specific path will handle ->async_data allocation if needed */
 	unsigned		manual_alloc : 1;
+	/* op always needs io-wq offload */
+	unsigned		always_iowq : 1;
 
 	int (*issue)(struct io_kiocb *, unsigned int);
 	int (*prep)(struct io_kiocb *, const struct io_uring_sqe *);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-20 18:31 ` [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt Jens Axboe
@ 2023-04-24  7:30   ` Ming Lei
  2023-04-24 15:24     ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2023-04-24  7:30 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, ming.lei

On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
> Add an opdef bit for them, and set it for the opcodes where we always
> need io-wq punt. With that done, exclude them from the file_can_poll()
> check in terms of whether or not we need to punt them if any of the
> NO_OFFLOAD flags are set.
> 
> Signed-off-by: Jens Axboe <[email protected]>
> ---
>  io_uring/io_uring.c |  2 +-
>  io_uring/opdef.c    | 22 ++++++++++++++++++++--
>  io_uring/opdef.h    |  2 ++
>  3 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index fee3e461e149..420cfd35ebc6 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>  		return -EBADF;
>  
>  	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
> -	    (!req->file || !file_can_poll(req->file)))
> +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
>  		issue_flags &= ~IO_URING_F_NONBLOCK;

I guess the check should be !def->always_iowq?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-24  7:30   ` Ming Lei
@ 2023-04-24 15:24     ` Jens Axboe
  2023-04-25  0:57       ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2023-04-24 15:24 UTC (permalink / raw)
  To: Ming Lei; +Cc: io-uring

On 4/24/23 1:30?AM, Ming Lei wrote:
> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
>> Add an opdef bit for them, and set it for the opcodes where we always
>> need io-wq punt. With that done, exclude them from the file_can_poll()
>> check in terms of whether or not we need to punt them if any of the
>> NO_OFFLOAD flags are set.
>>
>> Signed-off-by: Jens Axboe <[email protected]>
>> ---
>>  io_uring/io_uring.c |  2 +-
>>  io_uring/opdef.c    | 22 ++++++++++++++++++++--
>>  io_uring/opdef.h    |  2 ++
>>  3 files changed, 23 insertions(+), 3 deletions(-)
>>
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index fee3e461e149..420cfd35ebc6 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>>  		return -EBADF;
>>  
>>  	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
>> -	    (!req->file || !file_can_poll(req->file)))
>> +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
>>  		issue_flags &= ~IO_URING_F_NONBLOCK;
> 
> I guess the check should be !def->always_iowq?

How so? Nobody that takes pollable files should/is setting
->always_iowq. If we can poll the file, we should not force inline
submission. Basically the ones setting ->always_iowq always do -EAGAIN
returns if nonblock == true.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-24 15:24     ` Jens Axboe
@ 2023-04-25  0:57       ` Ming Lei
  2023-04-25  2:08         ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2023-04-25  0:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, ming.lei

On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
> On 4/24/23 1:30?AM, Ming Lei wrote:
> > On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
> >> Add an opdef bit for them, and set it for the opcodes where we always
> >> need io-wq punt. With that done, exclude them from the file_can_poll()
> >> check in terms of whether or not we need to punt them if any of the
> >> NO_OFFLOAD flags are set.
> >>
> >> Signed-off-by: Jens Axboe <[email protected]>
> >> ---
> >>  io_uring/io_uring.c |  2 +-
> >>  io_uring/opdef.c    | 22 ++++++++++++++++++++--
> >>  io_uring/opdef.h    |  2 ++
> >>  3 files changed, 23 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> >> index fee3e461e149..420cfd35ebc6 100644
> >> --- a/io_uring/io_uring.c
> >> +++ b/io_uring/io_uring.c
> >> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
> >>  		return -EBADF;
> >>  
> >>  	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
> >> -	    (!req->file || !file_can_poll(req->file)))
> >> +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
> >>  		issue_flags &= ~IO_URING_F_NONBLOCK;
> > 
> > I guess the check should be !def->always_iowq?
> 
> How so? Nobody that takes pollable files should/is setting
> ->always_iowq. If we can poll the file, we should not force inline
> submission. Basically the ones setting ->always_iowq always do -EAGAIN
> returns if nonblock == true.

I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
these OPs won't return -EAGAIN, then run in the current task context
directly.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-25  0:57       ` Ming Lei
@ 2023-04-25  2:08         ` Jens Axboe
  2023-04-25  2:13           ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2023-04-25  2:08 UTC (permalink / raw)
  To: Ming Lei; +Cc: io-uring

On 4/24/23 6:57?PM, Ming Lei wrote:
> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
>> On 4/24/23 1:30?AM, Ming Lei wrote:
>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
>>>> Add an opdef bit for them, and set it for the opcodes where we always
>>>> need io-wq punt. With that done, exclude them from the file_can_poll()
>>>> check in terms of whether or not we need to punt them if any of the
>>>> NO_OFFLOAD flags are set.
>>>>
>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>> ---
>>>>  io_uring/io_uring.c |  2 +-
>>>>  io_uring/opdef.c    | 22 ++++++++++++++++++++--
>>>>  io_uring/opdef.h    |  2 ++
>>>>  3 files changed, 23 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>> index fee3e461e149..420cfd35ebc6 100644
>>>> --- a/io_uring/io_uring.c
>>>> +++ b/io_uring/io_uring.c
>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>>>>  		return -EBADF;
>>>>  
>>>>  	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
>>>> -	    (!req->file || !file_can_poll(req->file)))
>>>> +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
>>>>  		issue_flags &= ~IO_URING_F_NONBLOCK;
>>>
>>> I guess the check should be !def->always_iowq?
>>
>> How so? Nobody that takes pollable files should/is setting
>> ->always_iowq. If we can poll the file, we should not force inline
>> submission. Basically the ones setting ->always_iowq always do -EAGAIN
>> returns if nonblock == true.
> 
> I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
> these OPs won't return -EAGAIN, then run in the current task context
> directly.

Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
it :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-25  2:08         ` Jens Axboe
@ 2023-04-25  2:13           ` Ming Lei
  2023-04-25  2:18             ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2023-04-25  2:13 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, ming.lei

On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
> On 4/24/23 6:57?PM, Ming Lei wrote:
> > On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
> >> On 4/24/23 1:30?AM, Ming Lei wrote:
> >>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
> >>>> Add an opdef bit for them, and set it for the opcodes where we always
> >>>> need io-wq punt. With that done, exclude them from the file_can_poll()
> >>>> check in terms of whether or not we need to punt them if any of the
> >>>> NO_OFFLOAD flags are set.
> >>>>
> >>>> Signed-off-by: Jens Axboe <[email protected]>
> >>>> ---
> >>>>  io_uring/io_uring.c |  2 +-
> >>>>  io_uring/opdef.c    | 22 ++++++++++++++++++++--
> >>>>  io_uring/opdef.h    |  2 ++
> >>>>  3 files changed, 23 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> >>>> index fee3e461e149..420cfd35ebc6 100644
> >>>> --- a/io_uring/io_uring.c
> >>>> +++ b/io_uring/io_uring.c
> >>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
> >>>>  		return -EBADF;
> >>>>  
> >>>>  	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
> >>>> -	    (!req->file || !file_can_poll(req->file)))
> >>>> +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
> >>>>  		issue_flags &= ~IO_URING_F_NONBLOCK;
> >>>
> >>> I guess the check should be !def->always_iowq?
> >>
> >> How so? Nobody that takes pollable files should/is setting
> >> ->always_iowq. If we can poll the file, we should not force inline
> >> submission. Basically the ones setting ->always_iowq always do -EAGAIN
> >> returns if nonblock == true.
> > 
> > I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
> > these OPs won't return -EAGAIN, then run in the current task context
> > directly.
> 
> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
> it :-)

But ->always_iowq isn't actually _always_ since fallocate/fsync/... are
not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of
->always_iowq is a bit confusing?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-25  2:13           ` Ming Lei
@ 2023-04-25  2:18             ` Jens Axboe
  2023-04-25  2:50               ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2023-04-25  2:18 UTC (permalink / raw)
  To: Ming Lei; +Cc: io-uring

On 4/24/23 8:13?PM, Ming Lei wrote:
> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
>> On 4/24/23 6:57?PM, Ming Lei wrote:
>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
>>>> On 4/24/23 1:30?AM, Ming Lei wrote:
>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
>>>>>> Add an opdef bit for them, and set it for the opcodes where we always
>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll()
>>>>>> check in terms of whether or not we need to punt them if any of the
>>>>>> NO_OFFLOAD flags are set.
>>>>>>
>>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>>> ---
>>>>>>  io_uring/io_uring.c |  2 +-
>>>>>>  io_uring/opdef.c    | 22 ++++++++++++++++++++--
>>>>>>  io_uring/opdef.h    |  2 ++
>>>>>>  3 files changed, 23 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>>>> index fee3e461e149..420cfd35ebc6 100644
>>>>>> --- a/io_uring/io_uring.c
>>>>>> +++ b/io_uring/io_uring.c
>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>>>>>>  		return -EBADF;
>>>>>>  
>>>>>>  	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
>>>>>> -	    (!req->file || !file_can_poll(req->file)))
>>>>>> +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
>>>>>>  		issue_flags &= ~IO_URING_F_NONBLOCK;
>>>>>
>>>>> I guess the check should be !def->always_iowq?
>>>>
>>>> How so? Nobody that takes pollable files should/is setting
>>>> ->always_iowq. If we can poll the file, we should not force inline
>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN
>>>> returns if nonblock == true.
>>>
>>> I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
>>> these OPs won't return -EAGAIN, then run in the current task context
>>> directly.
>>
>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
>> it :-)
> 
> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are
> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of
> ->always_iowq is a bit confusing?

Yeah naming isn't that great, I can see how that's bit confusing. I'll
be happy to take suggestions on what would make it clearer.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-25  2:18             ` Jens Axboe
@ 2023-04-25  2:50               ` Ming Lei
  2023-04-25 13:31                 ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2023-04-25  2:50 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, ming.lei

On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote:
> On 4/24/23 8:13?PM, Ming Lei wrote:
> > On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
> >> On 4/24/23 6:57?PM, Ming Lei wrote:
> >>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
> >>>> On 4/24/23 1:30?AM, Ming Lei wrote:
> >>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
> >>>>>> Add an opdef bit for them, and set it for the opcodes where we always
> >>>>>> need io-wq punt. With that done, exclude them from the file_can_poll()
> >>>>>> check in terms of whether or not we need to punt them if any of the
> >>>>>> NO_OFFLOAD flags are set.
> >>>>>>
> >>>>>> Signed-off-by: Jens Axboe <[email protected]>
> >>>>>> ---
> >>>>>>  io_uring/io_uring.c |  2 +-
> >>>>>>  io_uring/opdef.c    | 22 ++++++++++++++++++++--
> >>>>>>  io_uring/opdef.h    |  2 ++
> >>>>>>  3 files changed, 23 insertions(+), 3 deletions(-)
> >>>>>>
> >>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> >>>>>> index fee3e461e149..420cfd35ebc6 100644
> >>>>>> --- a/io_uring/io_uring.c
> >>>>>> +++ b/io_uring/io_uring.c
> >>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
> >>>>>>  		return -EBADF;
> >>>>>>  
> >>>>>>  	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
> >>>>>> -	    (!req->file || !file_can_poll(req->file)))
> >>>>>> +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
> >>>>>>  		issue_flags &= ~IO_URING_F_NONBLOCK;
> >>>>>
> >>>>> I guess the check should be !def->always_iowq?
> >>>>
> >>>> How so? Nobody that takes pollable files should/is setting
> >>>> ->always_iowq. If we can poll the file, we should not force inline
> >>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN
> >>>> returns if nonblock == true.
> >>>
> >>> I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
> >>> these OPs won't return -EAGAIN, then run in the current task context
> >>> directly.
> >>
> >> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
> >> it :-)
> > 
> > But ->always_iowq isn't actually _always_ since fallocate/fsync/... are
> > not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of
> > ->always_iowq is a bit confusing?
> 
> Yeah naming isn't that great, I can see how that's bit confusing. I'll
> be happy to take suggestions on what would make it clearer.

Except for the naming, I am also wondering why these ->always_iowq OPs
aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it
shouldn't improve performance by doing so because these OPs are supposed
to be slow and always slept, not like others(buffered writes, ...),
can you provide one hint about not offloading these OPs? Or is it just that
NO_OFFLOAD needs to not offload every OPs?

Or can we rename IORING_SETUP_NO_OFFLOAD as IORING_SETUP_SUBMIT_MAY_WAIT
and still punt ->always_iowq OPs to iowq?

Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-25  2:50               ` Ming Lei
@ 2023-04-25 13:31                 ` Jens Axboe
  2023-04-25 14:42                   ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2023-04-25 13:31 UTC (permalink / raw)
  To: Ming Lei; +Cc: io-uring

On 4/24/23 8:50?PM, Ming Lei wrote:
> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote:
>> On 4/24/23 8:13?PM, Ming Lei wrote:
>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
>>>> On 4/24/23 6:57?PM, Ming Lei wrote:
>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote:
>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
>>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always
>>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll()
>>>>>>>> check in terms of whether or not we need to punt them if any of the
>>>>>>>> NO_OFFLOAD flags are set.
>>>>>>>>
>>>>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>>>>> ---
>>>>>>>>  io_uring/io_uring.c |  2 +-
>>>>>>>>  io_uring/opdef.c    | 22 ++++++++++++++++++++--
>>>>>>>>  io_uring/opdef.h    |  2 ++
>>>>>>>>  3 files changed, 23 insertions(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>>>>>> index fee3e461e149..420cfd35ebc6 100644
>>>>>>>> --- a/io_uring/io_uring.c
>>>>>>>> +++ b/io_uring/io_uring.c
>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>>>>>>>>  		return -EBADF;
>>>>>>>>  
>>>>>>>>  	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
>>>>>>>> -	    (!req->file || !file_can_poll(req->file)))
>>>>>>>> +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
>>>>>>>>  		issue_flags &= ~IO_URING_F_NONBLOCK;
>>>>>>>
>>>>>>> I guess the check should be !def->always_iowq?
>>>>>>
>>>>>> How so? Nobody that takes pollable files should/is setting
>>>>>> ->always_iowq. If we can poll the file, we should not force inline
>>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN
>>>>>> returns if nonblock == true.
>>>>>
>>>>> I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
>>>>> these OPs won't return -EAGAIN, then run in the current task context
>>>>> directly.
>>>>
>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
>>>> it :-)
>>>
>>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are
>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of
>>> ->always_iowq is a bit confusing?
>>
>> Yeah naming isn't that great, I can see how that's bit confusing. I'll
>> be happy to take suggestions on what would make it clearer.
> 
> Except for the naming, I am also wondering why these ->always_iowq OPs
> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it
> shouldn't improve performance by doing so because these OPs are supposed
> to be slow and always slept, not like others(buffered writes, ...),
> can you provide one hint about not offloading these OPs? Or is it just that
> NO_OFFLOAD needs to not offload every OPs?

The whole point of NO_OFFLOAD is that items that would normally be
passed to io-wq are just run inline. This provides a way to reap the
benefits of batched submissions and syscall reductions. Some opcodes
will just never be async, and io-wq offloads are not very fast. Some of
them can eventually be migrated to async support, if the underlying
mechanics support it.

You'll note that none of the ->always_iowq opcodes are pollable. If
NO_OFFLOAD is setup, it's pointless NOT to issue them with NONBLOCK
cleared, as you'd just get -EAGAIN and then need to call them again with
NONBLOCK cleared from the same context.

For naming, maybe ->always_iowq is better as ->no_nonblock or something
like that. Or perhaps get rid of the double negation and just call it
->blocking, or maybe ->no_async_support to make it clearer?

> Or can we rename IORING_SETUP_NO_OFFLOAD as IORING_SETUP_SUBMIT_MAY_WAIT
> and still punt ->always_iowq OPs to iowq?

I think NO_OFFLOAD better explains that we'll never offload to io-wq. I
would've called it NO_IOWQ, but I don't think that's understandable to
users in the same way. The problem is that the user does need some
knowledge of how ios are issued and completed in io_uring to fully grok
what it does, which I'll put in the man pages.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-25 13:31                 ` Jens Axboe
@ 2023-04-25 14:42                   ` Ming Lei
  2023-04-25 14:50                     ` Jens Axboe
  2023-04-25 15:28                     ` Pavel Begunkov
  0 siblings, 2 replies; 25+ messages in thread
From: Ming Lei @ 2023-04-25 14:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, ming.lei

On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote:
> On 4/24/23 8:50?PM, Ming Lei wrote:
> > On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote:
> >> On 4/24/23 8:13?PM, Ming Lei wrote:
> >>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
> >>>> On 4/24/23 6:57?PM, Ming Lei wrote:
> >>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
> >>>>>> On 4/24/23 1:30?AM, Ming Lei wrote:
> >>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
> >>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always
> >>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll()
> >>>>>>>> check in terms of whether or not we need to punt them if any of the
> >>>>>>>> NO_OFFLOAD flags are set.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Jens Axboe <[email protected]>
> >>>>>>>> ---
> >>>>>>>>  io_uring/io_uring.c |  2 +-
> >>>>>>>>  io_uring/opdef.c    | 22 ++++++++++++++++++++--
> >>>>>>>>  io_uring/opdef.h    |  2 ++
> >>>>>>>>  3 files changed, 23 insertions(+), 3 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> >>>>>>>> index fee3e461e149..420cfd35ebc6 100644
> >>>>>>>> --- a/io_uring/io_uring.c
> >>>>>>>> +++ b/io_uring/io_uring.c
> >>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
> >>>>>>>>  		return -EBADF;
> >>>>>>>>  
> >>>>>>>>  	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
> >>>>>>>> -	    (!req->file || !file_can_poll(req->file)))
> >>>>>>>> +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
> >>>>>>>>  		issue_flags &= ~IO_URING_F_NONBLOCK;
> >>>>>>>
> >>>>>>> I guess the check should be !def->always_iowq?
> >>>>>>
> >>>>>> How so? Nobody that takes pollable files should/is setting
> >>>>>> ->always_iowq. If we can poll the file, we should not force inline
> >>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN
> >>>>>> returns if nonblock == true.
> >>>>>
> >>>>> I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
> >>>>> these OPs won't return -EAGAIN, then run in the current task context
> >>>>> directly.
> >>>>
> >>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
> >>>> it :-)
> >>>
> >>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are
> >>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of
> >>> ->always_iowq is a bit confusing?
> >>
> >> Yeah naming isn't that great, I can see how that's bit confusing. I'll
> >> be happy to take suggestions on what would make it clearer.
> > 
> > Except for the naming, I am also wondering why these ->always_iowq OPs
> > aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it
> > shouldn't improve performance by doing so because these OPs are supposed
> > to be slow and always slept, not like others(buffered writes, ...),
> > can you provide one hint about not offloading these OPs? Or is it just that
> > NO_OFFLOAD needs to not offload every OPs?
> 
> The whole point of NO_OFFLOAD is that items that would normally be
> passed to io-wq are just run inline. This provides a way to reap the
> benefits of batched submissions and syscall reductions. Some opcodes
> will just never be async, and io-wq offloads are not very fast. Some of

Yeah, seems io-wq is much slower than inline issue, maybe it needs
to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK.

> them can eventually be migrated to async support, if the underlying
> mechanics support it.
> 
> You'll note that none of the ->always_iowq opcodes are pollable. If

True, then looks the ->always_iowq flag doesn't make a difference here
because your patch clears IO_URING_F_NONBLOCK for !file_can_poll(req->file).

Also almost all these ->always_iowq OPs are slow and blocked, if they are
issued inline, the submission pipeline will be blocked.

> NO_OFFLOAD is setup, it's pointless NOT to issue them with NONBLOCK
> cleared, as you'd just get -EAGAIN and then need to call them again with
> NONBLOCK cleared from the same context.

My point is that these OPs are slow and slept, so inline issue won't
improve performance actually for them, and punting to io-wq couldn't
be worse too. On the other side, inline issue may hurt perf because
submission pipeline is blocked when issuing these OPs.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-25 14:42                   ` Ming Lei
@ 2023-04-25 14:50                     ` Jens Axboe
  2023-04-25 15:07                       ` Ming Lei
  2023-04-25 15:28                     ` Pavel Begunkov
  1 sibling, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2023-04-25 14:50 UTC (permalink / raw)
  To: Ming Lei; +Cc: io-uring

On 4/25/23 8:42?AM, Ming Lei wrote:
> On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote:
>> On 4/24/23 8:50?PM, Ming Lei wrote:
>>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote:
>>>> On 4/24/23 8:13?PM, Ming Lei wrote:
>>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
>>>>>> On 4/24/23 6:57?PM, Ming Lei wrote:
>>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
>>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote:
>>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
>>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always
>>>>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll()
>>>>>>>>>> check in terms of whether or not we need to punt them if any of the
>>>>>>>>>> NO_OFFLOAD flags are set.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>>>>>>> ---
>>>>>>>>>>  io_uring/io_uring.c |  2 +-
>>>>>>>>>>  io_uring/opdef.c    | 22 ++++++++++++++++++++--
>>>>>>>>>>  io_uring/opdef.h    |  2 ++
>>>>>>>>>>  3 files changed, 23 insertions(+), 3 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644
>>>>>>>>>> --- a/io_uring/io_uring.c
>>>>>>>>>> +++ b/io_uring/io_uring.c
>>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>>>>>>>>>>  		return -EBADF;
>>>>>>>>>>  
>>>>>>>>>>  	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
>>>>>>>>>> -	    (!req->file || !file_can_poll(req->file)))
>>>>>>>>>> +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
>>>>>>>>>>  		issue_flags &= ~IO_URING_F_NONBLOCK;
>>>>>>>>>
>>>>>>>>> I guess the check should be !def->always_iowq?
>>>>>>>>
>>>>>>>> How so? Nobody that takes pollable files should/is setting
>>>>>>>> ->always_iowq. If we can poll the file, we should not force inline
>>>>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN
>>>>>>>> returns if nonblock == true.
>>>>>>>
>>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
>>>>>>> these OPs won't return -EAGAIN, then run in the current task context
>>>>>>> directly.
>>>>>>
>>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
>>>>>> it :-)
>>>>>
>>>>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are
>>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of
>>>>> ->always_iowq is a bit confusing?
>>>>
>>>> Yeah naming isn't that great, I can see how that's bit confusing. I'll
>>>> be happy to take suggestions on what would make it clearer.
>>>
>>> Except for the naming, I am also wondering why these ->always_iowq OPs
>>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it
>>> shouldn't improve performance by doing so because these OPs are supposed
>>> to be slow and always slept, not like others(buffered writes, ...),
>>> can you provide one hint about not offloading these OPs? Or is it just that
>>> NO_OFFLOAD needs to not offload every OPs?
>>
>> The whole point of NO_OFFLOAD is that items that would normally be
>> passed to io-wq are just run inline. This provides a way to reap the
>> benefits of batched submissions and syscall reductions. Some opcodes
>> will just never be async, and io-wq offloads are not very fast. Some of
> 
> Yeah, seems io-wq is much slower than inline issue, maybe it needs
> to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK.

Indeed, depending on what is being linked, you may see io-wq activity
which is not ideal.

>> them can eventually be migrated to async support, if the underlying
>> mechanics support it.
>>
>> You'll note that none of the ->always_iowq opcodes are pollable. If
> 
> True, then looks the ->always_iowq flag doesn't make a difference here
> because your patch clears IO_URING_F_NONBLOCK for !file_can_poll(req->file).

Yep, we may be able to just get rid of it. The important bit is really
getting rid of the forced setting of REQ_F_FORCE_ASYNC which the
previous reverts take care of. So we can probably just drop this one,
let me give it a spin.

> Also almost all these ->always_iowq OPs are slow and blocked, if they are
> issued inline, the submission pipeline will be blocked.

That is true, but that's very much the tradeoff you make by using
NO_OFFLOAD. I would expect any users of this to have two rings, one for
just batched submissions, and one for "normal" usage. Or maybe they only
do the batched submissions and one is fine.

>> NO_OFFLOAD is setup, it's pointless NOT to issue them with NONBLOCK
>> cleared, as you'd just get -EAGAIN and then need to call them again with
>> NONBLOCK cleared from the same context.
> 
> My point is that these OPs are slow and slept, so inline issue won't
> improve performance actually for them, and punting to io-wq couldn't
> be worse too. On the other side, inline issue may hurt perf because
> submission pipeline is blocked when issuing these OPs.

That is definitely not true, it really depends on which ops it is. For a
lot of them, they don't generally block, but we have to be prepared for
them blocking. This is why they are offloaded. If they don't block and
execute fast, then the io-wq offload is easily a 2-3x slowdown, while
the batched submission can make it more efficient than simply doing the
normal syscalls as you avoid quite a few syscalls.

But you should not mix and match issue of these slower ops with "normal"
issues, you should separate them.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-25 14:50                     ` Jens Axboe
@ 2023-04-25 15:07                       ` Ming Lei
  2023-04-25 15:25                         ` Jens Axboe
  2023-04-25 16:10                         ` Pavel Begunkov
  0 siblings, 2 replies; 25+ messages in thread
From: Ming Lei @ 2023-04-25 15:07 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, ming.lei

On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote:
> On 4/25/23 8:42?AM, Ming Lei wrote:
> > On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote:
> >> On 4/24/23 8:50?PM, Ming Lei wrote:
> >>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote:
> >>>> On 4/24/23 8:13?PM, Ming Lei wrote:
> >>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
> >>>>>> On 4/24/23 6:57?PM, Ming Lei wrote:
> >>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
> >>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote:
> >>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
> >>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always
> >>>>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll()
> >>>>>>>>>> check in terms of whether or not we need to punt them if any of the
> >>>>>>>>>> NO_OFFLOAD flags are set.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Jens Axboe <[email protected]>
> >>>>>>>>>> ---
> >>>>>>>>>>  io_uring/io_uring.c |  2 +-
> >>>>>>>>>>  io_uring/opdef.c    | 22 ++++++++++++++++++++--
> >>>>>>>>>>  io_uring/opdef.h    |  2 ++
> >>>>>>>>>>  3 files changed, 23 insertions(+), 3 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> >>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644
> >>>>>>>>>> --- a/io_uring/io_uring.c
> >>>>>>>>>> +++ b/io_uring/io_uring.c
> >>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
> >>>>>>>>>>  		return -EBADF;
> >>>>>>>>>>  
> >>>>>>>>>>  	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
> >>>>>>>>>> -	    (!req->file || !file_can_poll(req->file)))
> >>>>>>>>>> +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
> >>>>>>>>>>  		issue_flags &= ~IO_URING_F_NONBLOCK;
> >>>>>>>>>
> >>>>>>>>> I guess the check should be !def->always_iowq?
> >>>>>>>>
> >>>>>>>> How so? Nobody that takes pollable files should/is setting
> >>>>>>>> ->always_iowq. If we can poll the file, we should not force inline
> >>>>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN
> >>>>>>>> returns if nonblock == true.
> >>>>>>>
> >>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
> >>>>>>> these OPs won't return -EAGAIN, then run in the current task context
> >>>>>>> directly.
> >>>>>>
> >>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
> >>>>>> it :-)
> >>>>>
> >>>>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are
> >>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of
> >>>>> ->always_iowq is a bit confusing?
> >>>>
> >>>> Yeah naming isn't that great, I can see how that's bit confusing. I'll
> >>>> be happy to take suggestions on what would make it clearer.
> >>>
> >>> Except for the naming, I am also wondering why these ->always_iowq OPs
> >>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it
> >>> shouldn't improve performance by doing so because these OPs are supposed
> >>> to be slow and always slept, not like others(buffered writes, ...),
> >>> can you provide one hint about not offloading these OPs? Or is it just that
> >>> NO_OFFLOAD needs to not offload every OPs?
> >>
> >> The whole point of NO_OFFLOAD is that items that would normally be
> >> passed to io-wq are just run inline. This provides a way to reap the
> >> benefits of batched submissions and syscall reductions. Some opcodes
> >> will just never be async, and io-wq offloads are not very fast. Some of
> > 
> > Yeah, seems io-wq is much slower than inline issue, maybe it needs
> > to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK.
> 
> Indeed, depending on what is being linked, you may see io-wq activity
> which is not ideal.

That is why I prefer to fused command for ublk zero copy, because the
registering buffer approach suggested by Pavel and Ziyang has to link
register buffer OP with the actual IO OP, and it is observed that
IOPS drops to 1/2 in 4k random io test with registered buffer approach.

> 
> >> them can eventually be migrated to async support, if the underlying
> >> mechanics support it.
> >>
> >> You'll note that none of the ->always_iowq opcodes are pollable. If
> > 
> > True, then looks the ->always_iowq flag doesn't make a difference here
> > because your patch clears IO_URING_F_NONBLOCK for !file_can_poll(req->file).
> 
> Yep, we may be able to just get rid of it. The important bit is really
> getting rid of the forced setting of REQ_F_FORCE_ASYNC which the
> previous reverts take care of. So we can probably just drop this one,
> let me give it a spin.
> 
> > Also almost all these ->always_iowq OPs are slow and blocked, if they are
> > issued inline, the submission pipeline will be blocked.
> 
> That is true, but that's very much the tradeoff you make by using
> NO_OFFLOAD. I would expect any users of this to have two rings, one for
> just batched submissions, and one for "normal" usage. Or maybe they only
> do the batched submissions and one is fine.

I guess that NO_OFFLOAD probably should be used for most of usecase,
cause it does avoid slow io-wq if io-wq perf won't be improved.

Also there is other issue for two rings, such as sync/communication
between two rings, and single ring should be the easiest way.

> 
> >> NO_OFFLOAD is setup, it's pointless NOT to issue them with NONBLOCK
> >> cleared, as you'd just get -EAGAIN and then need to call them again with
> >> NONBLOCK cleared from the same context.
> > 
> > My point is that these OPs are slow and slept, so inline issue won't
> > improve performance actually for them, and punting to io-wq couldn't
> > be worse too. On the other side, inline issue may hurt perf because
> > submission pipeline is blocked when issuing these OPs.
> 
> That is definitely not true, it really depends on which ops it is. For a
> lot of them, they don't generally block, but we have to be prepared for

OK, but fsync/fallocate does block.

> them blocking. This is why they are offloaded. If they don't block and

Got it.

> execute fast, then the io-wq offload is easily a 2-3x slowdown, while
> the batched submission can make it more efficient than simply doing the
> normal syscalls as you avoid quite a few syscalls.
> 
> But you should not mix and match issue of these slower ops with "normal"
> issues, you should separate them.

OK, I will keep it in mind when writing io_uring applications.


thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-25 15:07                       ` Ming Lei
@ 2023-04-25 15:25                         ` Jens Axboe
  2023-04-25 15:46                           ` Pavel Begunkov
  2023-04-26  1:43                           ` Ming Lei
  2023-04-25 16:10                         ` Pavel Begunkov
  1 sibling, 2 replies; 25+ messages in thread
From: Jens Axboe @ 2023-04-25 15:25 UTC (permalink / raw)
  To: Ming Lei; +Cc: io-uring

On 4/25/23 9:07?AM, Ming Lei wrote:
> On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote:
>> On 4/25/23 8:42?AM, Ming Lei wrote:
>>> On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote:
>>>> On 4/24/23 8:50?PM, Ming Lei wrote:
>>>>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote:
>>>>>> On 4/24/23 8:13?PM, Ming Lei wrote:
>>>>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
>>>>>>>> On 4/24/23 6:57?PM, Ming Lei wrote:
>>>>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
>>>>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote:
>>>>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
>>>>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always
>>>>>>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll()
>>>>>>>>>>>> check in terms of whether or not we need to punt them if any of the
>>>>>>>>>>>> NO_OFFLOAD flags are set.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>>>>>>>>> ---
>>>>>>>>>>>>  io_uring/io_uring.c |  2 +-
>>>>>>>>>>>>  io_uring/opdef.c    | 22 ++++++++++++++++++++--
>>>>>>>>>>>>  io_uring/opdef.h    |  2 ++
>>>>>>>>>>>>  3 files changed, 23 insertions(+), 3 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644
>>>>>>>>>>>> --- a/io_uring/io_uring.c
>>>>>>>>>>>> +++ b/io_uring/io_uring.c
>>>>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>>>>>>>>>>>>  		return -EBADF;
>>>>>>>>>>>>  
>>>>>>>>>>>>  	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
>>>>>>>>>>>> -	    (!req->file || !file_can_poll(req->file)))
>>>>>>>>>>>> +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
>>>>>>>>>>>>  		issue_flags &= ~IO_URING_F_NONBLOCK;
>>>>>>>>>>>
>>>>>>>>>>> I guess the check should be !def->always_iowq?
>>>>>>>>>>
>>>>>>>>>> How so? Nobody that takes pollable files should/is setting
>>>>>>>>>> ->always_iowq. If we can poll the file, we should not force inline
>>>>>>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN
>>>>>>>>>> returns if nonblock == true.
>>>>>>>>>
>>>>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
>>>>>>>>> these OPs won't return -EAGAIN, then run in the current task context
>>>>>>>>> directly.
>>>>>>>>
>>>>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
>>>>>>>> it :-)
>>>>>>>
>>>>>>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are
>>>>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of
>>>>>>> ->always_iowq is a bit confusing?
>>>>>>
>>>>>> Yeah naming isn't that great, I can see how that's bit confusing. I'll
>>>>>> be happy to take suggestions on what would make it clearer.
>>>>>
>>>>> Except for the naming, I am also wondering why these ->always_iowq OPs
>>>>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it
>>>>> shouldn't improve performance by doing so because these OPs are supposed
>>>>> to be slow and always slept, not like others(buffered writes, ...),
>>>>> can you provide one hint about not offloading these OPs? Or is it just that
>>>>> NO_OFFLOAD needs to not offload every OPs?
>>>>
>>>> The whole point of NO_OFFLOAD is that items that would normally be
>>>> passed to io-wq are just run inline. This provides a way to reap the
>>>> benefits of batched submissions and syscall reductions. Some opcodes
>>>> will just never be async, and io-wq offloads are not very fast. Some of
>>>
>>> Yeah, seems io-wq is much slower than inline issue, maybe it needs
>>> to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK.
>>
>> Indeed, depending on what is being linked, you may see io-wq activity
>> which is not ideal.
> 
> That is why I prefer to fused command for ublk zero copy, because the
> registering buffer approach suggested by Pavel and Ziyang has to link
> register buffer OP with the actual IO OP, and it is observed that
> IOPS drops to 1/2 in 4k random io test with registered buffer approach.

It'd be worth looking into if we can avoid io-wq for link execution, as
that'd be a nice win overall too. IIRC, there's no reason why it can't
be done like initial issue rather than just a lazy punt to io-wq.

That's not really related to fused command support or otherwise for
that, it'd just be a generic improvement. But it may indeed make the
linekd approach viable for that too.

>>>> them can eventually be migrated to async support, if the underlying
>>>> mechanics support it.
>>>>
>>>> You'll note that none of the ->always_iowq opcodes are pollable. If
>>>
>>> True, then looks the ->always_iowq flag doesn't make a difference here
>>> because your patch clears IO_URING_F_NONBLOCK for !file_can_poll(req->file).

Actually not sure that's the case, as we have plenty of ops that are not
pollable, yet are perfectly fine for a nonblocking issue. Things like
any read/write on a regular file or block device.

>> Yep, we may be able to just get rid of it. The important bit is really
>> getting rid of the forced setting of REQ_F_FORCE_ASYNC which the
>> previous reverts take care of. So we can probably just drop this one,
>> let me give it a spin.
>>
>>> Also almost all these ->always_iowq OPs are slow and blocked, if they are
>>> issued inline, the submission pipeline will be blocked.
>>
>> That is true, but that's very much the tradeoff you make by using
>> NO_OFFLOAD. I would expect any users of this to have two rings, one for
>> just batched submissions, and one for "normal" usage. Or maybe they only
>> do the batched submissions and one is fine.
> 
> I guess that NO_OFFLOAD probably should be used for most of usecase,
> cause it does avoid slow io-wq if io-wq perf won't be improved.
>
> Also there is other issue for two rings, such as sync/communication
> between two rings, and single ring should be the easiest way.

I think some use cases may indeed just use that and be fine with it,
also because it is probably not uncommon to bundle the issues and hence
not really mix and match for issue. But this is a vastly different use
case than fast IO cases, for storage and networking. Though those will
bypass that anyway as they can do nonblocking issue just fine.

>>>> NO_OFFLOAD is setup, it's pointless NOT to issue them with NONBLOCK
>>>> cleared, as you'd just get -EAGAIN and then need to call them again with
>>>> NONBLOCK cleared from the same context.
>>>
>>> My point is that these OPs are slow and slept, so inline issue won't
>>> improve performance actually for them, and punting to io-wq couldn't
>>> be worse too. On the other side, inline issue may hurt perf because
>>> submission pipeline is blocked when issuing these OPs.
>>
>> That is definitely not true, it really depends on which ops it is. For a
>> lot of them, they don't generally block, but we have to be prepared for
> 
> OK, but fsync/fallocate does block.

They do, but statx, fadvise, madvise, rename, shutdown, etc (basically
all the rest of them) do not for a lot of cases.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-25 14:42                   ` Ming Lei
  2023-04-25 14:50                     ` Jens Axboe
@ 2023-04-25 15:28                     ` Pavel Begunkov
  2023-04-30 13:34                       ` Hao Xu
  1 sibling, 1 reply; 25+ messages in thread
From: Pavel Begunkov @ 2023-04-25 15:28 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe; +Cc: io-uring, Hao Xu

On 4/25/23 15:42, Ming Lei wrote:
> On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote:
>> On 4/24/23 8:50?PM, Ming Lei wrote:
>>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote:
>>>> On 4/24/23 8:13?PM, Ming Lei wrote:
>>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
>>>>>> On 4/24/23 6:57?PM, Ming Lei wrote:
>>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
>>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote:
>>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
>>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always
>>>>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll()
>>>>>>>>>> check in terms of whether or not we need to punt them if any of the
>>>>>>>>>> NO_OFFLOAD flags are set.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>>>>>>> ---
>>>>>>>>>>   io_uring/io_uring.c |  2 +-
>>>>>>>>>>   io_uring/opdef.c    | 22 ++++++++++++++++++++--
>>>>>>>>>>   io_uring/opdef.h    |  2 ++
>>>>>>>>>>   3 files changed, 23 insertions(+), 3 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644
>>>>>>>>>> --- a/io_uring/io_uring.c
>>>>>>>>>> +++ b/io_uring/io_uring.c
>>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>>>>>>>>>>   		return -EBADF;
>>>>>>>>>>   
>>>>>>>>>>   	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
>>>>>>>>>> -	    (!req->file || !file_can_poll(req->file)))
>>>>>>>>>> +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
>>>>>>>>>>   		issue_flags &= ~IO_URING_F_NONBLOCK;
>>>>>>>>>
>>>>>>>>> I guess the check should be !def->always_iowq?
>>>>>>>>
>>>>>>>> How so? Nobody that takes pollable files should/is setting
>>>>>>>> ->always_iowq. If we can poll the file, we should not force inline
>>>>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN
>>>>>>>> returns if nonblock == true.
>>>>>>>
>>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
>>>>>>> these OPs won't return -EAGAIN, then run in the current task context
>>>>>>> directly.
>>>>>>
>>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
>>>>>> it :-)
>>>>>
>>>>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are
>>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of
>>>>> ->always_iowq is a bit confusing?
>>>>
>>>> Yeah naming isn't that great, I can see how that's bit confusing. I'll
>>>> be happy to take suggestions on what would make it clearer.
>>>
>>> Except for the naming, I am also wondering why these ->always_iowq OPs
>>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it
>>> shouldn't improve performance by doing so because these OPs are supposed
>>> to be slow and always slept, not like others(buffered writes, ...),
>>> can you provide one hint about not offloading these OPs? Or is it just that
>>> NO_OFFLOAD needs to not offload every OPs?
>>
>> The whole point of NO_OFFLOAD is that items that would normally be
>> passed to io-wq are just run inline. This provides a way to reap the
>> benefits of batched submissions and syscall reductions. Some opcodes
>> will just never be async, and io-wq offloads are not very fast. Some of
> 
> Yeah, seems io-wq is much slower than inline issue, maybe it needs
> to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK.

There were attempts like this one from Hao (CC'ed)

https://lore.kernel.org/io-uring/[email protected]/t/

Not sure why it got stalled, but maybe Hao would be willing
to pick it up again.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-25 15:25                         ` Jens Axboe
@ 2023-04-25 15:46                           ` Pavel Begunkov
  2023-04-26  3:25                             ` Ming Lei
  2023-04-26  1:43                           ` Ming Lei
  1 sibling, 1 reply; 25+ messages in thread
From: Pavel Begunkov @ 2023-04-25 15:46 UTC (permalink / raw)
  To: Jens Axboe, Ming Lei; +Cc: io-uring

On 4/25/23 16:25, Jens Axboe wrote:
> On 4/25/23 9:07?AM, Ming Lei wrote:
>> On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote:
>>> On 4/25/23 8:42?AM, Ming Lei wrote:
>>>> On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote:
>>>>> On 4/24/23 8:50?PM, Ming Lei wrote:
>>>>>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote:
>>>>>>> On 4/24/23 8:13?PM, Ming Lei wrote:
>>>>>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
>>>>>>>>> On 4/24/23 6:57?PM, Ming Lei wrote:
>>>>>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
>>>>>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote:
>>>>>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
>>>>>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always
>>>>>>>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll()
>>>>>>>>>>>>> check in terms of whether or not we need to punt them if any of the
>>>>>>>>>>>>> NO_OFFLOAD flags are set.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>   io_uring/io_uring.c |  2 +-
>>>>>>>>>>>>>   io_uring/opdef.c    | 22 ++++++++++++++++++++--
>>>>>>>>>>>>>   io_uring/opdef.h    |  2 ++
>>>>>>>>>>>>>   3 files changed, 23 insertions(+), 3 deletions(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>>>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644
>>>>>>>>>>>>> --- a/io_uring/io_uring.c
>>>>>>>>>>>>> +++ b/io_uring/io_uring.c
>>>>>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>>>>>>>>>>>>>   		return -EBADF;
>>>>>>>>>>>>>   
>>>>>>>>>>>>>   	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
>>>>>>>>>>>>> -	    (!req->file || !file_can_poll(req->file)))
>>>>>>>>>>>>> +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
>>>>>>>>>>>>>   		issue_flags &= ~IO_URING_F_NONBLOCK;
>>>>>>>>>>>>
>>>>>>>>>>>> I guess the check should be !def->always_iowq?
>>>>>>>>>>>
>>>>>>>>>>> How so? Nobody that takes pollable files should/is setting
>>>>>>>>>>> ->always_iowq. If we can poll the file, we should not force inline
>>>>>>>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN
>>>>>>>>>>> returns if nonblock == true.
>>>>>>>>>>
>>>>>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
>>>>>>>>>> these OPs won't return -EAGAIN, then run in the current task context
>>>>>>>>>> directly.
>>>>>>>>>
>>>>>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
>>>>>>>>> it :-)
>>>>>>>>
>>>>>>>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are
>>>>>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of
>>>>>>>> ->always_iowq is a bit confusing?
>>>>>>>
>>>>>>> Yeah naming isn't that great, I can see how that's bit confusing. I'll
>>>>>>> be happy to take suggestions on what would make it clearer.
>>>>>>
>>>>>> Except for the naming, I am also wondering why these ->always_iowq OPs
>>>>>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it
>>>>>> shouldn't improve performance by doing so because these OPs are supposed
>>>>>> to be slow and always slept, not like others(buffered writes, ...),
>>>>>> can you provide one hint about not offloading these OPs? Or is it just that
>>>>>> NO_OFFLOAD needs to not offload every OPs?
>>>>>
>>>>> The whole point of NO_OFFLOAD is that items that would normally be
>>>>> passed to io-wq are just run inline. This provides a way to reap the
>>>>> benefits of batched submissions and syscall reductions. Some opcodes
>>>>> will just never be async, and io-wq offloads are not very fast. Some of
>>>>
>>>> Yeah, seems io-wq is much slower than inline issue, maybe it needs
>>>> to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK.
>>>
>>> Indeed, depending on what is being linked, you may see io-wq activity
>>> which is not ideal.
>>
>> That is why I prefer to fused command for ublk zero copy, because the
>> registering buffer approach suggested by Pavel and Ziyang has to link
>> register buffer OP with the actual IO OP, and it is observed that
>> IOPS drops to 1/2 in 4k random io test with registered buffer approach.
> 
> It'd be worth looking into if we can avoid io-wq for link execution, as
> that'd be a nice win overall too. IIRC, there's no reason why it can't
> be done like initial issue rather than just a lazy punt to io-wq.

I might've missed a part of the discussion, but links are _usually_
executed by task_work, e.g.

io_submit_flush_completions() -> io_queue_next() -> io_req_task_queue()

There is one optimisation where if we're already in io-wq, it'll try
to serve the next linked request by the same io-wq worker with no
overhead on requeueing, but otherwise it'll only get there if
the request can't be executed nowait / async as per usual rules.

The tw execution part might be further optimised, it can be executed
almost in place instead of queueing a tw. It saved quite a lot of CPU
when I tried it with BPF requests.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-25 15:07                       ` Ming Lei
  2023-04-25 15:25                         ` Jens Axboe
@ 2023-04-25 16:10                         ` Pavel Begunkov
  2023-04-26  3:37                           ` Ming Lei
  1 sibling, 1 reply; 25+ messages in thread
From: Pavel Begunkov @ 2023-04-25 16:10 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe; +Cc: io-uring

On 4/25/23 16:07, Ming Lei wrote:
> On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote:
>> On 4/25/23 8:42?AM, Ming Lei wrote:
>>> On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote:
>>>> On 4/24/23 8:50?PM, Ming Lei wrote:
>>>>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote:
>>>>>> On 4/24/23 8:13?PM, Ming Lei wrote:
>>>>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
>>>>>>>> On 4/24/23 6:57?PM, Ming Lei wrote:
>>>>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
>>>>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote:
>>>>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
>>>>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always
>>>>>>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll()
>>>>>>>>>>>> check in terms of whether or not we need to punt them if any of the
>>>>>>>>>>>> NO_OFFLOAD flags are set.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>>>>>>>>> ---
>>>>>>>>>>>>   io_uring/io_uring.c |  2 +-
>>>>>>>>>>>>   io_uring/opdef.c    | 22 ++++++++++++++++++++--
>>>>>>>>>>>>   io_uring/opdef.h    |  2 ++
>>>>>>>>>>>>   3 files changed, 23 insertions(+), 3 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644
>>>>>>>>>>>> --- a/io_uring/io_uring.c
>>>>>>>>>>>> +++ b/io_uring/io_uring.c
>>>>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>>>>>>>>>>>>   		return -EBADF;
>>>>>>>>>>>>   
>>>>>>>>>>>>   	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
>>>>>>>>>>>> -	    (!req->file || !file_can_poll(req->file)))
>>>>>>>>>>>> +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
>>>>>>>>>>>>   		issue_flags &= ~IO_URING_F_NONBLOCK;
>>>>>>>>>>>
>>>>>>>>>>> I guess the check should be !def->always_iowq?
>>>>>>>>>>
>>>>>>>>>> How so? Nobody that takes pollable files should/is setting
>>>>>>>>>> ->always_iowq. If we can poll the file, we should not force inline
>>>>>>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN
>>>>>>>>>> returns if nonblock == true.
>>>>>>>>>
>>>>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
>>>>>>>>> these OPs won't return -EAGAIN, then run in the current task context
>>>>>>>>> directly.
>>>>>>>>
>>>>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
>>>>>>>> it :-)
>>>>>>>
>>>>>>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are
>>>>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of
>>>>>>> ->always_iowq is a bit confusing?
>>>>>>
>>>>>> Yeah naming isn't that great, I can see how that's bit confusing. I'll
>>>>>> be happy to take suggestions on what would make it clearer.
>>>>>
>>>>> Except for the naming, I am also wondering why these ->always_iowq OPs
>>>>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it
>>>>> shouldn't improve performance by doing so because these OPs are supposed
>>>>> to be slow and always slept, not like others(buffered writes, ...),
>>>>> can you provide one hint about not offloading these OPs? Or is it just that
>>>>> NO_OFFLOAD needs to not offload every OPs?
>>>>
>>>> The whole point of NO_OFFLOAD is that items that would normally be
>>>> passed to io-wq are just run inline. This provides a way to reap the
>>>> benefits of batched submissions and syscall reductions. Some opcodes
>>>> will just never be async, and io-wq offloads are not very fast. Some of
>>>
>>> Yeah, seems io-wq is much slower than inline issue, maybe it needs
>>> to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK.
>>
>> Indeed, depending on what is being linked, you may see io-wq activity
>> which is not ideal.
> 
> That is why I prefer to fused command for ublk zero copy, because the
> registering buffer approach suggested by Pavel and Ziyang has to link
> register buffer OP with the actual IO OP, and it is observed that
> IOPS drops to 1/2 in 4k random io test with registered buffer approach.

What's good about it is that you can use linked requests with it
but you _don't have to_.

Curiously, I just recently compared submitting 8 two-request links
(16 reqs in total) vs submit(8)+submit(8), all that in a loop.
The latter was faster. It wasn't a clean experiment, but shows
that links are not super fast and would be nice to get them better.

For the register buf approach, I tried it out, looked good to me.
It outperforms splice requests (with a hack that removes force
iowq execution) by 5-10% with synthetic benchmark. Works better than
splice(2) for QD>=2. Let me send it out, perhaps today, so we can
figure out how it compares against ublk/fused and see the margin is.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-25 15:25                         ` Jens Axboe
  2023-04-25 15:46                           ` Pavel Begunkov
@ 2023-04-26  1:43                           ` Ming Lei
  1 sibling, 0 replies; 25+ messages in thread
From: Ming Lei @ 2023-04-26  1:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, ming.lei

On Tue, Apr 25, 2023 at 09:25:35AM -0600, Jens Axboe wrote:
> On 4/25/23 9:07?AM, Ming Lei wrote:
> > On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote:
> >> On 4/25/23 8:42?AM, Ming Lei wrote:
> >>> On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote:
> >>>> On 4/24/23 8:50?PM, Ming Lei wrote:
> >>>>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote:
> >>>>>> On 4/24/23 8:13?PM, Ming Lei wrote:
> >>>>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
> >>>>>>>> On 4/24/23 6:57?PM, Ming Lei wrote:
> >>>>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
> >>>>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote:
> >>>>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
> >>>>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always
> >>>>>>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll()
> >>>>>>>>>>>> check in terms of whether or not we need to punt them if any of the
> >>>>>>>>>>>> NO_OFFLOAD flags are set.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Signed-off-by: Jens Axboe <[email protected]>
> >>>>>>>>>>>> ---
> >>>>>>>>>>>>  io_uring/io_uring.c |  2 +-
> >>>>>>>>>>>>  io_uring/opdef.c    | 22 ++++++++++++++++++++--
> >>>>>>>>>>>>  io_uring/opdef.h    |  2 ++
> >>>>>>>>>>>>  3 files changed, 23 insertions(+), 3 deletions(-)
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> >>>>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644
> >>>>>>>>>>>> --- a/io_uring/io_uring.c
> >>>>>>>>>>>> +++ b/io_uring/io_uring.c
> >>>>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
> >>>>>>>>>>>>  		return -EBADF;
> >>>>>>>>>>>>  
> >>>>>>>>>>>>  	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
> >>>>>>>>>>>> -	    (!req->file || !file_can_poll(req->file)))
> >>>>>>>>>>>> +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
> >>>>>>>>>>>>  		issue_flags &= ~IO_URING_F_NONBLOCK;
> >>>>>>>>>>>
> >>>>>>>>>>> I guess the check should be !def->always_iowq?
> >>>>>>>>>>
> >>>>>>>>>> How so? Nobody that takes pollable files should/is setting
> >>>>>>>>>> ->always_iowq. If we can poll the file, we should not force inline
> >>>>>>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN
> >>>>>>>>>> returns if nonblock == true.
> >>>>>>>>>
> >>>>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
> >>>>>>>>> these OPs won't return -EAGAIN, then run in the current task context
> >>>>>>>>> directly.
> >>>>>>>>
> >>>>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
> >>>>>>>> it :-)
> >>>>>>>
> >>>>>>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are
> >>>>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of
> >>>>>>> ->always_iowq is a bit confusing?
> >>>>>>
> >>>>>> Yeah naming isn't that great, I can see how that's bit confusing. I'll
> >>>>>> be happy to take suggestions on what would make it clearer.
> >>>>>
> >>>>> Except for the naming, I am also wondering why these ->always_iowq OPs
> >>>>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it
> >>>>> shouldn't improve performance by doing so because these OPs are supposed
> >>>>> to be slow and always slept, not like others(buffered writes, ...),
> >>>>> can you provide one hint about not offloading these OPs? Or is it just that
> >>>>> NO_OFFLOAD needs to not offload every OPs?
> >>>>
> >>>> The whole point of NO_OFFLOAD is that items that would normally be
> >>>> passed to io-wq are just run inline. This provides a way to reap the
> >>>> benefits of batched submissions and syscall reductions. Some opcodes
> >>>> will just never be async, and io-wq offloads are not very fast. Some of
> >>>
> >>> Yeah, seems io-wq is much slower than inline issue, maybe it needs
> >>> to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK.
> >>
> >> Indeed, depending on what is being linked, you may see io-wq activity
> >> which is not ideal.
> > 
> > That is why I prefer to fused command for ublk zero copy, because the
> > registering buffer approach suggested by Pavel and Ziyang has to link
> > register buffer OP with the actual IO OP, and it is observed that
> > IOPS drops to 1/2 in 4k random io test with registered buffer approach.
> 
> It'd be worth looking into if we can avoid io-wq for link execution, as
> that'd be a nice win overall too. IIRC, there's no reason why it can't
> be done like initial issue rather than just a lazy punt to io-wq.
> 
> That's not really related to fused command support or otherwise for
> that, it'd just be a generic improvement. But it may indeed make the
> linekd approach viable for that too.

Performance degrade with io-wq is just one reason, another thing is that
the current link model doesn't support 1:N dependency, such as, if one
buffer is registered, the following N OPs depend the registered
the buffer, but actually all the N OPs(requests) have to be run one by
one, that is one limit of current io_uring link model.

Fused command actually performs pretty good because:

1) no io-wq is involved

2) allow the following N OPs to be issued concurrently

3) avoid to register the buffer to per-context data(we can ignore
this part so far, cause the uring_lock should help us to avoid
the contention)

> 
> >>>> them can eventually be migrated to async support, if the underlying
> >>>> mechanics support it.
> >>>>
> >>>> You'll note that none of the ->always_iowq opcodes are pollable. If
> >>>
> >>> True, then looks the ->always_iowq flag doesn't make a difference here
> >>> because your patch clears IO_URING_F_NONBLOCK for !file_can_poll(req->file).
> 
> Actually not sure that's the case, as we have plenty of ops that are not
> pollable, yet are perfectly fine for a nonblocking issue. Things like
> any read/write on a regular file or block device.

But you mentioned "none of the ->always_iowq opcodes are pollable". If
that isn't true, it is fine to add ->always_iowq.

> 
> >> Yep, we may be able to just get rid of it. The important bit is really
> >> getting rid of the forced setting of REQ_F_FORCE_ASYNC which the
> >> previous reverts take care of. So we can probably just drop this one,
> >> let me give it a spin.
> >>
> >>> Also almost all these ->always_iowq OPs are slow and blocked, if they are
> >>> issued inline, the submission pipeline will be blocked.
> >>
> >> That is true, but that's very much the tradeoff you make by using
> >> NO_OFFLOAD. I would expect any users of this to have two rings, one for
> >> just batched submissions, and one for "normal" usage. Or maybe they only
> >> do the batched submissions and one is fine.
> > 
> > I guess that NO_OFFLOAD probably should be used for most of usecase,
> > cause it does avoid slow io-wq if io-wq perf won't be improved.
> >
> > Also there is other issue for two rings, such as sync/communication
> > between two rings, and single ring should be the easiest way.
> 
> I think some use cases may indeed just use that and be fine with it,
> also because it is probably not uncommon to bundle the issues and hence
> not really mix and match for issue. But this is a vastly different use
> case than fast IO cases, for storage and networking. Though those will
> bypass that anyway as they can do nonblocking issue just fine.
> 
> >>>> NO_OFFLOAD is setup, it's pointless NOT to issue them with NONBLOCK
> >>>> cleared, as you'd just get -EAGAIN and then need to call them again with
> >>>> NONBLOCK cleared from the same context.
> >>>
> >>> My point is that these OPs are slow and slept, so inline issue won't
> >>> improve performance actually for them, and punting to io-wq couldn't
> >>> be worse too. On the other side, inline issue may hurt perf because
> >>> submission pipeline is blocked when issuing these OPs.
> >>
> >> That is definitely not true, it really depends on which ops it is. For a
> >> lot of them, they don't generally block, but we have to be prepared for
> > 
> > OK, but fsync/fallocate does block.
> 
> They do, but statx, fadvise, madvise, rename, shutdown, etc (basically
> all the rest of them) do not for a lot of cases.

OK, but fsync/fallocate is often mixed with normal IOs.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-25 15:46                           ` Pavel Begunkov
@ 2023-04-26  3:25                             ` Ming Lei
  2023-04-26  4:28                               ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2023-04-26  3:25 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, ming.lei

On Tue, Apr 25, 2023 at 04:46:03PM +0100, Pavel Begunkov wrote:
> On 4/25/23 16:25, Jens Axboe wrote:
> > On 4/25/23 9:07?AM, Ming Lei wrote:
> > > On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote:
> > > > On 4/25/23 8:42?AM, Ming Lei wrote:
> > > > > On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote:
> > > > > > On 4/24/23 8:50?PM, Ming Lei wrote:
> > > > > > > On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote:
> > > > > > > > On 4/24/23 8:13?PM, Ming Lei wrote:
> > > > > > > > > On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
> > > > > > > > > > On 4/24/23 6:57?PM, Ming Lei wrote:
> > > > > > > > > > > On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
> > > > > > > > > > > > On 4/24/23 1:30?AM, Ming Lei wrote:
> > > > > > > > > > > > > On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
> > > > > > > > > > > > > > Add an opdef bit for them, and set it for the opcodes where we always
> > > > > > > > > > > > > > need io-wq punt. With that done, exclude them from the file_can_poll()
> > > > > > > > > > > > > > check in terms of whether or not we need to punt them if any of the
> > > > > > > > > > > > > > NO_OFFLOAD flags are set.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Signed-off-by: Jens Axboe <[email protected]>
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >   io_uring/io_uring.c |  2 +-
> > > > > > > > > > > > > >   io_uring/opdef.c    | 22 ++++++++++++++++++++--
> > > > > > > > > > > > > >   io_uring/opdef.h    |  2 ++
> > > > > > > > > > > > > >   3 files changed, 23 insertions(+), 3 deletions(-)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> > > > > > > > > > > > > > index fee3e461e149..420cfd35ebc6 100644
> > > > > > > > > > > > > > --- a/io_uring/io_uring.c
> > > > > > > > > > > > > > +++ b/io_uring/io_uring.c
> > > > > > > > > > > > > > @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
> > > > > > > > > > > > > >   		return -EBADF;
> > > > > > > > > > > > > >   	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
> > > > > > > > > > > > > > -	    (!req->file || !file_can_poll(req->file)))
> > > > > > > > > > > > > > +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
> > > > > > > > > > > > > >   		issue_flags &= ~IO_URING_F_NONBLOCK;
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I guess the check should be !def->always_iowq?
> > > > > > > > > > > > 
> > > > > > > > > > > > How so? Nobody that takes pollable files should/is setting
> > > > > > > > > > > > ->always_iowq. If we can poll the file, we should not force inline
> > > > > > > > > > > > submission. Basically the ones setting ->always_iowq always do -EAGAIN
> > > > > > > > > > > > returns if nonblock == true.
> > > > > > > > > > > 
> > > > > > > > > > > I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
> > > > > > > > > > > these OPs won't return -EAGAIN, then run in the current task context
> > > > > > > > > > > directly.
> > > > > > > > > > 
> > > > > > > > > > Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
> > > > > > > > > > it :-)
> > > > > > > > > 
> > > > > > > > > But ->always_iowq isn't actually _always_ since fallocate/fsync/... are
> > > > > > > > > not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of
> > > > > > > > > ->always_iowq is a bit confusing?
> > > > > > > > 
> > > > > > > > Yeah naming isn't that great, I can see how that's bit confusing. I'll
> > > > > > > > be happy to take suggestions on what would make it clearer.
> > > > > > > 
> > > > > > > Except for the naming, I am also wondering why these ->always_iowq OPs
> > > > > > > aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it
> > > > > > > shouldn't improve performance by doing so because these OPs are supposed
> > > > > > > to be slow and always slept, not like others(buffered writes, ...),
> > > > > > > can you provide one hint about not offloading these OPs? Or is it just that
> > > > > > > NO_OFFLOAD needs to not offload every OPs?
> > > > > > 
> > > > > > The whole point of NO_OFFLOAD is that items that would normally be
> > > > > > passed to io-wq are just run inline. This provides a way to reap the
> > > > > > benefits of batched submissions and syscall reductions. Some opcodes
> > > > > > will just never be async, and io-wq offloads are not very fast. Some of
> > > > > 
> > > > > Yeah, seems io-wq is much slower than inline issue, maybe it needs
> > > > > to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK.
> > > > 
> > > > Indeed, depending on what is being linked, you may see io-wq activity
> > > > which is not ideal.
> > > 
> > > That is why I prefer to fused command for ublk zero copy, because the
> > > registering buffer approach suggested by Pavel and Ziyang has to link
> > > register buffer OP with the actual IO OP, and it is observed that
> > > IOPS drops to 1/2 in 4k random io test with registered buffer approach.
> > 
> > It'd be worth looking into if we can avoid io-wq for link execution, as
> > that'd be a nice win overall too. IIRC, there's no reason why it can't
> > be done like initial issue rather than just a lazy punt to io-wq.
> 
> I might've missed a part of the discussion, but links are _usually_
> executed by task_work, e.g.
> 
> io_submit_flush_completions() -> io_queue_next() -> io_req_task_queue()

Good catch, just figured out that /dev/ublkcN & backing file isn't opened by
O_NONBLOCK.

But -EAGAIN is still returned from io_write() even though the regular
file is opened with O_DIRECT, at least on btrfs & xfs, so io wq is still
scheduled. Not look into the exact reason yet, and not see such issue for
block device. Anyway, it isn't related with io wq.

However, in case that multiple OPs depends on this registered buffer,
the other OPs(from the 2nd to the last one) are still run one by one
can be submitted concurrently, and fused command does not have such
limit.

> 
> There is one optimisation where if we're already in io-wq, it'll try
> to serve the next linked request by the same io-wq worker with no
> overhead on requeueing, but otherwise it'll only get there if
> the request can't be executed nowait / async as per usual rules.
> 
> The tw execution part might be further optimised, it can be executed
> almost in place instead of queueing a tw. It saved quite a lot of CPU
> when I tried it with BPF requests.

OK, once it is done, I am happy to test it.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-25 16:10                         ` Pavel Begunkov
@ 2023-04-26  3:37                           ` Ming Lei
  0 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2023-04-26  3:37 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, ming.lei

On Tue, Apr 25, 2023 at 05:10:41PM +0100, Pavel Begunkov wrote:
> On 4/25/23 16:07, Ming Lei wrote:
> > On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote:
> > > On 4/25/23 8:42?AM, Ming Lei wrote:
> > > > On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote:
> > > > > On 4/24/23 8:50?PM, Ming Lei wrote:
> > > > > > On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote:
> > > > > > > On 4/24/23 8:13?PM, Ming Lei wrote:
> > > > > > > > On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
> > > > > > > > > On 4/24/23 6:57?PM, Ming Lei wrote:
> > > > > > > > > > On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
> > > > > > > > > > > On 4/24/23 1:30?AM, Ming Lei wrote:
> > > > > > > > > > > > On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
> > > > > > > > > > > > > Add an opdef bit for them, and set it for the opcodes where we always
> > > > > > > > > > > > > need io-wq punt. With that done, exclude them from the file_can_poll()
> > > > > > > > > > > > > check in terms of whether or not we need to punt them if any of the
> > > > > > > > > > > > > NO_OFFLOAD flags are set.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Signed-off-by: Jens Axboe <[email protected]>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >   io_uring/io_uring.c |  2 +-
> > > > > > > > > > > > >   io_uring/opdef.c    | 22 ++++++++++++++++++++--
> > > > > > > > > > > > >   io_uring/opdef.h    |  2 ++
> > > > > > > > > > > > >   3 files changed, 23 insertions(+), 3 deletions(-)
> > > > > > > > > > > > > 
> > > > > > > > > > > > > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> > > > > > > > > > > > > index fee3e461e149..420cfd35ebc6 100644
> > > > > > > > > > > > > --- a/io_uring/io_uring.c
> > > > > > > > > > > > > +++ b/io_uring/io_uring.c
> > > > > > > > > > > > > @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
> > > > > > > > > > > > >   		return -EBADF;
> > > > > > > > > > > > >   	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
> > > > > > > > > > > > > -	    (!req->file || !file_can_poll(req->file)))
> > > > > > > > > > > > > +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
> > > > > > > > > > > > >   		issue_flags &= ~IO_URING_F_NONBLOCK;
> > > > > > > > > > > > 
> > > > > > > > > > > > I guess the check should be !def->always_iowq?
> > > > > > > > > > > 
> > > > > > > > > > > How so? Nobody that takes pollable files should/is setting
> > > > > > > > > > > ->always_iowq. If we can poll the file, we should not force inline
> > > > > > > > > > > submission. Basically the ones setting ->always_iowq always do -EAGAIN
> > > > > > > > > > > returns if nonblock == true.
> > > > > > > > > > 
> > > > > > > > > > I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
> > > > > > > > > > these OPs won't return -EAGAIN, then run in the current task context
> > > > > > > > > > directly.
> > > > > > > > > 
> > > > > > > > > Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
> > > > > > > > > it :-)
> > > > > > > > 
> > > > > > > > But ->always_iowq isn't actually _always_ since fallocate/fsync/... are
> > > > > > > > not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of
> > > > > > > > ->always_iowq is a bit confusing?
> > > > > > > 
> > > > > > > Yeah naming isn't that great, I can see how that's bit confusing. I'll
> > > > > > > be happy to take suggestions on what would make it clearer.
> > > > > > 
> > > > > > Except for the naming, I am also wondering why these ->always_iowq OPs
> > > > > > aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it
> > > > > > shouldn't improve performance by doing so because these OPs are supposed
> > > > > > to be slow and always slept, not like others(buffered writes, ...),
> > > > > > can you provide one hint about not offloading these OPs? Or is it just that
> > > > > > NO_OFFLOAD needs to not offload every OPs?
> > > > > 
> > > > > The whole point of NO_OFFLOAD is that items that would normally be
> > > > > passed to io-wq are just run inline. This provides a way to reap the
> > > > > benefits of batched submissions and syscall reductions. Some opcodes
> > > > > will just never be async, and io-wq offloads are not very fast. Some of
> > > > 
> > > > Yeah, seems io-wq is much slower than inline issue, maybe it needs
> > > > to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK.
> > > 
> > > Indeed, depending on what is being linked, you may see io-wq activity
> > > which is not ideal.
> > 
> > That is why I prefer to fused command for ublk zero copy, because the
> > registering buffer approach suggested by Pavel and Ziyang has to link
> > register buffer OP with the actual IO OP, and it is observed that
> > IOPS drops to 1/2 in 4k random io test with registered buffer approach.
> 
> What's good about it is that you can use linked requests with it
> but you _don't have to_.

Can you explain it a bit?

The register buffer OP has to be done before using it since the buffer
register is done by calling ->uring_cmd().

> 
> Curiously, I just recently compared submitting 8 two-request links
> (16 reqs in total) vs submit(8)+submit(8), all that in a loop.
> The latter was faster. It wasn't a clean experiment, but shows
> that links are not super fast and would be nice to get them better.
> 
> For the register buf approach, I tried it out, looked good to me.
> It outperforms splice requests (with a hack that removes force
> iowq execution) by 5-10% with synthetic benchmark. Works better than
> splice(2) for QD>=2. Let me send it out, perhaps today, so we can
> figure out how it compares against ublk/fused and see the margin is.

Cool!

I will take a look and see if it can be used for ublk zero copy.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-26  3:25                             ` Ming Lei
@ 2023-04-26  4:28                               ` Ming Lei
  0 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2023-04-26  4:28 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, ming.lei

On Wed, Apr 26, 2023 at 11:25:15AM +0800, Ming Lei wrote:
> On Tue, Apr 25, 2023 at 04:46:03PM +0100, Pavel Begunkov wrote:
> > On 4/25/23 16:25, Jens Axboe wrote:
> > > On 4/25/23 9:07?AM, Ming Lei wrote:
> > > > On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote:
> > > > > On 4/25/23 8:42?AM, Ming Lei wrote:
> > > > > > On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote:
> > > > > > > On 4/24/23 8:50?PM, Ming Lei wrote:
> > > > > > > > On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote:
> > > > > > > > > On 4/24/23 8:13?PM, Ming Lei wrote:
> > > > > > > > > > On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
> > > > > > > > > > > On 4/24/23 6:57?PM, Ming Lei wrote:
> > > > > > > > > > > > On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
> > > > > > > > > > > > > On 4/24/23 1:30?AM, Ming Lei wrote:
> > > > > > > > > > > > > > On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
> > > > > > > > > > > > > > > Add an opdef bit for them, and set it for the opcodes where we always
> > > > > > > > > > > > > > > need io-wq punt. With that done, exclude them from the file_can_poll()
> > > > > > > > > > > > > > > check in terms of whether or not we need to punt them if any of the
> > > > > > > > > > > > > > > NO_OFFLOAD flags are set.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Signed-off-by: Jens Axboe <[email protected]>
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > >   io_uring/io_uring.c |  2 +-
> > > > > > > > > > > > > > >   io_uring/opdef.c    | 22 ++++++++++++++++++++--
> > > > > > > > > > > > > > >   io_uring/opdef.h    |  2 ++
> > > > > > > > > > > > > > >   3 files changed, 23 insertions(+), 3 deletions(-)
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> > > > > > > > > > > > > > > index fee3e461e149..420cfd35ebc6 100644
> > > > > > > > > > > > > > > --- a/io_uring/io_uring.c
> > > > > > > > > > > > > > > +++ b/io_uring/io_uring.c
> > > > > > > > > > > > > > > @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
> > > > > > > > > > > > > > >   		return -EBADF;
> > > > > > > > > > > > > > >   	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
> > > > > > > > > > > > > > > -	    (!req->file || !file_can_poll(req->file)))
> > > > > > > > > > > > > > > +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
> > > > > > > > > > > > > > >   		issue_flags &= ~IO_URING_F_NONBLOCK;
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I guess the check should be !def->always_iowq?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > How so? Nobody that takes pollable files should/is setting
> > > > > > > > > > > > > ->always_iowq. If we can poll the file, we should not force inline
> > > > > > > > > > > > > submission. Basically the ones setting ->always_iowq always do -EAGAIN
> > > > > > > > > > > > > returns if nonblock == true.
> > > > > > > > > > > > 
> > > > > > > > > > > > I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
> > > > > > > > > > > > these OPs won't return -EAGAIN, then run in the current task context
> > > > > > > > > > > > directly.
> > > > > > > > > > > 
> > > > > > > > > > > Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
> > > > > > > > > > > it :-)
> > > > > > > > > > 
> > > > > > > > > > But ->always_iowq isn't actually _always_ since fallocate/fsync/... are
> > > > > > > > > > not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of
> > > > > > > > > > ->always_iowq is a bit confusing?
> > > > > > > > > 
> > > > > > > > > Yeah naming isn't that great, I can see how that's bit confusing. I'll
> > > > > > > > > be happy to take suggestions on what would make it clearer.
> > > > > > > > 
> > > > > > > > Except for the naming, I am also wondering why these ->always_iowq OPs
> > > > > > > > aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it
> > > > > > > > shouldn't improve performance by doing so because these OPs are supposed
> > > > > > > > to be slow and always slept, not like others(buffered writes, ...),
> > > > > > > > can you provide one hint about not offloading these OPs? Or is it just that
> > > > > > > > NO_OFFLOAD needs to not offload every OPs?
> > > > > > > 
> > > > > > > The whole point of NO_OFFLOAD is that items that would normally be
> > > > > > > passed to io-wq are just run inline. This provides a way to reap the
> > > > > > > benefits of batched submissions and syscall reductions. Some opcodes
> > > > > > > will just never be async, and io-wq offloads are not very fast. Some of
> > > > > > 
> > > > > > Yeah, seems io-wq is much slower than inline issue, maybe it needs
> > > > > > to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK.
> > > > > 
> > > > > Indeed, depending on what is being linked, you may see io-wq activity
> > > > > which is not ideal.
> > > > 
> > > > That is why I prefer to fused command for ublk zero copy, because the
> > > > registering buffer approach suggested by Pavel and Ziyang has to link
> > > > register buffer OP with the actual IO OP, and it is observed that
> > > > IOPS drops to 1/2 in 4k random io test with registered buffer approach.
> > > 
> > > It'd be worth looking into if we can avoid io-wq for link execution, as
> > > that'd be a nice win overall too. IIRC, there's no reason why it can't
> > > be done like initial issue rather than just a lazy punt to io-wq.
> > 
> > I might've missed a part of the discussion, but links are _usually_
> > executed by task_work, e.g.
> > 
> > io_submit_flush_completions() -> io_queue_next() -> io_req_task_queue()
> 
> Good catch, just figured out that /dev/ublkcN & backing file isn't opened by
> O_NONBLOCK.
> 
> But -EAGAIN is still returned from io_write() even though the regular
> file is opened with O_DIRECT, at least on btrfs & xfs, so io wq is still
> scheduled. Not look into the exact reason yet, and not see such issue for
> block device. Anyway, it isn't related with io wq.

It is because -EAGAIN is returned from call_write_iter() in case of IOCB_NOWAIT,
so it is exactly what Jens's patchset is addressing.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
  2023-04-25 15:28                     ` Pavel Begunkov
@ 2023-04-30 13:34                       ` Hao Xu
  0 siblings, 0 replies; 25+ messages in thread
From: Hao Xu @ 2023-04-30 13:34 UTC (permalink / raw)
  To: Pavel Begunkov, Ming Lei, Jens Axboe; +Cc: io-uring


On 4/25/23 23:28, Pavel Begunkov wrote:
> On 4/25/23 15:42, Ming Lei wrote:
>> On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote:
>>> On 4/24/23 8:50?PM, Ming Lei wrote:
>>>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote:
>>>>> On 4/24/23 8:13?PM, Ming Lei wrote:
>>>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
>>>>>>> On 4/24/23 6:57?PM, Ming Lei wrote:
>>>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
>>>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote:
>>>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
>>>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where 
>>>>>>>>>>> we always
>>>>>>>>>>> need io-wq punt. With that done, exclude them from the 
>>>>>>>>>>> file_can_poll()
>>>>>>>>>>> check in terms of whether or not we need to punt them if any 
>>>>>>>>>>> of the
>>>>>>>>>>> NO_OFFLOAD flags are set.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>>>>>>>> ---
>>>>>>>>>>>   io_uring/io_uring.c |  2 +-
>>>>>>>>>>>   io_uring/opdef.c    | 22 ++++++++++++++++++++--
>>>>>>>>>>>   io_uring/opdef.h    |  2 ++
>>>>>>>>>>>   3 files changed, 23 insertions(+), 3 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644
>>>>>>>>>>> --- a/io_uring/io_uring.c
>>>>>>>>>>> +++ b/io_uring/io_uring.c
>>>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct 
>>>>>>>>>>> io_kiocb *req, unsigned int issue_flags)
>>>>>>>>>>>           return -EBADF;
>>>>>>>>>>>         if (issue_flags & IO_URING_F_NO_OFFLOAD &&
>>>>>>>>>>> -        (!req->file || !file_can_poll(req->file)))
>>>>>>>>>>> +        (!req->file || !file_can_poll(req->file) || 
>>>>>>>>>>> def->always_iowq))
>>>>>>>>>>>           issue_flags &= ~IO_URING_F_NONBLOCK;
>>>>>>>>>>
>>>>>>>>>> I guess the check should be !def->always_iowq?
>>>>>>>>>
>>>>>>>>> How so? Nobody that takes pollable files should/is setting
>>>>>>>>> ->always_iowq. If we can poll the file, we should not force 
>>>>>>>>> inline
>>>>>>>>> submission. Basically the ones setting ->always_iowq always do 
>>>>>>>>> -EAGAIN
>>>>>>>>> returns if nonblock == true.
>>>>>>>>
>>>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and
>>>>>>>> these OPs won't return -EAGAIN, then run in the current task 
>>>>>>>> context
>>>>>>>> directly.
>>>>>>>
>>>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the 
>>>>>>> point of
>>>>>>> it :-)
>>>>>>
>>>>>> But ->always_iowq isn't actually _always_ since 
>>>>>> fallocate/fsync/... are
>>>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the 
>>>>>> naming of
>>>>>> ->always_iowq is a bit confusing?
>>>>>
>>>>> Yeah naming isn't that great, I can see how that's bit confusing. 
>>>>> I'll
>>>>> be happy to take suggestions on what would make it clearer.
>>>>
>>>> Except for the naming, I am also wondering why these ->always_iowq OPs
>>>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it
>>>> shouldn't improve performance by doing so because these OPs are 
>>>> supposed
>>>> to be slow and always slept, not like others(buffered writes, ...),
>>>> can you provide one hint about not offloading these OPs? Or is it 
>>>> just that
>>>> NO_OFFLOAD needs to not offload every OPs?
>>>
>>> The whole point of NO_OFFLOAD is that items that would normally be
>>> passed to io-wq are just run inline. This provides a way to reap the
>>> benefits of batched submissions and syscall reductions. Some opcodes
>>> will just never be async, and io-wq offloads are not very fast. Some of
>>
>> Yeah, seems io-wq is much slower than inline issue, maybe it needs
>> to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK.
>
> There were attempts like this one from Hao (CC'ed)
>
> https://lore.kernel.org/io-uring/[email protected]/t/ 
>
>
> Not sure why it got stalled, but maybe Hao would be willing
> to pick it up again.


Hi folks, I'd like to pick it up again, but I just didn't get any reply 
at that time after sending

several versions of it...so before I restart that series, I'd like to 
ask Jens to comment the idea

of that patchset (fixed worker).


Thanks,

Hao



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-04-30 13:43 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-20 18:31 [PATCHSET v2 0/4] Enable NO_OFFLOAD support Jens Axboe
2023-04-20 18:31 ` [PATCH 1/4] io_uring: add support for NO_OFFLOAD Jens Axboe
2023-04-20 18:31 ` [PATCH 2/4] Revert "io_uring: always go async for unsupported fadvise flags" Jens Axboe
2023-04-20 18:31 ` [PATCH 3/4] Revert "io_uring: for requests that require async, force it" Jens Axboe
2023-04-20 18:31 ` [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt Jens Axboe
2023-04-24  7:30   ` Ming Lei
2023-04-24 15:24     ` Jens Axboe
2023-04-25  0:57       ` Ming Lei
2023-04-25  2:08         ` Jens Axboe
2023-04-25  2:13           ` Ming Lei
2023-04-25  2:18             ` Jens Axboe
2023-04-25  2:50               ` Ming Lei
2023-04-25 13:31                 ` Jens Axboe
2023-04-25 14:42                   ` Ming Lei
2023-04-25 14:50                     ` Jens Axboe
2023-04-25 15:07                       ` Ming Lei
2023-04-25 15:25                         ` Jens Axboe
2023-04-25 15:46                           ` Pavel Begunkov
2023-04-26  3:25                             ` Ming Lei
2023-04-26  4:28                               ` Ming Lei
2023-04-26  1:43                           ` Ming Lei
2023-04-25 16:10                         ` Pavel Begunkov
2023-04-26  3:37                           ` Ming Lei
2023-04-25 15:28                     ` Pavel Begunkov
2023-04-30 13:34                       ` Hao Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox