public inbox for [email protected]
 help / color / mirror / Atom feed
* [RFC 0/9] scrap 24 bytes from io_kiocb
@ 2020-07-12  9:41 Pavel Begunkov
  2020-07-12  9:41 ` [PATCH 1/9] io_uring: share completion list w/ per-op space Pavel Begunkov
                   ` (9 more replies)
  0 siblings, 10 replies; 18+ messages in thread
From: Pavel Begunkov @ 2020-07-12  9:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Make io_kiocb slimmer by 24 bytes mainly by revising lists usage. The
drawback is adding extra kmalloc in draining path, but that's a slow
path, so meh. It also frees some space for the deferred completion path
if would be needed in the future, but the main idea here is to shrink it
to 3 cachelines in the end.

I'm not happy yet with a few details, so that's not final, but it would
be lovely to hear some feedback.

Pavel Begunkov (9):
  io_uring: share completion list w/ per-op space
  io_uring: rename ctx->poll into ctx->iopoll
  io_uring: use inflight_entry list for iopolling
  io_uring: use competion list for CQ overflow
  io_uring: add req->timeout.list
  io_uring: remove init for unused list
  io_uring: kill rq->list and allocate it on demand
  io_uring: remove sequence from io_kiocb
  io_uring: place cflags into completion data

 fs/io_uring.c | 188 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 116 insertions(+), 72 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/9] io_uring: share completion list w/ per-op space
  2020-07-12  9:41 [RFC 0/9] scrap 24 bytes from io_kiocb Pavel Begunkov
@ 2020-07-12  9:41 ` Pavel Begunkov
  2020-07-12  9:41 ` [PATCH 2/9] io_uring: rename ctx->poll into ctx->iopoll Pavel Begunkov
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2020-07-12  9:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring

After io_req_complete() per-op data is not needed anymore, reuse it
to keep a list for struct io_comp_state there, cleaning up a request
before hand. Though, useless by itself, that's a preparation for
compacting io_kiocb.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8482b9aed952..2316e6b840b3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -487,6 +487,12 @@ struct io_statx {
 	struct statx __user		*buffer;
 };
 
+/* Safe to use only after *fill_event() and properly cleaning per-op data. */
+struct io_completion {
+	struct file			*file;
+	struct list_head		list;
+};
+
 struct io_async_connect {
 	struct sockaddr_storage		address;
 };
@@ -622,6 +628,7 @@ struct io_kiocb {
 		struct io_splice	splice;
 		struct io_provide_buf	pbuf;
 		struct io_statx		statx;
+		struct io_completion	compl;
 	};
 
 	struct io_async_ctx		*io;
@@ -1410,8 +1417,8 @@ static void io_submit_flush_completions(struct io_comp_state *cs)
 	while (!list_empty(&cs->list)) {
 		struct io_kiocb *req;
 
-		req = list_first_entry(&cs->list, struct io_kiocb, list);
-		list_del(&req->list);
+		req = list_first_entry(&cs->list, struct io_kiocb, compl.list);
+		list_del(&req->compl.list);
 		__io_cqring_fill_event(req, req->result, req->cflags);
 		if (!(req->flags & REQ_F_LINK_HEAD)) {
 			req->flags |= REQ_F_COMP_LOCKED;
@@ -1436,9 +1443,12 @@ static void __io_req_complete(struct io_kiocb *req, long res, unsigned cflags,
 		io_cqring_add_event(req, res, cflags);
 		io_put_req(req);
 	} else {
+		if (req->flags & REQ_F_NEED_CLEANUP)
+			io_cleanup_req(req);
+
 		req->result = res;
 		req->cflags = cflags;
-		list_add_tail(&req->list, &cs->list);
+		list_add_tail(&req->compl.list, &cs->list);
 		if (++cs->nr >= 32)
 			io_submit_flush_completions(cs);
 	}
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/9] io_uring: rename ctx->poll into ctx->iopoll
  2020-07-12  9:41 [RFC 0/9] scrap 24 bytes from io_kiocb Pavel Begunkov
  2020-07-12  9:41 ` [PATCH 1/9] io_uring: share completion list w/ per-op space Pavel Begunkov
@ 2020-07-12  9:41 ` Pavel Begunkov
  2020-07-12  9:41 ` [PATCH 3/9] io_uring: use inflight_entry list for iopolling Pavel Begunkov
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2020-07-12  9:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring

It supports both polling and I/O polling. Rename ctx->poll to clearly
show that it's only in I/O poll case.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 2316e6b840b3..669a131c22ec 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -320,12 +320,12 @@ struct io_ring_ctx {
 		spinlock_t		completion_lock;
 
 		/*
-		 * ->poll_list is protected by the ctx->uring_lock for
+		 * ->iopoll_list is protected by the ctx->uring_lock for
 		 * io_uring instances that don't use IORING_SETUP_SQPOLL.
 		 * For SQPOLL, only the single threaded io_sq_thread() will
 		 * manipulate the list, hence no extra locking is needed there.
 		 */
-		struct list_head	poll_list;
+		struct list_head	iopoll_list;
 		struct hlist_head	*cancel_hash;
 		unsigned		cancel_hash_bits;
 		bool			poll_multi_file;
@@ -1059,7 +1059,7 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	mutex_init(&ctx->uring_lock);
 	init_waitqueue_head(&ctx->wait);
 	spin_lock_init(&ctx->completion_lock);
-	INIT_LIST_HEAD(&ctx->poll_list);
+	INIT_LIST_HEAD(&ctx->iopoll_list);
 	INIT_LIST_HEAD(&ctx->defer_list);
 	INIT_LIST_HEAD(&ctx->timeout_list);
 	init_waitqueue_head(&ctx->inflight_wait);
@@ -2003,7 +2003,7 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
 	spin = !ctx->poll_multi_file && *nr_events < min;
 
 	ret = 0;
-	list_for_each_entry_safe(req, tmp, &ctx->poll_list, list) {
+	list_for_each_entry_safe(req, tmp, &ctx->iopoll_list, list) {
 		struct kiocb *kiocb = &req->rw.kiocb;
 
 		/*
@@ -2045,7 +2045,7 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
 static int io_iopoll_getevents(struct io_ring_ctx *ctx, unsigned int *nr_events,
 				long min)
 {
-	while (!list_empty(&ctx->poll_list) && !need_resched()) {
+	while (!list_empty(&ctx->iopoll_list) && !need_resched()) {
 		int ret;
 
 		ret = io_do_iopoll(ctx, nr_events, min);
@@ -2068,7 +2068,7 @@ static void io_iopoll_try_reap_events(struct io_ring_ctx *ctx)
 		return;
 
 	mutex_lock(&ctx->uring_lock);
-	while (!list_empty(&ctx->poll_list)) {
+	while (!list_empty(&ctx->iopoll_list)) {
 		unsigned int nr_events = 0;
 
 		io_do_iopoll(ctx, &nr_events, 0);
@@ -2285,12 +2285,12 @@ static void io_iopoll_req_issued(struct io_kiocb *req)
 	 * how we do polling eventually, not spinning if we're on potentially
 	 * different devices.
 	 */
-	if (list_empty(&ctx->poll_list)) {
+	if (list_empty(&ctx->iopoll_list)) {
 		ctx->poll_multi_file = false;
 	} else if (!ctx->poll_multi_file) {
 		struct io_kiocb *list_req;
 
-		list_req = list_first_entry(&ctx->poll_list, struct io_kiocb,
+		list_req = list_first_entry(&ctx->iopoll_list, struct io_kiocb,
 						list);
 		if (list_req->file != req->file)
 			ctx->poll_multi_file = true;
@@ -2301,9 +2301,9 @@ static void io_iopoll_req_issued(struct io_kiocb *req)
 	 * it to the front so we find it first.
 	 */
 	if (READ_ONCE(req->iopoll_completed))
-		list_add(&req->list, &ctx->poll_list);
+		list_add(&req->list, &ctx->iopoll_list);
 	else
-		list_add_tail(&req->list, &ctx->poll_list);
+		list_add_tail(&req->list, &ctx->iopoll_list);
 
 	if ((ctx->flags & IORING_SETUP_SQPOLL) &&
 	    wq_has_sleeper(&ctx->sqo_wait))
@@ -6327,11 +6327,11 @@ static int io_sq_thread(void *data)
 	while (!kthread_should_park()) {
 		unsigned int to_submit;
 
-		if (!list_empty(&ctx->poll_list)) {
+		if (!list_empty(&ctx->iopoll_list)) {
 			unsigned nr_events = 0;
 
 			mutex_lock(&ctx->uring_lock);
-			if (!list_empty(&ctx->poll_list) && !need_resched())
+			if (!list_empty(&ctx->iopoll_list) && !need_resched())
 				io_do_iopoll(ctx, &nr_events, 0);
 			else
 				timeout = jiffies + ctx->sq_thread_idle;
@@ -6360,7 +6360,7 @@ static int io_sq_thread(void *data)
 			 * more IO, we should wait for the application to
 			 * reap events and wake us up.
 			 */
-			if (!list_empty(&ctx->poll_list) || need_resched() ||
+			if (!list_empty(&ctx->iopoll_list) || need_resched() ||
 			    (!time_after(jiffies, timeout) && ret != -EBUSY &&
 			    !percpu_ref_is_dying(&ctx->refs))) {
 				io_run_task_work();
@@ -6373,13 +6373,13 @@ static int io_sq_thread(void *data)
 
 			/*
 			 * While doing polled IO, before going to sleep, we need
-			 * to check if there are new reqs added to poll_list, it
-			 * is because reqs may have been punted to io worker and
-			 * will be added to poll_list later, hence check the
-			 * poll_list again.
+			 * to check if there are new reqs added to iopoll_list,
+			 * it is because reqs may have been punted to io worker
+			 * and will be added to iopoll_list later, hence check
+			 * the iopoll_list again.
 			 */
 			if ((ctx->flags & IORING_SETUP_IOPOLL) &&
-			    !list_empty_careful(&ctx->poll_list)) {
+			    !list_empty_careful(&ctx->iopoll_list)) {
 				finish_wait(&ctx->sqo_wait, &wait);
 				continue;
 			}
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/9] io_uring: use inflight_entry list for iopolling
  2020-07-12  9:41 [RFC 0/9] scrap 24 bytes from io_kiocb Pavel Begunkov
  2020-07-12  9:41 ` [PATCH 1/9] io_uring: share completion list w/ per-op space Pavel Begunkov
  2020-07-12  9:41 ` [PATCH 2/9] io_uring: rename ctx->poll into ctx->iopoll Pavel Begunkov
@ 2020-07-12  9:41 ` Pavel Begunkov
  2020-07-12  9:41 ` [PATCH 4/9] io_uring: use competion list for CQ overflow Pavel Begunkov
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2020-07-12  9:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring

req->inflight_entry is used to track requests with ->files, and the
only iopoll'ed requests (i.e. read/write) don't have it. Use
req->inflight_entry for iopoll path, btw aliasing it in union
with a more proper name for clarity.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 669a131c22ec..bb92cc736afe 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -651,7 +651,14 @@ struct io_kiocb {
 
 	struct list_head	link_list;
 
-	struct list_head	inflight_entry;
+	/*
+	 * @inflight_entry is for reqs with ->files (see io_op_def::file_table)
+	 * @iopoll_list is for read/write requests
+	 */
+	union {
+		struct list_head	inflight_entry;
+		struct list_head	iopoll_list;
+	};
 
 	struct percpu_ref	*fixed_file_refs;
 
@@ -1937,8 +1944,8 @@ static void io_iopoll_queue(struct list_head *again)
 	struct io_kiocb *req;
 
 	do {
-		req = list_first_entry(again, struct io_kiocb, list);
-		list_del(&req->list);
+		req = list_first_entry(again, struct io_kiocb, iopoll_list);
+		list_del(&req->iopoll_list);
 		if (!io_rw_reissue(req, -EAGAIN))
 			io_complete_rw_common(&req->rw.kiocb, -EAGAIN, NULL);
 	} while (!list_empty(again));
@@ -1961,13 +1968,13 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
 	while (!list_empty(done)) {
 		int cflags = 0;
 
-		req = list_first_entry(done, struct io_kiocb, list);
+		req = list_first_entry(done, struct io_kiocb, iopoll_list);
 		if (READ_ONCE(req->result) == -EAGAIN) {
 			req->iopoll_completed = 0;
-			list_move_tail(&req->list, &again);
+			list_move_tail(&req->iopoll_list, &again);
 			continue;
 		}
-		list_del(&req->list);
+		list_del(&req->iopoll_list);
 
 		if (req->flags & REQ_F_BUFFER_SELECTED)
 			cflags = io_put_kbuf(req);
@@ -2003,7 +2010,7 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
 	spin = !ctx->poll_multi_file && *nr_events < min;
 
 	ret = 0;
-	list_for_each_entry_safe(req, tmp, &ctx->iopoll_list, list) {
+	list_for_each_entry_safe(req, tmp, &ctx->iopoll_list, iopoll_list) {
 		struct kiocb *kiocb = &req->rw.kiocb;
 
 		/*
@@ -2012,7 +2019,7 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
 		 * and complete those lists first, if we have entries there.
 		 */
 		if (READ_ONCE(req->iopoll_completed)) {
-			list_move_tail(&req->list, &done);
+			list_move_tail(&req->iopoll_list, &done);
 			continue;
 		}
 		if (!list_empty(&done))
@@ -2024,7 +2031,7 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
 
 		/* iopoll may have completed current req */
 		if (READ_ONCE(req->iopoll_completed))
-			list_move_tail(&req->list, &done);
+			list_move_tail(&req->iopoll_list, &done);
 
 		if (ret && spin)
 			spin = false;
@@ -2291,7 +2298,7 @@ static void io_iopoll_req_issued(struct io_kiocb *req)
 		struct io_kiocb *list_req;
 
 		list_req = list_first_entry(&ctx->iopoll_list, struct io_kiocb,
-						list);
+						iopoll_list);
 		if (list_req->file != req->file)
 			ctx->poll_multi_file = true;
 	}
@@ -2301,9 +2308,9 @@ static void io_iopoll_req_issued(struct io_kiocb *req)
 	 * it to the front so we find it first.
 	 */
 	if (READ_ONCE(req->iopoll_completed))
-		list_add(&req->list, &ctx->iopoll_list);
+		list_add(&req->iopoll_list, &ctx->iopoll_list);
 	else
-		list_add_tail(&req->list, &ctx->iopoll_list);
+		list_add_tail(&req->iopoll_list, &ctx->iopoll_list);
 
 	if ((ctx->flags & IORING_SETUP_SQPOLL) &&
 	    wq_has_sleeper(&ctx->sqo_wait))
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/9] io_uring: use competion list for CQ overflow
  2020-07-12  9:41 [RFC 0/9] scrap 24 bytes from io_kiocb Pavel Begunkov
                   ` (2 preceding siblings ...)
  2020-07-12  9:41 ` [PATCH 3/9] io_uring: use inflight_entry list for iopolling Pavel Begunkov
@ 2020-07-12  9:41 ` Pavel Begunkov
  2020-07-12  9:41 ` [PATCH 5/9] io_uring: add req->timeout.list Pavel Begunkov
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2020-07-12  9:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring

As with the completion path, use compl.list for overflowed requests. If
cleaned up properly, nobody needs per-op data there anymore.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index bb92cc736afe..88c3092399e2 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1335,8 +1335,8 @@ static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
 			break;
 
 		req = list_first_entry(&ctx->cq_overflow_list, struct io_kiocb,
-						list);
-		list_move(&req->list, &list);
+						compl.list);
+		list_move(&req->compl.list, &list);
 		req->flags &= ~REQ_F_OVERFLOW;
 		if (cqe) {
 			WRITE_ONCE(cqe->user_data, req->user_data);
@@ -1357,8 +1357,8 @@ static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
 	io_cqring_ev_posted(ctx);
 
 	while (!list_empty(&list)) {
-		req = list_first_entry(&list, struct io_kiocb, list);
-		list_del(&req->list);
+		req = list_first_entry(&list, struct io_kiocb, compl.list);
+		list_del(&req->compl.list);
 		io_put_req(req);
 	}
 
@@ -1386,6 +1386,9 @@ static void __io_cqring_fill_event(struct io_kiocb *req, long res, long cflags)
 		WRITE_ONCE(ctx->rings->cq_overflow,
 				atomic_inc_return(&ctx->cached_cq_overflow));
 	} else {
+		if (req->flags & REQ_F_NEED_CLEANUP)
+			io_cleanup_req(req);
+
 		if (list_empty(&ctx->cq_overflow_list)) {
 			set_bit(0, &ctx->sq_check_overflow);
 			set_bit(0, &ctx->cq_check_overflow);
@@ -1394,7 +1397,7 @@ static void __io_cqring_fill_event(struct io_kiocb *req, long res, long cflags)
 		refcount_inc(&req->refs);
 		req->result = res;
 		req->cflags = cflags;
-		list_add_tail(&req->list, &ctx->cq_overflow_list);
+		list_add_tail(&req->compl.list, &ctx->cq_overflow_list);
 	}
 }
 
@@ -7822,7 +7825,7 @@ static void io_uring_cancel_files(struct io_ring_ctx *ctx,
 
 		if (cancel_req->flags & REQ_F_OVERFLOW) {
 			spin_lock_irq(&ctx->completion_lock);
-			list_del(&cancel_req->list);
+			list_del(&cancel_req->compl.list);
 			cancel_req->flags &= ~REQ_F_OVERFLOW;
 			if (list_empty(&ctx->cq_overflow_list)) {
 				clear_bit(0, &ctx->sq_check_overflow);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/9] io_uring: add req->timeout.list
  2020-07-12  9:41 [RFC 0/9] scrap 24 bytes from io_kiocb Pavel Begunkov
                   ` (3 preceding siblings ...)
  2020-07-12  9:41 ` [PATCH 4/9] io_uring: use competion list for CQ overflow Pavel Begunkov
@ 2020-07-12  9:41 ` Pavel Begunkov
  2020-07-12  9:41 ` [PATCH 6/9] io_uring: remove init for unused list Pavel Begunkov
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2020-07-12  9:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Instead of using shared req->list, hang timeouts up on their own list
entry. struct io_timeout have enough extra space for it, but if that
will be a problem ->inflight_entry can reused for that.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 88c3092399e2..81e7bb226dbd 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -396,6 +396,7 @@ struct io_timeout {
 	int				flags;
 	u32				off;
 	u32				target_seq;
+	struct list_head		list;
 };
 
 struct io_rw {
@@ -1209,7 +1210,7 @@ static void io_kill_timeout(struct io_kiocb *req)
 	ret = hrtimer_try_to_cancel(&req->io->timeout.timer);
 	if (ret != -1) {
 		atomic_inc(&req->ctx->cq_timeouts);
-		list_del_init(&req->list);
+		list_del_init(&req->timeout.list);
 		req->flags |= REQ_F_COMP_LOCKED;
 		io_cqring_fill_event(req, 0);
 		io_put_req(req);
@@ -1221,7 +1222,7 @@ static void io_kill_timeouts(struct io_ring_ctx *ctx)
 	struct io_kiocb *req, *tmp;
 
 	spin_lock_irq(&ctx->completion_lock);
-	list_for_each_entry_safe(req, tmp, &ctx->timeout_list, list)
+	list_for_each_entry_safe(req, tmp, &ctx->timeout_list, timeout.list)
 		io_kill_timeout(req);
 	spin_unlock_irq(&ctx->completion_lock);
 }
@@ -1244,7 +1245,7 @@ static void io_flush_timeouts(struct io_ring_ctx *ctx)
 {
 	while (!list_empty(&ctx->timeout_list)) {
 		struct io_kiocb *req = list_first_entry(&ctx->timeout_list,
-							struct io_kiocb, list);
+						struct io_kiocb, timeout.list);
 
 		if (io_is_timeout_noseq(req))
 			break;
@@ -1252,7 +1253,7 @@ static void io_flush_timeouts(struct io_ring_ctx *ctx)
 					- atomic_read(&ctx->cq_timeouts))
 			break;
 
-		list_del_init(&req->list);
+		list_del_init(&req->timeout.list);
 		io_kill_timeout(req);
 	}
 }
@@ -5006,8 +5007,8 @@ static enum hrtimer_restart io_timeout_fn(struct hrtimer *timer)
 	 * We could be racing with timeout deletion. If the list is empty,
 	 * then timeout lookup already found it and will be handling it.
 	 */
-	if (!list_empty(&req->list))
-		list_del_init(&req->list);
+	if (!list_empty(&req->timeout.list))
+		list_del_init(&req->timeout.list);
 
 	io_cqring_fill_event(req, -ETIME);
 	io_commit_cqring(ctx);
@@ -5024,9 +5025,9 @@ static int io_timeout_cancel(struct io_ring_ctx *ctx, __u64 user_data)
 	struct io_kiocb *req;
 	int ret = -ENOENT;
 
-	list_for_each_entry(req, &ctx->timeout_list, list) {
+	list_for_each_entry(req, &ctx->timeout_list, timeout.list) {
 		if (user_data == req->user_data) {
-			list_del_init(&req->list);
+			list_del_init(&req->timeout.list);
 			ret = 0;
 			break;
 		}
@@ -5146,7 +5147,8 @@ static int io_timeout(struct io_kiocb *req)
 	 * the one we need first.
 	 */
 	list_for_each_prev(entry, &ctx->timeout_list) {
-		struct io_kiocb *nxt = list_entry(entry, struct io_kiocb, list);
+		struct io_kiocb *nxt = list_entry(entry, struct io_kiocb,
+						  timeout.list);
 
 		if (io_is_timeout_noseq(nxt))
 			continue;
@@ -5155,7 +5157,7 @@ static int io_timeout(struct io_kiocb *req)
 			break;
 	}
 add:
-	list_add(&req->list, entry);
+	list_add(&req->timeout.list, entry);
 	data->timer.function = io_timeout_fn;
 	hrtimer_start(&data->timer, timespec64_to_ktime(data->ts), data->mode);
 	spin_unlock_irq(&ctx->completion_lock);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/9] io_uring: remove init for unused list
  2020-07-12  9:41 [RFC 0/9] scrap 24 bytes from io_kiocb Pavel Begunkov
                   ` (4 preceding siblings ...)
  2020-07-12  9:41 ` [PATCH 5/9] io_uring: add req->timeout.list Pavel Begunkov
@ 2020-07-12  9:41 ` Pavel Begunkov
  2020-07-12  9:41 ` [PATCH 7/9] io_uring: kill rq->list and allocate it on demand Pavel Begunkov
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2020-07-12  9:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring

poll*() doesn't use req->list, don't init it.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 81e7bb226dbd..38ffcfca9b34 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4973,7 +4973,6 @@ static int io_poll_add(struct io_kiocb *req)
 	__poll_t mask;
 
 	INIT_HLIST_NODE(&req->hash_node);
-	INIT_LIST_HEAD(&req->list);
 	ipt.pt._qproc = io_poll_queue_proc;
 
 	mask = __io_arm_poll_handler(req, &req->poll, &ipt, poll->events,
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 7/9] io_uring: kill rq->list and allocate it on demand
  2020-07-12  9:41 [RFC 0/9] scrap 24 bytes from io_kiocb Pavel Begunkov
                   ` (5 preceding siblings ...)
  2020-07-12  9:41 ` [PATCH 6/9] io_uring: remove init for unused list Pavel Begunkov
@ 2020-07-12  9:41 ` Pavel Begunkov
  2020-07-12  9:41 ` [PATCH 8/9] io_uring: remove sequence from io_kiocb Pavel Begunkov
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2020-07-12  9:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring

The only user of req->list is DRAIN, hence instead of keeping a separate
per request list for it, just allocate a separate defer entry when
needed, that's a slow path anyway.
This removes req->list and so sheds 16 bytes from io_kiocb.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 38ffcfca9b34..93e8192983e1 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -641,7 +641,6 @@ struct io_kiocb {
 	u16				buf_index;
 
 	struct io_ring_ctx	*ctx;
-	struct list_head	list;
 	unsigned int		flags;
 	refcount_t		refs;
 	struct task_struct	*task;
@@ -679,6 +678,11 @@ struct io_kiocb {
 	struct callback_head	task_work;
 };
 
+struct io_defer_entry {
+	struct list_head	list;
+	struct io_kiocb		*req;
+};
+
 #define IO_IOPOLL_BATCH			8
 
 struct io_comp_state {
@@ -1230,14 +1234,15 @@ static void io_kill_timeouts(struct io_ring_ctx *ctx)
 static void __io_queue_deferred(struct io_ring_ctx *ctx)
 {
 	do {
-		struct io_kiocb *req = list_first_entry(&ctx->defer_list,
-							struct io_kiocb, list);
+		struct io_defer_entry *de = list_first_entry(&ctx->defer_list,
+						struct io_defer_entry, list);
 
-		if (req_need_defer(req))
+		if (req_need_defer(de->req))
 			break;
-		list_del_init(&req->list);
+		list_del_init(&de->list);
 		/* punt-init is done before queueing for defer */
-		__io_queue_async_work(req);
+		__io_queue_async_work(de->req);
+		kfree(de);
 	} while (!list_empty(&ctx->defer_list));
 }
 
@@ -5398,6 +5403,7 @@ static int io_req_defer_prep(struct io_kiocb *req,
 static int io_req_defer(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_ring_ctx *ctx = req->ctx;
+	struct io_defer_entry *de;
 	int ret;
 
 	/* Still need defer if there is pending req in defer list. */
@@ -5412,15 +5418,20 @@ static int io_req_defer(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 			return ret;
 	}
 	io_prep_async_link(req);
+	de = kmalloc(sizeof(*de), GFP_KERNEL);
+	if (!de)
+		return -ENOMEM;
 
 	spin_lock_irq(&ctx->completion_lock);
 	if (!req_need_defer(req) && list_empty(&ctx->defer_list)) {
 		spin_unlock_irq(&ctx->completion_lock);
+		kfree(de);
 		return 0;
 	}
 
 	trace_io_uring_defer(ctx, req, req->user_data);
-	list_add_tail(&req->list, &ctx->defer_list);
+	de->req = req;
+	list_add_tail(&de->list, &ctx->defer_list);
 	spin_unlock_irq(&ctx->completion_lock);
 	return -EIOCBQUEUED;
 }
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 8/9] io_uring: remove sequence from io_kiocb
  2020-07-12  9:41 [RFC 0/9] scrap 24 bytes from io_kiocb Pavel Begunkov
                   ` (6 preceding siblings ...)
  2020-07-12  9:41 ` [PATCH 7/9] io_uring: kill rq->list and allocate it on demand Pavel Begunkov
@ 2020-07-12  9:41 ` Pavel Begunkov
  2020-07-12  9:41 ` [PATCH 9/9] io_uring: place cflags into completion data Pavel Begunkov
  2020-07-12 15:59 ` [RFC 0/9] scrap 24 bytes from io_kiocb Jens Axboe
  9 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2020-07-12  9:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring

req->sequence is used only for deferred (i.e. DRAIN) requests, but is
a burden of every request even if isn't used. Remove req->sequence from
io_kiocb together with its initialisation in io_init_req(). In place of
that keep the field in io_defer_entry, and calculate it in
io_req_defer()'s slow path.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 40 ++++++++++++++++++++++++++--------------
 1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 93e8192983e1..db7f86b6da09 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -639,6 +639,7 @@ struct io_kiocb {
 	u8				iopoll_completed;
 
 	u16				buf_index;
+	u32				result;
 
 	struct io_ring_ctx	*ctx;
 	unsigned int		flags;
@@ -646,8 +647,6 @@ struct io_kiocb {
 	struct task_struct	*task;
 	unsigned long		fsize;
 	u64			user_data;
-	u32			result;
-	u32			sequence;
 
 	struct list_head	link_list;
 
@@ -681,6 +680,7 @@ struct io_kiocb {
 struct io_defer_entry {
 	struct list_head	list;
 	struct io_kiocb		*req;
+	u32			seq;
 };
 
 #define IO_IOPOLL_BATCH			8
@@ -1088,13 +1088,13 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	return NULL;
 }
 
-static inline bool req_need_defer(struct io_kiocb *req)
+static bool req_need_defer(struct io_kiocb *req, u32 seq)
 {
 	if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
 		struct io_ring_ctx *ctx = req->ctx;
 
-		return req->sequence != ctx->cached_cq_tail
-					+ atomic_read(&ctx->cached_cq_overflow);
+		return seq != ctx->cached_cq_tail
+				+ atomic_read(&ctx->cached_cq_overflow);
 	}
 
 	return false;
@@ -1237,7 +1237,7 @@ static void __io_queue_deferred(struct io_ring_ctx *ctx)
 		struct io_defer_entry *de = list_first_entry(&ctx->defer_list,
 						struct io_defer_entry, list);
 
-		if (req_need_defer(de->req))
+		if (req_need_defer(de->req, de->seq))
 			break;
 		list_del_init(&de->list);
 		/* punt-init is done before queueing for defer */
@@ -5400,14 +5400,30 @@ static int io_req_defer_prep(struct io_kiocb *req,
 	return ret;
 }
 
+static int io_get_sequence(struct io_kiocb *req)
+{
+	struct io_kiocb *pos;
+	struct io_ring_ctx *ctx = req->ctx;
+	u32 nr_reqs = 1;
+
+	if (req->flags & REQ_F_LINK_HEAD)
+		list_for_each_entry(pos, &req->link_list, link_list)
+			nr_reqs++;
+
+	return ctx->cached_sq_head - ctx->cached_sq_dropped - nr_reqs;
+}
+
+
 static int io_req_defer(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	struct io_defer_entry *de;
 	int ret;
+	u32 seq;
 
 	/* Still need defer if there is pending req in defer list. */
-	if (!req_need_defer(req) && list_empty_careful(&ctx->defer_list))
+	if (likely(list_empty_careful(&ctx->defer_list) &&
+		!(req->flags & REQ_F_IO_DRAIN)))
 		return 0;
 
 	if (!req->io) {
@@ -5422,8 +5438,9 @@ static int io_req_defer(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	if (!de)
 		return -ENOMEM;
 
+	seq = io_get_sequence(req);
 	spin_lock_irq(&ctx->completion_lock);
-	if (!req_need_defer(req) && list_empty(&ctx->defer_list)) {
+	if (!req_need_defer(req, seq) && list_empty(&ctx->defer_list)) {
 		spin_unlock_irq(&ctx->completion_lock);
 		kfree(de);
 		return 0;
@@ -5431,6 +5448,7 @@ static int io_req_defer(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 
 	trace_io_uring_defer(ctx, req, req->user_data);
 	de->req = req;
+	de->seq = seq;
 	list_add_tail(&de->list, &ctx->defer_list);
 	spin_unlock_irq(&ctx->completion_lock);
 	return -EIOCBQUEUED;
@@ -6207,12 +6225,6 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
 	unsigned int sqe_flags;
 	int id;
 
-	/*
-	 * All io need record the previous position, if LINK vs DARIN,
-	 * it can be used to mark the position of the first IO in the
-	 * link list.
-	 */
-	req->sequence = ctx->cached_sq_head - ctx->cached_sq_dropped;
 	req->opcode = READ_ONCE(sqe->opcode);
 	req->user_data = READ_ONCE(sqe->user_data);
 	req->io = NULL;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 9/9] io_uring: place cflags into completion data
  2020-07-12  9:41 [RFC 0/9] scrap 24 bytes from io_kiocb Pavel Begunkov
                   ` (7 preceding siblings ...)
  2020-07-12  9:41 ` [PATCH 8/9] io_uring: remove sequence from io_kiocb Pavel Begunkov
@ 2020-07-12  9:41 ` Pavel Begunkov
  2020-07-12 15:59 ` [RFC 0/9] scrap 24 bytes from io_kiocb Jens Axboe
  9 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2020-07-12  9:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring

req->cflags is used only for defer-completion path, just use
completion data to store it. With the 4 bytes from the ->sequence
patch and compacting io_kiocb, this frees 8 bytes.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index db7f86b6da09..08af9abe69e3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -492,6 +492,7 @@ struct io_statx {
 struct io_completion {
 	struct file			*file;
 	struct list_head		list;
+	int				cflags;
 };
 
 struct io_async_connect {
@@ -633,7 +634,6 @@ struct io_kiocb {
 	};
 
 	struct io_async_ctx		*io;
-	int				cflags;
 	u8				opcode;
 	/* polled IO has completed */
 	u8				iopoll_completed;
@@ -1347,7 +1347,7 @@ static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
 		if (cqe) {
 			WRITE_ONCE(cqe->user_data, req->user_data);
 			WRITE_ONCE(cqe->res, req->result);
-			WRITE_ONCE(cqe->flags, req->cflags);
+			WRITE_ONCE(cqe->flags, req->compl.cflags);
 		} else {
 			WRITE_ONCE(ctx->rings->cq_overflow,
 				atomic_inc_return(&ctx->cached_cq_overflow));
@@ -1402,7 +1402,7 @@ static void __io_cqring_fill_event(struct io_kiocb *req, long res, long cflags)
 		req->flags |= REQ_F_OVERFLOW;
 		refcount_inc(&req->refs);
 		req->result = res;
-		req->cflags = cflags;
+		req->compl.cflags = cflags;
 		list_add_tail(&req->compl.list, &ctx->cq_overflow_list);
 	}
 }
@@ -1435,7 +1435,7 @@ static void io_submit_flush_completions(struct io_comp_state *cs)
 
 		req = list_first_entry(&cs->list, struct io_kiocb, compl.list);
 		list_del(&req->compl.list);
-		__io_cqring_fill_event(req, req->result, req->cflags);
+		__io_cqring_fill_event(req, req->result, req->compl.cflags);
 		if (!(req->flags & REQ_F_LINK_HEAD)) {
 			req->flags |= REQ_F_COMP_LOCKED;
 			io_put_req(req);
@@ -1463,7 +1463,7 @@ static void __io_req_complete(struct io_kiocb *req, long res, unsigned cflags,
 			io_cleanup_req(req);
 
 		req->result = res;
-		req->cflags = cflags;
+		req->compl.cflags = cflags;
 		list_add_tail(&req->compl.list, &cs->list);
 		if (++cs->nr >= 32)
 			io_submit_flush_completions(cs);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC 0/9] scrap 24 bytes from io_kiocb
  2020-07-12  9:41 [RFC 0/9] scrap 24 bytes from io_kiocb Pavel Begunkov
                   ` (8 preceding siblings ...)
  2020-07-12  9:41 ` [PATCH 9/9] io_uring: place cflags into completion data Pavel Begunkov
@ 2020-07-12 15:59 ` Jens Axboe
  2020-07-12 17:34   ` Pavel Begunkov
  9 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2020-07-12 15:59 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 7/12/20 3:41 AM, Pavel Begunkov wrote:
> Make io_kiocb slimmer by 24 bytes mainly by revising lists usage. The
> drawback is adding extra kmalloc in draining path, but that's a slow
> path, so meh. It also frees some space for the deferred completion path
> if would be needed in the future, but the main idea here is to shrink it
> to 3 cachelines in the end.
> 
> I'm not happy yet with a few details, so that's not final, but it would
> be lovely to hear some feedback.

I think it looks pretty good, most of the changes are straight forward.
Adding a completion entry that shares the submit space is a good idea,
and really helps bring it together.

From a quick look, the only part I'm not super crazy about is patch #3.
I'd probably rather use a generic list name and not unionize the tw
lists.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 0/9] scrap 24 bytes from io_kiocb
  2020-07-12 15:59 ` [RFC 0/9] scrap 24 bytes from io_kiocb Jens Axboe
@ 2020-07-12 17:34   ` Pavel Begunkov
  2020-07-12 20:32     ` Jens Axboe
  0 siblings, 1 reply; 18+ messages in thread
From: Pavel Begunkov @ 2020-07-12 17:34 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 12/07/2020 18:59, Jens Axboe wrote:
> On 7/12/20 3:41 AM, Pavel Begunkov wrote:
>> Make io_kiocb slimmer by 24 bytes mainly by revising lists usage. The
>> drawback is adding extra kmalloc in draining path, but that's a slow
>> path, so meh. It also frees some space for the deferred completion path
>> if would be needed in the future, but the main idea here is to shrink it
>> to 3 cachelines in the end.
>>
>> I'm not happy yet with a few details, so that's not final, but it would
>> be lovely to hear some feedback.
> 
> I think it looks pretty good, most of the changes are straight forward.
> Adding a completion entry that shares the submit space is a good idea,
> and really helps bring it together.
> 
> From a quick look, the only part I'm not super crazy about is patch #3.

Thanks!

> I'd probably rather use a generic list name and not unionize the tw
> lists.

I don't care much, but without compiler's help always have troubles
finding and distinguishing something as generic as "list".

BTW, I thought out how to bring it down to 3 cache lines, but that would
require taking io_wq_work out of io_kiocb and kmalloc'ing it on demand.
And there should also be a bunch of nice side effects like improving apoll.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 0/9] scrap 24 bytes from io_kiocb
  2020-07-12 17:34   ` Pavel Begunkov
@ 2020-07-12 20:32     ` Jens Axboe
  2020-07-13  8:17       ` Pavel Begunkov
  2020-07-13  8:17       ` Pavel Begunkov
  0 siblings, 2 replies; 18+ messages in thread
From: Jens Axboe @ 2020-07-12 20:32 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 7/12/20 11:34 AM, Pavel Begunkov wrote:
> On 12/07/2020 18:59, Jens Axboe wrote:
>> On 7/12/20 3:41 AM, Pavel Begunkov wrote:
>>> Make io_kiocb slimmer by 24 bytes mainly by revising lists usage. The
>>> drawback is adding extra kmalloc in draining path, but that's a slow
>>> path, so meh. It also frees some space for the deferred completion path
>>> if would be needed in the future, but the main idea here is to shrink it
>>> to 3 cachelines in the end.
>>>
>>> I'm not happy yet with a few details, so that's not final, but it would
>>> be lovely to hear some feedback.
>>
>> I think it looks pretty good, most of the changes are straight forward.
>> Adding a completion entry that shares the submit space is a good idea,
>> and really helps bring it together.
>>
>> From a quick look, the only part I'm not super crazy about is patch #3.
> 
> Thanks!
> 
>> I'd probably rather use a generic list name and not unionize the tw
>> lists.
> 
> I don't care much, but without compiler's help always have troubles
> finding and distinguishing something as generic as "list".

To me, it's easier to verify that we're doing the right thing when they
use the same list member. Otherwise you have to cross reference two
different names, easier to shoot yourself in the foot that way. So I'd
prefer just retaining it as 'list' or something generic.

> BTW, I thought out how to bring it down to 3 cache lines, but that would
> require taking io_wq_work out of io_kiocb and kmalloc'ing it on demand.
> And there should also be a bunch of nice side effects like improving apoll.

How would this work with the current use of io_wq_work as storage for
whatever bits we're hanging on to? I guess it could work with a prep
series first more cleanly separating it, though I do feel like we've
been getting closer to that already.

Definitely always interested in shrinking io_kiocb, just need to keep
an eye out for the fast(er) paths not needing two allocations (and
frees) for a single request.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 0/9] scrap 24 bytes from io_kiocb
  2020-07-12 20:32     ` Jens Axboe
@ 2020-07-13  8:17       ` Pavel Begunkov
  2020-07-13  8:17       ` Pavel Begunkov
  1 sibling, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2020-07-13  8:17 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 12/07/2020 23:32, Jens Axboe wrote:
> On 7/12/20 11:34 AM, Pavel Begunkov wrote:
>> On 12/07/2020 18:59, Jens Axboe wrote:
>>> On 7/12/20 3:41 AM, Pavel Begunkov wrote:
>>>> Make io_kiocb slimmer by 24 bytes mainly by revising lists usage. The
>>>> drawback is adding extra kmalloc in draining path, but that's a slow
>>>> path, so meh. It also frees some space for the deferred completion path
>>>> if would be needed in the future, but the main idea here is to shrink it
>>>> to 3 cachelines in the end.
>>>>
>>>> I'm not happy yet with a few details, so that's not final, but it would
>>>> be lovely to hear some feedback.
>>>
>>> I think it looks pretty good, most of the changes are straight forward.
>>> Adding a completion entry that shares the submit space is a good idea,
>>> and really helps bring it together.
>>>
>>> From a quick look, the only part I'm not super crazy about is patch #3.
>>
>> Thanks!
>>
>>> I'd probably rather use a generic list name and not unionize the tw
>>> lists.
>>
>> I don't care much, but without compiler's help always have troubles
>> finding and distinguishing something as generic as "list".
> 
> To me, it's easier to verify that we're doing the right thing when they
> use the same list member. Otherwise you have to cross reference two
> different names, easier to shoot yourself in the foot that way. So I'd
> prefer just retaining it as 'list' or something generic.

If you don't have objections, I'll just leave it "inflight_entry". This
one is easy to grep.

>> BTW, I thought out how to bring it down to 3 cache lines, but that would
>> require taking io_wq_work out of io_kiocb and kmalloc'ing it on demand.
>> And there should also be a bunch of nice side effects like improving apoll.
> 
> How would this work with the current use of io_wq_work as storage for
> whatever bits we're hanging on to? I guess it could work with a prep
> series first more cleanly separating it, though I do feel like we've
> been getting closer to that already.

It's definitely not a single patch. I'm going to prepare a series for
discussion later, and then we'll see whether it worths it.


> Definitely always interested in shrinking io_kiocb, just need to keep
> an eye out for the fast(er) paths not needing two allocations (and
> frees) for a single request.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 0/9] scrap 24 bytes from io_kiocb
  2020-07-12 20:32     ` Jens Axboe
  2020-07-13  8:17       ` Pavel Begunkov
@ 2020-07-13  8:17       ` Pavel Begunkov
  2020-07-13 14:12         ` Jens Axboe
  1 sibling, 1 reply; 18+ messages in thread
From: Pavel Begunkov @ 2020-07-13  8:17 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 12/07/2020 23:32, Jens Axboe wrote:
> On 7/12/20 11:34 AM, Pavel Begunkov wrote:
>> On 12/07/2020 18:59, Jens Axboe wrote:
>>> On 7/12/20 3:41 AM, Pavel Begunkov wrote:
>>>> Make io_kiocb slimmer by 24 bytes mainly by revising lists usage. The
>>>> drawback is adding extra kmalloc in draining path, but that's a slow
>>>> path, so meh. It also frees some space for the deferred completion path
>>>> if would be needed in the future, but the main idea here is to shrink it
>>>> to 3 cachelines in the end.
>>>>
>>>> I'm not happy yet with a few details, so that's not final, but it would
>>>> be lovely to hear some feedback.
>>>
>>> I think it looks pretty good, most of the changes are straight forward.
>>> Adding a completion entry that shares the submit space is a good idea,
>>> and really helps bring it together.
>>>
>>> From a quick look, the only part I'm not super crazy about is patch #3.
>>
>> Thanks!
>>
>>> I'd probably rather use a generic list name and not unionize the tw
>>> lists.
>>
>> I don't care much, but without compiler's help always have troubles
>> finding and distinguishing something as generic as "list".
> 
> To me, it's easier to verify that we're doing the right thing when they
> use the same list member. Otherwise you have to cross reference two
> different names, easier to shoot yourself in the foot that way. So I'd
> prefer just retaining it as 'list' or something generic.

If you don't have objections, I'll just leave it "inflight_entry". This
one is easy to grep.

>> BTW, I thought out how to bring it down to 3 cache lines, but that would
>> require taking io_wq_work out of io_kiocb and kmalloc'ing it on demand.
>> And there should also be a bunch of nice side effects like improving apoll.
> 
> How would this work with the current use of io_wq_work as storage for
> whatever bits we're hanging on to? I guess it could work with a prep
> series first more cleanly separating it, though I do feel like we've
> been getting closer to that already.

It's definitely not a single patch. I'm going to prepare a series for
discussion later, and then we'll see whether it worth it.


> Definitely always interested in shrinking io_kiocb, just need to keep
> an eye out for the fast(er) paths not needing two allocations (and
> frees) for a single request.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 0/9] scrap 24 bytes from io_kiocb
  2020-07-13  8:17       ` Pavel Begunkov
@ 2020-07-13 14:12         ` Jens Axboe
  2020-07-13 20:45           ` Pavel Begunkov
  0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2020-07-13 14:12 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 7/13/20 2:17 AM, Pavel Begunkov wrote:
> On 12/07/2020 23:32, Jens Axboe wrote:
>> On 7/12/20 11:34 AM, Pavel Begunkov wrote:
>>> On 12/07/2020 18:59, Jens Axboe wrote:
>>>> On 7/12/20 3:41 AM, Pavel Begunkov wrote:
>>>>> Make io_kiocb slimmer by 24 bytes mainly by revising lists usage. The
>>>>> drawback is adding extra kmalloc in draining path, but that's a slow
>>>>> path, so meh. It also frees some space for the deferred completion path
>>>>> if would be needed in the future, but the main idea here is to shrink it
>>>>> to 3 cachelines in the end.
>>>>>
>>>>> I'm not happy yet with a few details, so that's not final, but it would
>>>>> be lovely to hear some feedback.
>>>>
>>>> I think it looks pretty good, most of the changes are straight forward.
>>>> Adding a completion entry that shares the submit space is a good idea,
>>>> and really helps bring it together.
>>>>
>>>> From a quick look, the only part I'm not super crazy about is patch #3.
>>>
>>> Thanks!
>>>
>>>> I'd probably rather use a generic list name and not unionize the tw
>>>> lists.
>>>
>>> I don't care much, but without compiler's help always have troubles
>>> finding and distinguishing something as generic as "list".
>>
>> To me, it's easier to verify that we're doing the right thing when they
>> use the same list member. Otherwise you have to cross reference two
>> different names, easier to shoot yourself in the foot that way. So I'd
>> prefer just retaining it as 'list' or something generic.
> 
> If you don't have objections, I'll just leave it "inflight_entry". This
> one is easy to grep.

Sure, don't have strong feelings on the actual name.

>>> BTW, I thought out how to bring it down to 3 cache lines, but that would
>>> require taking io_wq_work out of io_kiocb and kmalloc'ing it on demand.
>>> And there should also be a bunch of nice side effects like improving apoll.
>>
>> How would this work with the current use of io_wq_work as storage for
>> whatever bits we're hanging on to? I guess it could work with a prep
>> series first more cleanly separating it, though I do feel like we've
>> been getting closer to that already.
> 
> It's definitely not a single patch. I'm going to prepare a series for
> discussion later, and then we'll see whether it worth it.

Definitely not. Let's flesh this one out first, then we can move on.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 0/9] scrap 24 bytes from io_kiocb
  2020-07-13 14:12         ` Jens Axboe
@ 2020-07-13 20:45           ` Pavel Begunkov
  2020-07-13 21:00             ` Jens Axboe
  0 siblings, 1 reply; 18+ messages in thread
From: Pavel Begunkov @ 2020-07-13 20:45 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 13/07/2020 17:12, Jens Axboe wrote:
> On 7/13/20 2:17 AM, Pavel Begunkov wrote:
>> On 12/07/2020 23:32, Jens Axboe wrote:
>>> On 7/12/20 11:34 AM, Pavel Begunkov wrote:
>>>> On 12/07/2020 18:59, Jens Axboe wrote:
>>>>> On 7/12/20 3:41 AM, Pavel Begunkov wrote:
>>>>>> Make io_kiocb slimmer by 24 bytes mainly by revising lists usage. The
>>>>>> drawback is adding extra kmalloc in draining path, but that's a slow
>>>>>> path, so meh. It also frees some space for the deferred completion path
>>>>>> if would be needed in the future, but the main idea here is to shrink it
>>>>>> to 3 cachelines in the end.
>>>>>>
>>>>>> I'm not happy yet with a few details, so that's not final, but it would
>>>>>> be lovely to hear some feedback.
>>>>>
>>>>> I think it looks pretty good, most of the changes are straight forward.
>>>>> Adding a completion entry that shares the submit space is a good idea,
>>>>> and really helps bring it together.
>>>>>
>>>>> From a quick look, the only part I'm not super crazy about is patch #3.
>>>>
>>>> Thanks!
>>>>
>>>>> I'd probably rather use a generic list name and not unionize the tw
>>>>> lists.
>>>>
>>>> I don't care much, but without compiler's help always have troubles
>>>> finding and distinguishing something as generic as "list".
>>>
>>> To me, it's easier to verify that we're doing the right thing when they
>>> use the same list member. Otherwise you have to cross reference two
>>> different names, easier to shoot yourself in the foot that way. So I'd
>>> prefer just retaining it as 'list' or something generic.
>>
>> If you don't have objections, I'll just leave it "inflight_entry". This
>> one is easy to grep.
> 
> Sure, don't have strong feelings on the actual name.
> 
>>>> BTW, I thought out how to bring it down to 3 cache lines, but that would
>>>> require taking io_wq_work out of io_kiocb and kmalloc'ing it on demand.
>>>> And there should also be a bunch of nice side effects like improving apoll.
>>>
>>> How would this work with the current use of io_wq_work as storage for
>>> whatever bits we're hanging on to? I guess it could work with a prep
>>> series first more cleanly separating it, though I do feel like we've
>>> been getting closer to that already.
>>
>> It's definitely not a single patch. I'm going to prepare a series for
>> discussion later, and then we'll see whether it worth it.
> 
> Definitely not. Let's flesh this one out first, then we can move on.

But not a lot of work either.
I've got a bit lost, do you mean to flesh out the idea or this
"loose 24 bytes" series?

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 0/9] scrap 24 bytes from io_kiocb
  2020-07-13 20:45           ` Pavel Begunkov
@ 2020-07-13 21:00             ` Jens Axboe
  0 siblings, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2020-07-13 21:00 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 7/13/20 2:45 PM, Pavel Begunkov wrote:
> On 13/07/2020 17:12, Jens Axboe wrote:
>> On 7/13/20 2:17 AM, Pavel Begunkov wrote:
>>> On 12/07/2020 23:32, Jens Axboe wrote:
>>>> On 7/12/20 11:34 AM, Pavel Begunkov wrote:
>>>>> On 12/07/2020 18:59, Jens Axboe wrote:
>>>>>> On 7/12/20 3:41 AM, Pavel Begunkov wrote:
>>>>>>> Make io_kiocb slimmer by 24 bytes mainly by revising lists usage. The
>>>>>>> drawback is adding extra kmalloc in draining path, but that's a slow
>>>>>>> path, so meh. It also frees some space for the deferred completion path
>>>>>>> if would be needed in the future, but the main idea here is to shrink it
>>>>>>> to 3 cachelines in the end.
>>>>>>>
>>>>>>> I'm not happy yet with a few details, so that's not final, but it would
>>>>>>> be lovely to hear some feedback.
>>>>>>
>>>>>> I think it looks pretty good, most of the changes are straight forward.
>>>>>> Adding a completion entry that shares the submit space is a good idea,
>>>>>> and really helps bring it together.
>>>>>>
>>>>>> From a quick look, the only part I'm not super crazy about is patch #3.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>> I'd probably rather use a generic list name and not unionize the tw
>>>>>> lists.
>>>>>
>>>>> I don't care much, but without compiler's help always have troubles
>>>>> finding and distinguishing something as generic as "list".
>>>>
>>>> To me, it's easier to verify that we're doing the right thing when they
>>>> use the same list member. Otherwise you have to cross reference two
>>>> different names, easier to shoot yourself in the foot that way. So I'd
>>>> prefer just retaining it as 'list' or something generic.
>>>
>>> If you don't have objections, I'll just leave it "inflight_entry". This
>>> one is easy to grep.
>>
>> Sure, don't have strong feelings on the actual name.
>>
>>>>> BTW, I thought out how to bring it down to 3 cache lines, but that would
>>>>> require taking io_wq_work out of io_kiocb and kmalloc'ing it on demand.
>>>>> And there should also be a bunch of nice side effects like improving apoll.
>>>>
>>>> How would this work with the current use of io_wq_work as storage for
>>>> whatever bits we're hanging on to? I guess it could work with a prep
>>>> series first more cleanly separating it, though I do feel like we've
>>>> been getting closer to that already.
>>>
>>> It's definitely not a single patch. I'm going to prepare a series for
>>> discussion later, and then we'll see whether it worth it.
>>
>> Definitely not. Let's flesh this one out first, then we can move on.
> 
> But not a lot of work either.

Great

> I've got a bit lost, do you mean to flesh out the idea or this
> "loose 24 bytes" series?

The latter, but I'm already looking over your v2, so I guess that's
taken care of.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-07-13 21:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-07-12  9:41 [RFC 0/9] scrap 24 bytes from io_kiocb Pavel Begunkov
2020-07-12  9:41 ` [PATCH 1/9] io_uring: share completion list w/ per-op space Pavel Begunkov
2020-07-12  9:41 ` [PATCH 2/9] io_uring: rename ctx->poll into ctx->iopoll Pavel Begunkov
2020-07-12  9:41 ` [PATCH 3/9] io_uring: use inflight_entry list for iopolling Pavel Begunkov
2020-07-12  9:41 ` [PATCH 4/9] io_uring: use competion list for CQ overflow Pavel Begunkov
2020-07-12  9:41 ` [PATCH 5/9] io_uring: add req->timeout.list Pavel Begunkov
2020-07-12  9:41 ` [PATCH 6/9] io_uring: remove init for unused list Pavel Begunkov
2020-07-12  9:41 ` [PATCH 7/9] io_uring: kill rq->list and allocate it on demand Pavel Begunkov
2020-07-12  9:41 ` [PATCH 8/9] io_uring: remove sequence from io_kiocb Pavel Begunkov
2020-07-12  9:41 ` [PATCH 9/9] io_uring: place cflags into completion data Pavel Begunkov
2020-07-12 15:59 ` [RFC 0/9] scrap 24 bytes from io_kiocb Jens Axboe
2020-07-12 17:34   ` Pavel Begunkov
2020-07-12 20:32     ` Jens Axboe
2020-07-13  8:17       ` Pavel Begunkov
2020-07-13  8:17       ` Pavel Begunkov
2020-07-13 14:12         ` Jens Axboe
2020-07-13 20:45           ` Pavel Begunkov
2020-07-13 21:00             ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox