* [PATCH for-next v2 01/25] io_uring: make reg buf init consistent
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
@ 2022-06-14 14:36 ` Pavel Begunkov
2022-06-14 14:36 ` [PATCH for-next v2 02/25] io_uring: move defer_list to slow data Pavel Begunkov
` (23 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
The default (i.e. empty) state of register buffer is dummy_ubuf, so set
it to dummy on init instead of NULL.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/rsrc.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index fef46972c327..fd1323482030 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -567,7 +567,7 @@ static int __io_sqe_buffers_update(struct io_ring_ctx *ctx,
io_buffer_unmap(ctx, &imu);
break;
}
- ctx->user_bufs[i] = NULL;
+ ctx->user_bufs[i] = ctx->dummy_ubuf;
needs_switch = true;
}
@@ -1200,14 +1200,11 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov,
size_t size;
int ret, nr_pages, i;
- if (!iov->iov_base) {
- *pimu = ctx->dummy_ubuf;
+ *pimu = ctx->dummy_ubuf;
+ if (!iov->iov_base)
return 0;
- }
- *pimu = NULL;
ret = -ENOMEM;
-
pages = io_pin_pages((unsigned long) iov->iov_base, iov->iov_len,
&nr_pages);
if (IS_ERR(pages)) {
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH for-next v2 02/25] io_uring: move defer_list to slow data
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
2022-06-14 14:36 ` [PATCH for-next v2 01/25] io_uring: make reg buf init consistent Pavel Begunkov
@ 2022-06-14 14:36 ` Pavel Begunkov
2022-06-14 14:36 ` [PATCH for-next v2 03/25] io_uring: better caching for ctx timeout fields Pavel Begunkov
` (22 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
draining is slow path, move defer_list to the end where slow data lives
inside the context.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring_types.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index 7c22cf35a7e2..52e91c3df8d5 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -160,7 +160,6 @@ struct io_ring_ctx {
struct io_uring_sqe *sq_sqes;
unsigned cached_sq_head;
unsigned sq_entries;
- struct list_head defer_list;
/*
* Fixed resources fast path, should be accessed only under
@@ -272,8 +271,12 @@ struct io_ring_ctx {
struct work_struct exit_work;
struct list_head tctx_list;
struct completion ref_comp;
+
+ /* io-wq management, e.g. thread count */
u32 iowq_limits[2];
bool iowq_limits_set;
+
+ struct list_head defer_list;
};
};
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH for-next v2 03/25] io_uring: better caching for ctx timeout fields
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
2022-06-14 14:36 ` [PATCH for-next v2 01/25] io_uring: make reg buf init consistent Pavel Begunkov
2022-06-14 14:36 ` [PATCH for-next v2 02/25] io_uring: move defer_list to slow data Pavel Begunkov
@ 2022-06-14 14:36 ` Pavel Begunkov
2022-06-14 14:36 ` [PATCH for-next v2 04/25] io_uring: refactor ctx slow data placement Pavel Begunkov
` (21 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Following timeout fields access patterns, move all of them into a
separate cache line inside ctx, so they don't intervene with normal
completion caching, especially since timeout removals and completion
are separated and the later is done via tw.
It also sheds some bytes from io_ring_ctx, 1216B -> 1152B
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring_types.h | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index 52e91c3df8d5..4f52dcbbda56 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -179,8 +179,6 @@ struct io_ring_ctx {
struct xarray io_bl_xa;
struct list_head io_buffers_cache;
- struct list_head timeout_list;
- struct list_head ltimeout_list;
struct list_head cq_overflow_list;
struct list_head apoll_cache;
struct xarray personalities;
@@ -213,15 +211,11 @@ struct io_ring_ctx {
struct io_ev_fd __rcu *io_ev_fd;
struct wait_queue_head cq_wait;
unsigned cq_extra;
- atomic_t cq_timeouts;
- unsigned cq_last_tm_flush;
} ____cacheline_aligned_in_smp;
struct {
spinlock_t completion_lock;
- spinlock_t timeout_lock;
-
/*
* ->iopoll_list is protected by the ctx->uring_lock for
* io_uring instances that don't use IORING_SETUP_SQPOLL.
@@ -253,6 +247,15 @@ struct io_ring_ctx {
struct list_head io_buffers_pages;
};
+ /* timeouts */
+ struct {
+ spinlock_t timeout_lock;
+ atomic_t cq_timeouts;
+ struct list_head timeout_list;
+ struct list_head ltimeout_list;
+ unsigned cq_last_tm_flush;
+ } ____cacheline_aligned_in_smp;
+
/* Keep this last, we don't need it for the fast path */
struct {
#if defined(CONFIG_UNIX)
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH for-next v2 04/25] io_uring: refactor ctx slow data placement
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (2 preceding siblings ...)
2022-06-14 14:36 ` [PATCH for-next v2 03/25] io_uring: better caching for ctx timeout fields Pavel Begunkov
@ 2022-06-14 14:36 ` Pavel Begunkov
2022-06-15 7:58 ` Hao Xu
2022-06-14 14:36 ` [PATCH for-next v2 05/25] io_uring: move small helpers to headers Pavel Begunkov
` (20 subsequent siblings)
24 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Shove all slow path data at the end of ctx and get rid of extra
indention.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring_types.h | 81 +++++++++++++++++++--------------------
1 file changed, 39 insertions(+), 42 deletions(-)
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index 4f52dcbbda56..ca8e25992ece 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -183,7 +183,6 @@ struct io_ring_ctx {
struct list_head apoll_cache;
struct xarray personalities;
u32 pers_next;
- unsigned sq_thread_idle;
} ____cacheline_aligned_in_smp;
/* IRQ completion list, under ->completion_lock */
@@ -230,23 +229,6 @@ struct io_ring_ctx {
struct list_head io_buffers_comp;
} ____cacheline_aligned_in_smp;
- struct io_restriction restrictions;
-
- /* slow path rsrc auxilary data, used by update/register */
- struct {
- struct io_rsrc_node *rsrc_backup_node;
- struct io_mapped_ubuf *dummy_ubuf;
- struct io_rsrc_data *file_data;
- struct io_rsrc_data *buf_data;
-
- struct delayed_work rsrc_put_work;
- struct llist_head rsrc_put_llist;
- struct list_head rsrc_ref_list;
- spinlock_t rsrc_ref_lock;
-
- struct list_head io_buffers_pages;
- };
-
/* timeouts */
struct {
spinlock_t timeout_lock;
@@ -257,30 +239,45 @@ struct io_ring_ctx {
} ____cacheline_aligned_in_smp;
/* Keep this last, we don't need it for the fast path */
- struct {
- #if defined(CONFIG_UNIX)
- struct socket *ring_sock;
- #endif
- /* hashed buffered write serialization */
- struct io_wq_hash *hash_map;
-
- /* Only used for accounting purposes */
- struct user_struct *user;
- struct mm_struct *mm_account;
-
- /* ctx exit and cancelation */
- struct llist_head fallback_llist;
- struct delayed_work fallback_work;
- struct work_struct exit_work;
- struct list_head tctx_list;
- struct completion ref_comp;
-
- /* io-wq management, e.g. thread count */
- u32 iowq_limits[2];
- bool iowq_limits_set;
-
- struct list_head defer_list;
- };
+
+ struct io_restriction restrictions;
+
+ /* slow path rsrc auxilary data, used by update/register */
+ struct io_rsrc_node *rsrc_backup_node;
+ struct io_mapped_ubuf *dummy_ubuf;
+ struct io_rsrc_data *file_data;
+ struct io_rsrc_data *buf_data;
+
+ struct delayed_work rsrc_put_work;
+ struct llist_head rsrc_put_llist;
+ struct list_head rsrc_ref_list;
+ spinlock_t rsrc_ref_lock;
+
+ struct list_head io_buffers_pages;
+
+ #if defined(CONFIG_UNIX)
+ struct socket *ring_sock;
+ #endif
+ /* hashed buffered write serialization */
+ struct io_wq_hash *hash_map;
+
+ /* Only used for accounting purposes */
+ struct user_struct *user;
+ struct mm_struct *mm_account;
+
+ /* ctx exit and cancelation */
+ struct llist_head fallback_llist;
+ struct delayed_work fallback_work;
+ struct work_struct exit_work;
+ struct list_head tctx_list;
+ struct completion ref_comp;
+
+ /* io-wq management, e.g. thread count */
+ u32 iowq_limits[2];
+ bool iowq_limits_set;
+
+ struct list_head defer_list;
+ unsigned sq_thread_idle;
};
enum {
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH for-next v2 04/25] io_uring: refactor ctx slow data placement
2022-06-14 14:36 ` [PATCH for-next v2 04/25] io_uring: refactor ctx slow data placement Pavel Begunkov
@ 2022-06-15 7:58 ` Hao Xu
2022-06-15 10:11 ` Pavel Begunkov
0 siblings, 1 reply; 44+ messages in thread
From: Hao Xu @ 2022-06-15 7:58 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Jens Axboe
On 6/14/22 22:36, Pavel Begunkov wrote:
> Shove all slow path data at the end of ctx and get rid of extra
> indention.
>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> io_uring/io_uring_types.h | 81 +++++++++++++++++++--------------------
> 1 file changed, 39 insertions(+), 42 deletions(-)
>
> diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
> index 4f52dcbbda56..ca8e25992ece 100644
> --- a/io_uring/io_uring_types.h
> +++ b/io_uring/io_uring_types.h
> @@ -183,7 +183,6 @@ struct io_ring_ctx {
> struct list_head apoll_cache;
> struct xarray personalities;
> u32 pers_next;
> - unsigned sq_thread_idle;
SQPOLL is seen as a slow path?
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH for-next v2 04/25] io_uring: refactor ctx slow data placement
2022-06-15 7:58 ` Hao Xu
@ 2022-06-15 10:11 ` Pavel Begunkov
2022-06-15 10:59 ` Hao Xu
0 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-15 10:11 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Jens Axboe
On 6/15/22 08:58, Hao Xu wrote:
> On 6/14/22 22:36, Pavel Begunkov wrote:
>> Shove all slow path data at the end of ctx and get rid of extra
>> indention.
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>> io_uring/io_uring_types.h | 81 +++++++++++++++++++--------------------
>> 1 file changed, 39 insertions(+), 42 deletions(-)
>>
>> diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
>> index 4f52dcbbda56..ca8e25992ece 100644
>> --- a/io_uring/io_uring_types.h
>> +++ b/io_uring/io_uring_types.h
>> @@ -183,7 +183,6 @@ struct io_ring_ctx {
>> struct list_head apoll_cache;
>> struct xarray personalities;
>> u32 pers_next;
>> - unsigned sq_thread_idle;
>
> SQPOLL is seen as a slow path?
SQPOLL is not a slow path, but struct io_ring_ctx::sq_thread_idle
definitely is. Perhaps, you mixed it up with
struct io_sq_data::sq_thread_idle
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH for-next v2 04/25] io_uring: refactor ctx slow data placement
2022-06-15 10:11 ` Pavel Begunkov
@ 2022-06-15 10:59 ` Hao Xu
0 siblings, 0 replies; 44+ messages in thread
From: Hao Xu @ 2022-06-15 10:59 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Jens Axboe
On 6/15/22 18:11, Pavel Begunkov wrote:
> On 6/15/22 08:58, Hao Xu wrote:
>> On 6/14/22 22:36, Pavel Begunkov wrote:
>>> Shove all slow path data at the end of ctx and get rid of extra
>>> indention.
>>>
>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>> ---
>>> io_uring/io_uring_types.h | 81 +++++++++++++++++++--------------------
>>> 1 file changed, 39 insertions(+), 42 deletions(-)
>>>
>>> diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
>>> index 4f52dcbbda56..ca8e25992ece 100644
>>> --- a/io_uring/io_uring_types.h
>>> +++ b/io_uring/io_uring_types.h
>>> @@ -183,7 +183,6 @@ struct io_ring_ctx {
>>> struct list_head apoll_cache;
>>> struct xarray personalities;
>>> u32 pers_next;
>>> - unsigned sq_thread_idle;
>>
>> SQPOLL is seen as a slow path?
>
> SQPOLL is not a slow path, but struct io_ring_ctx::sq_thread_idle
> definitely is. Perhaps, you mixed it up with
> struct io_sq_data::sq_thread_idle
>
Indeed, thanks.
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH for-next v2 05/25] io_uring: move small helpers to headers
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (3 preceding siblings ...)
2022-06-14 14:36 ` [PATCH for-next v2 04/25] io_uring: refactor ctx slow data placement Pavel Begunkov
@ 2022-06-14 14:36 ` Pavel Begunkov
2022-06-14 14:36 ` [PATCH for-next v2 06/25] io_uring: explain io_wq_work::cancel_seq placement Pavel Begunkov
` (19 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
There is a bunch of inline helpers that will be useful not only to the
core of io_uring, move them to headers.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring.c | 22 ----------------------
io_uring/io_uring.h | 22 ++++++++++++++++++++++
2 files changed, 22 insertions(+), 22 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 6a94d1682aaf..3fdb368820c9 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -616,14 +616,6 @@ struct sock *io_uring_get_socket(struct file *file)
}
EXPORT_SYMBOL(io_uring_get_socket);
-static inline void io_tw_lock(struct io_ring_ctx *ctx, bool *locked)
-{
- if (!*locked) {
- mutex_lock(&ctx->uring_lock);
- *locked = true;
- }
-}
-
static inline void io_submit_flush_completions(struct io_ring_ctx *ctx)
{
if (!wq_list_empty(&ctx->submit_state.compl_reqs))
@@ -879,15 +871,6 @@ static void io_prep_async_link(struct io_kiocb *req)
}
}
-static inline void io_req_add_compl_list(struct io_kiocb *req)
-{
- struct io_submit_state *state = &req->ctx->submit_state;
-
- if (!(req->flags & REQ_F_CQE_SKIP))
- state->flush_cqes = true;
- wq_list_add_tail(&req->comp_list, &state->compl_reqs);
-}
-
void io_queue_iowq(struct io_kiocb *req, bool *dont_use)
{
struct io_kiocb *link = io_prep_linked_timeout(req);
@@ -1293,11 +1276,6 @@ static void io_req_complete_post32(struct io_kiocb *req, u64 extra1, u64 extra2)
io_cqring_ev_posted(ctx);
}
-static inline void io_req_complete_state(struct io_kiocb *req)
-{
- req->flags |= REQ_F_COMPLETE_INLINE;
-}
-
inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags)
{
if (issue_flags & IO_URING_F_COMPLETE_DEFER)
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 3660df80e589..26b669746d61 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -193,6 +193,28 @@ static inline bool io_run_task_work(void)
return false;
}
+static inline void io_req_complete_state(struct io_kiocb *req)
+{
+ req->flags |= REQ_F_COMPLETE_INLINE;
+}
+
+static inline void io_tw_lock(struct io_ring_ctx *ctx, bool *locked)
+{
+ if (!*locked) {
+ mutex_lock(&ctx->uring_lock);
+ *locked = true;
+ }
+}
+
+static inline void io_req_add_compl_list(struct io_kiocb *req)
+{
+ struct io_submit_state *state = &req->ctx->submit_state;
+
+ if (!(req->flags & REQ_F_CQE_SKIP))
+ state->flush_cqes = true;
+ wq_list_add_tail(&req->comp_list, &state->compl_reqs);
+}
+
int io_run_task_work_sig(void);
void io_req_complete_failed(struct io_kiocb *req, s32 res);
void __io_req_complete32(struct io_kiocb *req, unsigned int issue_flags,
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH for-next v2 06/25] io_uring: explain io_wq_work::cancel_seq placement
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (4 preceding siblings ...)
2022-06-14 14:36 ` [PATCH for-next v2 05/25] io_uring: move small helpers to headers Pavel Begunkov
@ 2022-06-14 14:36 ` Pavel Begunkov
2022-06-14 14:36 ` [PATCH for-next v2 07/25] io_uring: inline ->registered_rings Pavel Begunkov
` (18 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Add a comment on why we keep ->cancel_seq in struct io_wq_work instead
of struct io_kiocb despite it needed only by io_uring but not io-wq.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io-wq.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/io_uring/io-wq.h b/io_uring/io-wq.h
index ba6eee76d028..3f54ee2a8eeb 100644
--- a/io_uring/io-wq.h
+++ b/io_uring/io-wq.h
@@ -155,6 +155,7 @@ struct io_wq_work_node *wq_stack_extract(struct io_wq_work_node *stack)
struct io_wq_work {
struct io_wq_work_node list;
unsigned flags;
+ /* place it here instead of io_kiocb as it fills padding and saves 4B */
int cancel_seq;
};
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH for-next v2 07/25] io_uring: inline ->registered_rings
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (5 preceding siblings ...)
2022-06-14 14:36 ` [PATCH for-next v2 06/25] io_uring: explain io_wq_work::cancel_seq placement Pavel Begunkov
@ 2022-06-14 14:36 ` Pavel Begunkov
2022-06-14 14:36 ` [PATCH for-next v2 08/25] io_uring: don't set REQ_F_COMPLETE_INLINE in tw Pavel Begunkov
` (17 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
There can be only 16 registered rings, no need to allocate an array for
them separately but store it in tctx.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring.c | 1 -
io_uring/tctx.c | 9 ---------
io_uring/tctx.h | 3 ++-
3 files changed, 2 insertions(+), 11 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 3fdb368820c9..dbf0dbc87758 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2840,7 +2840,6 @@ void __io_uring_free(struct task_struct *tsk)
WARN_ON_ONCE(tctx->io_wq);
WARN_ON_ONCE(tctx->cached_refs);
- kfree(tctx->registered_rings);
percpu_counter_destroy(&tctx->inflight);
kfree(tctx);
tsk->io_uring = NULL;
diff --git a/io_uring/tctx.c b/io_uring/tctx.c
index f3262eef55d4..6adf659687f8 100644
--- a/io_uring/tctx.c
+++ b/io_uring/tctx.c
@@ -55,16 +55,8 @@ __cold int io_uring_alloc_task_context(struct task_struct *task,
if (unlikely(!tctx))
return -ENOMEM;
- tctx->registered_rings = kcalloc(IO_RINGFD_REG_MAX,
- sizeof(struct file *), GFP_KERNEL);
- if (unlikely(!tctx->registered_rings)) {
- kfree(tctx);
- return -ENOMEM;
- }
-
ret = percpu_counter_init(&tctx->inflight, 0, GFP_KERNEL);
if (unlikely(ret)) {
- kfree(tctx->registered_rings);
kfree(tctx);
return ret;
}
@@ -73,7 +65,6 @@ __cold int io_uring_alloc_task_context(struct task_struct *task,
if (IS_ERR(tctx->io_wq)) {
ret = PTR_ERR(tctx->io_wq);
percpu_counter_destroy(&tctx->inflight);
- kfree(tctx->registered_rings);
kfree(tctx);
return ret;
}
diff --git a/io_uring/tctx.h b/io_uring/tctx.h
index f4964e40d07e..7684713e950f 100644
--- a/io_uring/tctx.h
+++ b/io_uring/tctx.h
@@ -20,8 +20,9 @@ struct io_uring_task {
struct io_wq_work_list task_list;
struct io_wq_work_list prio_task_list;
struct callback_head task_work;
- struct file **registered_rings;
bool task_running;
+
+ struct file *registered_rings[IO_RINGFD_REG_MAX];
};
struct io_tctx_node {
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH for-next v2 08/25] io_uring: don't set REQ_F_COMPLETE_INLINE in tw
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (6 preceding siblings ...)
2022-06-14 14:36 ` [PATCH for-next v2 07/25] io_uring: inline ->registered_rings Pavel Begunkov
@ 2022-06-14 14:36 ` Pavel Begunkov
2022-06-14 14:36 ` [PATCH for-next v2 09/25] io_uring: never defer-complete multi-apoll Pavel Begunkov
` (16 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
io_req_task_complete() enqueues requests for state completion itself, no
need for REQ_F_COMPLETE_INLINE, which is only serve the purpose of not
bloating the kernel.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index dbf0dbc87758..d895f70977b0 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1864,7 +1864,6 @@ inline void io_req_task_complete(struct io_kiocb *req, bool *locked)
{
if (*locked) {
req->cqe.flags |= io_put_kbuf(req, 0);
- io_req_complete_state(req);
io_req_add_compl_list(req);
} else {
req->cqe.flags |= io_put_kbuf(req, IO_URING_F_UNLOCKED);
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH for-next v2 09/25] io_uring: never defer-complete multi-apoll
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (7 preceding siblings ...)
2022-06-14 14:36 ` [PATCH for-next v2 08/25] io_uring: don't set REQ_F_COMPLETE_INLINE in tw Pavel Begunkov
@ 2022-06-14 14:36 ` Pavel Begunkov
2022-06-15 8:05 ` Hao Xu
2022-06-14 14:37 ` [PATCH for-next v2 10/25] io_uring: kill REQ_F_COMPLETE_INLINE Pavel Begunkov
` (15 subsequent siblings)
24 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:36 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Luckily, nnobody completes multi-apoll requests outside the polling
functions, but don't set IO_URING_F_COMPLETE_DEFER in any case as
there is nobody who is catching REQ_F_COMPLETE_INLINE, and so will leak
requests if used.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index d895f70977b0..1fb93fdcfbab 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2149,7 +2149,7 @@ int io_poll_issue(struct io_kiocb *req, bool *locked)
io_tw_lock(req->ctx, locked);
if (unlikely(req->task->flags & PF_EXITING))
return -EFAULT;
- return io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
+ return io_issue_sqe(req, IO_URING_F_NONBLOCK);
}
struct io_wq_work *io_wq_free_work(struct io_wq_work *work)
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH for-next v2 09/25] io_uring: never defer-complete multi-apoll
2022-06-14 14:36 ` [PATCH for-next v2 09/25] io_uring: never defer-complete multi-apoll Pavel Begunkov
@ 2022-06-15 8:05 ` Hao Xu
0 siblings, 0 replies; 44+ messages in thread
From: Hao Xu @ 2022-06-15 8:05 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Jens Axboe
On 6/14/22 22:36, Pavel Begunkov wrote:
> Luckily, nnobody completes multi-apoll requests outside the polling
> functions, but don't set IO_URING_F_COMPLETE_DEFER in any case as
> there is nobody who is catching REQ_F_COMPLETE_INLINE, and so will leak
> requests if used.
>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> io_uring/io_uring.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index d895f70977b0..1fb93fdcfbab 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -2149,7 +2149,7 @@ int io_poll_issue(struct io_kiocb *req, bool *locked)
> io_tw_lock(req->ctx, locked);
> if (unlikely(req->task->flags & PF_EXITING))
> return -EFAULT;
> - return io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
> + return io_issue_sqe(req, IO_URING_F_NONBLOCK);
> }
>
> struct io_wq_work *io_wq_free_work(struct io_wq_work *work)
Good catch! Thanks.
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH for-next v2 10/25] io_uring: kill REQ_F_COMPLETE_INLINE
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (8 preceding siblings ...)
2022-06-14 14:36 ` [PATCH for-next v2 09/25] io_uring: never defer-complete multi-apoll Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
2022-06-15 8:20 ` Hao Xu
2022-06-14 14:37 ` [PATCH for-next v2 11/25] io_uring: refactor io_req_task_complete() Pavel Begunkov
` (14 subsequent siblings)
24 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
REQ_F_COMPLETE_INLINE is only needed to delay queueing into the
completion list to io_queue_sqe() as __io_req_complete() is inlined and
we don't want to bloat the kernel.
As now we complete in a more centralised fashion in io_issue_sqe() we
can get rid of the flag and queue to the list directly.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring.c | 20 ++++++++------------
io_uring/io_uring.h | 5 -----
io_uring/io_uring_types.h | 3 ---
3 files changed, 8 insertions(+), 20 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 1fb93fdcfbab..fcee58c6c35e 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1278,17 +1278,14 @@ static void io_req_complete_post32(struct io_kiocb *req, u64 extra1, u64 extra2)
inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags)
{
- if (issue_flags & IO_URING_F_COMPLETE_DEFER)
- io_req_complete_state(req);
- else
- io_req_complete_post(req);
+ io_req_complete_post(req);
}
void __io_req_complete32(struct io_kiocb *req, unsigned int issue_flags,
u64 extra1, u64 extra2)
{
if (issue_flags & IO_URING_F_COMPLETE_DEFER) {
- io_req_complete_state(req);
+ io_req_add_compl_list(req);
req->extra1 = extra1;
req->extra2 = extra2;
} else {
@@ -2132,9 +2129,12 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
if (creds)
revert_creds(creds);
- if (ret == IOU_OK)
- __io_req_complete(req, issue_flags);
- else if (ret != IOU_ISSUE_SKIP_COMPLETE)
+ if (ret == IOU_OK) {
+ if (issue_flags & IO_URING_F_COMPLETE_DEFER)
+ io_req_add_compl_list(req);
+ else
+ io_req_complete_post(req);
+ } else if (ret != IOU_ISSUE_SKIP_COMPLETE)
return ret;
/* If the op doesn't have a file, we're not polling for it */
@@ -2299,10 +2299,6 @@ static inline void io_queue_sqe(struct io_kiocb *req)
ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
- if (req->flags & REQ_F_COMPLETE_INLINE) {
- io_req_add_compl_list(req);
- return;
- }
/*
* We async punt it if the file wasn't marked NOWAIT, or if the file
* doesn't support non-blocking read/write attempts
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 26b669746d61..2141519e995a 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -193,11 +193,6 @@ static inline bool io_run_task_work(void)
return false;
}
-static inline void io_req_complete_state(struct io_kiocb *req)
-{
- req->flags |= REQ_F_COMPLETE_INLINE;
-}
-
static inline void io_tw_lock(struct io_ring_ctx *ctx, bool *locked)
{
if (!*locked) {
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index ca8e25992ece..3228872c199b 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -299,7 +299,6 @@ enum {
REQ_F_POLLED_BIT,
REQ_F_BUFFER_SELECTED_BIT,
REQ_F_BUFFER_RING_BIT,
- REQ_F_COMPLETE_INLINE_BIT,
REQ_F_REISSUE_BIT,
REQ_F_CREDS_BIT,
REQ_F_REFCOUNT_BIT,
@@ -353,8 +352,6 @@ enum {
REQ_F_BUFFER_SELECTED = BIT(REQ_F_BUFFER_SELECTED_BIT),
/* buffer selected from ring, needs commit */
REQ_F_BUFFER_RING = BIT(REQ_F_BUFFER_RING_BIT),
- /* completion is deferred through io_comp_state */
- REQ_F_COMPLETE_INLINE = BIT(REQ_F_COMPLETE_INLINE_BIT),
/* caller should reissue async */
REQ_F_REISSUE = BIT(REQ_F_REISSUE_BIT),
/* supports async reads/writes */
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH for-next v2 10/25] io_uring: kill REQ_F_COMPLETE_INLINE
2022-06-14 14:37 ` [PATCH for-next v2 10/25] io_uring: kill REQ_F_COMPLETE_INLINE Pavel Begunkov
@ 2022-06-15 8:20 ` Hao Xu
2022-06-15 10:18 ` Pavel Begunkov
0 siblings, 1 reply; 44+ messages in thread
From: Hao Xu @ 2022-06-15 8:20 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Jens Axboe
On 6/14/22 22:37, Pavel Begunkov wrote:
> REQ_F_COMPLETE_INLINE is only needed to delay queueing into the
> completion list to io_queue_sqe() as __io_req_complete() is inlined and
> we don't want to bloat the kernel.
>
> As now we complete in a more centralised fashion in io_issue_sqe() we
> can get rid of the flag and queue to the list directly.
>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> io_uring/io_uring.c | 20 ++++++++------------
> io_uring/io_uring.h | 5 -----
> io_uring/io_uring_types.h | 3 ---
> 3 files changed, 8 insertions(+), 20 deletions(-)
>
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 1fb93fdcfbab..fcee58c6c35e 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -1278,17 +1278,14 @@ static void io_req_complete_post32(struct io_kiocb *req, u64 extra1, u64 extra2)
>
> inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags)
> {
> - if (issue_flags & IO_URING_F_COMPLETE_DEFER)
> - io_req_complete_state(req);
> - else
> - io_req_complete_post(req);
> + io_req_complete_post(req);
> }
>
io_read/write and provide_buffers/remove_buffers are still using
io_req_complete() in their own function. By removing the
IO_URING_F_COMPLETE_DEFER branch they will end in complete_post path
100% which we shouldn't.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH for-next v2 10/25] io_uring: kill REQ_F_COMPLETE_INLINE
2022-06-15 8:20 ` Hao Xu
@ 2022-06-15 10:18 ` Pavel Begunkov
2022-06-15 10:19 ` Pavel Begunkov
0 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-15 10:18 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Jens Axboe
On 6/15/22 09:20, Hao Xu wrote:
> On 6/14/22 22:37, Pavel Begunkov wrote:
>> REQ_F_COMPLETE_INLINE is only needed to delay queueing into the
>> completion list to io_queue_sqe() as __io_req_complete() is inlined and
>> we don't want to bloat the kernel.
>>
>> As now we complete in a more centralised fashion in io_issue_sqe() we
>> can get rid of the flag and queue to the list directly.
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>> io_uring/io_uring.c | 20 ++++++++------------
>> io_uring/io_uring.h | 5 -----
>> io_uring/io_uring_types.h | 3 ---
>> 3 files changed, 8 insertions(+), 20 deletions(-)
>>
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index 1fb93fdcfbab..fcee58c6c35e 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -1278,17 +1278,14 @@ static void io_req_complete_post32(struct io_kiocb *req, u64 extra1, u64 extra2)
>> inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags)
>> {
>> - if (issue_flags & IO_URING_F_COMPLETE_DEFER)
>> - io_req_complete_state(req);
>> - else
>> - io_req_complete_post(req);
>> + io_req_complete_post(req);
>> }
>
> io_read/write and provide_buffers/remove_buffers are still using
> io_req_complete() in their own function. By removing the
> IO_URING_F_COMPLETE_DEFER branch they will end in complete_post path
> 100% which we shouldn't.
Old provided buffers are such a useful feature that Jens adds
a new ring-based version of it, so I couldn't care less about
those two.
I any case, let's leave it to follow ups. Those locking is a
weird construct and shouldn't be done this ad-hook way, it's
a potential bug nest
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH for-next v2 10/25] io_uring: kill REQ_F_COMPLETE_INLINE
2022-06-15 10:18 ` Pavel Begunkov
@ 2022-06-15 10:19 ` Pavel Begunkov
0 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-15 10:19 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Jens Axboe
On 6/15/22 11:18, Pavel Begunkov wrote:
> On 6/15/22 09:20, Hao Xu wrote:
>> On 6/14/22 22:37, Pavel Begunkov wrote:
>>> REQ_F_COMPLETE_INLINE is only needed to delay queueing into the
>>> completion list to io_queue_sqe() as __io_req_complete() is inlined and
>>> we don't want to bloat the kernel.
>>>
>>> As now we complete in a more centralised fashion in io_issue_sqe() we
>>> can get rid of the flag and queue to the list directly.
>>>
>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>> ---
>>> io_uring/io_uring.c | 20 ++++++++------------
>>> io_uring/io_uring.h | 5 -----
>>> io_uring/io_uring_types.h | 3 ---
>>> 3 files changed, 8 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>> index 1fb93fdcfbab..fcee58c6c35e 100644
>>> --- a/io_uring/io_uring.c
>>> +++ b/io_uring/io_uring.c
>>> @@ -1278,17 +1278,14 @@ static void io_req_complete_post32(struct io_kiocb *req, u64 extra1, u64 extra2)
>>> inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags)
>>> {
>>> - if (issue_flags & IO_URING_F_COMPLETE_DEFER)
>>> - io_req_complete_state(req);
>>> - else
>>> - io_req_complete_post(req);
>>> + io_req_complete_post(req);
>>> }
>>
>> io_read/write and provide_buffers/remove_buffers are still using
>> io_req_complete() in their own function. By removing the
>> IO_URING_F_COMPLETE_DEFER branch they will end in complete_post path
>> 100% which we shouldn't.
>
> Old provided buffers are such a useful feature that Jens adds
> a new ring-based version of it, so I couldn't care less about
> those two.
>
> I any case, let's leave it to follow ups. Those locking is a
> weird construct and shouldn't be done this ad-hook way, it's
> a potential bug nest
Ok, missed io_read/write part, that's a problem, agree
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH for-next v2 11/25] io_uring: refactor io_req_task_complete()
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (9 preceding siblings ...)
2022-06-14 14:37 ` [PATCH for-next v2 10/25] io_uring: kill REQ_F_COMPLETE_INLINE Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
2022-06-14 17:45 ` Hao Xu
2022-06-14 14:37 ` [PATCH for-next v2 12/25] io_uring: don't inline io_put_kbuf Pavel Begunkov
` (13 subsequent siblings)
24 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Clean up io_req_task_complete() and deduplicate io_put_kbuf() calls.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index fcee58c6c35e..0f6edf82f262 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1857,15 +1857,19 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
return ret;
}
-inline void io_req_task_complete(struct io_kiocb *req, bool *locked)
+
+void io_req_task_complete(struct io_kiocb *req, bool *locked)
{
- if (*locked) {
- req->cqe.flags |= io_put_kbuf(req, 0);
+ if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)) {
+ unsigned issue_flags = *locked ? IO_URING_F_UNLOCKED : 0;
+
+ req->cqe.flags |= io_put_kbuf(req, issue_flags);
+ }
+
+ if (*locked)
io_req_add_compl_list(req);
- } else {
- req->cqe.flags |= io_put_kbuf(req, IO_URING_F_UNLOCKED);
+ else
io_req_complete_post(req);
- }
}
/*
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH for-next v2 11/25] io_uring: refactor io_req_task_complete()
2022-06-14 14:37 ` [PATCH for-next v2 11/25] io_uring: refactor io_req_task_complete() Pavel Begunkov
@ 2022-06-14 17:45 ` Hao Xu
2022-06-14 17:52 ` Pavel Begunkov
0 siblings, 1 reply; 44+ messages in thread
From: Hao Xu @ 2022-06-14 17:45 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Jens Axboe
On 6/14/22 22:37, Pavel Begunkov wrote:
> Clean up io_req_task_complete() and deduplicate io_put_kbuf() calls.
>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> io_uring/io_uring.c | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index fcee58c6c35e..0f6edf82f262 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -1857,15 +1857,19 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
>
> return ret;
> }
> -inline void io_req_task_complete(struct io_kiocb *req, bool *locked)
> +
> +void io_req_task_complete(struct io_kiocb *req, bool *locked)
> {
> - if (*locked) {
> - req->cqe.flags |= io_put_kbuf(req, 0);
> + if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)) {
> + unsigned issue_flags = *locked ? IO_URING_F_UNLOCKED : 0;
should be *locked ? 0 : IO_URING_F_UNLOCKED; I think?. I haven't look
into the whole series carefully, will do that tomorrow.
> +
> + req->cqe.flags |= io_put_kbuf(req, issue_flags);
> + }
> +
> + if (*locked)
> io_req_add_compl_list(req);
> - } else {
> - req->cqe.flags |= io_put_kbuf(req, IO_URING_F_UNLOCKED);
> + else
> io_req_complete_post(req);
> - }
> }
>
> /*
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH for-next v2 11/25] io_uring: refactor io_req_task_complete()
2022-06-14 17:45 ` Hao Xu
@ 2022-06-14 17:52 ` Pavel Begunkov
0 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 17:52 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Jens Axboe
On 6/14/22 18:45, Hao Xu wrote:
> On 6/14/22 22:37, Pavel Begunkov wrote:
>> Clean up io_req_task_complete() and deduplicate io_put_kbuf() calls.
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>> io_uring/io_uring.c | 16 ++++++++++------
>> 1 file changed, 10 insertions(+), 6 deletions(-)
>>
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index fcee58c6c35e..0f6edf82f262 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -1857,15 +1857,19 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
>> return ret;
>> }
>> -inline void io_req_task_complete(struct io_kiocb *req, bool *locked)
>> +
>> +void io_req_task_complete(struct io_kiocb *req, bool *locked)
>> {
>> - if (*locked) {
>> - req->cqe.flags |= io_put_kbuf(req, 0);
>> + if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)) {
>> + unsigned issue_flags = *locked ? IO_URING_F_UNLOCKED : 0;
>
> should be *locked ? 0 : IO_URING_F_UNLOCKED; I think?. I haven't look
> into the whole series carefully, will do that tomorrow.
Yeah, it should... Thanks
>> +
>> + req->cqe.flags |= io_put_kbuf(req, issue_flags);
>> + }
>> +
>> + if (*locked)
>> io_req_add_compl_list(req);
>> - } else {
>> - req->cqe.flags |= io_put_kbuf(req, IO_URING_F_UNLOCKED);
>> + else
>> io_req_complete_post(req);
>> - }
>> }
>> /*
>
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH for-next v2 12/25] io_uring: don't inline io_put_kbuf
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (10 preceding siblings ...)
2022-06-14 14:37 ` [PATCH for-next v2 11/25] io_uring: refactor io_req_task_complete() Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 13/25] io_uring: remove check_cq checking from hot paths Pavel Begunkov
` (12 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
io_put_kbuf() is huge, don't bloat the kernel with inlining.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/kbuf.c | 33 +++++++++++++++++++++++++++++++++
io_uring/kbuf.h | 38 ++++++--------------------------------
2 files changed, 39 insertions(+), 32 deletions(-)
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 9cdbc018fd64..6f2adb481a0d 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -81,6 +81,39 @@ static int io_buffer_add_list(struct io_ring_ctx *ctx,
return xa_err(xa_store(&ctx->io_bl_xa, bgid, bl, GFP_KERNEL));
}
+unsigned int __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags)
+{
+ unsigned int cflags;
+
+ /*
+ * We can add this buffer back to two lists:
+ *
+ * 1) The io_buffers_cache list. This one is protected by the
+ * ctx->uring_lock. If we already hold this lock, add back to this
+ * list as we can grab it from issue as well.
+ * 2) The io_buffers_comp list. This one is protected by the
+ * ctx->completion_lock.
+ *
+ * We migrate buffers from the comp_list to the issue cache list
+ * when we need one.
+ */
+ if (req->flags & REQ_F_BUFFER_RING) {
+ /* no buffers to recycle for this case */
+ cflags = __io_put_kbuf_list(req, NULL);
+ } else if (issue_flags & IO_URING_F_UNLOCKED) {
+ struct io_ring_ctx *ctx = req->ctx;
+
+ spin_lock(&ctx->completion_lock);
+ cflags = __io_put_kbuf_list(req, &ctx->io_buffers_comp);
+ spin_unlock(&ctx->completion_lock);
+ } else {
+ lockdep_assert_held(&req->ctx->uring_lock);
+
+ cflags = __io_put_kbuf_list(req, &req->ctx->io_buffers_cache);
+ }
+ return cflags;
+}
+
static void __user *io_provided_buffer_select(struct io_kiocb *req, size_t *len,
struct io_buffer_list *bl)
{
diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
index 80b6df2c7535..5da3d4039aed 100644
--- a/io_uring/kbuf.h
+++ b/io_uring/kbuf.h
@@ -47,6 +47,8 @@ int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags);
int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg);
int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg);
+unsigned int __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags);
+
static inline bool io_do_buffer_select(struct io_kiocb *req)
{
if (!(req->flags & REQ_F_BUFFER_SELECT))
@@ -70,7 +72,8 @@ static inline void io_kbuf_recycle(struct io_kiocb *req, unsigned issue_flags)
__io_kbuf_recycle(req, issue_flags);
}
-static unsigned int __io_put_kbuf(struct io_kiocb *req, struct list_head *list)
+static inline unsigned int __io_put_kbuf_list(struct io_kiocb *req,
+ struct list_head *list)
{
if (req->flags & REQ_F_BUFFER_RING) {
if (req->buf_list)
@@ -90,44 +93,15 @@ static inline unsigned int io_put_kbuf_comp(struct io_kiocb *req)
if (!(req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)))
return 0;
- return __io_put_kbuf(req, &req->ctx->io_buffers_comp);
+ return __io_put_kbuf_list(req, &req->ctx->io_buffers_comp);
}
static inline unsigned int io_put_kbuf(struct io_kiocb *req,
unsigned issue_flags)
{
- unsigned int cflags;
if (!(req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)))
return 0;
-
- /*
- * We can add this buffer back to two lists:
- *
- * 1) The io_buffers_cache list. This one is protected by the
- * ctx->uring_lock. If we already hold this lock, add back to this
- * list as we can grab it from issue as well.
- * 2) The io_buffers_comp list. This one is protected by the
- * ctx->completion_lock.
- *
- * We migrate buffers from the comp_list to the issue cache list
- * when we need one.
- */
- if (req->flags & REQ_F_BUFFER_RING) {
- /* no buffers to recycle for this case */
- cflags = __io_put_kbuf(req, NULL);
- } else if (issue_flags & IO_URING_F_UNLOCKED) {
- struct io_ring_ctx *ctx = req->ctx;
-
- spin_lock(&ctx->completion_lock);
- cflags = __io_put_kbuf(req, &ctx->io_buffers_comp);
- spin_unlock(&ctx->completion_lock);
- } else {
- lockdep_assert_held(&req->ctx->uring_lock);
-
- cflags = __io_put_kbuf(req, &req->ctx->io_buffers_cache);
- }
-
- return cflags;
+ return __io_put_kbuf(req, issue_flags);
}
#endif
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH for-next v2 13/25] io_uring: remove check_cq checking from hot paths
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (11 preceding siblings ...)
2022-06-14 14:37 ` [PATCH for-next v2 12/25] io_uring: don't inline io_put_kbuf Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 14/25] io_uring: poll: remove unnecessary req->ref set Pavel Begunkov
` (11 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
All ctx->check_cq events are slow path, don't test every single flag one
by one in the hot path, but add a common guarding if.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring.c | 34 +++++++++++++++++++---------------
1 file changed, 19 insertions(+), 15 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 0f6edf82f262..e43eccf173ff 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1807,24 +1807,25 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
int ret = 0;
unsigned long check_cq;
+ check_cq = READ_ONCE(ctx->check_cq);
+ if (unlikely(check_cq)) {
+ if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT))
+ __io_cqring_overflow_flush(ctx, false);
+ /*
+ * Similarly do not spin if we have not informed the user of any
+ * dropped CQE.
+ */
+ if (check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT))
+ return -EBADR;
+ }
/*
* Don't enter poll loop if we already have events pending.
* If we do, we can potentially be spinning for commands that
* already triggered a CQE (eg in error).
*/
- check_cq = READ_ONCE(ctx->check_cq);
- if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT))
- __io_cqring_overflow_flush(ctx, false);
if (io_cqring_events(ctx))
return 0;
- /*
- * Similarly do not spin if we have not informed the user of any
- * dropped CQE.
- */
- if (unlikely(check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT)))
- return -EBADR;
-
do {
/*
* If a submit got punted to a workqueue, we can have the
@@ -2752,12 +2753,15 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
ret = io_run_task_work_sig();
if (ret || io_should_wake(iowq))
return ret;
+
check_cq = READ_ONCE(ctx->check_cq);
- /* let the caller flush overflows, retry */
- if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT))
- return 1;
- if (unlikely(check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT)))
- return -EBADR;
+ if (unlikely(check_cq)) {
+ /* let the caller flush overflows, retry */
+ if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT))
+ return 1;
+ if (check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT))
+ return -EBADR;
+ }
if (!schedule_hrtimeout(&timeout, HRTIMER_MODE_ABS))
return -ETIME;
return 1;
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH for-next v2 14/25] io_uring: poll: remove unnecessary req->ref set
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (12 preceding siblings ...)
2022-06-14 14:37 ` [PATCH for-next v2 13/25] io_uring: remove check_cq checking from hot paths Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 15/25] io_uring: switch cancel_hash to use per entry spinlock Pavel Begunkov
` (10 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence, Hao Xu
From: Hao Xu <[email protected]>
We now don't need to set req->refcount for poll requests since the
reworked poll code ensures no request release race.
Signed-off-by: Hao Xu <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/poll.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 0df5eca93b16..73584c4e3e9b 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -683,7 +683,6 @@ int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if ((flags & IORING_POLL_ADD_MULTI) && (req->flags & REQ_F_CQE_SKIP))
return -EINVAL;
- io_req_set_refcount(req);
req->apoll_events = poll->events = io_poll_parse_events(sqe, flags);
return 0;
}
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH for-next v2 15/25] io_uring: switch cancel_hash to use per entry spinlock
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (13 preceding siblings ...)
2022-06-14 14:37 ` [PATCH for-next v2 14/25] io_uring: poll: remove unnecessary req->ref set Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 16/25] io_uring: pass poll_find lock back Pavel Begunkov
` (9 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence, Hao Xu
From: Hao Xu <[email protected]>
Add a new io_hash_bucket structure so that each bucket in cancel_hash
has separate spinlock. Use per entry lock for cancel_hash, this removes
some completion lock invocation and remove contension between different
cancel_hash entries.
Signed-off-by: Hao Xu <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/cancel.c | 14 ++++++-
io_uring/cancel.h | 6 +++
io_uring/fdinfo.c | 9 +++--
io_uring/io_uring.c | 8 ++--
io_uring/io_uring_types.h | 2 +-
io_uring/poll.c | 80 ++++++++++++++++++++++++---------------
6 files changed, 79 insertions(+), 40 deletions(-)
diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index 83cceb52d82d..6f2888388a40 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -93,14 +93,14 @@ int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd)
if (!ret)
return 0;
- spin_lock(&ctx->completion_lock);
ret = io_poll_cancel(ctx, cd);
if (ret != -ENOENT)
goto out;
+ spin_lock(&ctx->completion_lock);
if (!(cd->flags & IORING_ASYNC_CANCEL_FD))
ret = io_timeout_cancel(ctx, cd);
-out:
spin_unlock(&ctx->completion_lock);
+out:
return ret;
}
@@ -192,3 +192,13 @@ int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags)
io_req_set_res(req, ret, 0);
return IOU_OK;
}
+
+void init_hash_table(struct io_hash_bucket *hash_table, unsigned size)
+{
+ unsigned int i;
+
+ for (i = 0; i < size; i++) {
+ spin_lock_init(&hash_table[i].lock);
+ INIT_HLIST_HEAD(&hash_table[i].list);
+ }
+}
diff --git a/io_uring/cancel.h b/io_uring/cancel.h
index 4f35d8696325..556a7dcf160e 100644
--- a/io_uring/cancel.h
+++ b/io_uring/cancel.h
@@ -4,3 +4,9 @@ int io_async_cancel_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags);
int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd);
+void init_hash_table(struct io_hash_bucket *hash_table, unsigned size);
+
+struct io_hash_bucket {
+ spinlock_t lock;
+ struct hlist_head list;
+} ____cacheline_aligned_in_smp;
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index fcedde4b4b1e..f941c73f5502 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -13,6 +13,7 @@
#include "io_uring.h"
#include "sqpoll.h"
#include "fdinfo.h"
+#include "cancel.h"
#ifdef CONFIG_PROC_FS
static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id,
@@ -157,17 +158,19 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
mutex_unlock(&ctx->uring_lock);
seq_puts(m, "PollList:\n");
- spin_lock(&ctx->completion_lock);
for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
- struct hlist_head *list = &ctx->cancel_hash[i];
+ struct io_hash_bucket *hb = &ctx->cancel_hash[i];
struct io_kiocb *req;
- hlist_for_each_entry(req, list, hash_node)
+ spin_lock(&hb->lock);
+ hlist_for_each_entry(req, &hb->list, hash_node)
seq_printf(m, " op=%d, task_works=%d\n", req->opcode,
task_work_pending(req->task));
+ spin_unlock(&hb->lock);
}
seq_puts(m, "CqOverflowList:\n");
+ spin_lock(&ctx->completion_lock);
list_for_each_entry(ocqe, &ctx->cq_overflow_list, list) {
struct io_uring_cqe *cqe = &ocqe->cqe;
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index e43eccf173ff..4d94757e6e28 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -717,11 +717,13 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
if (hash_bits <= 0)
hash_bits = 1;
ctx->cancel_hash_bits = hash_bits;
- ctx->cancel_hash = kmalloc((1U << hash_bits) * sizeof(struct hlist_head),
- GFP_KERNEL);
+ ctx->cancel_hash =
+ kmalloc((1U << hash_bits) * sizeof(struct io_hash_bucket),
+ GFP_KERNEL);
if (!ctx->cancel_hash)
goto err;
- __hash_init(ctx->cancel_hash, 1U << hash_bits);
+
+ init_hash_table(ctx->cancel_hash, 1U << hash_bits);
ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
if (!ctx->dummy_ubuf)
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index 3228872c199b..aba0f8cd6f49 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -222,7 +222,7 @@ struct io_ring_ctx {
* manipulate the list, hence no extra locking is needed there.
*/
struct io_wq_work_list iopoll_list;
- struct hlist_head *cancel_hash;
+ struct io_hash_bucket *cancel_hash;
unsigned cancel_hash_bits;
bool poll_multi_queue;
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 73584c4e3e9b..7f6b16f687b0 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -19,6 +19,7 @@
#include "opdef.h"
#include "kbuf.h"
#include "poll.h"
+#include "cancel.h"
struct io_poll_update {
struct file *file;
@@ -73,10 +74,22 @@ static struct io_poll *io_poll_get_single(struct io_kiocb *req)
static void io_poll_req_insert(struct io_kiocb *req)
{
struct io_ring_ctx *ctx = req->ctx;
- struct hlist_head *list;
+ u32 index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
+ struct io_hash_bucket *hb = &ctx->cancel_hash[index];
- list = &ctx->cancel_hash[hash_long(req->cqe.user_data, ctx->cancel_hash_bits)];
- hlist_add_head(&req->hash_node, list);
+ spin_lock(&hb->lock);
+ hlist_add_head(&req->hash_node, &hb->list);
+ spin_unlock(&hb->lock);
+}
+
+static void io_poll_req_delete(struct io_kiocb *req, struct io_ring_ctx *ctx)
+{
+ u32 index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
+ spinlock_t *lock = &ctx->cancel_hash[index].lock;
+
+ spin_lock(lock);
+ hash_del(&req->hash_node);
+ spin_unlock(lock);
}
static void io_init_poll_iocb(struct io_poll *poll, __poll_t events,
@@ -220,8 +233,8 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
}
io_poll_remove_entries(req);
+ io_poll_req_delete(req, ctx);
spin_lock(&ctx->completion_lock);
- hash_del(&req->hash_node);
req->cqe.flags = 0;
__io_req_complete_post(req);
io_commit_cqring(ctx);
@@ -231,7 +244,6 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
{
- struct io_ring_ctx *ctx = req->ctx;
int ret;
ret = io_poll_check_events(req, locked);
@@ -239,9 +251,7 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
return;
io_poll_remove_entries(req);
- spin_lock(&ctx->completion_lock);
- hash_del(&req->hash_node);
- spin_unlock(&ctx->completion_lock);
+ io_poll_req_delete(req, req->ctx);
if (!ret)
io_req_task_submit(req, locked);
@@ -435,9 +445,7 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
return 0;
}
- spin_lock(&ctx->completion_lock);
io_poll_req_insert(req);
- spin_unlock(&ctx->completion_lock);
if (mask && (poll->events & EPOLLET)) {
/* can't multishot if failed, just queue the event we've got */
@@ -534,32 +542,31 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
bool found = false;
int i;
- spin_lock(&ctx->completion_lock);
for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
- struct hlist_head *list;
+ struct io_hash_bucket *hb = &ctx->cancel_hash[i];
- list = &ctx->cancel_hash[i];
- hlist_for_each_entry_safe(req, tmp, list, hash_node) {
+ spin_lock(&hb->lock);
+ hlist_for_each_entry_safe(req, tmp, &hb->list, hash_node) {
if (io_match_task_safe(req, tsk, cancel_all)) {
hlist_del_init(&req->hash_node);
io_poll_cancel_req(req);
found = true;
}
}
+ spin_unlock(&hb->lock);
}
- spin_unlock(&ctx->completion_lock);
return found;
}
static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
struct io_cancel_data *cd)
- __must_hold(&ctx->completion_lock)
{
- struct hlist_head *list;
struct io_kiocb *req;
+ u32 index = hash_long(cd->data, ctx->cancel_hash_bits);
+ struct io_hash_bucket *hb = &ctx->cancel_hash[index];
- list = &ctx->cancel_hash[hash_long(cd->data, ctx->cancel_hash_bits)];
- hlist_for_each_entry(req, list, hash_node) {
+ spin_lock(&hb->lock);
+ hlist_for_each_entry(req, &hb->list, hash_node) {
if (cd->data != req->cqe.user_data)
continue;
if (poll_only && req->opcode != IORING_OP_POLL_ADD)
@@ -571,21 +578,21 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
}
return req;
}
+ spin_unlock(&hb->lock);
return NULL;
}
static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
struct io_cancel_data *cd)
- __must_hold(&ctx->completion_lock)
{
struct io_kiocb *req;
int i;
for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
- struct hlist_head *list;
+ struct io_hash_bucket *hb = &ctx->cancel_hash[i];
- list = &ctx->cancel_hash[i];
- hlist_for_each_entry(req, list, hash_node) {
+ spin_lock(&hb->lock);
+ hlist_for_each_entry(req, &hb->list, hash_node) {
if (!(cd->flags & IORING_ASYNC_CANCEL_ANY) &&
req->file != cd->file)
continue;
@@ -594,12 +601,12 @@ static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
req->work.cancel_seq = cd->seq;
return req;
}
+ spin_unlock(&hb->lock);
}
return NULL;
}
static bool io_poll_disarm(struct io_kiocb *req)
- __must_hold(&ctx->completion_lock)
{
if (!io_poll_get_ownership(req))
return false;
@@ -609,17 +616,23 @@ static bool io_poll_disarm(struct io_kiocb *req)
}
int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
- __must_hold(&ctx->completion_lock)
{
struct io_kiocb *req;
+ u32 index;
+ spinlock_t *lock;
if (cd->flags & (IORING_ASYNC_CANCEL_FD|IORING_ASYNC_CANCEL_ANY))
req = io_poll_file_find(ctx, cd);
else
req = io_poll_find(ctx, false, cd);
- if (!req)
+ if (!req) {
return -ENOENT;
+ } else {
+ index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
+ lock = &ctx->cancel_hash[index].lock;
+ }
io_poll_cancel_req(req);
+ spin_unlock(lock);
return 0;
}
@@ -713,18 +726,23 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
struct io_poll_update *poll_update = io_kiocb_to_cmd(req);
struct io_cancel_data cd = { .data = poll_update->old_user_data, };
struct io_ring_ctx *ctx = req->ctx;
+ u32 index = hash_long(cd.data, ctx->cancel_hash_bits);
+ spinlock_t *lock = &ctx->cancel_hash[index].lock;
struct io_kiocb *preq;
int ret2, ret = 0;
bool locked;
- spin_lock(&ctx->completion_lock);
preq = io_poll_find(ctx, true, &cd);
- if (!preq || !io_poll_disarm(preq)) {
- spin_unlock(&ctx->completion_lock);
- ret = preq ? -EALREADY : -ENOENT;
+ if (!preq) {
+ ret = -ENOENT;
+ goto out;
+ }
+ ret2 = io_poll_disarm(preq);
+ spin_unlock(lock);
+ if (!ret2) {
+ ret = -EALREADY;
goto out;
}
- spin_unlock(&ctx->completion_lock);
if (poll_update->update_events || poll_update->update_user_data) {
/* only mask one event flags, keep behavior flags */
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH for-next v2 16/25] io_uring: pass poll_find lock back
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (14 preceding siblings ...)
2022-06-14 14:37 ` [PATCH for-next v2 15/25] io_uring: switch cancel_hash to use per entry spinlock Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 17/25] io_uring: clean up io_try_cancel Pavel Begunkov
` (8 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Instead of using implicit knowledge of what is locked or not after
io_poll_find() and co returns, pass back a pointer to the locked
bucket if any. If set the user must to unlock the spinlock.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/poll.c | 46 ++++++++++++++++++++++++++--------------------
1 file changed, 26 insertions(+), 20 deletions(-)
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 7f6b16f687b0..7fc4aafcca95 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -559,12 +559,15 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
}
static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
- struct io_cancel_data *cd)
+ struct io_cancel_data *cd,
+ struct io_hash_bucket **out_bucket)
{
struct io_kiocb *req;
u32 index = hash_long(cd->data, ctx->cancel_hash_bits);
struct io_hash_bucket *hb = &ctx->cancel_hash[index];
+ *out_bucket = NULL;
+
spin_lock(&hb->lock);
hlist_for_each_entry(req, &hb->list, hash_node) {
if (cd->data != req->cqe.user_data)
@@ -576,6 +579,7 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
continue;
req->work.cancel_seq = cd->seq;
}
+ *out_bucket = hb;
return req;
}
spin_unlock(&hb->lock);
@@ -583,11 +587,14 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
}
static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
- struct io_cancel_data *cd)
+ struct io_cancel_data *cd,
+ struct io_hash_bucket **out_bucket)
{
struct io_kiocb *req;
int i;
+ *out_bucket = NULL;
+
for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
struct io_hash_bucket *hb = &ctx->cancel_hash[i];
@@ -599,6 +606,7 @@ static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
if (cd->seq == req->work.cancel_seq)
continue;
req->work.cancel_seq = cd->seq;
+ *out_bucket = hb;
return req;
}
spin_unlock(&hb->lock);
@@ -617,23 +625,19 @@ static bool io_poll_disarm(struct io_kiocb *req)
int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
{
+ struct io_hash_bucket *bucket;
struct io_kiocb *req;
- u32 index;
- spinlock_t *lock;
if (cd->flags & (IORING_ASYNC_CANCEL_FD|IORING_ASYNC_CANCEL_ANY))
- req = io_poll_file_find(ctx, cd);
+ req = io_poll_file_find(ctx, cd, &bucket);
else
- req = io_poll_find(ctx, false, cd);
- if (!req) {
- return -ENOENT;
- } else {
- index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
- lock = &ctx->cancel_hash[index].lock;
- }
- io_poll_cancel_req(req);
- spin_unlock(lock);
- return 0;
+ req = io_poll_find(ctx, false, cd, &bucket);
+
+ if (req)
+ io_poll_cancel_req(req);
+ if (bucket)
+ spin_unlock(&bucket->lock);
+ return req ? 0 : -ENOENT;
}
static __poll_t io_poll_parse_events(const struct io_uring_sqe *sqe,
@@ -726,19 +730,21 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
struct io_poll_update *poll_update = io_kiocb_to_cmd(req);
struct io_cancel_data cd = { .data = poll_update->old_user_data, };
struct io_ring_ctx *ctx = req->ctx;
- u32 index = hash_long(cd.data, ctx->cancel_hash_bits);
- spinlock_t *lock = &ctx->cancel_hash[index].lock;
+ struct io_hash_bucket *bucket;
struct io_kiocb *preq;
int ret2, ret = 0;
bool locked;
- preq = io_poll_find(ctx, true, &cd);
+ preq = io_poll_find(ctx, true, &cd, &bucket);
+ if (preq)
+ ret2 = io_poll_disarm(preq);
+ if (bucket)
+ spin_unlock(&bucket->lock);
+
if (!preq) {
ret = -ENOENT;
goto out;
}
- ret2 = io_poll_disarm(preq);
- spin_unlock(lock);
if (!ret2) {
ret = -EALREADY;
goto out;
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH for-next v2 17/25] io_uring: clean up io_try_cancel
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (15 preceding siblings ...)
2022-06-14 14:37 ` [PATCH for-next v2 16/25] io_uring: pass poll_find lock back Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 18/25] io_uring: limit number hash buckets Pavel Begunkov
` (7 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Get rid of an unnecessary extra goto in io_try_cancel() and simplify the
function.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/cancel.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index 6f2888388a40..a253e2ad22eb 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -95,12 +95,12 @@ int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd)
ret = io_poll_cancel(ctx, cd);
if (ret != -ENOENT)
- goto out;
+ return ret;
+
spin_lock(&ctx->completion_lock);
if (!(cd->flags & IORING_ASYNC_CANCEL_FD))
ret = io_timeout_cancel(ctx, cd);
spin_unlock(&ctx->completion_lock);
-out:
return ret;
}
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH for-next v2 18/25] io_uring: limit number hash buckets
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (16 preceding siblings ...)
2022-06-14 14:37 ` [PATCH for-next v2 17/25] io_uring: clean up io_try_cancel Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 19/25] io_uring: clean up io_ring_ctx_alloc Pavel Begunkov
` (6 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Don't allocate to many hash/cancellation buckets, there might be too
many, clamp it to 8 bits, or 256 * 64B = 16KB. We don't usually have too
many requests, and 256 buckets should be enough, especially since we
do hash search only in the cancellation path.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 4d94757e6e28..2a7a5db12a0e 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -710,12 +710,12 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
/*
* Use 5 bits less than the max cq entries, that should give us around
- * 32 entries per hash list if totally full and uniformly spread.
+ * 32 entries per hash list if totally full and uniformly spread, but
+ * don't keep too many buckets to not overconsume memory.
*/
- hash_bits = ilog2(p->cq_entries);
- hash_bits -= 5;
- if (hash_bits <= 0)
- hash_bits = 1;
+ hash_bits = ilog2(p->cq_entries) - 5;
+ hash_bits = clamp(hash_bits, 1, 8);
+
ctx->cancel_hash_bits = hash_bits;
ctx->cancel_hash =
kmalloc((1U << hash_bits) * sizeof(struct io_hash_bucket),
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH for-next v2 19/25] io_uring: clean up io_ring_ctx_alloc
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (17 preceding siblings ...)
2022-06-14 14:37 ` [PATCH for-next v2 18/25] io_uring: limit number hash buckets Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
2022-06-15 8:46 ` Hao Xu
2022-06-14 14:37 ` [PATCH for-next v2 20/25] io_uring: use state completion infra for poll reqs Pavel Begunkov
` (5 subsequent siblings)
24 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Add a variable for the number of hash buckets in io_ring_ctx_alloc(),
makes it more readable.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 2a7a5db12a0e..15d209f334eb 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -700,6 +700,8 @@ static __cold void io_fallback_req_func(struct work_struct *work)
static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
{
struct io_ring_ctx *ctx;
+ unsigned hash_buckets;
+ size_t hash_size;
int hash_bits;
ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -715,15 +717,15 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
*/
hash_bits = ilog2(p->cq_entries) - 5;
hash_bits = clamp(hash_bits, 1, 8);
+ hash_buckets = 1U << hash_bits;
+ hash_size = hash_buckets * sizeof(struct io_hash_bucket);
ctx->cancel_hash_bits = hash_bits;
- ctx->cancel_hash =
- kmalloc((1U << hash_bits) * sizeof(struct io_hash_bucket),
- GFP_KERNEL);
+ ctx->cancel_hash = kmalloc(hash_size, GFP_KERNEL);
if (!ctx->cancel_hash)
goto err;
- init_hash_table(ctx->cancel_hash, 1U << hash_bits);
+ init_hash_table(ctx->cancel_hash, hash_buckets);
ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
if (!ctx->dummy_ubuf)
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH for-next v2 19/25] io_uring: clean up io_ring_ctx_alloc
2022-06-14 14:37 ` [PATCH for-next v2 19/25] io_uring: clean up io_ring_ctx_alloc Pavel Begunkov
@ 2022-06-15 8:46 ` Hao Xu
0 siblings, 0 replies; 44+ messages in thread
From: Hao Xu @ 2022-06-15 8:46 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Jens Axboe
On 6/14/22 22:37, Pavel Begunkov wrote:
> Add a variable for the number of hash buckets in io_ring_ctx_alloc(),
> makes it more readable.
>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> io_uring/io_uring.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 2a7a5db12a0e..15d209f334eb 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -700,6 +700,8 @@ static __cold void io_fallback_req_func(struct work_struct *work)
> static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
> {
> struct io_ring_ctx *ctx;
> + unsigned hash_buckets;
personally prefer nr_something like nr_buckets or nr_hash_buckets
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH for-next v2 20/25] io_uring: use state completion infra for poll reqs
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (18 preceding siblings ...)
2022-06-14 14:37 ` [PATCH for-next v2 19/25] io_uring: clean up io_ring_ctx_alloc Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER Pavel Begunkov
` (4 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Use io_req_task_complete() for poll request completions, so it can
utilise state completions and save lots of unnecessary locking.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/poll.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 7fc4aafcca95..c4ce98504986 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -234,12 +234,8 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
io_poll_remove_entries(req);
io_poll_req_delete(req, ctx);
- spin_lock(&ctx->completion_lock);
- req->cqe.flags = 0;
- __io_req_complete_post(req);
- io_commit_cqring(ctx);
- spin_unlock(&ctx->completion_lock);
- io_cqring_ev_posted(ctx);
+ io_req_set_res(req, req->cqe.res, 0);
+ io_req_task_complete(req, locked);
}
static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (19 preceding siblings ...)
2022-06-14 14:37 ` [PATCH for-next v2 20/25] io_uring: use state completion infra for poll reqs Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
2022-06-15 9:34 ` Hao Xu
2022-06-15 9:41 ` Hao Xu
2022-06-14 14:37 ` [PATCH for-next v2 22/25] io_uring: pass hash table into poll_find Pavel Begunkov
` (3 subsequent siblings)
24 siblings, 2 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Add a new IORING_SETUP_SINGLE_ISSUER flag and the userspace visible part
of it, i.e. put limitations of submitters. Also, don't allow it together
with IOPOLL as we're not going to put it to good use.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/uapi/linux/io_uring.h | 5 ++++-
io_uring/io_uring.c | 7 +++++--
io_uring/io_uring_types.h | 1 +
io_uring/tctx.c | 27 ++++++++++++++++++++++++---
io_uring/tctx.h | 4 ++--
5 files changed, 36 insertions(+), 8 deletions(-)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index a41ddb8c5e1f..a3a691340d3e 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -138,9 +138,12 @@ enum {
* IORING_SQ_TASKRUN in the sq ring flags. Not valid with COOP_TASKRUN.
*/
#define IORING_SETUP_TASKRUN_FLAG (1U << 9)
-
#define IORING_SETUP_SQE128 (1U << 10) /* SQEs are 128 byte */
#define IORING_SETUP_CQE32 (1U << 11) /* CQEs are 32 byte */
+/*
+ * Only one task is allowed to submit requests
+ */
+#define IORING_SETUP_SINGLE_ISSUER (1U << 12)
enum io_uring_op {
IORING_OP_NOP,
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 15d209f334eb..4b90439808e3 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3020,6 +3020,8 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
io_destroy_buffers(ctx);
if (ctx->sq_creds)
put_cred(ctx->sq_creds);
+ if (ctx->submitter_task)
+ put_task_struct(ctx->submitter_task);
/* there are no registered resources left, nobody uses it */
if (ctx->rsrc_node)
@@ -3752,7 +3754,7 @@ static int io_uring_install_fd(struct io_ring_ctx *ctx, struct file *file)
if (fd < 0)
return fd;
- ret = io_uring_add_tctx_node(ctx);
+ ret = __io_uring_add_tctx_node(ctx, false);
if (ret) {
put_unused_fd(fd);
return ret;
@@ -3972,7 +3974,8 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
IORING_SETUP_CLAMP | IORING_SETUP_ATTACH_WQ |
IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL |
IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG |
- IORING_SETUP_SQE128 | IORING_SETUP_CQE32))
+ IORING_SETUP_SQE128 | IORING_SETUP_CQE32 |
+ IORING_SETUP_SINGLE_ISSUER))
return -EINVAL;
return io_uring_create(entries, &p, params);
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index aba0f8cd6f49..f6d0ad25f377 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -241,6 +241,7 @@ struct io_ring_ctx {
/* Keep this last, we don't need it for the fast path */
struct io_restriction restrictions;
+ struct task_struct *submitter_task;
/* slow path rsrc auxilary data, used by update/register */
struct io_rsrc_node *rsrc_backup_node;
diff --git a/io_uring/tctx.c b/io_uring/tctx.c
index 6adf659687f8..012be261dc50 100644
--- a/io_uring/tctx.c
+++ b/io_uring/tctx.c
@@ -81,12 +81,32 @@ __cold int io_uring_alloc_task_context(struct task_struct *task,
return 0;
}
-int __io_uring_add_tctx_node(struct io_ring_ctx *ctx)
+static int io_register_submitter(struct io_ring_ctx *ctx)
+{
+ int ret = 0;
+
+ mutex_lock(&ctx->uring_lock);
+ if (!ctx->submitter_task)
+ ctx->submitter_task = get_task_struct(current);
+ else if (ctx->submitter_task != current)
+ ret = -EEXIST;
+ mutex_unlock(&ctx->uring_lock);
+
+ return ret;
+}
+
+int __io_uring_add_tctx_node(struct io_ring_ctx *ctx, bool submitter)
{
struct io_uring_task *tctx = current->io_uring;
struct io_tctx_node *node;
int ret;
+ if ((ctx->flags & IORING_SETUP_SINGLE_ISSUER) && submitter) {
+ ret = io_register_submitter(ctx);
+ if (ret)
+ return ret;
+ }
+
if (unlikely(!tctx)) {
ret = io_uring_alloc_task_context(current, ctx);
if (unlikely(ret))
@@ -120,7 +140,8 @@ int __io_uring_add_tctx_node(struct io_ring_ctx *ctx)
list_add(&node->ctx_node, &ctx->tctx_list);
mutex_unlock(&ctx->uring_lock);
}
- tctx->last = ctx;
+ if (submitter)
+ tctx->last = ctx;
return 0;
}
@@ -228,7 +249,7 @@ int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg,
return -EINVAL;
mutex_unlock(&ctx->uring_lock);
- ret = io_uring_add_tctx_node(ctx);
+ ret = __io_uring_add_tctx_node(ctx, false);
mutex_lock(&ctx->uring_lock);
if (ret)
return ret;
diff --git a/io_uring/tctx.h b/io_uring/tctx.h
index 7684713e950f..dde82ce4d8e2 100644
--- a/io_uring/tctx.h
+++ b/io_uring/tctx.h
@@ -34,7 +34,7 @@ struct io_tctx_node {
int io_uring_alloc_task_context(struct task_struct *task,
struct io_ring_ctx *ctx);
void io_uring_del_tctx_node(unsigned long index);
-int __io_uring_add_tctx_node(struct io_ring_ctx *ctx);
+int __io_uring_add_tctx_node(struct io_ring_ctx *ctx, bool submitter);
void io_uring_clean_tctx(struct io_uring_task *tctx);
void io_uring_unreg_ringfd(void);
@@ -52,5 +52,5 @@ static inline int io_uring_add_tctx_node(struct io_ring_ctx *ctx)
if (likely(tctx && tctx->last == ctx))
return 0;
- return __io_uring_add_tctx_node(ctx);
+ return __io_uring_add_tctx_node(ctx, true);
}
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER
2022-06-14 14:37 ` [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER Pavel Begunkov
@ 2022-06-15 9:34 ` Hao Xu
2022-06-15 10:20 ` Pavel Begunkov
2022-06-15 9:41 ` Hao Xu
1 sibling, 1 reply; 44+ messages in thread
From: Hao Xu @ 2022-06-15 9:34 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Jens Axboe
On 6/14/22 22:37, Pavel Begunkov wrote:
> @@ -228,7 +249,7 @@ int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg,
> return -EINVAL;
>
> mutex_unlock(&ctx->uring_lock);
> - ret = io_uring_add_tctx_node(ctx);
> + ret = __io_uring_add_tctx_node(ctx, false);
An question unrelated with this patch: why do we need this, since anyway
we will do it in later io_uring_enter() this task really submits reqs.
> mutex_lock(&ctx->uring_lock);
> if (ret)
> return ret;
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER
2022-06-15 9:34 ` Hao Xu
@ 2022-06-15 10:20 ` Pavel Begunkov
0 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-15 10:20 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Jens Axboe
On 6/15/22 10:34, Hao Xu wrote:
> On 6/14/22 22:37, Pavel Begunkov wrote:
>> @@ -228,7 +249,7 @@ int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg,
>> return -EINVAL;
>> mutex_unlock(&ctx->uring_lock);
>> - ret = io_uring_add_tctx_node(ctx);
>> + ret = __io_uring_add_tctx_node(ctx, false);
>
> An question unrelated with this patch: why do we need this, since anyway
> we will do it in later io_uring_enter() this task really submits reqs.
At least we need to allocate a tctx as the files are stored in there
>> mutex_lock(&ctx->uring_lock);
>> if (ret)
>> return ret;
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER
2022-06-14 14:37 ` [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER Pavel Begunkov
2022-06-15 9:34 ` Hao Xu
@ 2022-06-15 9:41 ` Hao Xu
2022-06-15 10:26 ` Pavel Begunkov
1 sibling, 1 reply; 44+ messages in thread
From: Hao Xu @ 2022-06-15 9:41 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Jens Axboe
On 6/14/22 22:37, Pavel Begunkov wrote:
> Add a new IORING_SETUP_SINGLE_ISSUER flag and the userspace visible part
> of it, i.e. put limitations of submitters. Also, don't allow it together
> with IOPOLL as we're not going to put it to good use.
>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> include/uapi/linux/io_uring.h | 5 ++++-
> io_uring/io_uring.c | 7 +++++--
> io_uring/io_uring_types.h | 1 +
> io_uring/tctx.c | 27 ++++++++++++++++++++++++---
> io_uring/tctx.h | 4 ++--
> 5 files changed, 36 insertions(+), 8 deletions(-)
>
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index a41ddb8c5e1f..a3a691340d3e 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -138,9 +138,12 @@ enum {
> * IORING_SQ_TASKRUN in the sq ring flags. Not valid with COOP_TASKRUN.
> */
> #define IORING_SETUP_TASKRUN_FLAG (1U << 9)
> -
> #define IORING_SETUP_SQE128 (1U << 10) /* SQEs are 128 byte */
> #define IORING_SETUP_CQE32 (1U << 11) /* CQEs are 32 byte */
> +/*
> + * Only one task is allowed to submit requests
> + */
> +#define IORING_SETUP_SINGLE_ISSUER (1U << 12)
>
> enum io_uring_op {
> IORING_OP_NOP,
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 15d209f334eb..4b90439808e3 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -3020,6 +3020,8 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
> io_destroy_buffers(ctx);
> if (ctx->sq_creds)
> put_cred(ctx->sq_creds);
> + if (ctx->submitter_task)
> + put_task_struct(ctx->submitter_task);
>
> /* there are no registered resources left, nobody uses it */
> if (ctx->rsrc_node)
> @@ -3752,7 +3754,7 @@ static int io_uring_install_fd(struct io_ring_ctx *ctx, struct file *file)
> if (fd < 0)
> return fd;
>
> - ret = io_uring_add_tctx_node(ctx);
> + ret = __io_uring_add_tctx_node(ctx, false);
> if (ret) {
> put_unused_fd(fd);
> return ret;
> @@ -3972,7 +3974,8 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
> IORING_SETUP_CLAMP | IORING_SETUP_ATTACH_WQ |
> IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL |
> IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG |
> - IORING_SETUP_SQE128 | IORING_SETUP_CQE32))
> + IORING_SETUP_SQE128 | IORING_SETUP_CQE32 |
> + IORING_SETUP_SINGLE_ISSUER))
> return -EINVAL;
>
> return io_uring_create(entries, &p, params);
> diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
> index aba0f8cd6f49..f6d0ad25f377 100644
> --- a/io_uring/io_uring_types.h
> +++ b/io_uring/io_uring_types.h
> @@ -241,6 +241,7 @@ struct io_ring_ctx {
> /* Keep this last, we don't need it for the fast path */
>
> struct io_restriction restrictions;
> + struct task_struct *submitter_task;
>
> /* slow path rsrc auxilary data, used by update/register */
> struct io_rsrc_node *rsrc_backup_node;
> diff --git a/io_uring/tctx.c b/io_uring/tctx.c
> index 6adf659687f8..012be261dc50 100644
> --- a/io_uring/tctx.c
> +++ b/io_uring/tctx.c
> @@ -81,12 +81,32 @@ __cold int io_uring_alloc_task_context(struct task_struct *task,
> return 0;
> }
>
> -int __io_uring_add_tctx_node(struct io_ring_ctx *ctx)
> +static int io_register_submitter(struct io_ring_ctx *ctx)
> +{
> + int ret = 0;
> +
> + mutex_lock(&ctx->uring_lock);
> + if (!ctx->submitter_task)
> + ctx->submitter_task = get_task_struct(current);
> + else if (ctx->submitter_task != current)
> + ret = -EEXIST;
> + mutex_unlock(&ctx->uring_lock);
> +
> + return ret;
> +}
Seems we don't need this uring_lock:
When we create a ring, we setup ctx->submitter_task before uring fd is
installed so at that time nobody else can enter this code.
when we enter this code later in io_uring_enter, we just read it.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER
2022-06-15 9:41 ` Hao Xu
@ 2022-06-15 10:26 ` Pavel Begunkov
2022-06-15 11:08 ` Hao Xu
0 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-15 10:26 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Jens Axboe
On 6/15/22 10:41, Hao Xu wrote:
> On 6/14/22 22:37, Pavel Begunkov wrote:
>> Add a new IORING_SETUP_SINGLE_ISSUER flag and the userspace visible part
>> of it, i.e. put limitations of submitters. Also, don't allow it together
>> with IOPOLL as we're not going to put it to good use.
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>> include/uapi/linux/io_uring.h | 5 ++++-
>> io_uring/io_uring.c | 7 +++++--
>> io_uring/io_uring_types.h | 1 +
>> io_uring/tctx.c | 27 ++++++++++++++++++++++++---
>> io_uring/tctx.h | 4 ++--
>> 5 files changed, 36 insertions(+), 8 deletions(-)
>>
[...]
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index 15d209f334eb..4b90439808e3 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -3020,6 +3020,8 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
>> io_destroy_buffers(ctx);
>> if (ctx->sq_creds)
>> put_cred(ctx->sq_creds);
>> + if (ctx->submitter_task)
>> + put_task_struct(ctx->submitter_task);
>> /* there are no registered resources left, nobody uses it */
>> if (ctx->rsrc_node)
>> @@ -3752,7 +3754,7 @@ static int io_uring_install_fd(struct io_ring_ctx *ctx, struct file *file)
>> if (fd < 0)
>> return fd;
>> - ret = io_uring_add_tctx_node(ctx);
>> + ret = __io_uring_add_tctx_node(ctx, false);
^^^^^^
Note this one
>> if (ret) {
>> put_unused_fd(fd);
>> return ret;
>> @@ -3972,7 +3974,8 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
>> IORING_SETUP_CLAMP | IORING_SETUP_ATTACH_WQ |
>> IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL |
>> IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG |
>> - IORING_SETUP_SQE128 | IORING_SETUP_CQE32))
>> + IORING_SETUP_SQE128 | IORING_SETUP_CQE32 |
>> + IORING_SETUP_SINGLE_ISSUER))
>> return -EINVAL;
>> return io_uring_create(entries, &p, params);
>> diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
>> index aba0f8cd6f49..f6d0ad25f377 100644
>> --- a/io_uring/io_uring_types.h
>> +++ b/io_uring/io_uring_types.h
>> @@ -241,6 +241,7 @@ struct io_ring_ctx {
>> /* Keep this last, we don't need it for the fast path */
>> struct io_restriction restrictions;
>> + struct task_struct *submitter_task;
>> /* slow path rsrc auxilary data, used by update/register */
>> struct io_rsrc_node *rsrc_backup_node;
>> diff --git a/io_uring/tctx.c b/io_uring/tctx.c
>> index 6adf659687f8..012be261dc50 100644
>> --- a/io_uring/tctx.c
>> +++ b/io_uring/tctx.c
>> @@ -81,12 +81,32 @@ __cold int io_uring_alloc_task_context(struct task_struct *task,
>> return 0;
>> }
>> -int __io_uring_add_tctx_node(struct io_ring_ctx *ctx)
>> +static int io_register_submitter(struct io_ring_ctx *ctx)
>> +{
>> + int ret = 0;
>> +
>> + mutex_lock(&ctx->uring_lock);
>> + if (!ctx->submitter_task)
>> + ctx->submitter_task = get_task_struct(current);
>> + else if (ctx->submitter_task != current)
>> + ret = -EEXIST;
>> + mutex_unlock(&ctx->uring_lock);
>> +
>> + return ret;
>> +}
>
> Seems we don't need this uring_lock:
> When we create a ring, we setup ctx->submitter_task before uring fd is
> installed so at that time nobody else can enter this code.
> when we enter this code later in io_uring_enter, we just read it.
Not really, we specifically don't set it just to the ring's
creator but to the first submitter. That's needed to be able to
create a ring in one task and pass it over to another.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER
2022-06-15 10:26 ` Pavel Begunkov
@ 2022-06-15 11:08 ` Hao Xu
0 siblings, 0 replies; 44+ messages in thread
From: Hao Xu @ 2022-06-15 11:08 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Jens Axboe
On 6/15/22 18:26, Pavel Begunkov wrote:
> On 6/15/22 10:41, Hao Xu wrote:
>> On 6/14/22 22:37, Pavel Begunkov wrote:
>>> Add a new IORING_SETUP_SINGLE_ISSUER flag and the userspace visible part
>>> of it, i.e. put limitations of submitters. Also, don't allow it together
>>> with IOPOLL as we're not going to put it to good use.
>>>
>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>> ---
>>> include/uapi/linux/io_uring.h | 5 ++++-
>>> io_uring/io_uring.c | 7 +++++--
>>> io_uring/io_uring_types.h | 1 +
>>> io_uring/tctx.c | 27 ++++++++++++++++++++++++---
>>> io_uring/tctx.h | 4 ++--
>>> 5 files changed, 36 insertions(+), 8 deletions(-)
>>>
> [...]
>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>> index 15d209f334eb..4b90439808e3 100644
>>> --- a/io_uring/io_uring.c
>>> +++ b/io_uring/io_uring.c
>>> @@ -3020,6 +3020,8 @@ static __cold void io_ring_ctx_free(struct
>>> io_ring_ctx *ctx)
>>> io_destroy_buffers(ctx);
>>> if (ctx->sq_creds)
>>> put_cred(ctx->sq_creds);
>>> + if (ctx->submitter_task)
>>> + put_task_struct(ctx->submitter_task);
>>> /* there are no registered resources left, nobody uses it */
>>> if (ctx->rsrc_node)
>>> @@ -3752,7 +3754,7 @@ static int io_uring_install_fd(struct
>>> io_ring_ctx *ctx, struct file *file)
>>> if (fd < 0)
>>> return fd;
>>> - ret = io_uring_add_tctx_node(ctx);
>>> + ret = __io_uring_add_tctx_node(ctx, false);
>
> ^^^^^^
>
> Note this one
My bad, I read it wrong...
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH for-next v2 22/25] io_uring: pass hash table into poll_find
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (20 preceding siblings ...)
2022-06-14 14:37 ` [PATCH for-next v2 21/25] io_uring: add IORING_SETUP_SINGLE_ISSUER Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 23/25] io_uring: introduce a struct for hash table Pavel Begunkov
` (2 subsequent siblings)
24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
In preparation for having multiple cancellation hash tables, pass a
table pointer into io_poll_find() and other poll cancel functions.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/poll.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/io_uring/poll.c b/io_uring/poll.c
index c4ce98504986..5cc03be365e3 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -556,11 +556,12 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
struct io_cancel_data *cd,
+ struct io_hash_bucket hash_table[],
struct io_hash_bucket **out_bucket)
{
struct io_kiocb *req;
u32 index = hash_long(cd->data, ctx->cancel_hash_bits);
- struct io_hash_bucket *hb = &ctx->cancel_hash[index];
+ struct io_hash_bucket *hb = &hash_table[index];
*out_bucket = NULL;
@@ -584,6 +585,7 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
struct io_cancel_data *cd,
+ struct io_hash_bucket hash_table[],
struct io_hash_bucket **out_bucket)
{
struct io_kiocb *req;
@@ -592,7 +594,7 @@ static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
*out_bucket = NULL;
for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
- struct io_hash_bucket *hb = &ctx->cancel_hash[i];
+ struct io_hash_bucket *hb = &hash_table[i];
spin_lock(&hb->lock);
hlist_for_each_entry(req, &hb->list, hash_node) {
@@ -619,15 +621,16 @@ static bool io_poll_disarm(struct io_kiocb *req)
return true;
}
-int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
+static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
+ struct io_hash_bucket hash_table[])
{
struct io_hash_bucket *bucket;
struct io_kiocb *req;
if (cd->flags & (IORING_ASYNC_CANCEL_FD|IORING_ASYNC_CANCEL_ANY))
- req = io_poll_file_find(ctx, cd, &bucket);
+ req = io_poll_file_find(ctx, cd, ctx->cancel_hash, &bucket);
else
- req = io_poll_find(ctx, false, cd, &bucket);
+ req = io_poll_find(ctx, false, cd, ctx->cancel_hash, &bucket);
if (req)
io_poll_cancel_req(req);
@@ -636,6 +639,11 @@ int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
return req ? 0 : -ENOENT;
}
+int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
+{
+ return __io_poll_cancel(ctx, cd, ctx->cancel_hash);
+}
+
static __poll_t io_poll_parse_events(const struct io_uring_sqe *sqe,
unsigned int flags)
{
@@ -731,7 +739,7 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
int ret2, ret = 0;
bool locked;
- preq = io_poll_find(ctx, true, &cd, &bucket);
+ preq = io_poll_find(ctx, true, &cd, ctx->cancel_hash, &bucket);
if (preq)
ret2 = io_poll_disarm(preq);
if (bucket)
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH for-next v2 23/25] io_uring: introduce a struct for hash table
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (21 preceding siblings ...)
2022-06-14 14:37 ` [PATCH for-next v2 22/25] io_uring: pass hash table into poll_find Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 24/25] io_uring: propagate locking state to poll cancel Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 25/25] io_uring: mutex locked poll hashing Pavel Begunkov
24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Instead of passing around a pointer to hash buckets, add a bit of type
safety and wrap it into a structure.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/cancel.c | 6 +++---
io_uring/cancel.h | 7 +------
io_uring/fdinfo.c | 4 ++--
io_uring/io_uring.c | 29 ++++++++++++++++------------
io_uring/io_uring_types.h | 13 +++++++++++--
io_uring/poll.c | 40 +++++++++++++++++++++------------------
6 files changed, 56 insertions(+), 43 deletions(-)
diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index a253e2ad22eb..f28f0a7d1272 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -193,12 +193,12 @@ int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags)
return IOU_OK;
}
-void init_hash_table(struct io_hash_bucket *hash_table, unsigned size)
+void init_hash_table(struct io_hash_table *table, unsigned size)
{
unsigned int i;
for (i = 0; i < size; i++) {
- spin_lock_init(&hash_table[i].lock);
- INIT_HLIST_HEAD(&hash_table[i].list);
+ spin_lock_init(&table->hbs[i].lock);
+ INIT_HLIST_HEAD(&table->hbs[i].list);
}
}
diff --git a/io_uring/cancel.h b/io_uring/cancel.h
index 556a7dcf160e..fd4cb1a2595d 100644
--- a/io_uring/cancel.h
+++ b/io_uring/cancel.h
@@ -4,9 +4,4 @@ int io_async_cancel_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags);
int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd);
-void init_hash_table(struct io_hash_bucket *hash_table, unsigned size);
-
-struct io_hash_bucket {
- spinlock_t lock;
- struct hlist_head list;
-} ____cacheline_aligned_in_smp;
+void init_hash_table(struct io_hash_table *table, unsigned size);
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index f941c73f5502..344e7d90d557 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -158,8 +158,8 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
mutex_unlock(&ctx->uring_lock);
seq_puts(m, "PollList:\n");
- for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
- struct io_hash_bucket *hb = &ctx->cancel_hash[i];
+ for (i = 0; i < (1U << ctx->cancel_table.hash_bits); i++) {
+ struct io_hash_bucket *hb = &ctx->cancel_table.hbs[i];
struct io_kiocb *req;
spin_lock(&hb->lock);
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 4b90439808e3..4bead16e57f7 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -697,11 +697,23 @@ static __cold void io_fallback_req_func(struct work_struct *work)
percpu_ref_put(&ctx->refs);
}
+static int io_alloc_hash_table(struct io_hash_table *table, unsigned bits)
+{
+ unsigned hash_buckets = 1U << bits;
+ size_t hash_size = hash_buckets * sizeof(table->hbs[0]);
+
+ table->hbs = kmalloc(hash_size, GFP_KERNEL);
+ if (!table->hbs)
+ return -ENOMEM;
+
+ table->hash_bits = bits;
+ init_hash_table(table, hash_buckets);
+ return 0;
+}
+
static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
{
struct io_ring_ctx *ctx;
- unsigned hash_buckets;
- size_t hash_size;
int hash_bits;
ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -717,16 +729,9 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
*/
hash_bits = ilog2(p->cq_entries) - 5;
hash_bits = clamp(hash_bits, 1, 8);
- hash_buckets = 1U << hash_bits;
- hash_size = hash_buckets * sizeof(struct io_hash_bucket);
-
- ctx->cancel_hash_bits = hash_bits;
- ctx->cancel_hash = kmalloc(hash_size, GFP_KERNEL);
- if (!ctx->cancel_hash)
+ if (io_alloc_hash_table(&ctx->cancel_table, hash_bits))
goto err;
- init_hash_table(ctx->cancel_hash, hash_buckets);
-
ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
if (!ctx->dummy_ubuf)
goto err;
@@ -767,7 +772,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
return ctx;
err:
kfree(ctx->dummy_ubuf);
- kfree(ctx->cancel_hash);
+ kfree(ctx->cancel_table.hbs);
kfree(ctx->io_bl);
xa_destroy(&ctx->io_bl_xa);
kfree(ctx);
@@ -3050,7 +3055,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
io_req_caches_free(ctx);
if (ctx->hash_map)
io_wq_put_hash(ctx->hash_map);
- kfree(ctx->cancel_hash);
+ kfree(ctx->cancel_table.hbs);
kfree(ctx->dummy_ubuf);
kfree(ctx->io_bl);
xa_destroy(&ctx->io_bl_xa);
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index f6d0ad25f377..ce2fbe6749bb 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -7,6 +7,16 @@
#include "io-wq.h"
#include "filetable.h"
+struct io_hash_bucket {
+ spinlock_t lock;
+ struct hlist_head list;
+} ____cacheline_aligned_in_smp;
+
+struct io_hash_table {
+ struct io_hash_bucket *hbs;
+ unsigned hash_bits;
+};
+
struct io_uring {
u32 head ____cacheline_aligned_in_smp;
u32 tail ____cacheline_aligned_in_smp;
@@ -222,8 +232,7 @@ struct io_ring_ctx {
* manipulate the list, hence no extra locking is needed there.
*/
struct io_wq_work_list iopoll_list;
- struct io_hash_bucket *cancel_hash;
- unsigned cancel_hash_bits;
+ struct io_hash_table cancel_table;
bool poll_multi_queue;
struct list_head io_buffers_comp;
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 5cc03be365e3..9c7793f5e93b 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -73,9 +73,9 @@ static struct io_poll *io_poll_get_single(struct io_kiocb *req)
static void io_poll_req_insert(struct io_kiocb *req)
{
- struct io_ring_ctx *ctx = req->ctx;
- u32 index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
- struct io_hash_bucket *hb = &ctx->cancel_hash[index];
+ struct io_hash_table *table = &req->ctx->cancel_table;
+ u32 index = hash_long(req->cqe.user_data, table->hash_bits);
+ struct io_hash_bucket *hb = &table->hbs[index];
spin_lock(&hb->lock);
hlist_add_head(&req->hash_node, &hb->list);
@@ -84,8 +84,9 @@ static void io_poll_req_insert(struct io_kiocb *req)
static void io_poll_req_delete(struct io_kiocb *req, struct io_ring_ctx *ctx)
{
- u32 index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
- spinlock_t *lock = &ctx->cancel_hash[index].lock;
+ struct io_hash_table *table = &req->ctx->cancel_table;
+ u32 index = hash_long(req->cqe.user_data, table->hash_bits);
+ spinlock_t *lock = &table->hbs[index].lock;
spin_lock(lock);
hash_del(&req->hash_node);
@@ -533,13 +534,15 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
__cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
bool cancel_all)
{
+ struct io_hash_table *table = &ctx->cancel_table;
+ unsigned nr_buckets = 1U << table->hash_bits;
struct hlist_node *tmp;
struct io_kiocb *req;
bool found = false;
int i;
- for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
- struct io_hash_bucket *hb = &ctx->cancel_hash[i];
+ for (i = 0; i < nr_buckets; i++) {
+ struct io_hash_bucket *hb = &table->hbs[i];
spin_lock(&hb->lock);
hlist_for_each_entry_safe(req, tmp, &hb->list, hash_node) {
@@ -556,12 +559,12 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
struct io_cancel_data *cd,
- struct io_hash_bucket hash_table[],
+ struct io_hash_table *table,
struct io_hash_bucket **out_bucket)
{
struct io_kiocb *req;
- u32 index = hash_long(cd->data, ctx->cancel_hash_bits);
- struct io_hash_bucket *hb = &hash_table[index];
+ u32 index = hash_long(cd->data, table->hash_bits);
+ struct io_hash_bucket *hb = &table->hbs[index];
*out_bucket = NULL;
@@ -585,16 +588,17 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
struct io_cancel_data *cd,
- struct io_hash_bucket hash_table[],
+ struct io_hash_table *table,
struct io_hash_bucket **out_bucket)
{
+ unsigned nr_buckets = 1U << table->hash_bits;
struct io_kiocb *req;
int i;
*out_bucket = NULL;
- for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
- struct io_hash_bucket *hb = &hash_table[i];
+ for (i = 0; i < nr_buckets; i++) {
+ struct io_hash_bucket *hb = &table->hbs[i];
spin_lock(&hb->lock);
hlist_for_each_entry(req, &hb->list, hash_node) {
@@ -622,15 +626,15 @@ static bool io_poll_disarm(struct io_kiocb *req)
}
static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
- struct io_hash_bucket hash_table[])
+ struct io_hash_table *table)
{
struct io_hash_bucket *bucket;
struct io_kiocb *req;
if (cd->flags & (IORING_ASYNC_CANCEL_FD|IORING_ASYNC_CANCEL_ANY))
- req = io_poll_file_find(ctx, cd, ctx->cancel_hash, &bucket);
+ req = io_poll_file_find(ctx, cd, table, &bucket);
else
- req = io_poll_find(ctx, false, cd, ctx->cancel_hash, &bucket);
+ req = io_poll_find(ctx, false, cd, table, &bucket);
if (req)
io_poll_cancel_req(req);
@@ -641,7 +645,7 @@ static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
{
- return __io_poll_cancel(ctx, cd, ctx->cancel_hash);
+ return __io_poll_cancel(ctx, cd, &ctx->cancel_table);
}
static __poll_t io_poll_parse_events(const struct io_uring_sqe *sqe,
@@ -739,7 +743,7 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
int ret2, ret = 0;
bool locked;
- preq = io_poll_find(ctx, true, &cd, ctx->cancel_hash, &bucket);
+ preq = io_poll_find(ctx, true, &cd, &ctx->cancel_table, &bucket);
if (preq)
ret2 = io_poll_disarm(preq);
if (bucket)
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH for-next v2 24/25] io_uring: propagate locking state to poll cancel
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (22 preceding siblings ...)
2022-06-14 14:37 ` [PATCH for-next v2 23/25] io_uring: introduce a struct for hash table Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
2022-06-14 14:37 ` [PATCH for-next v2 25/25] io_uring: mutex locked poll hashing Pavel Begunkov
24 siblings, 0 replies; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Poll cancellation will be soon need to grab ->uring_lock inside, pass
the locking state, i.e. issue_flags, inside the cancellation functions.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/cancel.c | 7 ++++---
io_uring/cancel.h | 3 ++-
io_uring/poll.c | 3 ++-
io_uring/poll.h | 3 ++-
io_uring/timeout.c | 3 ++-
5 files changed, 12 insertions(+), 7 deletions(-)
diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index f28f0a7d1272..f07bfd27c98a 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -78,7 +78,8 @@ static int io_async_cancel_one(struct io_uring_task *tctx,
return ret;
}
-int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd)
+int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd,
+ unsigned issue_flags)
{
struct io_ring_ctx *ctx = req->ctx;
int ret;
@@ -93,7 +94,7 @@ int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd)
if (!ret)
return 0;
- ret = io_poll_cancel(ctx, cd);
+ ret = io_poll_cancel(ctx, cd, issue_flags);
if (ret != -ENOENT)
return ret;
@@ -136,7 +137,7 @@ static int __io_async_cancel(struct io_cancel_data *cd, struct io_kiocb *req,
int ret, nr = 0;
do {
- ret = io_try_cancel(req, cd);
+ ret = io_try_cancel(req, cd, issue_flags);
if (ret == -ENOENT)
break;
if (!all)
diff --git a/io_uring/cancel.h b/io_uring/cancel.h
index fd4cb1a2595d..8dd259dc383e 100644
--- a/io_uring/cancel.h
+++ b/io_uring/cancel.h
@@ -3,5 +3,6 @@
int io_async_cancel_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags);
-int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd);
+int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd,
+ unsigned int issue_flags);
void init_hash_table(struct io_hash_table *table, unsigned size);
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 9c7793f5e93b..07157da1c2cb 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -643,7 +643,8 @@ static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
return req ? 0 : -ENOENT;
}
-int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
+int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
+ unsigned issue_flags)
{
return __io_poll_cancel(ctx, cd, &ctx->cancel_table);
}
diff --git a/io_uring/poll.h b/io_uring/poll.h
index cc75c1567a84..fa3e19790281 100644
--- a/io_uring/poll.h
+++ b/io_uring/poll.h
@@ -24,7 +24,8 @@ int io_poll_add(struct io_kiocb *req, unsigned int issue_flags);
int io_poll_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags);
-int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd);
+int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
+ unsigned issue_flags);
int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags);
bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
bool cancel_all);
diff --git a/io_uring/timeout.c b/io_uring/timeout.c
index 69cca42d6835..526fc8b2e3b6 100644
--- a/io_uring/timeout.c
+++ b/io_uring/timeout.c
@@ -262,6 +262,7 @@ int io_timeout_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
static void io_req_task_link_timeout(struct io_kiocb *req, bool *locked)
{
+ unsigned issue_flags = *locked ? 0 : IO_URING_F_UNLOCKED;
struct io_timeout *timeout = io_kiocb_to_cmd(req);
struct io_kiocb *prev = timeout->prev;
int ret = -ENOENT;
@@ -273,7 +274,7 @@ static void io_req_task_link_timeout(struct io_kiocb *req, bool *locked)
.data = prev->cqe.user_data,
};
- ret = io_try_cancel(req, &cd);
+ ret = io_try_cancel(req, &cd, issue_flags);
}
io_req_set_res(req, ret ?: -ETIME, 0);
io_req_complete_post(req);
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH for-next v2 25/25] io_uring: mutex locked poll hashing
2022-06-14 14:36 [PATCH for-next v2 00/25] 5.20 cleanups and poll optimisations Pavel Begunkov
` (23 preceding siblings ...)
2022-06-14 14:37 ` [PATCH for-next v2 24/25] io_uring: propagate locking state to poll cancel Pavel Begunkov
@ 2022-06-14 14:37 ` Pavel Begunkov
2022-06-15 12:53 ` Hao Xu
24 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-14 14:37 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Currently we do two extra spin lock/unlock pairs to add a poll/apoll
request to the cancellation hash table and remove it from there.
On the submission side we often already hold ->uring_lock and tw
completion is likely to hold it as well. Add a second cancellation hash
table protected by ->uring_lock. In concerns for latency because of a
need to have the mutex locked on the completion side, use the new table
only in following cases:
1) IORING_SETUP_SINGLE_ISSUER: only one task grabs uring_lock, so there
is no contention and so the main tw hander will always end up
grabbing it before calling into callbacks.
2) IORING_SETUP_SQPOLL: same as with single issuer, only one task is
using ->uring_lock.
3) apoll: we normally grab the lock on the completion side anyway to
execute the request, so it's free.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring.c | 9 +++-
io_uring/io_uring_types.h | 4 ++
io_uring/poll.c | 111 ++++++++++++++++++++++++++++++--------
3 files changed, 102 insertions(+), 22 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 4bead16e57f7..1395176bc2ea 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -731,6 +731,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
hash_bits = clamp(hash_bits, 1, 8);
if (io_alloc_hash_table(&ctx->cancel_table, hash_bits))
goto err;
+ if (io_alloc_hash_table(&ctx->cancel_table_locked, hash_bits))
+ goto err;
ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
if (!ctx->dummy_ubuf)
@@ -773,6 +775,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
err:
kfree(ctx->dummy_ubuf);
kfree(ctx->cancel_table.hbs);
+ kfree(ctx->cancel_table_locked.hbs);
kfree(ctx->io_bl);
xa_destroy(&ctx->io_bl_xa);
kfree(ctx);
@@ -3056,6 +3059,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
if (ctx->hash_map)
io_wq_put_hash(ctx->hash_map);
kfree(ctx->cancel_table.hbs);
+ kfree(ctx->cancel_table_locked.hbs);
kfree(ctx->dummy_ubuf);
kfree(ctx->io_bl);
xa_destroy(&ctx->io_bl_xa);
@@ -3217,12 +3221,13 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
__io_cqring_overflow_flush(ctx, true);
xa_for_each(&ctx->personalities, index, creds)
io_unregister_personality(ctx, index);
+ if (ctx->rings)
+ io_poll_remove_all(ctx, NULL, true);
mutex_unlock(&ctx->uring_lock);
/* failed during ring init, it couldn't have issued any requests */
if (ctx->rings) {
io_kill_timeouts(ctx, NULL, true);
- io_poll_remove_all(ctx, NULL, true);
/* if we failed setting up the ctx, we might not have any rings */
io_iopoll_try_reap_events(ctx);
}
@@ -3347,7 +3352,9 @@ static __cold void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
}
ret |= io_cancel_defer_files(ctx, task, cancel_all);
+ mutex_lock(&ctx->uring_lock);
ret |= io_poll_remove_all(ctx, task, cancel_all);
+ mutex_unlock(&ctx->uring_lock);
ret |= io_kill_timeouts(ctx, task, cancel_all);
if (task)
ret |= io_run_task_work();
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index ce2fbe6749bb..557b8e7719c9 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -189,6 +189,7 @@ struct io_ring_ctx {
struct xarray io_bl_xa;
struct list_head io_buffers_cache;
+ struct io_hash_table cancel_table_locked;
struct list_head cq_overflow_list;
struct list_head apoll_cache;
struct xarray personalities;
@@ -323,6 +324,7 @@ enum {
/* keep async read/write and isreg together and in order */
REQ_F_SUPPORT_NOWAIT_BIT,
REQ_F_ISREG_BIT,
+ REQ_F_HASH_LOCKED_BIT,
/* not a real bit, just to check we're not overflowing the space */
__REQ_F_LAST_BIT,
@@ -388,6 +390,8 @@ enum {
REQ_F_APOLL_MULTISHOT = BIT(REQ_F_APOLL_MULTISHOT_BIT),
/* recvmsg special flag, clear EPOLLIN */
REQ_F_CLEAR_POLLIN = BIT(REQ_F_CLEAR_POLLIN_BIT),
+ /* hashed into ->cancel_hash_locked */
+ REQ_F_HASH_LOCKED = BIT(REQ_F_HASH_LOCKED_BIT),
};
typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked);
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 07157da1c2cb..d20484c1cbb7 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -93,6 +93,26 @@ static void io_poll_req_delete(struct io_kiocb *req, struct io_ring_ctx *ctx)
spin_unlock(lock);
}
+static void io_poll_req_insert_locked(struct io_kiocb *req)
+{
+ struct io_hash_table *table = &req->ctx->cancel_table_locked;
+ u32 index = hash_long(req->cqe.user_data, table->hash_bits);
+
+ hlist_add_head(&req->hash_node, &table->hbs[index].list);
+}
+
+static void io_poll_tw_hash_eject(struct io_kiocb *req, bool *locked)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+
+ if (req->flags & REQ_F_HASH_LOCKED) {
+ io_tw_lock(ctx, locked);
+ hash_del(&req->hash_node);
+ } else {
+ io_poll_req_delete(req, ctx);
+ }
+}
+
static void io_init_poll_iocb(struct io_poll *poll, __poll_t events,
wait_queue_func_t wake_func)
{
@@ -217,7 +237,6 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
static void io_poll_task_func(struct io_kiocb *req, bool *locked)
{
- struct io_ring_ctx *ctx = req->ctx;
int ret;
ret = io_poll_check_events(req, locked);
@@ -234,7 +253,8 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
}
io_poll_remove_entries(req);
- io_poll_req_delete(req, ctx);
+ io_poll_tw_hash_eject(req, locked);
+
io_req_set_res(req, req->cqe.res, 0);
io_req_task_complete(req, locked);
}
@@ -248,7 +268,7 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
return;
io_poll_remove_entries(req);
- io_poll_req_delete(req, req->ctx);
+ io_poll_tw_hash_eject(req, locked);
if (!ret)
io_req_task_submit(req, locked);
@@ -442,7 +462,10 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
return 0;
}
- io_poll_req_insert(req);
+ if (req->flags & REQ_F_HASH_LOCKED)
+ io_poll_req_insert_locked(req);
+ else
+ io_poll_req_insert(req);
if (mask && (poll->events & EPOLLET)) {
/* can't multishot if failed, just queue the event we've got */
@@ -480,6 +503,15 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
__poll_t mask = POLLPRI | POLLERR | EPOLLET;
int ret;
+ /*
+ * apoll requests already grab the mutex to complete in the tw handler,
+ * so removal from the mutex-backed hash is free, use it by default.
+ */
+ if (issue_flags & IO_URING_F_UNLOCKED)
+ req->flags &= ~REQ_F_HASH_LOCKED;
+ else
+ req->flags |= REQ_F_HASH_LOCKED;
+
if (!def->pollin && !def->pollout)
return IO_APOLL_ABORTED;
if (!file_can_poll(req->file))
@@ -528,13 +560,10 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
return IO_APOLL_OK;
}
-/*
- * Returns true if we found and killed one or more poll requests
- */
-__cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
- bool cancel_all)
+static __cold bool io_poll_remove_all_table(struct task_struct *tsk,
+ struct io_hash_table *table,
+ bool cancel_all)
{
- struct io_hash_table *table = &ctx->cancel_table;
unsigned nr_buckets = 1U << table->hash_bits;
struct hlist_node *tmp;
struct io_kiocb *req;
@@ -557,6 +586,17 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
return found;
}
+/*
+ * Returns true if we found and killed one or more poll requests
+ */
+__cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
+ bool cancel_all)
+ __must_hold(&ctx->uring_lock)
+{
+ return io_poll_remove_all_table(tsk, &ctx->cancel_table, cancel_all) |
+ io_poll_remove_all_table(tsk, &ctx->cancel_table_locked, cancel_all);
+}
+
static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
struct io_cancel_data *cd,
struct io_hash_table *table,
@@ -616,13 +656,15 @@ static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
return NULL;
}
-static bool io_poll_disarm(struct io_kiocb *req)
+static int io_poll_disarm(struct io_kiocb *req)
{
+ if (!req)
+ return -ENOENT;
if (!io_poll_get_ownership(req))
- return false;
+ return -EALREADY;
io_poll_remove_entries(req);
hash_del(&req->hash_node);
- return true;
+ return 0;
}
static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
@@ -646,7 +688,16 @@ static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
unsigned issue_flags)
{
- return __io_poll_cancel(ctx, cd, &ctx->cancel_table);
+ int ret;
+
+ ret = __io_poll_cancel(ctx, cd, &ctx->cancel_table);
+ if (ret != -ENOENT)
+ return ret;
+
+ io_ring_submit_lock(ctx, issue_flags);
+ ret = __io_poll_cancel(ctx, cd, &ctx->cancel_table_locked);
+ io_ring_submit_unlock(ctx, issue_flags);
+ return ret;
}
static __poll_t io_poll_parse_events(const struct io_uring_sqe *sqe,
@@ -721,6 +772,16 @@ int io_poll_add(struct io_kiocb *req, unsigned int issue_flags)
ipt.pt._qproc = io_poll_queue_proc;
+ /*
+ * If sqpoll or single issuer, there is no contention for ->uring_lock
+ * and we'll end up holding it in tw handlers anyway.
+ */
+ if (!(issue_flags & IO_URING_F_UNLOCKED) &&
+ (req->ctx->flags & (IORING_SETUP_SQPOLL | IORING_SETUP_SINGLE_ISSUER)))
+ req->flags |= REQ_F_HASH_LOCKED;
+ else
+ req->flags &= ~REQ_F_HASH_LOCKED;
+
ret = __io_arm_poll_handler(req, poll, &ipt, poll->events);
if (ipt.error) {
return ipt.error;
@@ -745,20 +806,28 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
bool locked;
preq = io_poll_find(ctx, true, &cd, &ctx->cancel_table, &bucket);
- if (preq)
- ret2 = io_poll_disarm(preq);
+ ret2 = io_poll_disarm(preq);
if (bucket)
spin_unlock(&bucket->lock);
-
- if (!preq) {
- ret = -ENOENT;
+ if (!ret2)
+ goto found;
+ if (ret2 != -ENOENT) {
+ ret = ret2;
goto out;
}
- if (!ret2) {
- ret = -EALREADY;
+
+ io_ring_submit_lock(ctx, issue_flags);
+ preq = io_poll_find(ctx, true, &cd, &ctx->cancel_table_locked, &bucket);
+ ret2 = io_poll_disarm(preq);
+ if (bucket)
+ spin_unlock(&bucket->lock);
+ io_ring_submit_unlock(ctx, issue_flags);
+ if (ret2) {
+ ret = ret2;
goto out;
}
+found:
if (poll_update->update_events || poll_update->update_user_data) {
/* only mask one event flags, keep behavior flags */
if (poll_update->update_events) {
--
2.36.1
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH for-next v2 25/25] io_uring: mutex locked poll hashing
2022-06-14 14:37 ` [PATCH for-next v2 25/25] io_uring: mutex locked poll hashing Pavel Begunkov
@ 2022-06-15 12:53 ` Hao Xu
2022-06-15 13:55 ` Pavel Begunkov
0 siblings, 1 reply; 44+ messages in thread
From: Hao Xu @ 2022-06-15 12:53 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Jens Axboe
On 6/14/22 22:37, Pavel Begunkov wrote:
> Currently we do two extra spin lock/unlock pairs to add a poll/apoll
> request to the cancellation hash table and remove it from there.
>
> On the submission side we often already hold ->uring_lock and tw
> completion is likely to hold it as well. Add a second cancellation hash
> table protected by ->uring_lock. In concerns for latency because of a
> need to have the mutex locked on the completion side, use the new table
> only in following cases:
>
> 1) IORING_SETUP_SINGLE_ISSUER: only one task grabs uring_lock, so there
> is no contention and so the main tw hander will always end up
> grabbing it before calling into callbacks.
This statement seems not true, the io-worker may grab the uring lock,
and that's why the [1] place I marked below is needed, right? Or do I
miss something?
>
> 2) IORING_SETUP_SQPOLL: same as with single issuer, only one task is
> using ->uring_lock.
same as above.
>
> 3) apoll: we normally grab the lock on the completion side anyway to
> execute the request, so it's free.
>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> io_uring/io_uring.c | 9 +++-
> io_uring/io_uring_types.h | 4 ++
> io_uring/poll.c | 111 ++++++++++++++++++++++++++++++--------
> 3 files changed, 102 insertions(+), 22 deletions(-)
>
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 4bead16e57f7..1395176bc2ea 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -731,6 +731,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
> hash_bits = clamp(hash_bits, 1, 8);
> if (io_alloc_hash_table(&ctx->cancel_table, hash_bits))
> goto err;
> + if (io_alloc_hash_table(&ctx->cancel_table_locked, hash_bits))
> + goto err;
>
> ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
> if (!ctx->dummy_ubuf)
> @@ -773,6 +775,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
> err:
> kfree(ctx->dummy_ubuf);
> kfree(ctx->cancel_table.hbs);
> + kfree(ctx->cancel_table_locked.hbs);
> kfree(ctx->io_bl);
> xa_destroy(&ctx->io_bl_xa);
> kfree(ctx);
> @@ -3056,6 +3059,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
> if (ctx->hash_map)
> io_wq_put_hash(ctx->hash_map);
> kfree(ctx->cancel_table.hbs);
> + kfree(ctx->cancel_table_locked.hbs);
> kfree(ctx->dummy_ubuf);
> kfree(ctx->io_bl);
> xa_destroy(&ctx->io_bl_xa);
> @@ -3217,12 +3221,13 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
> __io_cqring_overflow_flush(ctx, true);
> xa_for_each(&ctx->personalities, index, creds)
> io_unregister_personality(ctx, index);
> + if (ctx->rings)
> + io_poll_remove_all(ctx, NULL, true);
> mutex_unlock(&ctx->uring_lock);
>
> /* failed during ring init, it couldn't have issued any requests */
> if (ctx->rings) {
> io_kill_timeouts(ctx, NULL, true);
> - io_poll_remove_all(ctx, NULL, true);
> /* if we failed setting up the ctx, we might not have any rings */
> io_iopoll_try_reap_events(ctx);
> }
> @@ -3347,7 +3352,9 @@ static __cold void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
> }
>
> ret |= io_cancel_defer_files(ctx, task, cancel_all);
> + mutex_lock(&ctx->uring_lock);
> ret |= io_poll_remove_all(ctx, task, cancel_all);
> + mutex_unlock(&ctx->uring_lock);
> ret |= io_kill_timeouts(ctx, task, cancel_all);
> if (task)
> ret |= io_run_task_work();
> diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
> index ce2fbe6749bb..557b8e7719c9 100644
> --- a/io_uring/io_uring_types.h
> +++ b/io_uring/io_uring_types.h
> @@ -189,6 +189,7 @@ struct io_ring_ctx {
> struct xarray io_bl_xa;
> struct list_head io_buffers_cache;
>
> + struct io_hash_table cancel_table_locked;
> struct list_head cq_overflow_list;
> struct list_head apoll_cache;
> struct xarray personalities;
> @@ -323,6 +324,7 @@ enum {
> /* keep async read/write and isreg together and in order */
> REQ_F_SUPPORT_NOWAIT_BIT,
> REQ_F_ISREG_BIT,
> + REQ_F_HASH_LOCKED_BIT,
>
> /* not a real bit, just to check we're not overflowing the space */
> __REQ_F_LAST_BIT,
> @@ -388,6 +390,8 @@ enum {
> REQ_F_APOLL_MULTISHOT = BIT(REQ_F_APOLL_MULTISHOT_BIT),
> /* recvmsg special flag, clear EPOLLIN */
> REQ_F_CLEAR_POLLIN = BIT(REQ_F_CLEAR_POLLIN_BIT),
> + /* hashed into ->cancel_hash_locked */
> + REQ_F_HASH_LOCKED = BIT(REQ_F_HASH_LOCKED_BIT),
> };
>
> typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked);
> diff --git a/io_uring/poll.c b/io_uring/poll.c
> index 07157da1c2cb..d20484c1cbb7 100644
> --- a/io_uring/poll.c
> +++ b/io_uring/poll.c
> @@ -93,6 +93,26 @@ static void io_poll_req_delete(struct io_kiocb *req, struct io_ring_ctx *ctx)
> spin_unlock(lock);
> }
>
> +static void io_poll_req_insert_locked(struct io_kiocb *req)
> +{
> + struct io_hash_table *table = &req->ctx->cancel_table_locked;
> + u32 index = hash_long(req->cqe.user_data, table->hash_bits);
> +
> + hlist_add_head(&req->hash_node, &table->hbs[index].list);
> +}
> +
> +static void io_poll_tw_hash_eject(struct io_kiocb *req, bool *locked)
> +{
> + struct io_ring_ctx *ctx = req->ctx;
> +
> + if (req->flags & REQ_F_HASH_LOCKED) {
> + io_tw_lock(ctx, locked);
[1]
> + hash_del(&req->hash_node);
> + } else {
> + io_poll_req_delete(req, ctx);
> + }
> +}
> +
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH for-next v2 25/25] io_uring: mutex locked poll hashing
2022-06-15 12:53 ` Hao Xu
@ 2022-06-15 13:55 ` Pavel Begunkov
2022-06-15 15:19 ` Hao Xu
0 siblings, 1 reply; 44+ messages in thread
From: Pavel Begunkov @ 2022-06-15 13:55 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Jens Axboe
On 6/15/22 13:53, Hao Xu wrote:
> On 6/14/22 22:37, Pavel Begunkov wrote:
>> Currently we do two extra spin lock/unlock pairs to add a poll/apoll
>> request to the cancellation hash table and remove it from there.
>>
>> On the submission side we often already hold ->uring_lock and tw
>> completion is likely to hold it as well. Add a second cancellation hash
>> table protected by ->uring_lock. In concerns for latency because of a
>> need to have the mutex locked on the completion side, use the new table
>> only in following cases:
>>
>> 1) IORING_SETUP_SINGLE_ISSUER: only one task grabs uring_lock, so there
>> is no contention and so the main tw hander will always end up
>> grabbing it before calling into callbacks.
>
> This statement seems not true, the io-worker may grab the uring lock,
> and that's why the [1] place I marked below is needed, right? Or do I
> miss something?
Ok, "almost always ends up ...". The thing is io-wq is discouraged
taking the lock and if it does can do only briefly and without any
blocking/waiting. So yeah, it might be not taken at [1] but it's
rather unlikely.
>> 2) IORING_SETUP_SQPOLL: same as with single issuer, only one task is
>> using ->uring_lock.
>
> same as above.
>
>>
>> 3) apoll: we normally grab the lock on the completion side anyway to
>> execute the request, so it's free.
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>> io_uring/io_uring.c | 9 +++-
>> io_uring/io_uring_types.h | 4 ++
>> io_uring/poll.c | 111 ++++++++++++++++++++++++++++++--------
>> 3 files changed, 102 insertions(+), 22 deletions(-)
>>
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index 4bead16e57f7..1395176bc2ea 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -731,6 +731,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
>> hash_bits = clamp(hash_bits, 1, 8);
>> if (io_alloc_hash_table(&ctx->cancel_table, hash_bits))
>> goto err;
>> + if (io_alloc_hash_table(&ctx->cancel_table_locked, hash_bits))
>> + goto err;
>> ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
>> if (!ctx->dummy_ubuf)
>> @@ -773,6 +775,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
>> err:
>> kfree(ctx->dummy_ubuf);
>> kfree(ctx->cancel_table.hbs);
>> + kfree(ctx->cancel_table_locked.hbs);
>> kfree(ctx->io_bl);
>> xa_destroy(&ctx->io_bl_xa);
>> kfree(ctx);
>> @@ -3056,6 +3059,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
>> if (ctx->hash_map)
>> io_wq_put_hash(ctx->hash_map);
>> kfree(ctx->cancel_table.hbs);
>> + kfree(ctx->cancel_table_locked.hbs);
>> kfree(ctx->dummy_ubuf);
>> kfree(ctx->io_bl);
>> xa_destroy(&ctx->io_bl_xa);
>> @@ -3217,12 +3221,13 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
>> __io_cqring_overflow_flush(ctx, true);
>> xa_for_each(&ctx->personalities, index, creds)
>> io_unregister_personality(ctx, index);
>> + if (ctx->rings)
>> + io_poll_remove_all(ctx, NULL, true);
>> mutex_unlock(&ctx->uring_lock);
>> /* failed during ring init, it couldn't have issued any requests */
>> if (ctx->rings) {
>> io_kill_timeouts(ctx, NULL, true);
>> - io_poll_remove_all(ctx, NULL, true);
>> /* if we failed setting up the ctx, we might not have any rings */
>> io_iopoll_try_reap_events(ctx);
>> }
>> @@ -3347,7 +3352,9 @@ static __cold void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
>> }
>> ret |= io_cancel_defer_files(ctx, task, cancel_all);
>> + mutex_lock(&ctx->uring_lock);
>> ret |= io_poll_remove_all(ctx, task, cancel_all);
>> + mutex_unlock(&ctx->uring_lock);
>> ret |= io_kill_timeouts(ctx, task, cancel_all);
>> if (task)
>> ret |= io_run_task_work();
>> diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
>> index ce2fbe6749bb..557b8e7719c9 100644
>> --- a/io_uring/io_uring_types.h
>> +++ b/io_uring/io_uring_types.h
>> @@ -189,6 +189,7 @@ struct io_ring_ctx {
>> struct xarray io_bl_xa;
>> struct list_head io_buffers_cache;
>> + struct io_hash_table cancel_table_locked;
>> struct list_head cq_overflow_list;
>> struct list_head apoll_cache;
>> struct xarray personalities;
>> @@ -323,6 +324,7 @@ enum {
>> /* keep async read/write and isreg together and in order */
>> REQ_F_SUPPORT_NOWAIT_BIT,
>> REQ_F_ISREG_BIT,
>> + REQ_F_HASH_LOCKED_BIT,
>> /* not a real bit, just to check we're not overflowing the space */
>> __REQ_F_LAST_BIT,
>> @@ -388,6 +390,8 @@ enum {
>> REQ_F_APOLL_MULTISHOT = BIT(REQ_F_APOLL_MULTISHOT_BIT),
>> /* recvmsg special flag, clear EPOLLIN */
>> REQ_F_CLEAR_POLLIN = BIT(REQ_F_CLEAR_POLLIN_BIT),
>> + /* hashed into ->cancel_hash_locked */
>> + REQ_F_HASH_LOCKED = BIT(REQ_F_HASH_LOCKED_BIT),
>> };
>> typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked);
>> diff --git a/io_uring/poll.c b/io_uring/poll.c
>> index 07157da1c2cb..d20484c1cbb7 100644
>> --- a/io_uring/poll.c
>> +++ b/io_uring/poll.c
>> @@ -93,6 +93,26 @@ static void io_poll_req_delete(struct io_kiocb *req, struct io_ring_ctx *ctx)
>> spin_unlock(lock);
>> }
>> +static void io_poll_req_insert_locked(struct io_kiocb *req)
>> +{
>> + struct io_hash_table *table = &req->ctx->cancel_table_locked;
>> + u32 index = hash_long(req->cqe.user_data, table->hash_bits);
>> +
>> + hlist_add_head(&req->hash_node, &table->hbs[index].list);
>> +}
>> +
>> +static void io_poll_tw_hash_eject(struct io_kiocb *req, bool *locked)
>> +{
>> + struct io_ring_ctx *ctx = req->ctx;
>> +
>> + if (req->flags & REQ_F_HASH_LOCKED) {
>> + io_tw_lock(ctx, locked);
>
> [1]
>
>> + hash_del(&req->hash_node);
>> + } else {
>> + io_poll_req_delete(req, ctx);
>> + }
>> +}
>> +
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH for-next v2 25/25] io_uring: mutex locked poll hashing
2022-06-15 13:55 ` Pavel Begunkov
@ 2022-06-15 15:19 ` Hao Xu
0 siblings, 0 replies; 44+ messages in thread
From: Hao Xu @ 2022-06-15 15:19 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Jens Axboe
On 6/15/22 21:55, Pavel Begunkov wrote:
> On 6/15/22 13:53, Hao Xu wrote:
>> On 6/14/22 22:37, Pavel Begunkov wrote:
>>> Currently we do two extra spin lock/unlock pairs to add a poll/apoll
>>> request to the cancellation hash table and remove it from there.
>>>
>>> On the submission side we often already hold ->uring_lock and tw
>>> completion is likely to hold it as well. Add a second cancellation hash
>>> table protected by ->uring_lock. In concerns for latency because of a
>>> need to have the mutex locked on the completion side, use the new table
>>> only in following cases:
>>>
>>> 1) IORING_SETUP_SINGLE_ISSUER: only one task grabs uring_lock, so there
>>> is no contention and so the main tw hander will always end up
>>> grabbing it before calling into callbacks.
>>
>> This statement seems not true, the io-worker may grab the uring lock,
>> and that's why the [1] place I marked below is needed, right? Or do I
>> miss something?
>
> Ok, "almost always ends up ...". The thing is io-wq is discouraged
> taking the lock and if it does can do only briefly and without any
> blocking/waiting. So yeah, it might be not taken at [1] but it's
> rather unlikely.
>
Agree, What I meant to say is the code at [1] deserve a comment for it
since I have a feeling that new developers into io_uring may struggle to
figure it out with commit message saying like this.
^ permalink raw reply [flat|nested] 44+ messages in thread