* [PATCH 0/2] cancel_hash per entry lock
@ 2022-05-29 16:19 Hao Xu
2022-05-29 16:19 ` [PATCH 1/2] io_uring: add an argument for io_poll_disarm() Hao Xu
2022-05-29 16:20 ` [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock Hao Xu
0 siblings, 2 replies; 15+ messages in thread
From: Hao Xu @ 2022-05-29 16:19 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, Pavel Begunkov
From: Hao Xu <[email protected]>
Make per entry lock for cancel_hash array, this reduces usage of
completion_lock and contension between cancel_hash entries.
Hao Xu (2):
io_uring: add an argument for io_poll_disarm()
io_uring: switch cancel_hash to use per list spinlock
io_uring/cancel.c | 12 ++++++++++--
io_uring/cancel.h | 1 +
io_uring/io_uring.c | 9 +++++++++
io_uring/io_uring_types.h | 1 +
io_uring/poll.c | 38 +++++++++++++++++++++-----------------
5 files changed, 42 insertions(+), 19 deletions(-)
base-commit: cbbf9a4865526c55a6a984ef69578def9944f319
--
2.25.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] io_uring: add an argument for io_poll_disarm()
2022-05-29 16:19 [PATCH 0/2] cancel_hash per entry lock Hao Xu
@ 2022-05-29 16:19 ` Hao Xu
2022-05-29 16:20 ` [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock Hao Xu
1 sibling, 0 replies; 15+ messages in thread
From: Hao Xu @ 2022-05-29 16:19 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, Pavel Begunkov
From: Hao Xu <[email protected]>
From: Hao Xu <[email protected]>
Add an argument for io_poll_disarm() for later use.
Signed-off-by: Hao Xu <[email protected]>
---
io_uring/poll.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 728f6e7b47c5..c8982c5ef0fa 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -561,8 +561,9 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
{
struct hlist_head *list;
struct io_kiocb *req;
+ u32 index = hash_long(cd->data, ctx->cancel_hash_bits);
- list = &ctx->cancel_hash[hash_long(cd->data, ctx->cancel_hash_bits)];
+ list = &ctx->cancel_hash[index];
hlist_for_each_entry(req, list, hash_node) {
if (cd->data != req->cqe.user_data)
continue;
@@ -573,6 +574,7 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
continue;
req->work.cancel_seq = cd->seq;
}
+ cd->flags = index;
return req;
}
return NULL;
@@ -602,7 +604,7 @@ static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
return NULL;
}
-static bool io_poll_disarm(struct io_kiocb *req)
+static bool io_poll_disarm(struct io_kiocb *req, u32 index)
__must_hold(&ctx->completion_lock)
{
if (!io_poll_get_ownership(req))
@@ -724,7 +726,7 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
spin_lock(&ctx->completion_lock);
preq = io_poll_find(ctx, true, &cd);
- if (!preq || !io_poll_disarm(preq)) {
+ if (!preq || !io_poll_disarm(preq, cd.flags)) {
spin_unlock(&ctx->completion_lock);
ret = preq ? -EALREADY : -ENOENT;
goto out;
--
2.25.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
2022-05-29 16:19 [PATCH 0/2] cancel_hash per entry lock Hao Xu
2022-05-29 16:19 ` [PATCH 1/2] io_uring: add an argument for io_poll_disarm() Hao Xu
@ 2022-05-29 16:20 ` Hao Xu
2022-05-29 16:25 ` Jens Axboe
1 sibling, 1 reply; 15+ messages in thread
From: Hao Xu @ 2022-05-29 16:20 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, Pavel Begunkov
From: Hao Xu <[email protected]>
From: Hao Xu <[email protected]>
Use per list lock for cancel_hash, this removes some completion lock
invocation and remove contension between different cancel_hash entries
Signed-off-by: Hao Xu <[email protected]>
---
io_uring/cancel.c | 12 ++++++++++--
io_uring/cancel.h | 1 +
io_uring/io_uring.c | 9 +++++++++
io_uring/io_uring_types.h | 1 +
io_uring/poll.c | 30 ++++++++++++++++--------------
5 files changed, 37 insertions(+), 16 deletions(-)
diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index 83cceb52d82d..0b1aa3ab7664 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -93,14 +93,14 @@ int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd)
if (!ret)
return 0;
- spin_lock(&ctx->completion_lock);
ret = io_poll_cancel(ctx, cd);
if (ret != -ENOENT)
goto out;
+ spin_lock(&ctx->completion_lock);
if (!(cd->flags & IORING_ASYNC_CANCEL_FD))
ret = io_timeout_cancel(ctx, cd);
-out:
spin_unlock(&ctx->completion_lock);
+out:
return ret;
}
@@ -192,3 +192,11 @@ int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags)
io_req_set_res(req, ret, 0);
return IOU_OK;
}
+
+inline void init_cancel_hash_locks(spinlock_t *cancel_hash_locks, unsigned size)
+{
+ int i;
+
+ for (i = 0; i < size; i++)
+ spin_lock_init(&cancel_hash_locks[i]);
+}
diff --git a/io_uring/cancel.h b/io_uring/cancel.h
index 4f35d8696325..fdec2595797e 100644
--- a/io_uring/cancel.h
+++ b/io_uring/cancel.h
@@ -4,3 +4,4 @@ int io_async_cancel_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags);
int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd);
+inline void init_cancel_hash_locks(spinlock_t *cancel_hash_locks, unsigned size);
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index f31d3446dcbf..6eaa27aea197 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -706,7 +706,14 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
GFP_KERNEL);
if (!ctx->cancel_hash)
goto err;
+ ctx->cancel_hash_locks =
+ kmalloc((1U << hash_bits) * sizeof(spinlock_t),
+ GFP_KERNEL);
+ if (!ctx->cancel_hash_locks)
+ goto err;
+
__hash_init(ctx->cancel_hash, 1U << hash_bits);
+ init_cancel_hash_locks(ctx->cancel_hash_locks, 1U << hash_bits);
ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
if (!ctx->dummy_ubuf)
@@ -749,6 +756,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
err:
kfree(ctx->dummy_ubuf);
kfree(ctx->cancel_hash);
+ kfree(ctx->cancel_hash_locks);
kfree(ctx->io_bl);
xa_destroy(&ctx->io_bl_xa);
kfree(ctx);
@@ -3045,6 +3053,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
if (ctx->hash_map)
io_wq_put_hash(ctx->hash_map);
kfree(ctx->cancel_hash);
+ kfree(ctx->cancel_hash_locks);
kfree(ctx->dummy_ubuf);
kfree(ctx->io_bl);
xa_destroy(&ctx->io_bl_xa);
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index 7c22cf35a7e2..4619a46f7ecd 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -231,6 +231,7 @@ struct io_ring_ctx {
*/
struct io_wq_work_list iopoll_list;
struct hlist_head *cancel_hash;
+ spinlock_t *cancel_hash_locks;
unsigned cancel_hash_bits;
bool poll_multi_queue;
diff --git a/io_uring/poll.c b/io_uring/poll.c
index c8982c5ef0fa..e1b6dd282860 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -73,10 +73,11 @@ static struct io_poll *io_poll_get_single(struct io_kiocb *req)
static void io_poll_req_insert(struct io_kiocb *req)
{
struct io_ring_ctx *ctx = req->ctx;
- struct hlist_head *list;
+ u32 index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
- list = &ctx->cancel_hash[hash_long(req->cqe.user_data, ctx->cancel_hash_bits)];
- hlist_add_head(&req->hash_node, list);
+ spin_lock(&ctx->cancel_hash_locks[index]);
+ hlist_add_head(&req->hash_node, &ctx->cancel_hash[index]);
+ spin_unlock(&ctx->cancel_hash_locks[index]);
}
static void io_init_poll_iocb(struct io_poll *poll, __poll_t events,
@@ -439,9 +440,7 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
return 0;
}
- spin_lock(&ctx->completion_lock);
io_poll_req_insert(req);
- spin_unlock(&ctx->completion_lock);
if (mask && (poll->events & EPOLLET)) {
/* can't multishot if failed, just queue the event we've got */
@@ -538,10 +537,10 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
bool found = false;
int i;
- spin_lock(&ctx->completion_lock);
for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
struct hlist_head *list;
+ spin_lock(&ctx->cancel_hash_locks[i]);
list = &ctx->cancel_hash[i];
hlist_for_each_entry_safe(req, tmp, list, hash_node) {
if (io_match_task_safe(req, tsk, cancel_all)) {
@@ -550,19 +549,19 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
found = true;
}
}
+ spin_unlock(&ctx->cancel_hash_locks[i]);
}
- spin_unlock(&ctx->completion_lock);
return found;
}
static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
struct io_cancel_data *cd)
- __must_hold(&ctx->completion_lock)
{
struct hlist_head *list;
struct io_kiocb *req;
u32 index = hash_long(cd->data, ctx->cancel_hash_bits);
+ spin_lock(&ctx->cancel_hash_locks[index]);
list = &ctx->cancel_hash[index];
hlist_for_each_entry(req, list, hash_node) {
if (cd->data != req->cqe.user_data)
@@ -574,15 +573,16 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
continue;
req->work.cancel_seq = cd->seq;
}
+ spin_unlock(&ctx->cancel_hash_locks[index]);
cd->flags = index;
return req;
}
+ spin_unlock(&ctx->cancel_hash_locks[index]);
return NULL;
}
static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
struct io_cancel_data *cd)
- __must_hold(&ctx->completion_lock)
{
struct io_kiocb *req;
int i;
@@ -590,6 +590,7 @@ static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
struct hlist_head *list;
+ spin_lock(&ctx->cancel_hash_locks[i]);
list = &ctx->cancel_hash[i];
hlist_for_each_entry(req, list, hash_node) {
if (!(cd->flags & IORING_ASYNC_CANCEL_ANY) &&
@@ -598,24 +599,28 @@ static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
if (cd->seq == req->work.cancel_seq)
continue;
req->work.cancel_seq = cd->seq;
+ spin_unlock(&ctx->cancel_hash_locks[i]);
return req;
}
+ spin_unlock(&ctx->cancel_hash_locks[i]);
}
return NULL;
}
static bool io_poll_disarm(struct io_kiocb *req, u32 index)
- __must_hold(&ctx->completion_lock)
{
+ struct io_ring_ctx *ctx = req->ctx;
+
if (!io_poll_get_ownership(req))
return false;
io_poll_remove_entries(req);
+ spin_lock(&ctx->cancel_hash_locks[index]);
hash_del(&req->hash_node);
+ spin_unlock(&ctx->cancel_hash_locks[index]);
return true;
}
int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
- __must_hold(&ctx->completion_lock)
{
struct io_kiocb *req;
@@ -724,14 +729,11 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
int ret2, ret = 0;
bool locked;
- spin_lock(&ctx->completion_lock);
preq = io_poll_find(ctx, true, &cd);
if (!preq || !io_poll_disarm(preq, cd.flags)) {
- spin_unlock(&ctx->completion_lock);
ret = preq ? -EALREADY : -ENOENT;
goto out;
}
- spin_unlock(&ctx->completion_lock);
if (poll_update->update_events || poll_update->update_user_data) {
/* only mask one event flags, keep behavior flags */
--
2.25.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
2022-05-29 16:20 ` [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock Hao Xu
@ 2022-05-29 16:25 ` Jens Axboe
2022-05-29 18:07 ` Hao Xu
2022-05-30 13:33 ` Hao Xu
0 siblings, 2 replies; 15+ messages in thread
From: Jens Axboe @ 2022-05-29 16:25 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Pavel Begunkov
On 5/29/22 10:20 AM, Hao Xu wrote:
> From: Hao Xu <[email protected]>
>
> From: Hao Xu <[email protected]>
>
> Use per list lock for cancel_hash, this removes some completion lock
> invocation and remove contension between different cancel_hash entries
Interesting, do you have any numbers on this?
Also, I'd make a hash bucket struct:
struct io_hash_bucket {
spinlock_t lock;
struct hlist_head list;
};
rather than two separate structs, that'll have nicer memory locality too
and should further improve it. Could be done as a prep patch with the
old locking in place, making the end patch doing the per-bucket lock
simpler as well.
Hmm?
--
Jens Axboe
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
2022-05-29 16:25 ` Jens Axboe
@ 2022-05-29 18:07 ` Hao Xu
2022-05-29 18:40 ` Jens Axboe
2022-05-30 13:33 ` Hao Xu
1 sibling, 1 reply; 15+ messages in thread
From: Hao Xu @ 2022-05-29 18:07 UTC (permalink / raw)
To: Jens Axboe, io-uring; +Cc: Pavel Begunkov
On 5/30/22 00:25, Jens Axboe wrote:
> On 5/29/22 10:20 AM, Hao Xu wrote:
>> From: Hao Xu <[email protected]>
>>
>> From: Hao Xu <[email protected]>
>>
>> Use per list lock for cancel_hash, this removes some completion lock
>> invocation and remove contension between different cancel_hash entries
>
> Interesting, do you have any numbers on this?
Just Theoretically for now, I'll do some tests tomorrow. This is
actually RFC, forgot to change the subject.
>
> Also, I'd make a hash bucket struct:
>
> struct io_hash_bucket {
> spinlock_t lock;
> struct hlist_head list;
> };
>
> rather than two separate structs, that'll have nicer memory locality too
> and should further improve it. Could be done as a prep patch with the
> old locking in place, making the end patch doing the per-bucket lock
> simpler as well.
Sure, if the test number make sense, I'll send v2. I'll test the
hlist_bl list as well(the comment of it says it is much slower than
normal spin_lock, but we may not care the efficiency of poll
cancellation very much?).
>
> Hmm?
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
2022-05-29 18:07 ` Hao Xu
@ 2022-05-29 18:40 ` Jens Axboe
2022-05-29 22:50 ` Pavel Begunkov
0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2022-05-29 18:40 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Pavel Begunkov
On 5/29/22 12:07 PM, Hao Xu wrote:
> On 5/30/22 00:25, Jens Axboe wrote:
>> On 5/29/22 10:20 AM, Hao Xu wrote:
>>> From: Hao Xu <[email protected]>
>>>
>>> From: Hao Xu <[email protected]>
>>>
>>> Use per list lock for cancel_hash, this removes some completion lock
>>> invocation and remove contension between different cancel_hash entries
>>
>> Interesting, do you have any numbers on this?
>
> Just Theoretically for now, I'll do some tests tomorrow. This is
> actually RFC, forgot to change the subject.
>
>>
>> Also, I'd make a hash bucket struct:
>>
>> struct io_hash_bucket {
>> spinlock_t lock;
>> struct hlist_head list;
>> };
>>
>> rather than two separate structs, that'll have nicer memory locality too
>> and should further improve it. Could be done as a prep patch with the
>> old locking in place, making the end patch doing the per-bucket lock
>> simpler as well.
>
> Sure, if the test number make sense, I'll send v2. I'll test the
> hlist_bl list as well(the comment of it says it is much slower than
> normal spin_lock, but we may not care the efficiency of poll
> cancellation very much?).
I don't think the bit spinlocks are going to be useful, we should
stick with a spinlock for this. They are indeed slower and generally not
used for that reason. For a use case where you need a ton of locks and
saving the 4 bytes for a spinlock would make sense (or maybe not
changing some struct?), maybe they have a purpose. But not for this.
--
Jens Axboe
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
2022-05-29 18:40 ` Jens Axboe
@ 2022-05-29 22:50 ` Pavel Begunkov
2022-05-29 23:34 ` Jens Axboe
2022-05-30 6:38 ` Hao Xu
0 siblings, 2 replies; 15+ messages in thread
From: Pavel Begunkov @ 2022-05-29 22:50 UTC (permalink / raw)
To: Jens Axboe, Hao Xu, io-uring
On 5/29/22 19:40, Jens Axboe wrote:
> On 5/29/22 12:07 PM, Hao Xu wrote:
>> On 5/30/22 00:25, Jens Axboe wrote:
>>> On 5/29/22 10:20 AM, Hao Xu wrote:
>>>> From: Hao Xu <[email protected]>
>>>>
>>>> From: Hao Xu <[email protected]>
>>>>
>>>> Use per list lock for cancel_hash, this removes some completion lock
>>>> invocation and remove contension between different cancel_hash entries
>>>
>>> Interesting, do you have any numbers on this?
>>
>> Just Theoretically for now, I'll do some tests tomorrow. This is
>> actually RFC, forgot to change the subject.
>>
>>>
>>> Also, I'd make a hash bucket struct:
>>>
>>> struct io_hash_bucket {
>>> spinlock_t lock;
>>> struct hlist_head list;
>>> };
>>>
>>> rather than two separate structs, that'll have nicer memory locality too
>>> and should further improve it. Could be done as a prep patch with the
>>> old locking in place, making the end patch doing the per-bucket lock
>>> simpler as well.
>>
>> Sure, if the test number make sense, I'll send v2. I'll test the
>> hlist_bl list as well(the comment of it says it is much slower than
>> normal spin_lock, but we may not care the efficiency of poll
>> cancellation very much?).
>
> I don't think the bit spinlocks are going to be useful, we should
> stick with a spinlock for this. They are indeed slower and generally not
> used for that reason. For a use case where you need a ton of locks and
> saving the 4 bytes for a spinlock would make sense (or maybe not
> changing some struct?), maybe they have a purpose. But not for this.
We can put the cancel hashes under uring_lock and completely kill
the hash spinlocking (2 lock/unlock pairs per single-shot). The code
below won't even compile and missing cancellation bits, I'll pick it
up in a week.
Even better would be to have two hash tables, and auto-magically apply
the feature to SINGLE_SUBMITTER, SQPOLL (both will have uring_lock held)
and apoll (need uring_lock after anyway).
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6be21967959d..191fa7f31610 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7120,12 +7120,20 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
}
io_poll_remove_entries(req);
- spin_lock(&ctx->completion_lock);
- hash_del(&req->hash_node);
- __io_req_complete_post(req, req->cqe.res, 0);
- io_commit_cqring(ctx);
- spin_unlock(&ctx->completion_lock);
- io_cqring_ev_posted(ctx);
+
+ if (ctx->flags & IORING_MUTEX_HASH) {
+ io_tw_lock(ctx, locked);
+ hash_del(&req->hash_node);
+ io_req_complete_state(req, req->cqe.res, 0);
+ io_req_add_compl_list(req);
+ } else {
+ spin_lock(&ctx->completion_lock);
+ hash_del(&req->hash_node);
+ __io_req_complete_post(req, req->cqe.res, 0);
+ io_commit_cqring(ctx);
+ spin_unlock(&ctx->completion_lock);
+ io_cqring_ev_posted(ctx);
+ }
}
static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
@@ -7138,9 +7146,14 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
return;
io_poll_remove_entries(req);
- spin_lock(&ctx->completion_lock);
- hash_del(&req->hash_node);
- spin_unlock(&ctx->completion_lock);
+ if (ctx->flags & IORING_MUTEX_HASH) {
+ io_tw_lock(ctx, locked);
+ hash_del(&req->hash_node);
+ } else {
+ spin_lock(&ctx->completion_lock);
+ hash_del(&req->hash_node);
+ spin_unlock(&ctx->completion_lock);
+ }
if (!ret)
io_req_task_submit(req, locked);
@@ -7332,9 +7345,13 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
return 0;
}
- spin_lock(&ctx->completion_lock);
- io_poll_req_insert(req);
- spin_unlock(&ctx->completion_lock);
+ if (ctx->flags & IORING_MUTEX_HASH) {
+ io_poll_req_insert(req);
+ } else {
+ spin_lock(&ctx->completion_lock);
+ io_poll_req_insert(req);
+ spin_unlock(&ctx->completion_lock);
+ }
if (mask) {
/* can't multishot if failed, just queue the event we've got */
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
2022-05-29 22:50 ` Pavel Begunkov
@ 2022-05-29 23:34 ` Jens Axboe
2022-05-30 0:18 ` Pavel Begunkov
2022-05-30 6:38 ` Hao Xu
1 sibling, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2022-05-29 23:34 UTC (permalink / raw)
To: Pavel Begunkov, Hao Xu, io-uring
On 5/29/22 4:50 PM, Pavel Begunkov wrote:
> On 5/29/22 19:40, Jens Axboe wrote:
>> On 5/29/22 12:07 PM, Hao Xu wrote:
>>> On 5/30/22 00:25, Jens Axboe wrote:
>>>> On 5/29/22 10:20 AM, Hao Xu wrote:
>>>>> From: Hao Xu <[email protected]>
>>>>>
>>>>> From: Hao Xu <[email protected]>
>>>>>
>>>>> Use per list lock for cancel_hash, this removes some completion lock
>>>>> invocation and remove contension between different cancel_hash entries
>>>>
>>>> Interesting, do you have any numbers on this?
>>>
>>> Just Theoretically for now, I'll do some tests tomorrow. This is
>>> actually RFC, forgot to change the subject.
>>>
>>>>
>>>> Also, I'd make a hash bucket struct:
>>>>
>>>> struct io_hash_bucket {
>>>> spinlock_t lock;
>>>> struct hlist_head list;
>>>> };
>>>>
>>>> rather than two separate structs, that'll have nicer memory locality too
>>>> and should further improve it. Could be done as a prep patch with the
>>>> old locking in place, making the end patch doing the per-bucket lock
>>>> simpler as well.
>>>
>>> Sure, if the test number make sense, I'll send v2. I'll test the
>>> hlist_bl list as well(the comment of it says it is much slower than
>>> normal spin_lock, but we may not care the efficiency of poll
>>> cancellation very much?).
>>
>> I don't think the bit spinlocks are going to be useful, we should
>> stick with a spinlock for this. They are indeed slower and generally not
>> used for that reason. For a use case where you need a ton of locks and
>> saving the 4 bytes for a spinlock would make sense (or maybe not
>> changing some struct?), maybe they have a purpose. But not for this.
>
> We can put the cancel hashes under uring_lock and completely kill
> the hash spinlocking (2 lock/unlock pairs per single-shot). The code
> below won't even compile and missing cancellation bits, I'll pick it
> up in a week.
>
> Even better would be to have two hash tables, and auto-magically apply
> the feature to SINGLE_SUBMITTER, SQPOLL (both will have uring_lock held)
> and apoll (need uring_lock after anyway).
My hope was that it'd take us closer to being able to use more granular
locking for hashing in general. I don't care too much about the
cancelation, but the normal hash locking would be useful to do.
However, for cancelations, under uring_lock would indeed be preferable
to doing per-bucket locks there. Guess I'll wait and see what your final
patch looks like, not sure why it'd be a ctx conditional?
What about io_poll_remove_all()?
--
Jens Axboe
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
2022-05-29 23:34 ` Jens Axboe
@ 2022-05-30 0:18 ` Pavel Begunkov
2022-05-30 6:52 ` Hao Xu
0 siblings, 1 reply; 15+ messages in thread
From: Pavel Begunkov @ 2022-05-30 0:18 UTC (permalink / raw)
To: Jens Axboe, Hao Xu, io-uring
On 5/30/22 00:34, Jens Axboe wrote:
> On 5/29/22 4:50 PM, Pavel Begunkov wrote:
>> On 5/29/22 19:40, Jens Axboe wrote:
>>> On 5/29/22 12:07 PM, Hao Xu wrote:
>>>> On 5/30/22 00:25, Jens Axboe wrote:
>>>>> On 5/29/22 10:20 AM, Hao Xu wrote:
>>>>>> From: Hao Xu <[email protected]>
>>>>>>
>>>>>> From: Hao Xu <[email protected]>
>>>>>>
>>>>>> Use per list lock for cancel_hash, this removes some completion lock
>>>>>> invocation and remove contension between different cancel_hash entries
>>>>>
>>>>> Interesting, do you have any numbers on this?
>>>>
>>>> Just Theoretically for now, I'll do some tests tomorrow. This is
>>>> actually RFC, forgot to change the subject.
>>>>
>>>>>
>>>>> Also, I'd make a hash bucket struct:
>>>>>
>>>>> struct io_hash_bucket {
>>>>> spinlock_t lock;
>>>>> struct hlist_head list;
>>>>> };
>>>>>
>>>>> rather than two separate structs, that'll have nicer memory locality too
>>>>> and should further improve it. Could be done as a prep patch with the
>>>>> old locking in place, making the end patch doing the per-bucket lock
>>>>> simpler as well.
>>>>
>>>> Sure, if the test number make sense, I'll send v2. I'll test the
>>>> hlist_bl list as well(the comment of it says it is much slower than
>>>> normal spin_lock, but we may not care the efficiency of poll
>>>> cancellation very much?).
>>>
>>> I don't think the bit spinlocks are going to be useful, we should
>>> stick with a spinlock for this. They are indeed slower and generally not
>>> used for that reason. For a use case where you need a ton of locks and
>>> saving the 4 bytes for a spinlock would make sense (or maybe not
>>> changing some struct?), maybe they have a purpose. But not for this.
>>
>> We can put the cancel hashes under uring_lock and completely kill
>> the hash spinlocking (2 lock/unlock pairs per single-shot). The code
>> below won't even compile and missing cancellation bits, I'll pick it
>> up in a week.
>>
>> Even better would be to have two hash tables, and auto-magically apply
>> the feature to SINGLE_SUBMITTER, SQPOLL (both will have uring_lock held)
>> and apoll (need uring_lock after anyway).
>
> My hope was that it'd take us closer to being able to use more granular
> locking for hashing in general. I don't care too much about the
> cancelation, but the normal hash locking would be useful to do.
>
> However, for cancelations, under uring_lock would indeed be preferable
> to doing per-bucket locks there. Guess I'll wait and see what your final
> patch looks like, not sure why it'd be a ctx conditional?
It replaces 2 spin lock/unlock with one io_tw_lock() in the completion
path, which is done once per tw batch and grabbed anyway if
there is no contention (see handle_tw_list()).
It could be unconditional, but I'd say for those 3 cases we have
non-existing chance to regress perf/latency, but I can think of
some cases where it might screw latencies, all share io_uring
b/w threads.
Should benefit the cancellation path as well, but I don't care
about it as well.
> What about io_poll_remove_all()?
As mentioned, it's not handled in the diff, but easily doable,
it should just traverse both hash tables.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
2022-05-29 22:50 ` Pavel Begunkov
2022-05-29 23:34 ` Jens Axboe
@ 2022-05-30 6:38 ` Hao Xu
2022-05-30 6:59 ` Hao Xu
2022-05-30 9:39 ` Pavel Begunkov
1 sibling, 2 replies; 15+ messages in thread
From: Hao Xu @ 2022-05-30 6:38 UTC (permalink / raw)
To: Pavel Begunkov, Jens Axboe, io-uring
On 5/30/22 06:50, Pavel Begunkov wrote:
> On 5/29/22 19:40, Jens Axboe wrote:
>> On 5/29/22 12:07 PM, Hao Xu wrote:
>>> On 5/30/22 00:25, Jens Axboe wrote:
>>>> On 5/29/22 10:20 AM, Hao Xu wrote:
>>>>> From: Hao Xu <[email protected]>
>>>>>
>>>>> From: Hao Xu <[email protected]>
>>>>>
>>>>> Use per list lock for cancel_hash, this removes some completion lock
>>>>> invocation and remove contension between different cancel_hash entries
>>>>
>>>> Interesting, do you have any numbers on this?
>>>
>>> Just Theoretically for now, I'll do some tests tomorrow. This is
>>> actually RFC, forgot to change the subject.
>>>
>>>>
>>>> Also, I'd make a hash bucket struct:
>>>>
>>>> struct io_hash_bucket {
>>>> spinlock_t lock;
>>>> struct hlist_head list;
>>>> };
>>>>
>>>> rather than two separate structs, that'll have nicer memory locality
>>>> too
>>>> and should further improve it. Could be done as a prep patch with the
>>>> old locking in place, making the end patch doing the per-bucket lock
>>>> simpler as well.
>>>
>>> Sure, if the test number make sense, I'll send v2. I'll test the
>>> hlist_bl list as well(the comment of it says it is much slower than
>>> normal spin_lock, but we may not care the efficiency of poll
>>> cancellation very much?).
>>
>> I don't think the bit spinlocks are going to be useful, we should
>> stick with a spinlock for this. They are indeed slower and generally not
>> used for that reason. For a use case where you need a ton of locks and
>> saving the 4 bytes for a spinlock would make sense (or maybe not
>> changing some struct?), maybe they have a purpose. But not for this.
>
> We can put the cancel hashes under uring_lock and completely kill
> the hash spinlocking (2 lock/unlock pairs per single-shot). The code
> below won't even compile and missing cancellation bits, I'll pick it
> up in a week.
>
> Even better would be to have two hash tables, and auto-magically apply
> the feature to SINGLE_SUBMITTER, SQPOLL (both will have uring_lock held)
> and apoll (need uring_lock after anyway).
>
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 6be21967959d..191fa7f31610 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -7120,12 +7120,20 @@ static void io_poll_task_func(struct io_kiocb
> *req, bool *locked)
> }
>
> io_poll_remove_entries(req);
> - spin_lock(&ctx->completion_lock);
> - hash_del(&req->hash_node);
> - __io_req_complete_post(req, req->cqe.res, 0);
> - io_commit_cqring(ctx);
> - spin_unlock(&ctx->completion_lock);
> - io_cqring_ev_posted(ctx);
> +
> + if (ctx->flags & IORING_MUTEX_HASH) {
> + io_tw_lock(ctx, locked);
> + hash_del(&req->hash_node);
> + io_req_complete_state(req, req->cqe.res, 0);
> + io_req_add_compl_list(req);
> + } else {
> + spin_lock(&ctx->completion_lock);
> + hash_del(&req->hash_node);
> + __io_req_complete_post(req, req->cqe.res, 0);
> + io_commit_cqring(ctx);
> + spin_unlock(&ctx->completion_lock);
> + io_cqring_ev_posted(ctx);
> + }
> }
>
> static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
> @@ -7138,9 +7146,14 @@ static void io_apoll_task_func(struct io_kiocb
> *req, bool *locked)
> return;
>
> io_poll_remove_entries(req);
> - spin_lock(&ctx->completion_lock);
> - hash_del(&req->hash_node);
> - spin_unlock(&ctx->completion_lock);
> + if (ctx->flags & IORING_MUTEX_HASH) {
> + io_tw_lock(ctx, locked);
> + hash_del(&req->hash_node);
> + } else {
> + spin_lock(&ctx->completion_lock);
> + hash_del(&req->hash_node);
> + spin_unlock(&ctx->completion_lock);
> + }
>
> if (!ret)
> io_req_task_submit(req, locked);
> @@ -7332,9 +7345,13 @@ static int __io_arm_poll_handler(struct io_kiocb
> *req,
> return 0;
> }
>
> - spin_lock(&ctx->completion_lock);
> - io_poll_req_insert(req);
> - spin_unlock(&ctx->completion_lock);
> + if (ctx->flags & IORING_MUTEX_HASH) {
Does IORING_MUTEX_HASH exclude IOSQE_ASYNC as well? though IOSQE_ASYNC
is uncommon but users may do that.
> + io_poll_req_insert(req);
> + } else {
> + spin_lock(&ctx->completion_lock);
> + io_poll_req_insert(req);
> + spin_unlock(&ctx->completion_lock);
> + }
>
> if (mask) {
> /* can't multishot if failed, just queue the event we've got */
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
2022-05-30 0:18 ` Pavel Begunkov
@ 2022-05-30 6:52 ` Hao Xu
2022-05-30 9:35 ` Pavel Begunkov
0 siblings, 1 reply; 15+ messages in thread
From: Hao Xu @ 2022-05-30 6:52 UTC (permalink / raw)
To: Pavel Begunkov, Jens Axboe, io-uring
On 5/30/22 08:18, Pavel Begunkov wrote:
> On 5/30/22 00:34, Jens Axboe wrote:
>> On 5/29/22 4:50 PM, Pavel Begunkov wrote:
>>> On 5/29/22 19:40, Jens Axboe wrote:
>>>> On 5/29/22 12:07 PM, Hao Xu wrote:
>>>>> On 5/30/22 00:25, Jens Axboe wrote:
>>>>>> On 5/29/22 10:20 AM, Hao Xu wrote:
>>>>>>> From: Hao Xu <[email protected]>
>>>>>>>
>>>>>>> From: Hao Xu <[email protected]>
>>>>>>>
>>>>>>> Use per list lock for cancel_hash, this removes some completion lock
>>>>>>> invocation and remove contension between different cancel_hash
>>>>>>> entries
>>>>>>
>>>>>> Interesting, do you have any numbers on this?
>>>>>
>>>>> Just Theoretically for now, I'll do some tests tomorrow. This is
>>>>> actually RFC, forgot to change the subject.
>>>>>
>>>>>>
>>>>>> Also, I'd make a hash bucket struct:
>>>>>>
>>>>>> struct io_hash_bucket {
>>>>>> spinlock_t lock;
>>>>>> struct hlist_head list;
>>>>>> };
>>>>>>
>>>>>> rather than two separate structs, that'll have nicer memory
>>>>>> locality too
>>>>>> and should further improve it. Could be done as a prep patch with the
>>>>>> old locking in place, making the end patch doing the per-bucket lock
>>>>>> simpler as well.
>>>>>
>>>>> Sure, if the test number make sense, I'll send v2. I'll test the
>>>>> hlist_bl list as well(the comment of it says it is much slower than
>>>>> normal spin_lock, but we may not care the efficiency of poll
>>>>> cancellation very much?).
>>>>
>>>> I don't think the bit spinlocks are going to be useful, we should
>>>> stick with a spinlock for this. They are indeed slower and generally
>>>> not
>>>> used for that reason. For a use case where you need a ton of locks and
>>>> saving the 4 bytes for a spinlock would make sense (or maybe not
>>>> changing some struct?), maybe they have a purpose. But not for this.
>>>
>>> We can put the cancel hashes under uring_lock and completely kill
>>> the hash spinlocking (2 lock/unlock pairs per single-shot). The code
>>> below won't even compile and missing cancellation bits, I'll pick it
>>> up in a week.
>>>
>>> Even better would be to have two hash tables, and auto-magically apply
>>> the feature to SINGLE_SUBMITTER, SQPOLL (both will have uring_lock held)
>>> and apoll (need uring_lock after anyway).
>>
>> My hope was that it'd take us closer to being able to use more granular
>> locking for hashing in general. I don't care too much about the
>> cancelation, but the normal hash locking would be useful to do.
>>
>> However, for cancelations, under uring_lock would indeed be preferable
>> to doing per-bucket locks there. Guess I'll wait and see what your final
>> patch looks like, not sure why it'd be a ctx conditional?
>
> It replaces 2 spin lock/unlock with one io_tw_lock() in the completion
> path, which is done once per tw batch and grabbed anyway if
> there is no contention (see handle_tw_list()).
>
> It could be unconditional, but I'd say for those 3 cases we have
> non-existing chance to regress perf/latency, but I can think of
> some cases where it might screw latencies, all share io_uring
> b/w threads.
>
> Should benefit the cancellation path as well, but I don't care
> about it as well.
>
>> What about io_poll_remove_all()?
>
> As mentioned, it's not handled in the diff, but easily doable,
> it should just traverse both hash tables.
>
Two hash tables looks good to me. If I don't get you wrong, one table
under uring_lock, the other one for normal handling(like per bucket
locking)?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
2022-05-30 6:38 ` Hao Xu
@ 2022-05-30 6:59 ` Hao Xu
2022-05-30 9:39 ` Pavel Begunkov
1 sibling, 0 replies; 15+ messages in thread
From: Hao Xu @ 2022-05-30 6:59 UTC (permalink / raw)
To: Pavel Begunkov, Jens Axboe, io-uring
On 5/30/22 14:38, Hao Xu wrote:
> On 5/30/22 06:50, Pavel Begunkov wrote:
>> On 5/29/22 19:40, Jens Axboe wrote:
>>> On 5/29/22 12:07 PM, Hao Xu wrote:
>>>> On 5/30/22 00:25, Jens Axboe wrote:
>>>>> On 5/29/22 10:20 AM, Hao Xu wrote:
>>>>>> From: Hao Xu <[email protected]>
>>>>>>
>>>>>> From: Hao Xu <[email protected]>
>>>>>>
>>>>>> Use per list lock for cancel_hash, this removes some completion lock
>>>>>> invocation and remove contension between different cancel_hash
>>>>>> entries
>>>>>
>>>>> Interesting, do you have any numbers on this?
>>>>
>>>> Just Theoretically for now, I'll do some tests tomorrow. This is
>>>> actually RFC, forgot to change the subject.
>>>>
>>>>>
>>>>> Also, I'd make a hash bucket struct:
>>>>>
>>>>> struct io_hash_bucket {
>>>>> spinlock_t lock;
>>>>> struct hlist_head list;
>>>>> };
>>>>>
>>>>> rather than two separate structs, that'll have nicer memory
>>>>> locality too
>>>>> and should further improve it. Could be done as a prep patch with the
>>>>> old locking in place, making the end patch doing the per-bucket lock
>>>>> simpler as well.
>>>>
>>>> Sure, if the test number make sense, I'll send v2. I'll test the
>>>> hlist_bl list as well(the comment of it says it is much slower than
>>>> normal spin_lock, but we may not care the efficiency of poll
>>>> cancellation very much?).
>>>
>>> I don't think the bit spinlocks are going to be useful, we should
>>> stick with a spinlock for this. They are indeed slower and generally not
>>> used for that reason. For a use case where you need a ton of locks and
>>> saving the 4 bytes for a spinlock would make sense (or maybe not
>>> changing some struct?), maybe they have a purpose. But not for this.
>>
>> We can put the cancel hashes under uring_lock and completely kill
>> the hash spinlocking (2 lock/unlock pairs per single-shot). The code
>> below won't even compile and missing cancellation bits, I'll pick it
>> up in a week.
>>
>> Even better would be to have two hash tables, and auto-magically apply
>> the feature to SINGLE_SUBMITTER, SQPOLL (both will have uring_lock held)
>> and apoll (need uring_lock after anyway).
>>
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 6be21967959d..191fa7f31610 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -7120,12 +7120,20 @@ static void io_poll_task_func(struct io_kiocb
>> *req, bool *locked)
>> }
>>
>> io_poll_remove_entries(req);
>> - spin_lock(&ctx->completion_lock);
>> - hash_del(&req->hash_node);
>> - __io_req_complete_post(req, req->cqe.res, 0);
>> - io_commit_cqring(ctx);
>> - spin_unlock(&ctx->completion_lock);
>> - io_cqring_ev_posted(ctx);
>> +
>> + if (ctx->flags & IORING_MUTEX_HASH) {
>> + io_tw_lock(ctx, locked);
>> + hash_del(&req->hash_node);
>> + io_req_complete_state(req, req->cqe.res, 0);
>> + io_req_add_compl_list(req);
>> + } else {
>> + spin_lock(&ctx->completion_lock);
>> + hash_del(&req->hash_node);
>> + __io_req_complete_post(req, req->cqe.res, 0);
>> + io_commit_cqring(ctx);
>> + spin_unlock(&ctx->completion_lock);
>> + io_cqring_ev_posted(ctx);
>> + }
>> }
>>
>> static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
>> @@ -7138,9 +7146,14 @@ static void io_apoll_task_func(struct io_kiocb
>> *req, bool *locked)
>> return;
>>
>> io_poll_remove_entries(req);
>> - spin_lock(&ctx->completion_lock);
>> - hash_del(&req->hash_node);
>> - spin_unlock(&ctx->completion_lock);
>> + if (ctx->flags & IORING_MUTEX_HASH) {
>> + io_tw_lock(ctx, locked);
>> + hash_del(&req->hash_node);
>> + } else {
>> + spin_lock(&ctx->completion_lock);
>> + hash_del(&req->hash_node);
>> + spin_unlock(&ctx->completion_lock);
>> + }
>>
>> if (!ret)
>> io_req_task_submit(req, locked);
>> @@ -7332,9 +7345,13 @@ static int __io_arm_poll_handler(struct
>> io_kiocb *req,
>> return 0;
>> }
>>
>> - spin_lock(&ctx->completion_lock);
>> - io_poll_req_insert(req);
>> - spin_unlock(&ctx->completion_lock);
>> + if (ctx->flags & IORING_MUTEX_HASH) {
>
> Does IORING_MUTEX_HASH exclude IOSQE_ASYNC as well? though IOSQE_ASYNC
> is uncommon but users may do that.
I see, cases like IOSQE_ASYNC definitely goes into the else branch since
it's in iowq context.
>
>> + io_poll_req_insert(req);
>> + } else {
>> + spin_lock(&ctx->completion_lock);
>> + io_poll_req_insert(req);
>> + spin_unlock(&ctx->completion_lock);
>> + }
>>
>> if (mask) {
>> /* can't multishot if failed, just queue the event we've got */
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
2022-05-30 6:52 ` Hao Xu
@ 2022-05-30 9:35 ` Pavel Begunkov
0 siblings, 0 replies; 15+ messages in thread
From: Pavel Begunkov @ 2022-05-30 9:35 UTC (permalink / raw)
To: Hao Xu, Jens Axboe, io-uring
On 5/30/22 07:52, Hao Xu wrote:
> On 5/30/22 08:18, Pavel Begunkov wrote:
>> On 5/30/22 00:34, Jens Axboe wrote:
>>> On 5/29/22 4:50 PM, Pavel Begunkov wrote:
>>>> On 5/29/22 19:40, Jens Axboe wrote:
>>>>> On 5/29/22 12:07 PM, Hao Xu wrote:
>>>>>> On 5/30/22 00:25, Jens Axboe wrote:
>>>>>>> On 5/29/22 10:20 AM, Hao Xu wrote:
>>>>>>>> From: Hao Xu <[email protected]>
>>>>>>>>
>>>>>>>> From: Hao Xu <[email protected]>
>>>>>>>>
>>>>>>>> Use per list lock for cancel_hash, this removes some completion lock
>>>>>>>> invocation and remove contension between different cancel_hash entries
>>>>>>>
>>>>>>> Interesting, do you have any numbers on this?
>>>>>>
>>>>>> Just Theoretically for now, I'll do some tests tomorrow. This is
>>>>>> actually RFC, forgot to change the subject.
>>>>>>
>>>>>>>
>>>>>>> Also, I'd make a hash bucket struct:
>>>>>>>
>>>>>>> struct io_hash_bucket {
>>>>>>> spinlock_t lock;
>>>>>>> struct hlist_head list;
>>>>>>> };
>>>>>>>
>>>>>>> rather than two separate structs, that'll have nicer memory locality too
>>>>>>> and should further improve it. Could be done as a prep patch with the
>>>>>>> old locking in place, making the end patch doing the per-bucket lock
>>>>>>> simpler as well.
>>>>>>
>>>>>> Sure, if the test number make sense, I'll send v2. I'll test the
>>>>>> hlist_bl list as well(the comment of it says it is much slower than
>>>>>> normal spin_lock, but we may not care the efficiency of poll
>>>>>> cancellation very much?).
>>>>>
>>>>> I don't think the bit spinlocks are going to be useful, we should
>>>>> stick with a spinlock for this. They are indeed slower and generally not
>>>>> used for that reason. For a use case where you need a ton of locks and
>>>>> saving the 4 bytes for a spinlock would make sense (or maybe not
>>>>> changing some struct?), maybe they have a purpose. But not for this.
>>>>
>>>> We can put the cancel hashes under uring_lock and completely kill
>>>> the hash spinlocking (2 lock/unlock pairs per single-shot). The code
>>>> below won't even compile and missing cancellation bits, I'll pick it
>>>> up in a week.
>>>>
>>>> Even better would be to have two hash tables, and auto-magically apply
>>>> the feature to SINGLE_SUBMITTER, SQPOLL (both will have uring_lock held)
>>>> and apoll (need uring_lock after anyway).
>>>
>>> My hope was that it'd take us closer to being able to use more granular
>>> locking for hashing in general. I don't care too much about the
>>> cancelation, but the normal hash locking would be useful to do.
>>>
>>> However, for cancelations, under uring_lock would indeed be preferable
>>> to doing per-bucket locks there. Guess I'll wait and see what your final
>>> patch looks like, not sure why it'd be a ctx conditional?
>>
>> It replaces 2 spin lock/unlock with one io_tw_lock() in the completion
>> path, which is done once per tw batch and grabbed anyway if
>> there is no contention (see handle_tw_list()).
>>
>> It could be unconditional, but I'd say for those 3 cases we have
>> non-existing chance to regress perf/latency, but I can think of
>> some cases where it might screw latencies, all share io_uring
>> b/w threads.
>>
>> Should benefit the cancellation path as well, but I don't care
>> about it as well.
>>
>>> What about io_poll_remove_all()?
>>
>> As mentioned, it's not handled in the diff, but easily doable,
>> it should just traverse both hash tables.
>>
>
> Two hash tables looks good to me. If I don't get you wrong, one table
> under uring_lock, the other one for normal handling(like per bucket
> locking)?
Right, that's the plan
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
2022-05-30 6:38 ` Hao Xu
2022-05-30 6:59 ` Hao Xu
@ 2022-05-30 9:39 ` Pavel Begunkov
1 sibling, 0 replies; 15+ messages in thread
From: Pavel Begunkov @ 2022-05-30 9:39 UTC (permalink / raw)
To: Hao Xu, Jens Axboe, io-uring
On 5/30/22 07:38, Hao Xu wrote:
> On 5/30/22 06:50, Pavel Begunkov wrote:
>> On 5/29/22 19:40, Jens Axboe wrote:
>>> On 5/29/22 12:07 PM, Hao Xu wrote:
>>>> On 5/30/22 00:25, Jens Axboe wrote:
>>>>> On 5/29/22 10:20 AM, Hao Xu wrote:
>>>>>> From: Hao Xu <[email protected]>
>>>>>>
>>>>>> From: Hao Xu <[email protected]>
>>>>>>
>>>>>> Use per list lock for cancel_hash, this removes some completion lock
>>>>>> invocation and remove contension between different cancel_hash entries
>>>>>
>>>>> Interesting, do you have any numbers on this?
>>>>
>>>> Just Theoretically for now, I'll do some tests tomorrow. This is
>>>> actually RFC, forgot to change the subject.
>>>>
>>>>>
>>>>> Also, I'd make a hash bucket struct:
>>>>>
>>>>> struct io_hash_bucket {
>>>>> spinlock_t lock;
>>>>> struct hlist_head list;
>>>>> };
>>>>>
>>>>> rather than two separate structs, that'll have nicer memory locality too
>>>>> and should further improve it. Could be done as a prep patch with the
>>>>> old locking in place, making the end patch doing the per-bucket lock
>>>>> simpler as well.
>>>>
>>>> Sure, if the test number make sense, I'll send v2. I'll test the
>>>> hlist_bl list as well(the comment of it says it is much slower than
>>>> normal spin_lock, but we may not care the efficiency of poll
>>>> cancellation very much?).
>>>
>>> I don't think the bit spinlocks are going to be useful, we should
>>> stick with a spinlock for this. They are indeed slower and generally not
>>> used for that reason. For a use case where you need a ton of locks and
>>> saving the 4 bytes for a spinlock would make sense (or maybe not
>>> changing some struct?), maybe they have a purpose. But not for this.
>>
>> We can put the cancel hashes under uring_lock and completely kill
>> the hash spinlocking (2 lock/unlock pairs per single-shot). The code
>> below won't even compile and missing cancellation bits, I'll pick it
>> up in a week.
>>
>> Even better would be to have two hash tables, and auto-magically apply
>> the feature to SINGLE_SUBMITTER, SQPOLL (both will have uring_lock held)
>> and apoll (need uring_lock after anyway).
>>
>> @@ -7332,9 +7345,13 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
>> return 0;
>> }
>>
>> - spin_lock(&ctx->completion_lock);
>> - io_poll_req_insert(req);
>> - spin_unlock(&ctx->completion_lock);
>> + if (ctx->flags & IORING_MUTEX_HASH) {
>
> Does IORING_MUTEX_HASH exclude IOSQE_ASYNC as well? though IOSQE_ASYNC
> is uncommon but users may do that.
It should, io-wq -> arm_poll() doesn't hold uring_lock, good point
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock
2022-05-29 16:25 ` Jens Axboe
2022-05-29 18:07 ` Hao Xu
@ 2022-05-30 13:33 ` Hao Xu
1 sibling, 0 replies; 15+ messages in thread
From: Hao Xu @ 2022-05-30 13:33 UTC (permalink / raw)
To: Jens Axboe, io-uring; +Cc: Pavel Begunkov
On 5/30/22 00:25, Jens Axboe wrote:
> On 5/29/22 10:20 AM, Hao Xu wrote:
>> From: Hao Xu <[email protected]>
>>
>> From: Hao Xu <[email protected]>
>>
>> Use per list lock for cancel_hash, this removes some completion lock
>> invocation and remove contension between different cancel_hash entries
>
> Interesting, do you have any numbers on this?
>
> Also, I'd make a hash bucket struct:
>
> struct io_hash_bucket {
> spinlock_t lock;
> struct hlist_head list;
> };
>
> rather than two separate structs, that'll have nicer memory locality too
> and should further improve it. Could be done as a prep patch with the
> old locking in place, making the end patch doing the per-bucket lock
> simpler as well.
>
> Hmm?
>
I've done a v2 here, also a test which issues async poll densely to
make high frequency cancel_hash[] visits. But I won't have a real box
with big number of cpu processors which is suitable for testing until
tomorrow, so I'd test it tomorrow.
https://github.com/HowHsu/linux/commits/for-5.20/io_uring_hash_lock
https://github.com/HowHsu/liburing/commit/b9fb4d20a5dfe7c7bd62fe36c37aea3b261d4499
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2022-05-30 13:48 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-05-29 16:19 [PATCH 0/2] cancel_hash per entry lock Hao Xu
2022-05-29 16:19 ` [PATCH 1/2] io_uring: add an argument for io_poll_disarm() Hao Xu
2022-05-29 16:20 ` [PATCH 2/2] io_uring: switch cancel_hash to use per list spinlock Hao Xu
2022-05-29 16:25 ` Jens Axboe
2022-05-29 18:07 ` Hao Xu
2022-05-29 18:40 ` Jens Axboe
2022-05-29 22:50 ` Pavel Begunkov
2022-05-29 23:34 ` Jens Axboe
2022-05-30 0:18 ` Pavel Begunkov
2022-05-30 6:52 ` Hao Xu
2022-05-30 9:35 ` Pavel Begunkov
2022-05-30 6:38 ` Hao Xu
2022-05-30 6:59 ` Hao Xu
2022-05-30 9:39 ` Pavel Begunkov
2022-05-30 13:33 ` Hao Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox