Waiting for requests completions from multiple threads

public inbox for [email protected]
 help / color / mirror / Atom feed

* Waiting for requests completions from multiple threads
@ 2020-01-22  2:45 Dmitry Sychov
  2020-01-22  2:51 ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Sychov @ 2020-01-22  2:45 UTC (permalink / raw)
  To: io-uring

Really nice work, I have a question though.

It is possible to efficiently wait for request completions
from multiple threads?

Like, two threads are entering
" io_uring_enter" both with min_complete=1 while the completion ring
holds 2 events - will the first one goes to thread 1 and the second
one to thread 2?

I just do not understand exactly the best way to scale this api into
multiple threads... with IOCP for example is is perfectly clear.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Waiting for requests completions from multiple threads
  2020-01-22  2:45 Waiting for requests completions from multiple threads Dmitry Sychov
@ 2020-01-22  2:51 ` Jens Axboe
  2020-01-22  3:09   ` Dmitry Sychov
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2020-01-22  2:51 UTC (permalink / raw)
  To: Dmitry Sychov, io-uring

On 1/21/20 7:45 PM, Dmitry Sychov wrote:
> Really nice work, I have a question though.
> 
> It is possible to efficiently wait for request completions
> from multiple threads?
> 
> Like, two threads are entering
> " io_uring_enter" both with min_complete=1 while the completion ring
> holds 2 events - will the first one goes to thread 1 and the second
> one to thread 2?
> 
> I just do not understand exactly the best way to scale this api into
> multiple threads... with IOCP for example is is perfectly clear.

You can have two threads waiting on events, and yes, if they each ask to
wait for 1 event and 2 completes, then they will both get woken up. But
the wait side doesn't give you any events, it merely tells you of the
availability of them. When each thread is woken up and goes back to
userspace, it'll have to reap an event from the ring. If each thread
reaps one event from the CQ ring, then you're done.

You need synchronization on the CQ ring side in userspace if you want
two rings to access the CQ ring. That is not needed for entering the
kernel, only when the application reads a CQE (or modifies the ring), if
you can have more than one thread modifying the CQ ring. The exact same
is true on the SQ ring side.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Waiting for requests completions from multiple threads
  2020-01-22  2:51 ` Jens Axboe
@ 2020-01-22  3:09   ` Dmitry Sychov
  2020-01-22  3:16     ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Sychov @ 2020-01-22  3:09 UTC (permalink / raw)
  To: io-uring

Thank you for quick reply! Yes I understand that I need a sort of
serializable-level isolation
when accessing the rings - I hope this could be done with a simple
atomic cmp-add after optimistic write ring update.

Correct me if I'am wrong, but from my understanding the kernel can
start to pick up newly written Uring jobs
without waiting for the "io_uring_enter" user level call and that's
why we need a write barrier(so that
the ring state is always valid for the kernel), else "io_uring_enter"
could serve as a write barrier itself as well...


On Wed, Jan 22, 2020 at 5:51 AM Jens Axboe <[email protected]> wrote:
>
> On 1/21/20 7:45 PM, Dmitry Sychov wrote:
> > Really nice work, I have a question though.
> >
> > It is possible to efficiently wait for request completions
> > from multiple threads?
> >
> > Like, two threads are entering
> > " io_uring_enter" both with min_complete=1 while the completion ring
> > holds 2 events - will the first one goes to thread 1 and the second
> > one to thread 2?
> >
> > I just do not understand exactly the best way to scale this api into
> > multiple threads... with IOCP for example is is perfectly clear.
>
> You can have two threads waiting on events, and yes, if they each ask to
> wait for 1 event and 2 completes, then they will both get woken up. But
> the wait side doesn't give you any events, it merely tells you of the
> availability of them. When each thread is woken up and goes back to
> userspace, it'll have to reap an event from the ring. If each thread
> reaps one event from the CQ ring, then you're done.
>
> You need synchronization on the CQ ring side in userspace if you want
> two rings to access the CQ ring. That is not needed for entering the
> kernel, only when the application reads a CQE (or modifies the ring), if
> you can have more than one thread modifying the CQ ring. The exact same
> is true on the SQ ring side.
>
> --
> Jens Axboe
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Waiting for requests completions from multiple threads
  2020-01-22  3:09   ` Dmitry Sychov
@ 2020-01-22  3:16     ` Jens Axboe
  2020-01-22  3:28       ` Pavel Begunkov
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2020-01-22  3:16 UTC (permalink / raw)
  To: Dmitry Sychov, io-uring

On 1/21/20 8:09 PM, Dmitry Sychov wrote:
> Thank you for quick reply! Yes I understand that I need a sort of
> serializable-level isolation
> when accessing the rings - I hope this could be done with a simple
> atomic cmp-add after optimistic write ring update.

That's not a bad idea, that could definitely work, and would be more
efficient than just grabbing a lock.

Could also be made to work quite nicely with restartable sequences. I'd
love to see liburing grow support for smarter sharing of a ring, that's
really where that belongs.

> Correct me if I'am wrong, but from my understanding the kernel can
> start to pick up newly written Uring jobs
> without waiting for the "io_uring_enter" user level call and that's
> why we need a write barrier(so that
> the ring state is always valid for the kernel), else "io_uring_enter"
> could serve as a write barrier itself as well...

By uring jobs, you mean SQEs, or submission queue entries? The kernel
only picks up what you ask it to, it won't randomly just grab entries
from the SQ ring unless you do an io_uring_enter() and tell it to
consume N entries. The exception is if you setup the ring with
IORING_SETUP_SQPOLL, in which case the kernel will maintain a submission
thread. For that case, yes, the kernel can pickup an entry as soon as
the SQ tail is updated by the application.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Waiting for requests completions from multiple threads
  2020-01-22  3:16     ` Jens Axboe
@ 2020-01-22  3:28       ` Pavel Begunkov
  2020-01-22 17:54         ` Dmitry Sychov
  0 siblings, 1 reply; 6+ messages in thread
From: Pavel Begunkov @ 2020-01-22  3:28 UTC (permalink / raw)
  To: Jens Axboe, Dmitry Sychov, io-uring


[-- Attachment #1.1: Type: text/plain, Size: 1895 bytes --]

On 22/01/2020 06:16, Jens Axboe wrote:
> On 1/21/20 8:09 PM, Dmitry Sychov wrote:
>> Thank you for quick reply! Yes I understand that I need a sort of
>> serializable-level isolation
>> when accessing the rings - I hope this could be done with a simple
>> atomic cmp-add after optimistic write ring update.
> 
> That's not a bad idea, that could definitely work, and would be more
> efficient than just grabbing a lock.
> 

If I got it right, it still will spam the system with atomics.
There is another pattern to consider, (seen in the networking world a lot). Just
one thread gets completions (i.e. calls io_uring_enter()), and than distributes
jobs to a thread pool.
And for this distribution there are a lot of way to do it efficiently. E.g. see
internal techniques in java fork join merge.

That's for completion part.

> Could also be made to work quite nicely with restartable sequences. I'd
> love to see liburing grow support for smarter sharing of a ring, that's
> really where that belongs.
> 
>> Correct me if I'am wrong, but from my understanding the kernel can
>> start to pick up newly written Uring jobs
>> without waiting for the "io_uring_enter" user level call and that's
>> why we need a write barrier(so that
>> the ring state is always valid for the kernel), else "io_uring_enter"
>> could serve as a write barrier itself as well...
> 
> By uring jobs, you mean SQEs, or submission queue entries? The kernel
> only picks up what you ask it to, it won't randomly just grab entries
> from the SQ ring unless you do an io_uring_enter() and tell it to
> consume N entries. The exception is if you setup the ring with
> IORING_SETUP_SQPOLL, in which case the kernel will maintain a submission
> thread. For that case, yes, the kernel can pickup an entry as soon as
> the SQ tail is updated by the application.
> 

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Waiting for requests completions from multiple threads
  2020-01-22  3:28       ` Pavel Begunkov
@ 2020-01-22 17:54         ` Dmitry Sychov
  0 siblings, 0 replies; 6+ messages in thread
From: Dmitry Sychov @ 2020-01-22 17:54 UTC (permalink / raw)
  To: io-uring

> Just one thread gets completions (i.e. calls io_uring_enter()), and than distributes
> jobs to a thread pool.

Yep, I'am thinking along this lines - one thread is getting the state update
for multiple events and spreading it to waiting worker threads through
semaphore or else
I assume there is no protection of multiple threads waking up from
single event(one of the
epoll original problems which was fixed with a new flag).

The worker threads are free to submit new job requests directly into
the SQE though(without a proxy job list).

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-01-22 17:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-01-22  2:45 Waiting for requests completions from multiple threads Dmitry Sychov
2020-01-22  2:51 ` Jens Axboe
2020-01-22  3:09   ` Dmitry Sychov
2020-01-22  3:16     ` Jens Axboe
2020-01-22  3:28       ` Pavel Begunkov
2020-01-22 17:54         ` Dmitry Sychov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox