From: Jens Axboe <[email protected]>
To: Stefan Metzmacher <[email protected]>
Cc: io-uring <[email protected]>,
Linus Torvalds <[email protected]>,
Samba Technical <[email protected]>
Subject: Re: Problems replacing epoll with io_uring in tevent
Date: Wed, 26 Oct 2022 11:08:54 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 10/26/22 10:00 AM, Stefan Metzmacher wrote:
> Hi Jens,
>
>> 9. The above works mostly, but manual testing and our massive automated regression tests
>> found the following problems:
>>
>> a) Related to https://github.com/axboe/liburing/issues/684 I was also wondering
>> about the return value of io_uring_submit_and_wait_timeout(),
>> but in addition I noticed that the timeout parameter doesn't work
>> as expected, the function will wait for two times of the timeout value.
>> I hacked a fix here:
>> https://git.samba.org/?p=metze/samba/wip.git;a=commitdiff;h=06fec644dd9f5748952c8b875878e0e1b0000d33
>
> Thanks for doing an upstream fix for the problem.
No problem - have you been able to test the current repo in general? I want to
cut a 2.3 release shortly, but since that particular change impacts any kind of
cqe waiting, would be nice to have a bit more confidence in it.
>> b) The major show stopper is that IORING_OP_POLL_ADD calls fget(), while
>> it's pending. Which means that a close() on the related file descriptor
>> is not able to remove the last reference! This is a problem for points 3.d,
>> 4.a and 4.b from above.
>>
>> I doubt IORING_ASYNC_CANCEL_FD would be able to be used as there's not always
>> code being triggered around a raw close() syscall, which could do a sync cancel.
>>
>> For now I plan to epoll_ctl (or IORING_OP_EPOLL_CTL) and only
>> register the fd from epoll_create() with IORING_OP_POLL_ADD
>> or I keep epoll_wait() as blocking call and register the io_uring fd
>> with epoll.
>>
>> I looked at the related epoll code and found that it uses
>> a list in struct file->f_ep to keep the reference, which gets
>> detached also via eventpoll_release_file() called from __fput()
>>
>> Would it be possible move IORING_OP_POLL_ADD to use a similar model
>> so that close() will causes a cqe with -ECANCELED?
>
> I'm currently trying to prototype for an IORING_POLL_CANCEL_ON_CLOSE
> flag that can be passed to POLL_ADD. With that we'll register
> the request in &req->file->f_uring_poll (similar to the file->f_ep list for epoll)
> Then we only get a real reference to the file during the call to
> vfs_poll() otherwise we drop the fget/fput reference and rely on
> an io_uring_poll_release_file() (similar to eventpoll_release_file())
> to cancel our registered poll request.
Yes, this is a bit tricky as we hold the file ref across the operation. I'd
be interested in seeing your approach to this, and also how it would
interact with registered files...
>> c) A simple pipe based performance test shows the following numbers:
>> - 'poll': Got 232387.31 pipe events/sec
>> - 'epoll': Got 251125.25 pipe events/sec
>> - 'samba_io_uring_ev': Got 210998.77 pipe events/sec
>> So the io_uring backend is even slower than the 'poll' backend.
>> I guess the reason is the constant re-submission of IORING_OP_POLL_ADD.
>
> Added some feature autodetection today and I'm now using
> IORING_SETUP_COOP_TASKRUN, IORING_SETUP_TASKRUN_FLAG,
> IORING_SETUP_SINGLE_ISSUER and IORING_SETUP_DEFER_TASKRUN if supported
> by the kernel.
>
> On a 6.1 kernel this improved the performance a lot, it's now faster
> than the epoll backend.
>
> The key flag is IORING_SETUP_DEFER_TASKRUN. On a different system than above
> I'm getting the following numbers:
> - epoll: Got 114450.16 pipe events/sec
> - poll: Got 105872.52 pipe events/sec
> - samba_io_uring_ev-without-defer_taskrun': Got 95564.22 pipe events/sec
> - samba_io_uring_ev-with-defer_taskrun': Got 122853.85 pipe events/sec
Any chance you can do a run with just IORING_SETUP_COOP_TASKRUN set? I'm
curious how big of an impact the IPI elimination is, where it slots in
compared to the defer taskrun and the default settings.
>> My hope would be that IORING_POLL_ADD_MULTI + IORING_POLL_ADD_LEVEL
>> would be able to avoid the performance problem with samba_io_uring_ev
>> compared to epoll.
>
> I've started with a IORING_POLL_ADD_MULTI + IORING_POLL_ADD_LEVEL prototype,
> but it's not very far yet and due to the IORING_SETUP_DEFER_TASKRUN
> speedup, I'll postpone working on IORING_POLL_ADD_LEVEL.
OK
--
Jens Axboe
next prev parent reply other threads:[~2022-10-26 17:09 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-18 14:42 Problems replacing epoll with io_uring in tevent Stefan Metzmacher
2022-10-26 16:00 ` Stefan Metzmacher
2022-10-26 17:08 ` Jens Axboe [this message]
2022-10-26 17:41 ` Pavel Begunkov
2022-10-27 8:18 ` Stefan Metzmacher
2022-10-27 8:05 ` Stefan Metzmacher
2022-10-27 19:25 ` Stefan Metzmacher
2022-12-28 16:19 ` Stefan Metzmacher
2023-01-18 15:56 ` Jens Axboe
2023-02-01 20:29 ` Stefan Metzmacher
2022-10-27 8:51 ` Stefan Metzmacher
2022-10-27 12:12 ` Jens Axboe
2022-10-27 18:35 ` Stefan Metzmacher
2022-10-27 19:54 ` Stefan Metzmacher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox