From: "J. Hanne" <[email protected]>
To: Jens Axboe <[email protected]>
Cc: [email protected]
Subject: Re: Memory ordering description in io_uring.pdf
Date: Sun, 25 Sep 2022 14:03:11 +0200 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
Hi,
what needs to be brought into an consistent state:
- https://kernel.dk/io_uring.pdf (where is the source??)
- https://git.kernel.dk/cgit/liburing/tree/man/io_uring.7 (https://manpages.debian.org/testing/liburing-dev/io_uring.7.en.html)
- https://git.kernel.dk/cgit/liburing/tree/src/queue.c (using macros from https://git.kernel.dk/cgit/liburing/tree/src/include/liburing/barrier.h)
- https://github.com/torvalds/linux/blob/master/io_uring/io_uring.c
I’ll start with submission queue handling.
Quoting myself, my (possibly naive) approach for submitting entries would be, using gcc atomic builtins in absence of standardized C functions:
- (1) first read the SQ ring tail (without any ordering enforcement)
- (2) then use __atomic_load(__ATOMIC_ACQUIRE) to read the SQ ring head
- (3) then use __atomic_store(__ATOMIC_RELEASE) to update the SQ ring tail
Comparing with the existing documentation, (3) matches everywhere:
- io_uring.pdf confirms (3), as it inserts a write_barrier() before updating the ring tail
- io_uring.7 confirms (3), as it uses atomic_store_release to update the ring tail
- __io_uring_flush_sq in queue.c confirms (3), as it uses io_uring_smp_store_release to update the ring tail
(BUT, __io_uring_flush_sq in queue.c also special cases IORING_SETUP_SQPOLL, which I do not fully understand)
- io_uring.c says: "the application must use an appropriate smp_wmb() before writing the SQ tail (ordering SQ entry stores with the tail store)"
However, (2) is not so clear:
- io_uring.pdf never reads the ring head (but at least it mentions that the example is simplified as it is missing a queue full check)
- io_uring.7 never reads the ring head (as it does not check if the ring is full, which it does not even mention)
- __io_uring_flush_sq in queue.c confirms that, usually, acquire semantics are needed for reading the ring head, but seems to handle it elsewhere due to how it works internally (?)
- io_uring.c says: “[the application] needs a barrier ordering the SQ head load before writing new SQ entries (smp_load_acquire to read head will do)."
(BUT, it does not mention WHY the application needs to load the ring head)
Lastly, I absolutely do not understand the second write_barrier in io_uring.pdf after updating the ring tail. https://git.kernel.dk/cgit/liburing/commit/?id=ecefd7958eb32602df07f12e9808598b2c2de84b more or less just removed it. Before removal, it had this comment: “The kernel has the matching read barrier for reading the SQ tail.“. Yes, the kernel does need such a read barrier, but the write barrier *before* the ring tail update should be enough?!
So, my recommendation for documentation updates is:
- In io_uring.pdf, remove the second write_barrier after the ring tail update.
- In io_uring.pdf, augment the submission example with reading the ring head (to check for a queue-full condition), including a read_barrier after
- In io_uring.7, also add a queue-full check
- In io_uring.c extend the comment to say WHY the application needs to read the ring head
Comments?
Regards,
Johann
> Am 25.09.2022 um 12:34 schrieb J. Hanne <[email protected]>:
>
> Hi,
>
>> Am 22.09.2022 um 03:54 schrieb Jens Axboe <[email protected]>:
>>
>> On 9/18/22 10:56 AM, J. Hanne wrote:
>>> Hi,
>>>
>>> I have a couple of questions regarding the necessity of including memory
>>> barriers when using io_uring, as outlined in
>>> https://kernel.dk/io_uring.pdf. I'm fine with using liburing, but still I
>>> do want to understand what is going on behind the scenes, so any comment
>>> would be appreciated.
>>
>> In terms of the barriers, that doc is somewhat outdated...
> Ok, that pretty much explains why I got an inconsistent view after studying multiple sources…
>
>>
>>> Firstly, I wonder why memory barriers are required at all, when NOT using
>>> polled mode. Because requiring them in non-polled mode somehow implies that:
>>> - Memory re-ordering occurs across system-call boundaries (i.e. when
>>> submitting, the tail write could happen after the io_uring_enter
>>> syscall?!)
>>> - CPU data dependency checks do not work
>>> So, are memory barriers really required when just using a simple
>>> loop around io_uring_enter with completely synchronous processing?
>>
>> No, I don't beleive that they are. The exception is SQPOLL, as you mention,
>> as there's not necessarily a syscall involved with that.
>>
>>> Secondly, the examples in io_uring.pdf suggest that checking completion
>>> entries requires a read_barrier and a write_barrier and submitting entries
>>> requires *two* write_barriers. Really?
>>>
>>> My expectation would be, just as with "normal" inter-thread userspace ipc,
>>> that plain store-release and load-acquire semantics are sufficient, e.g.:
>>> - For reading completion entries:
>>> -- first read the CQ ring head (without any ordering enforcement)
>>> -- then use __atomic_load(__ATOMIC_ACQUIRE) to read the CQ ring tail
>>> -- then use __atomic_store(__ATOMIC_RELEASE) to update the CQ ring head
>>> - For submitting entries:
>>> -- first read the SQ ring tail (without any ordering enforcement)
>>> -- then use __atomic_load(__ATOMIC_ACQUIRE) to read the SQ ring head
>>> -- then use __atomic_store(__ATOMIC_RELEASE) to update the SQ ring tail
>>> Wouldn't these be sufficient?!
>>
>> Please check liburing to see what that does. Would be interested in
>> your feedback (and patches!). Largely x86 not caring too much about
>> these have meant that I think we've erred on the side of caution
>> on that front.
> Ok, I will check. My practical experience with memory barriers is limited however, so I’m not in the position to give a final judgement
>
>>
>>> Thirdly, io_uring.pdf and
>>> https://github.com/torvalds/linux/blob/master/io_uring/io_uring.c seem a
>>> little contradicting, at least from my reading:
>>>
>>> io_uring.pdf, in the completion entry example:
>>> - Includes a read_barrier() **BEFORE** it reads the CQ ring tail
>>> - Include a write_barrier() **AFTER** updating CQ head
>>>
>>> io_uring.c says on completion entries:
>>> - **AFTER** the application reads the CQ ring tail, it must use an appropriate
>>> smp_rmb() [...].
>>> - It also needs a smp_mb() **BEFORE** updating CQ head [...].
>>>
>>> io_uring.pdf, in the submission entry example:
>>> - Includes a write_barrier() **BEFORE** updating the SQ tail
>>> - Includes a write_barrier() **AFTER** updating the SQ tail
>>>
>>> io_uring.c says on submission entries:
>>> - [...] the application must use an appropriate smp_wmb() **BEFORE**
>>> writing the SQ tail
>>> (this matches io_uring.pdf)
>>> - And it needs a barrier ordering the SQ head load before writing new
>>> SQ entries
>>>
>>> I know, io_uring.pdf does mention that the memory ordering description
>>> is simplified. So maybe this is the whole explanation for my confusion?
>>
>> The canonical resource at this point is the kernel code, as some of
>> the revamping of the memory ordering happened way later than when
>> that doc was written. Would be nice to get it updated at some point.
> Ok, I will try. Where is the io_uring.pdf source (tex? markdown??)?
>
> Regards,
> Johann
prev parent reply other threads:[~2022-09-25 12:03 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-18 16:56 Memory ordering description in io_uring.pdf J. Hanne
2022-09-22 1:54 ` Jens Axboe
2022-09-25 10:34 ` J. Hanne
2022-09-25 12:03 ` J. Hanne [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox