From: Pavel Begunkov <[email protected]>
To: Jens Axboe <[email protected]>,
Jiufei Xue <[email protected]>,
[email protected]
Cc: [email protected]
Subject: Re: [PATCH v3 1/2] io_uring: change the poll events to be 32-bits
Date: Tue, 16 Jun 2020 22:20:46 +0300 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 16/06/2020 21:45, Jens Axboe wrote:
> On 6/16/20 7:58 AM, Jens Axboe wrote:
>> On 6/15/20 9:04 PM, Jiufei Xue wrote:
>>>
>>>
>>> On 2020/6/15 下午11:09, Jens Axboe wrote:
>>>> On 6/14/20 8:49 PM, Jiufei Xue wrote:
>>>>> Hi Jens,
>>>>>
>>>>> On 2020/6/13 上午12:48, Jens Axboe wrote:
>>>>>> On 6/12/20 8:58 AM, Jens Axboe wrote:
>>>>>>> On 6/11/20 8:30 PM, Jiufei Xue wrote:
>>>>>>>> poll events should be 32-bits to cover EPOLLEXCLUSIVE.
>>>>>>>>
>>>>>>>> Signed-off-by: Jiufei Xue <[email protected]>
>>>>>>>> ---
>>>>>>>> fs/io_uring.c | 4 ++--
>>>>>>>> include/uapi/linux/io_uring.h | 2 +-
>>>>>>>> tools/io_uring/liburing.h | 2 +-
>>>>>>>> 3 files changed, 4 insertions(+), 4 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>>>> index 47790a2..6250227 100644
>>>>>>>> --- a/fs/io_uring.c
>>>>>>>> +++ b/fs/io_uring.c
>>>>>>>> @@ -4602,7 +4602,7 @@ static void io_poll_queue_proc(struct file *file, struct wait_queue_head *head,
>>>>>>>> static int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>>>> {
>>>>>>>> struct io_poll_iocb *poll = &req->poll;
>>>>>>>> - u16 events;
>>>>>>>> + u32 events;
>>>>>>>>
>>>>>>>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>>>>>> return -EINVAL;
>>>>>>>> @@ -8196,7 +8196,7 @@ static int __init io_uring_init(void)
>>>>>>>> BUILD_BUG_SQE_ELEM(28, /* compat */ int, rw_flags);
>>>>>>>> BUILD_BUG_SQE_ELEM(28, /* compat */ __u32, rw_flags);
>>>>>>>> BUILD_BUG_SQE_ELEM(28, __u32, fsync_flags);
>>>>>>>> - BUILD_BUG_SQE_ELEM(28, __u16, poll_events);
>>>>>>>> + BUILD_BUG_SQE_ELEM(28, __u32, poll_events);
>>>>>>>> BUILD_BUG_SQE_ELEM(28, __u32, sync_range_flags);
>>>>>>>> BUILD_BUG_SQE_ELEM(28, __u32, msg_flags);
>>>>>>>> BUILD_BUG_SQE_ELEM(28, __u32, timeout_flags);
>>>>>>>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>>>>>>> index 92c2269..afc7edd 100644
>>>>>>>> --- a/include/uapi/linux/io_uring.h
>>>>>>>> +++ b/include/uapi/linux/io_uring.h
>>>>>>>> @@ -31,7 +31,7 @@ struct io_uring_sqe {
>>>>>>>> union {
>>>>>>>> __kernel_rwf_t rw_flags;
>>>>>>>> __u32 fsync_flags;
>>>>>>>> - __u16 poll_events;
>>>>>>>> + __u32 poll_events;
>>>>>>>> __u32 sync_range_flags;
>>>>>>>> __u32 msg_flags;
>>>>>>>> __u32 timeout_flags;
>>>>>>>
>>>>>>> We obviously have the space in there as most other flag members are 32-bits, but
>>>>>>> I'd want to double check if we're not changing the ABI here. Is this always
>>>>>>> going to be safe, on any platform, regardless of endianess etc?
>>>>>>
>>>>>> Double checked, and as I feared, we can't safely do this. We'll have to
>>>>>> do something like the below, grabbing an unused bit of the poll mask
>>>>>> space and if that's set, then store the fact that EPOLLEXCLUSIVE is set.
>>>>>> So probably best to turn this just into one patch, since it doesn't make
>>>>>> a lot of sense to do it as a prep patch at that point.
>>>>>>
>>>>> Yes, Agree about that. But I also fear that if the unused bit is used
>>>>> in the feature, it will bring unexpected behavior.
>>>>
>>>> Yeah, it's certainly not the prettiest and could potentially be fragile.
>>>> I'm open to suggestions, we need some way of signaling that the 32-bit
>>>> variant of the poll_events should be used. We could potentially make
>>>> this work by doing explicit layout for big endian vs little endian, that
>>>> might be prettier and wouldn't suffer from the "grab some random bit"
>>>> issue.
>>>>
>>> Thank you for your suggestion, I will think about it.
>>>
>>>>>> This does have the benefit of not growing io_poll_iocb. With your patch,
>>>>>> it'd go beyond a cacheline, and hence bump the size of the entire
>>>>>> io_iocb as well, which would be very unfortunate.
>>>>>>
>>>>> events in io_poll_iocb is 32-bits already, so why it will bump the
>>>>> size of the io_iocb structure with my patch?
>>>>
>>>> It's not 32-bits already, it's a __poll_t type which is 16-bits only.
>>>>
>>> Yes, it is a __poll_t type, but I found that __poll_t type is 32-bits
>>> with the definition below:
>>>
>>> typedef unsigned __bitwise __poll_t;
>>>
>>> And I also investigate it with crash:
>>> crash> io_poll_iocb -ox
>>> struct io_poll_iocb {
>>> [0x0] struct file *file;
>>> union {
>>> [0x8] struct wait_queue_head *head;
>>> [0x8] u64 addr;
>>> };
>>> [0x10] __poll_t events;
>>> [0x14] bool done;
>>> [0x15] bool canceled;
>>> [0x18] struct wait_queue_entry wait;
>>> }
>>
>> Yeah you're right, not sure why I figured it was 16-bits. But just
>> checking on my default build:
>>
>> axboe@x1 ~/gi/linux-block (block-5.8)> pahole -C io_poll_iocb fs/io_uring.o
>> struct io_poll_iocb {
>> struct file * file; /* 0 8 */
>> union {
>> struct wait_queue_head * head; /* 8 8 */
>> u64 addr; /* 8 8 */
>> }; /* 8 8 */
>> __poll_t events; /* 16 4 */
>> bool done; /* 20 1 */
>> bool canceled; /* 21 1 */
>>
>> /* XXX 2 bytes hole, try to pack */
>>
>> struct wait_queue_entry wait; /* 24 40 */
>>
>> /* size: 64, cachelines: 1, members: 6 */
>> /* sum members: 62, holes: 1, sum holes: 2 */
>> };
>>
>> and it's definitely 64-bytes in total size (as it should be), and the
>> 'events' is indeed 32-bits as well.
>>
>> So at least that's good, we don't need anything extra in there. If we
>> can solve the endian issue, then it should be trivial to use the full
>> 32-bits for the flags in the sqe.
>
> To get back to something that can move us forward, something like the
> below I _think_ will work. Then applications just use poll32_events and
> we don't have to handle this in any special way. Care to base on that
> and re-send the change?
>
>
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index 92c22699a5a7..5b3f6bd59437 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -31,7 +31,16 @@ struct io_uring_sqe {
> union {
> __kernel_rwf_t rw_flags;
> __u32 fsync_flags;
> - __u16 poll_events;
> + struct {
> +#ifdef __BIG_ENDIAN_BITFIELD
> + __u16 __unused_poll;
> + __u16 poll_events;
> +#else
> + __u16 poll_events;
> + __u16 __unused_poll;
> +#endif
> + };
> + __u32 poll32_events;
> __u32 sync_range_flags;
> __u32 msg_flags;
> __u32 timeout_flags;
>
That changes layout for big endian, isn't it? It's not ABI compatible.
--
Pavel Begunkov
next prev parent reply other threads:[~2020-06-16 19:22 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-12 2:30 [PATCH v3] io_uring: add EPOLLEXCLUSIVE flag for POLL_ADD operation Jiufei Xue
2020-06-12 2:30 ` [PATCH v3 1/2] io_uring: change the poll events to be 32-bits Jiufei Xue
2020-06-12 14:58 ` Jens Axboe
2020-06-12 16:48 ` Jens Axboe
2020-06-15 2:49 ` Jiufei Xue
2020-06-15 15:09 ` Jens Axboe
2020-06-16 3:04 ` Jiufei Xue
2020-06-16 13:58 ` Jens Axboe
2020-06-16 18:45 ` Jens Axboe
2020-06-16 19:20 ` Pavel Begunkov [this message]
2020-06-16 19:27 ` Jens Axboe
2020-06-16 19:21 ` Jens Axboe
2020-06-16 21:46 ` Pavel Begunkov
2020-06-17 0:06 ` Jens Axboe
2020-06-17 1:39 ` Jiufei Xue
2020-06-12 2:30 ` [PATCH v3 2/2] io_uring: use EPOLLEXCLUSIVE flag to aoid thundering herd type behavior Jiufei Xue
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox