public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Jiufei Xue <[email protected]>, [email protected]
Cc: [email protected]
Subject: Re: [PATCH v3 1/2] io_uring: change the poll events to be 32-bits
Date: Tue, 16 Jun 2020 12:45:20 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 6/16/20 7:58 AM, Jens Axboe wrote:
> On 6/15/20 9:04 PM, Jiufei Xue wrote:
>>
>>
>> On 2020/6/15 下午11:09, Jens Axboe wrote:
>>> On 6/14/20 8:49 PM, Jiufei Xue wrote:
>>>> Hi Jens,
>>>>
>>>> On 2020/6/13 上午12:48, Jens Axboe wrote:
>>>>> On 6/12/20 8:58 AM, Jens Axboe wrote:
>>>>>> On 6/11/20 8:30 PM, Jiufei Xue wrote:
>>>>>>> poll events should be 32-bits to cover EPOLLEXCLUSIVE.
>>>>>>>
>>>>>>> Signed-off-by: Jiufei Xue <[email protected]>
>>>>>>> ---
>>>>>>>  fs/io_uring.c                 | 4 ++--
>>>>>>>  include/uapi/linux/io_uring.h | 2 +-
>>>>>>>  tools/io_uring/liburing.h     | 2 +-
>>>>>>>  3 files changed, 4 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>>> index 47790a2..6250227 100644
>>>>>>> --- a/fs/io_uring.c
>>>>>>> +++ b/fs/io_uring.c
>>>>>>> @@ -4602,7 +4602,7 @@ static void io_poll_queue_proc(struct file *file, struct wait_queue_head *head,
>>>>>>>  static int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>>>  {
>>>>>>>  	struct io_poll_iocb *poll = &req->poll;
>>>>>>> -	u16 events;
>>>>>>> +	u32 events;
>>>>>>>  
>>>>>>>  	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>>>>>  		return -EINVAL;
>>>>>>> @@ -8196,7 +8196,7 @@ static int __init io_uring_init(void)
>>>>>>>  	BUILD_BUG_SQE_ELEM(28, /* compat */   int, rw_flags);
>>>>>>>  	BUILD_BUG_SQE_ELEM(28, /* compat */ __u32, rw_flags);
>>>>>>>  	BUILD_BUG_SQE_ELEM(28, __u32,  fsync_flags);
>>>>>>> -	BUILD_BUG_SQE_ELEM(28, __u16,  poll_events);
>>>>>>> +	BUILD_BUG_SQE_ELEM(28, __u32,  poll_events);
>>>>>>>  	BUILD_BUG_SQE_ELEM(28, __u32,  sync_range_flags);
>>>>>>>  	BUILD_BUG_SQE_ELEM(28, __u32,  msg_flags);
>>>>>>>  	BUILD_BUG_SQE_ELEM(28, __u32,  timeout_flags);
>>>>>>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>>>>>> index 92c2269..afc7edd 100644
>>>>>>> --- a/include/uapi/linux/io_uring.h
>>>>>>> +++ b/include/uapi/linux/io_uring.h
>>>>>>> @@ -31,7 +31,7 @@ struct io_uring_sqe {
>>>>>>>  	union {
>>>>>>>  		__kernel_rwf_t	rw_flags;
>>>>>>>  		__u32		fsync_flags;
>>>>>>> -		__u16		poll_events;
>>>>>>> +		__u32		poll_events;
>>>>>>>  		__u32		sync_range_flags;
>>>>>>>  		__u32		msg_flags;
>>>>>>>  		__u32		timeout_flags;
>>>>>>
>>>>>> We obviously have the space in there as most other flag members are 32-bits, but
>>>>>> I'd want to double check if we're not changing the ABI here. Is this always
>>>>>> going to be safe, on any platform, regardless of endianess etc?
>>>>>
>>>>> Double checked, and as I feared, we can't safely do this. We'll have to
>>>>> do something like the below, grabbing an unused bit of the poll mask
>>>>> space and if that's set, then store the fact that EPOLLEXCLUSIVE is set.
>>>>> So probably best to turn this just into one patch, since it doesn't make
>>>>> a lot of sense to do it as a prep patch at that point.
>>>>>
>>>> Yes, Agree about that. But I also fear that if the unused bit is used
>>>> in the feature, it will bring unexpected behavior.
>>>
>>> Yeah, it's certainly not the prettiest and could potentially be fragile.
>>> I'm open to suggestions, we need some way of signaling that the 32-bit
>>> variant of the poll_events should be used. We could potentially make
>>> this work by doing explicit layout for big endian vs little endian, that
>>> might be prettier and wouldn't suffer from the "grab some random bit"
>>> issue.
>>>
>> Thank you for your suggestion, I will think about it.
>>
>>>>> This does have the benefit of not growing io_poll_iocb. With your patch,
>>>>> it'd go beyond a cacheline, and hence bump the size of the entire
>>>>> io_iocb as well, which would be very unfortunate.
>>>>>
>>>> events in io_poll_iocb is 32-bits already, so why it will bump the
>>>> size of the io_iocb structure with my patch? 
>>>
>>> It's not 32-bits already, it's a __poll_t type which is 16-bits only.
>>>
>> Yes, it is a __poll_t type, but I found that __poll_t type is 32-bits
>> with the definition below:
>>
>> typedef unsigned __bitwise __poll_t;
>>
>> And I also investigate it with crash:
>> crash> io_poll_iocb -ox
>> struct io_poll_iocb {
>>    [0x0] struct file *file;
>>          union {
>>    [0x8]     struct wait_queue_head *head;
>>    [0x8]     u64 addr;
>>          };
>>   [0x10] __poll_t events;
>>   [0x14] bool done;
>>   [0x15] bool canceled;
>>   [0x18] struct wait_queue_entry wait;
>> }
> 
> Yeah you're right, not sure why I figured it was 16-bits. But just
> checking on my default build:
> 
> axboe@x1 ~/gi/linux-block (block-5.8)> pahole -C io_poll_iocb fs/io_uring.o
> struct io_poll_iocb {
> 	struct file *              file;                 /*     0     8 */
> 	union {
> 		struct wait_queue_head * head;           /*     8     8 */
> 		u64                addr;                 /*     8     8 */
> 	};                                               /*     8     8 */
> 	__poll_t                   events;               /*    16     4 */
> 	bool                       done;                 /*    20     1 */
> 	bool                       canceled;             /*    21     1 */
> 
> 	/* XXX 2 bytes hole, try to pack */
> 
> 	struct wait_queue_entry wait;                    /*    24    40 */
> 
> 	/* size: 64, cachelines: 1, members: 6 */
> 	/* sum members: 62, holes: 1, sum holes: 2 */
> };
> 
> and it's definitely 64-bytes in total size (as it should be), and the
> 'events' is indeed 32-bits as well.
> 
> So at least that's good, we don't need anything extra in there. If we
> can solve the endian issue, then it should be trivial to use the full
> 32-bits for the flags in the sqe.

To get back to something that can move us forward, something like the
below I _think_ will work. Then applications just use poll32_events and
we don't have to handle this in any special way. Care to base on that
and re-send the change?


diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 92c22699a5a7..5b3f6bd59437 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -31,7 +31,16 @@ struct io_uring_sqe {
 	union {
 		__kernel_rwf_t	rw_flags;
 		__u32		fsync_flags;
-		__u16		poll_events;
+		struct {
+#ifdef __BIG_ENDIAN_BITFIELD
+			__u16	__unused_poll;
+			__u16	poll_events;
+#else
+			__u16	poll_events;
+			__u16	__unused_poll;
+#endif
+		};
+		__u32		poll32_events;
 		__u32		sync_range_flags;
 		__u32		msg_flags;
 		__u32		timeout_flags;

-- 
Jens Axboe


  reply	other threads:[~2020-06-16 18:45 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-12  2:30 [PATCH v3] io_uring: add EPOLLEXCLUSIVE flag for POLL_ADD operation Jiufei Xue
2020-06-12  2:30 ` [PATCH v3 1/2] io_uring: change the poll events to be 32-bits Jiufei Xue
2020-06-12 14:58   ` Jens Axboe
2020-06-12 16:48     ` Jens Axboe
2020-06-15  2:49       ` Jiufei Xue
2020-06-15 15:09         ` Jens Axboe
2020-06-16  3:04           ` Jiufei Xue
2020-06-16 13:58             ` Jens Axboe
2020-06-16 18:45               ` Jens Axboe [this message]
2020-06-16 19:20                 ` Pavel Begunkov
2020-06-16 19:27                   ` Jens Axboe
2020-06-16 19:21                 ` Jens Axboe
2020-06-16 21:46                   ` Pavel Begunkov
2020-06-17  0:06                     ` Jens Axboe
2020-06-17  1:39                     ` Jiufei Xue
2020-06-12  2:30 ` [PATCH v3 2/2] io_uring: use EPOLLEXCLUSIVE flag to aoid thundering herd type behavior Jiufei Xue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox