* [PATCH RFC] io_uring: extend io_uring_sqe flags bits
@ 2024-10-31 21:22 Jens Axboe
2024-11-01 2:12 ` Ming Lei
2024-11-01 16:59 ` Pavel Begunkov
0 siblings, 2 replies; 13+ messages in thread
From: Jens Axboe @ 2024-10-31 21:22 UTC (permalink / raw)
To: io-uring; +Cc: Ming Lei, Pavel Begunkov
In hindsight everything is clearer, but it probably should've been known
that 8 bits of ->flags would run out sooner than later. Rather than
gobble up the last bit for a random use case, add a bit that controls
whether or not ->personality is used as a flags2 argument. If that is
the case, then there's a new IOSQE2_PERSONALITY flag that tells io_uring
which personality field to read.
While this isn't the prettiest, it does allow extending with 15 extra
flags, and retains being able to use personality with any kind of
command. The exception is uring cmd, where personality2 will overlap
with the space set aside for SQE128. If they really need that, then that
would have to be done via a uring cmd flag.
Signed-off-by: Jens Axboe <[email protected]>
---
Was toying with this idea to allow for some more flags, I just don't
like grabbing the last flag and punting the problem both to the future
and to "somebody elses problem". Here's one way we could do it, without
rewriting the entire sqe into a v2. Which does need to happen at some
point, but preferably without pressing issues around.
I don't _hate_ it, there's really not a great way to do this. And I
do think personality is the least used of all the things, and probably
will never get used with uring_cmd. But if it had to work for that,
then there are certainly ways to pass in that info. Not that we
ever would...
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 77fd508d043a..8a45bf6a68ca 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -433,6 +433,7 @@ struct io_tw_state {
};
enum {
+ /* 8 bits of sqe->flags */
REQ_F_FIXED_FILE_BIT = IOSQE_FIXED_FILE_BIT,
REQ_F_IO_DRAIN_BIT = IOSQE_IO_DRAIN_BIT,
REQ_F_LINK_BIT = IOSQE_IO_LINK_BIT,
@@ -440,9 +441,13 @@ enum {
REQ_F_FORCE_ASYNC_BIT = IOSQE_ASYNC_BIT,
REQ_F_BUFFER_SELECT_BIT = IOSQE_BUFFER_SELECT_BIT,
REQ_F_CQE_SKIP_BIT = IOSQE_CQE_SKIP_SUCCESS_BIT,
+ REQ_F_FLAGS2_BIT = IOSQE_FLAGS2_BIT,
- /* first byte is taken by user flags, shift it to not overlap */
- REQ_F_FAIL_BIT = 8,
+ /* 16 bits of sqe->flags2 */
+ REQ_F_PERSONALITY_BIT = IOSQE2_PERSONALITY_BIT + 8,
+
+ /* first byte taken by sqe->flags, next 2 by sqe->flags2 */
+ REQ_F_FAIL_BIT = 24,
REQ_F_INFLIGHT_BIT,
REQ_F_CUR_POS_BIT,
REQ_F_NOWAIT_BIT,
@@ -492,6 +497,10 @@ enum {
REQ_F_BUFFER_SELECT = IO_REQ_FLAG(REQ_F_BUFFER_SELECT_BIT),
/* IOSQE_CQE_SKIP_SUCCESS */
REQ_F_CQE_SKIP = IO_REQ_FLAG(REQ_F_CQE_SKIP_BIT),
+ /* ->flags2 is valid */
+ REQ_F_FLAGS2 = IO_REQ_FLAG(REQ_F_FLAGS2_BIT),
+
+ REQ_F_PERSONALITY = IO_REQ_FLAG(REQ_F_PERSONALITY_BIT),
/* fail rest of links */
REQ_F_FAIL = IO_REQ_FLAG(REQ_F_FAIL_BIT),
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index ce58c4590de6..c7c3ba69ffdd 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -82,8 +82,12 @@ struct io_uring_sqe {
/* for grouped buffer selection */
__u16 buf_group;
} __attribute__((packed));
- /* personality to use, if used */
- __u16 personality;
+ union {
+ /* personality to use, if used */
+ __u16 personality;
+ /* 2nd set of flags, can't be used with personality */
+ __u16 flags2;
+ };
union {
__s32 splice_fd_in;
__u32 file_index;
@@ -99,11 +103,17 @@ struct io_uring_sqe {
__u64 __pad2[1];
};
__u64 optval;
- /*
- * If the ring is initialized with IORING_SETUP_SQE128, then
- * this field is used for 80 bytes of arbitrary command data
- */
- __u8 cmd[0];
+ struct {
+ /*
+ * If the ring is initialized with IORING_SETUP_SQE128,
+ * then this field is used for 80 bytes of arbitrary
+ * command data
+ */
+ __u8 cmd[0];
+
+ /* personality to use, if IOSQE2_PERSONALITY set */
+ __u16 personality2;
+ };
};
};
@@ -124,6 +134,11 @@ enum io_uring_sqe_flags_bit {
IOSQE_ASYNC_BIT,
IOSQE_BUFFER_SELECT_BIT,
IOSQE_CQE_SKIP_SUCCESS_BIT,
+ IOSQE_FLAGS2_BIT,
+};
+
+enum io_uring_sqe_flags2_bit {
+ IOSQE2_PERSONALITY_BIT,
};
/*
@@ -143,6 +158,14 @@ enum io_uring_sqe_flags_bit {
#define IOSQE_BUFFER_SELECT (1U << IOSQE_BUFFER_SELECT_BIT)
/* don't post CQE if request succeeded */
#define IOSQE_CQE_SKIP_SUCCESS (1U << IOSQE_CQE_SKIP_SUCCESS_BIT)
+/* ->flags2 is valid */
+#define IOSQE_FLAGS2 (1U << IOSQE_FLAGS2_BIT)
+
+/*
+ * sqe->flags2
+ */
+ /* if set, sqe->personality2 contains personality */
+#define IOSQE2_PERSONALITY (1U << IOSQE2_PERSONALITY_BIT)
/*
* io_uring_setup() flags
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 1149fba20503..c2bbadd5640d 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -109,7 +109,8 @@
IOSQE_IO_HARDLINK | IOSQE_ASYNC)
#define SQE_VALID_FLAGS (SQE_COMMON_FLAGS | IOSQE_BUFFER_SELECT | \
- IOSQE_IO_DRAIN | IOSQE_CQE_SKIP_SUCCESS)
+ IOSQE_IO_DRAIN | IOSQE_CQE_SKIP_SUCCESS | \
+ IOSQE_FLAGS2 | IOSQE2_PERSONALITY)
#define IO_REQ_CLEAN_FLAGS (REQ_F_BUFFER_SELECTED | REQ_F_NEED_CLEANUP | \
REQ_F_POLLED | REQ_F_INFLIGHT | REQ_F_CREDS | \
@@ -2032,6 +2033,8 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
req->opcode = opcode = READ_ONCE(sqe->opcode);
/* same numerical values with corresponding REQ_F_*, safe to copy */
sqe_flags = READ_ONCE(sqe->flags);
+ if (sqe_flags & REQ_F_FLAGS2)
+ sqe_flags |= (__u32) READ_ONCE(sqe->flags2) << 8;
req->flags = (__force io_req_flags_t) sqe_flags;
req->cqe.user_data = READ_ONCE(sqe->user_data);
req->file = NULL;
@@ -2095,8 +2098,12 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
}
}
- personality = READ_ONCE(sqe->personality);
- if (personality) {
+ personality = 0;
+ if (req->flags & REQ_F_PERSONALITY)
+ personality = READ_ONCE(sqe->personality2);
+ else if (!(req->flags & REQ_F_FLAGS2))
+ personality = READ_ONCE(sqe->personality);
+ if (unlikely(personality)) {
int ret;
req->creds = xa_load(&ctx->personalities, personality);
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 535909a38e76..ee04e0c48672 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -200,7 +200,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
- if (sqe->__pad1)
+ if (sqe->__pad1 || req->flags & REQ_F_PERSONALITY)
return -EINVAL;
ioucmd->flags = READ_ONCE(sqe->uring_cmd_flags);
--
Jens Axboe
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH RFC] io_uring: extend io_uring_sqe flags bits
2024-10-31 21:22 [PATCH RFC] io_uring: extend io_uring_sqe flags bits Jens Axboe
@ 2024-11-01 2:12 ` Ming Lei
2024-11-01 2:42 ` Ming Lei
2024-11-01 13:58 ` Jens Axboe
2024-11-01 16:59 ` Pavel Begunkov
1 sibling, 2 replies; 13+ messages in thread
From: Ming Lei @ 2024-11-01 2:12 UTC (permalink / raw)
To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, ming.lei
On Thu, Oct 31, 2024 at 03:22:18PM -0600, Jens Axboe wrote:
> In hindsight everything is clearer, but it probably should've been known
> that 8 bits of ->flags would run out sooner than later. Rather than
> gobble up the last bit for a random use case, add a bit that controls
> whether or not ->personality is used as a flags2 argument. If that is
> the case, then there's a new IOSQE2_PERSONALITY flag that tells io_uring
> which personality field to read.
>
> While this isn't the prettiest, it does allow extending with 15 extra
> flags, and retains being able to use personality with any kind of
> command. The exception is uring cmd, where personality2 will overlap
> with the space set aside for SQE128. If they really need that, then that
The space is the 1st `short` for uring_cmd, instead of SQE128 only.
Also it is overlapped with ->optval and ->addr3, so just wondering why not
use ->__pad2?
Another ways is to use __pad2 for sqe2_flags for non-uring_cmd, and for
uring_cmd, use its top 16 as sqe2_flags, this way does work, but it is
just a bit ugly to use.
Thanks,
Ming
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC] io_uring: extend io_uring_sqe flags bits
2024-11-01 2:12 ` Ming Lei
@ 2024-11-01 2:42 ` Ming Lei
2024-11-01 13:59 ` Jens Axboe
2024-11-01 13:58 ` Jens Axboe
1 sibling, 1 reply; 13+ messages in thread
From: Ming Lei @ 2024-11-01 2:42 UTC (permalink / raw)
To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, ming.lei
On Fri, Nov 01, 2024 at 10:12:25AM +0800, Ming Lei wrote:
> On Thu, Oct 31, 2024 at 03:22:18PM -0600, Jens Axboe wrote:
> > In hindsight everything is clearer, but it probably should've been known
> > that 8 bits of ->flags would run out sooner than later. Rather than
> > gobble up the last bit for a random use case, add a bit that controls
> > whether or not ->personality is used as a flags2 argument. If that is
> > the case, then there's a new IOSQE2_PERSONALITY flag that tells io_uring
> > which personality field to read.
> >
> > While this isn't the prettiest, it does allow extending with 15 extra
> > flags, and retains being able to use personality with any kind of
> > command. The exception is uring cmd, where personality2 will overlap
> > with the space set aside for SQE128. If they really need that, then that
>
> The space is the 1st `short` for uring_cmd, instead of SQE128 only.
>
> Also it is overlapped with ->optval and ->addr3, so just wondering why not
> use ->__pad2?
>
> Another ways is to use __pad2 for sqe2_flags for non-uring_cmd, and for
> uring_cmd, use its top 16 as sqe2_flags, this way does work, but it is
> just a bit ugly to use.
Also IOSQE2_PERSONALITY doesn't have to be per-SQE, and it can be one
feature of IORING_FEAT_IOSQE2_PERSONALITY, that is why I thought it is
fine to take the 7th bit as SQE_GROUP now.
Thanks,
Ming
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC] io_uring: extend io_uring_sqe flags bits
2024-11-01 2:42 ` Ming Lei
@ 2024-11-01 13:59 ` Jens Axboe
2024-11-01 14:34 ` Ming Lei
0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2024-11-01 13:59 UTC (permalink / raw)
To: Ming Lei; +Cc: io-uring, Pavel Begunkov
On 10/31/24 8:42 PM, Ming Lei wrote:
> On Fri, Nov 01, 2024 at 10:12:25AM +0800, Ming Lei wrote:
>> On Thu, Oct 31, 2024 at 03:22:18PM -0600, Jens Axboe wrote:
>>> In hindsight everything is clearer, but it probably should've been known
>>> that 8 bits of ->flags would run out sooner than later. Rather than
>>> gobble up the last bit for a random use case, add a bit that controls
>>> whether or not ->personality is used as a flags2 argument. If that is
>>> the case, then there's a new IOSQE2_PERSONALITY flag that tells io_uring
>>> which personality field to read.
>>>
>>> While this isn't the prettiest, it does allow extending with 15 extra
>>> flags, and retains being able to use personality with any kind of
>>> command. The exception is uring cmd, where personality2 will overlap
>>> with the space set aside for SQE128. If they really need that, then that
>>
>> The space is the 1st `short` for uring_cmd, instead of SQE128 only.
>>
>> Also it is overlapped with ->optval and ->addr3, so just wondering why not
>> use ->__pad2?
>>
>> Another ways is to use __pad2 for sqe2_flags for non-uring_cmd, and for
>> uring_cmd, use its top 16 as sqe2_flags, this way does work, but it is
>> just a bit ugly to use.
>
> Also IOSQE2_PERSONALITY doesn't have to be per-SQE, and it can be one
> feature of IORING_FEAT_IOSQE2_PERSONALITY, that is why I thought it is
> fine to take the 7th bit as SQE_GROUP now.
Not sure I follow your thinking there, can you expand?
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC] io_uring: extend io_uring_sqe flags bits
2024-11-01 13:59 ` Jens Axboe
@ 2024-11-01 14:34 ` Ming Lei
2024-11-01 14:42 ` Jens Axboe
0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2024-11-01 14:34 UTC (permalink / raw)
To: Jens Axboe; +Cc: io-uring, Pavel Begunkov
On Fri, Nov 01, 2024 at 07:59:38AM -0600, Jens Axboe wrote:
> On 10/31/24 8:42 PM, Ming Lei wrote:
> > On Fri, Nov 01, 2024 at 10:12:25AM +0800, Ming Lei wrote:
> >> On Thu, Oct 31, 2024 at 03:22:18PM -0600, Jens Axboe wrote:
> >>> In hindsight everything is clearer, but it probably should've been known
> >>> that 8 bits of ->flags would run out sooner than later. Rather than
> >>> gobble up the last bit for a random use case, add a bit that controls
> >>> whether or not ->personality is used as a flags2 argument. If that is
> >>> the case, then there's a new IOSQE2_PERSONALITY flag that tells io_uring
> >>> which personality field to read.
> >>>
> >>> While this isn't the prettiest, it does allow extending with 15 extra
> >>> flags, and retains being able to use personality with any kind of
> >>> command. The exception is uring cmd, where personality2 will overlap
> >>> with the space set aside for SQE128. If they really need that, then that
> >>
> >> The space is the 1st `short` for uring_cmd, instead of SQE128 only.
> >>
> >> Also it is overlapped with ->optval and ->addr3, so just wondering why not
> >> use ->__pad2?
> >>
> >> Another ways is to use __pad2 for sqe2_flags for non-uring_cmd, and for
> >> uring_cmd, use its top 16 as sqe2_flags, this way does work, but it is
> >> just a bit ugly to use.
> >
> > Also IOSQE2_PERSONALITY doesn't have to be per-SQE, and it can be one
> > feature of IORING_FEAT_IOSQE2_PERSONALITY, that is why I thought it is
> > fine to take the 7th bit as SQE_GROUP now.
>
> Not sure I follow your thinking there, can you expand?
It could be one io_uring setup flag, such as IORING_SETUP_IOSQE2_PERSONALITY.
If this flag is set, take __pad2 as sqe2_flags, otherwise use current way, so
it doesn't have to take bit7 of sqe_flags for this purpose.
Also in future, if uring_cmd needs personality, it still may reuse top
16bit of uring_cmd_flags for that.
Thanks,
Ming
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC] io_uring: extend io_uring_sqe flags bits
2024-11-01 14:34 ` Ming Lei
@ 2024-11-01 14:42 ` Jens Axboe
2024-11-01 15:01 ` Ming Lei
2024-11-01 16:55 ` Pavel Begunkov
0 siblings, 2 replies; 13+ messages in thread
From: Jens Axboe @ 2024-11-01 14:42 UTC (permalink / raw)
To: Ming Lei; +Cc: io-uring, Pavel Begunkov
On 11/1/24 8:34 AM, Ming Lei wrote:
> On Fri, Nov 01, 2024 at 07:59:38AM -0600, Jens Axboe wrote:
>> On 10/31/24 8:42 PM, Ming Lei wrote:
>>> On Fri, Nov 01, 2024 at 10:12:25AM +0800, Ming Lei wrote:
>>>> On Thu, Oct 31, 2024 at 03:22:18PM -0600, Jens Axboe wrote:
>>>>> In hindsight everything is clearer, but it probably should've been known
>>>>> that 8 bits of ->flags would run out sooner than later. Rather than
>>>>> gobble up the last bit for a random use case, add a bit that controls
>>>>> whether or not ->personality is used as a flags2 argument. If that is
>>>>> the case, then there's a new IOSQE2_PERSONALITY flag that tells io_uring
>>>>> which personality field to read.
>>>>>
>>>>> While this isn't the prettiest, it does allow extending with 15 extra
>>>>> flags, and retains being able to use personality with any kind of
>>>>> command. The exception is uring cmd, where personality2 will overlap
>>>>> with the space set aside for SQE128. If they really need that, then that
>>>>
>>>> The space is the 1st `short` for uring_cmd, instead of SQE128 only.
>>>>
>>>> Also it is overlapped with ->optval and ->addr3, so just wondering why not
>>>> use ->__pad2?
>>>>
>>>> Another ways is to use __pad2 for sqe2_flags for non-uring_cmd, and for
>>>> uring_cmd, use its top 16 as sqe2_flags, this way does work, but it is
>>>> just a bit ugly to use.
>>>
>>> Also IOSQE2_PERSONALITY doesn't have to be per-SQE, and it can be one
>>> feature of IORING_FEAT_IOSQE2_PERSONALITY, that is why I thought it is
>>> fine to take the 7th bit as SQE_GROUP now.
>>
>> Not sure I follow your thinking there, can you expand?
>
> It could be one io_uring setup flag, such as
> IORING_SETUP_IOSQE2_PERSONALITY.
>
> If this flag is set, take __pad2 as sqe2_flags, otherwise use current
> way, so it doesn't have to take bit7 of sqe_flags for this purpose.
Would probably have to be a IORING_SETUP_IOSQE2_FLAGS or something in
general. And while that could work, not a huge fan of that. I think we
should retain that for when a v2 of the sqe is done, to coordinate which
version to use.
> Also in future, if uring_cmd needs personality, it still may reuse top
> 16bit of uring_cmd_flags for that.
Right, that's what I referred to in terms of uring_cmd just having its
own way to set personality.
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC] io_uring: extend io_uring_sqe flags bits
2024-11-01 14:42 ` Jens Axboe
@ 2024-11-01 15:01 ` Ming Lei
2024-11-01 15:04 ` Jens Axboe
2024-11-01 16:55 ` Pavel Begunkov
1 sibling, 1 reply; 13+ messages in thread
From: Ming Lei @ 2024-11-01 15:01 UTC (permalink / raw)
To: Jens Axboe; +Cc: io-uring, Pavel Begunkov
On Fri, Nov 01, 2024 at 08:42:42AM -0600, Jens Axboe wrote:
> On 11/1/24 8:34 AM, Ming Lei wrote:
> > On Fri, Nov 01, 2024 at 07:59:38AM -0600, Jens Axboe wrote:
> >> On 10/31/24 8:42 PM, Ming Lei wrote:
> >>> On Fri, Nov 01, 2024 at 10:12:25AM +0800, Ming Lei wrote:
> >>>> On Thu, Oct 31, 2024 at 03:22:18PM -0600, Jens Axboe wrote:
> >>>>> In hindsight everything is clearer, but it probably should've been known
> >>>>> that 8 bits of ->flags would run out sooner than later. Rather than
> >>>>> gobble up the last bit for a random use case, add a bit that controls
> >>>>> whether or not ->personality is used as a flags2 argument. If that is
> >>>>> the case, then there's a new IOSQE2_PERSONALITY flag that tells io_uring
> >>>>> which personality field to read.
> >>>>>
> >>>>> While this isn't the prettiest, it does allow extending with 15 extra
> >>>>> flags, and retains being able to use personality with any kind of
> >>>>> command. The exception is uring cmd, where personality2 will overlap
> >>>>> with the space set aside for SQE128. If they really need that, then that
> >>>>
> >>>> The space is the 1st `short` for uring_cmd, instead of SQE128 only.
> >>>>
> >>>> Also it is overlapped with ->optval and ->addr3, so just wondering why not
> >>>> use ->__pad2?
> >>>>
> >>>> Another ways is to use __pad2 for sqe2_flags for non-uring_cmd, and for
> >>>> uring_cmd, use its top 16 as sqe2_flags, this way does work, but it is
> >>>> just a bit ugly to use.
> >>>
> >>> Also IOSQE2_PERSONALITY doesn't have to be per-SQE, and it can be one
> >>> feature of IORING_FEAT_IOSQE2_PERSONALITY, that is why I thought it is
> >>> fine to take the 7th bit as SQE_GROUP now.
> >>
> >> Not sure I follow your thinking there, can you expand?
> >
> > It could be one io_uring setup flag, such as
> > IORING_SETUP_IOSQE2_PERSONALITY.
> >
> > If this flag is set, take __pad2 as sqe2_flags, otherwise use current
> > way, so it doesn't have to take bit7 of sqe_flags for this purpose.
>
> Would probably have to be a IORING_SETUP_IOSQE2_FLAGS or something in
> general. And while that could work, not a huge fan of that. I think we
> should retain that for when a v2 of the sqe is done, to coordinate which
> version to use.
Fair enough.
Now there are 16bits for new features, which may put v2 off long enough.
>
> > Also in future, if uring_cmd needs personality, it still may reuse top
> > 16bit of uring_cmd_flags for that.
>
> Right, that's what I referred to in terms of uring_cmd just having its
> own way to set personality.
Then this approach is safe to go, imo.
Thanks,
Ming
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC] io_uring: extend io_uring_sqe flags bits
2024-11-01 15:01 ` Ming Lei
@ 2024-11-01 15:04 ` Jens Axboe
0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2024-11-01 15:04 UTC (permalink / raw)
To: Ming Lei; +Cc: io-uring, Pavel Begunkov
On 11/1/24 9:01 AM, Ming Lei wrote:
> On Fri, Nov 01, 2024 at 08:42:42AM -0600, Jens Axboe wrote:
>> On 11/1/24 8:34 AM, Ming Lei wrote:
>>> On Fri, Nov 01, 2024 at 07:59:38AM -0600, Jens Axboe wrote:
>>>> On 10/31/24 8:42 PM, Ming Lei wrote:
>>>>> On Fri, Nov 01, 2024 at 10:12:25AM +0800, Ming Lei wrote:
>>>>>> On Thu, Oct 31, 2024 at 03:22:18PM -0600, Jens Axboe wrote:
>>>>>>> In hindsight everything is clearer, but it probably should've been known
>>>>>>> that 8 bits of ->flags would run out sooner than later. Rather than
>>>>>>> gobble up the last bit for a random use case, add a bit that controls
>>>>>>> whether or not ->personality is used as a flags2 argument. If that is
>>>>>>> the case, then there's a new IOSQE2_PERSONALITY flag that tells io_uring
>>>>>>> which personality field to read.
>>>>>>>
>>>>>>> While this isn't the prettiest, it does allow extending with 15 extra
>>>>>>> flags, and retains being able to use personality with any kind of
>>>>>>> command. The exception is uring cmd, where personality2 will overlap
>>>>>>> with the space set aside for SQE128. If they really need that, then that
>>>>>>
>>>>>> The space is the 1st `short` for uring_cmd, instead of SQE128 only.
>>>>>>
>>>>>> Also it is overlapped with ->optval and ->addr3, so just wondering why not
>>>>>> use ->__pad2?
>>>>>>
>>>>>> Another ways is to use __pad2 for sqe2_flags for non-uring_cmd, and for
>>>>>> uring_cmd, use its top 16 as sqe2_flags, this way does work, but it is
>>>>>> just a bit ugly to use.
>>>>>
>>>>> Also IOSQE2_PERSONALITY doesn't have to be per-SQE, and it can be one
>>>>> feature of IORING_FEAT_IOSQE2_PERSONALITY, that is why I thought it is
>>>>> fine to take the 7th bit as SQE_GROUP now.
>>>>
>>>> Not sure I follow your thinking there, can you expand?
>>>
>>> It could be one io_uring setup flag, such as
>>> IORING_SETUP_IOSQE2_PERSONALITY.
>>>
>>> If this flag is set, take __pad2 as sqe2_flags, otherwise use current
>>> way, so it doesn't have to take bit7 of sqe_flags for this purpose.
>>
>> Would probably have to be a IORING_SETUP_IOSQE2_FLAGS or something in
>> general. And while that could work, not a huge fan of that. I think we
>> should retain that for when a v2 of the sqe is done, to coordinate which
>> version to use.
>
> Fair enough.
>
> Now there are 16bits for new features, which may put v2 off long enough.
Exactly, hopefully that'll push the need out quite a bit, so we have
time to do something nice for v2.
>>> Also in future, if uring_cmd needs personality, it still may reuse top
>>> 16bit of uring_cmd_flags for that.
>>
>> Right, that's what I referred to in terms of uring_cmd just having its
>> own way to set personality.
>
> Then this approach is safe to go, imo.
Thanks I think so too, and it'll unblock the sqe grouping. So at least
that paves the way for the first part of your patchset. I'll post a v2
of it shortly.
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC] io_uring: extend io_uring_sqe flags bits
2024-11-01 14:42 ` Jens Axboe
2024-11-01 15:01 ` Ming Lei
@ 2024-11-01 16:55 ` Pavel Begunkov
2024-11-01 16:58 ` Jens Axboe
1 sibling, 1 reply; 13+ messages in thread
From: Pavel Begunkov @ 2024-11-01 16:55 UTC (permalink / raw)
To: Jens Axboe, Ming Lei; +Cc: io-uring
On 11/1/24 14:42, Jens Axboe wrote:
> On 11/1/24 8:34 AM, Ming Lei wrote:
>> On Fri, Nov 01, 2024 at 07:59:38AM -0600, Jens Axboe wrote:
>>> On 10/31/24 8:42 PM, Ming Lei wrote:
>>>> On Fri, Nov 01, 2024 at 10:12:25AM +0800, Ming Lei wrote:
>>>>> On Thu, Oct 31, 2024 at 03:22:18PM -0600, Jens Axboe wrote:
>>>>>> In hindsight everything is clearer, but it probably should've been known
>>>>>> that 8 bits of ->flags would run out sooner than later. Rather than
>>>>>> gobble up the last bit for a random use case, add a bit that controls
>>>>>> whether or not ->personality is used as a flags2 argument. If that is
>>>>>> the case, then there's a new IOSQE2_PERSONALITY flag that tells io_uring
>>>>>> which personality field to read.
>>>>>>
>>>>>> While this isn't the prettiest, it does allow extending with 15 extra
>>>>>> flags, and retains being able to use personality with any kind of
>>>>>> command. The exception is uring cmd, where personality2 will overlap
>>>>>> with the space set aside for SQE128. If they really need that, then that
>>>>>
>>>>> The space is the 1st `short` for uring_cmd, instead of SQE128 only.
>>>>>
>>>>> Also it is overlapped with ->optval and ->addr3, so just wondering why not
>>>>> use ->__pad2?
>>>>>
>>>>> Another ways is to use __pad2 for sqe2_flags for non-uring_cmd, and for
>>>>> uring_cmd, use its top 16 as sqe2_flags, this way does work, but it is
>>>>> just a bit ugly to use.
>>>>
>>>> Also IOSQE2_PERSONALITY doesn't have to be per-SQE, and it can be one
>>>> feature of IORING_FEAT_IOSQE2_PERSONALITY, that is why I thought it is
>>>> fine to take the 7th bit as SQE_GROUP now.
>>>
>>> Not sure I follow your thinking there, can you expand?
>>
>> It could be one io_uring setup flag, such as
>> IORING_SETUP_IOSQE2_PERSONALITY.
>>
>> If this flag is set, take __pad2 as sqe2_flags, otherwise use current
>> way, so it doesn't have to take bit7 of sqe_flags for this purpose.
>
> Would probably have to be a IORING_SETUP_IOSQE2_FLAGS or something in
> general. And while that could work, not a huge fan of that. I think we
> should retain that for when a v2 of the sqe is done, to coordinate which
> version to use.
A setup flag over an sqe flag for marking IMHO would be a _much_
better approach. It doesn't take an SQE bit for nothing, you can
parse and process it in the slow setup path, enable static keys
for the hot path and so on.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC] io_uring: extend io_uring_sqe flags bits
2024-11-01 16:55 ` Pavel Begunkov
@ 2024-11-01 16:58 ` Jens Axboe
0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2024-11-01 16:58 UTC (permalink / raw)
To: Pavel Begunkov, Ming Lei; +Cc: io-uring
On 11/1/24 10:55 AM, Pavel Begunkov wrote:
> On 11/1/24 14:42, Jens Axboe wrote:
>> On 11/1/24 8:34 AM, Ming Lei wrote:
>>> On Fri, Nov 01, 2024 at 07:59:38AM -0600, Jens Axboe wrote:
>>>> On 10/31/24 8:42 PM, Ming Lei wrote:
>>>>> On Fri, Nov 01, 2024 at 10:12:25AM +0800, Ming Lei wrote:
>>>>>> On Thu, Oct 31, 2024 at 03:22:18PM -0600, Jens Axboe wrote:
>>>>>>> In hindsight everything is clearer, but it probably should've been known
>>>>>>> that 8 bits of ->flags would run out sooner than later. Rather than
>>>>>>> gobble up the last bit for a random use case, add a bit that controls
>>>>>>> whether or not ->personality is used as a flags2 argument. If that is
>>>>>>> the case, then there's a new IOSQE2_PERSONALITY flag that tells io_uring
>>>>>>> which personality field to read.
>>>>>>>
>>>>>>> While this isn't the prettiest, it does allow extending with 15 extra
>>>>>>> flags, and retains being able to use personality with any kind of
>>>>>>> command. The exception is uring cmd, where personality2 will overlap
>>>>>>> with the space set aside for SQE128. If they really need that, then that
>>>>>>
>>>>>> The space is the 1st `short` for uring_cmd, instead of SQE128 only.
>>>>>>
>>>>>> Also it is overlapped with ->optval and ->addr3, so just wondering why not
>>>>>> use ->__pad2?
>>>>>>
>>>>>> Another ways is to use __pad2 for sqe2_flags for non-uring_cmd, and for
>>>>>> uring_cmd, use its top 16 as sqe2_flags, this way does work, but it is
>>>>>> just a bit ugly to use.
>>>>>
>>>>> Also IOSQE2_PERSONALITY doesn't have to be per-SQE, and it can be one
>>>>> feature of IORING_FEAT_IOSQE2_PERSONALITY, that is why I thought it is
>>>>> fine to take the 7th bit as SQE_GROUP now.
>>>>
>>>> Not sure I follow your thinking there, can you expand?
>>>
>>> It could be one io_uring setup flag, such as
>>> IORING_SETUP_IOSQE2_PERSONALITY.
>>>
>>> If this flag is set, take __pad2 as sqe2_flags, otherwise use current
>>> way, so it doesn't have to take bit7 of sqe_flags for this purpose.
>>
>> Would probably have to be a IORING_SETUP_IOSQE2_FLAGS or something in
>> general. And while that could work, not a huge fan of that. I think we
>> should retain that for when a v2 of the sqe is done, to coordinate which
>> version to use.
>
> A setup flag over an sqe flag for marking IMHO would be a _much_
> better approach. It doesn't take an SQE bit for nothing, you can
> parse and process it in the slow setup path, enable static keys
> for the hot path and so on.
Alright, if both of you like that, then let me rework the flags patch to
do that instead. If we go that route, then we can just defer doing the
actual setup flag until we need a new bit beyond filling the last
sqe->flags bit with GROUP.
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC] io_uring: extend io_uring_sqe flags bits
2024-11-01 2:12 ` Ming Lei
2024-11-01 2:42 ` Ming Lei
@ 2024-11-01 13:58 ` Jens Axboe
1 sibling, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2024-11-01 13:58 UTC (permalink / raw)
To: Ming Lei; +Cc: io-uring, Pavel Begunkov
On 10/31/24 8:12 PM, Ming Lei wrote:
> On Thu, Oct 31, 2024 at 03:22:18PM -0600, Jens Axboe wrote:
>> In hindsight everything is clearer, but it probably should've been known
>> that 8 bits of ->flags would run out sooner than later. Rather than
>> gobble up the last bit for a random use case, add a bit that controls
>> whether or not ->personality is used as a flags2 argument. If that is
>> the case, then there's a new IOSQE2_PERSONALITY flag that tells io_uring
>> which personality field to read.
>>
>> While this isn't the prettiest, it does allow extending with 15 extra
>> flags, and retains being able to use personality with any kind of
>> command. The exception is uring cmd, where personality2 will overlap
>> with the space set aside for SQE128. If they really need that, then that
>
> The space is the 1st `short` for uring_cmd, instead of SQE128 only.
>
> Also it is overlapped with ->optval and ->addr3, so just wondering why not
> use ->__pad2?
>
> Another ways is to use __pad2 for sqe2_flags for non-uring_cmd, and for
> uring_cmd, use its top 16 as sqe2_flags, this way does work, but it is
> just a bit ugly to use.
Agree, __pad2 is the better place, we don't want it overlapping with
add3/optval.
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC] io_uring: extend io_uring_sqe flags bits
2024-10-31 21:22 [PATCH RFC] io_uring: extend io_uring_sqe flags bits Jens Axboe
2024-11-01 2:12 ` Ming Lei
@ 2024-11-01 16:59 ` Pavel Begunkov
2024-11-01 17:05 ` Jens Axboe
1 sibling, 1 reply; 13+ messages in thread
From: Pavel Begunkov @ 2024-11-01 16:59 UTC (permalink / raw)
To: Jens Axboe, io-uring; +Cc: Ming Lei
On 10/31/24 21:22, Jens Axboe wrote:
> In hindsight everything is clearer, but it probably should've been known
> that 8 bits of ->flags would run out sooner than later. Rather than
> gobble up the last bit for a random use case, add a bit that controls
> whether or not ->personality is used as a flags2 argument. If that is
> the case, then there's a new IOSQE2_PERSONALITY flag that tells io_uring
> which personality field to read.
>
> While this isn't the prettiest, it does allow extending with 15 extra
> flags, and retains being able to use personality with any kind of
> command. The exception is uring cmd, where personality2 will overlap
> with the space set aside for SQE128. If they really need that, then that
> would have to be done via a uring cmd flag.
Interesting, I was just experimenting using the personality bits for
similar purposes I mentioned but in a different way, and I even thought
if anything it could be used to extend sqe flags though I'm not a huge
fan of that.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC] io_uring: extend io_uring_sqe flags bits
2024-11-01 16:59 ` Pavel Begunkov
@ 2024-11-01 17:05 ` Jens Axboe
0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2024-11-01 17:05 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Ming Lei
On 11/1/24 10:59 AM, Pavel Begunkov wrote:
> On 10/31/24 21:22, Jens Axboe wrote:
>> In hindsight everything is clearer, but it probably should've been known
>> that 8 bits of ->flags would run out sooner than later. Rather than
>> gobble up the last bit for a random use case, add a bit that controls
>> whether or not ->personality is used as a flags2 argument. If that is
>> the case, then there's a new IOSQE2_PERSONALITY flag that tells io_uring
>> which personality field to read.
>>
>> While this isn't the prettiest, it does allow extending with 15 extra
>> flags, and retains being able to use personality with any kind of
>> command. The exception is uring cmd, where personality2 will overlap
>> with the space set aside for SQE128. If they really need that, then that
>> would have to be done via a uring cmd flag.
>
> Interesting, I was just experimenting using the personality bits for
> similar purposes I mentioned but in a different way, and I even thought
> if anything it could be used to extend sqe flags though I'm not a huge
> fan of that.
We're going to need more SQE flags at some point. At least with the
potential to extend it with a setup flag in the future, we can grab the
last one and have that other option down the line.
I don't mind grabbing personality. Obviously it'd be better to get free
space somewhere, but there's no free real estate...
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-11-01 17:05 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-31 21:22 [PATCH RFC] io_uring: extend io_uring_sqe flags bits Jens Axboe
2024-11-01 2:12 ` Ming Lei
2024-11-01 2:42 ` Ming Lei
2024-11-01 13:59 ` Jens Axboe
2024-11-01 14:34 ` Ming Lei
2024-11-01 14:42 ` Jens Axboe
2024-11-01 15:01 ` Ming Lei
2024-11-01 15:04 ` Jens Axboe
2024-11-01 16:55 ` Pavel Begunkov
2024-11-01 16:58 ` Jens Axboe
2024-11-01 13:58 ` Jens Axboe
2024-11-01 16:59 ` Pavel Begunkov
2024-11-01 17:05 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox