public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Christoph Hellwig <[email protected]>
Cc: [email protected], [email protected], [email protected],
	[email protected], [email protected]
Subject: Re: [PATCH 1/8] io_uring: split up io_uring_sqe into hdr + main
Date: Thu, 18 Mar 2021 12:40:25 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 3/17/21 11:34 PM, Christoph Hellwig wrote:
>> @@ -14,11 +14,22 @@
>>  /*
>>   * IO submission data structure (Submission Queue Entry)
>>   */
>> +struct io_uring_sqe_hdr {
>> +	__u8	opcode;		/* type of operation for this sqe */
>> +	__u8	flags;		/* IOSQE_ flags */
>> +	__u16	ioprio;		/* ioprio for the request */
>> +	__s32	fd;		/* file descriptor to do IO on */
>> +};
>> +
>>  struct io_uring_sqe {
>> +#ifdef __KERNEL__
>> +	struct io_uring_sqe_hdr	hdr;
>> +#else
>>  	__u8	opcode;		/* type of operation for this sqe */
>>  	__u8	flags;		/* IOSQE_ flags */
>>  	__u16	ioprio;		/* ioprio for the request */
>>  	__s32	fd;		/* file descriptor to do IO on */
>> +#endif
>>  	union {
>>  		__u64	off;	/* offset into file */
>>  		__u64	addr2;
> 
> Please don't do that ifdef __KERNEL__ mess.  We never guaranteed
> userspace API compatbility, just ABI compatibility.

Right, but I'm the one that has to deal with the fallout. For the
in-kernel one I can skip the __KERNEL__ part, and the layout is the
same anyway.

> But we really do have a biger problem here, and that is ioprio is
> a field that is specific to the read and write commands and thus
> should not be in the generic header.  On the other hand the
> personality is.
> 
> So I'm not sure trying to retrofit this even makes all that much sense.
> 
> Maybe we should just define io_uring_sqe_hdr the way it makes
> sense:
> 
> struct io_uring_sqe_hdr {
> 	__u8	opcode;	
> 	__u8	flags;
> 	__u16	personality;
> 	__s32	fd;
> 	__u64	user_data;
> };
> 
> and use that for all new commands going forward while marking the
> old ones as legacy.
> 
> io_uring_cmd_sqe would then be:
> 
> struct io_uring_cmd_sqe {
>         struct io_uring_sqe_hdr	hdr;
> 	__u33			ioc;
> 	__u32 			len;
> 	__u8			data[40];
> };
> 
> for example.  Note the 32-bit opcode just like ioctl to avoid
> getting into too much trouble due to collisions.

I was debating that with myself too, it's essentially making
the existing io_uring_sqe into io_uring_sqe_v1 and then making a new
v2 one. That would impact _all_ commands, and we'd need some trickery
to have newly compiled stuff use v2 and have existing applications
continue to work with the v1 format. That's very different from having
a single (or new) opcodes use a v2 format, effectively.

Looking into the feasibility of this. But if that is done, there are
other things that need to be factored in, as I'm not at all interested
in having a v3 down the line as well. And I'd need to be able to do this
seamlessly, both from an application point of view, and a performance
point of view (no stupid conversions inline).

Things that come up when something like this is on the table

- Should flags be extended? We're almost out... It hasn't been an
  issue so far, but seems a bit silly to go v2 and not at least leave
  a bit of room there. But obviously comes at a cost of losing eg 8
  bits somewhere else.

- Is u8 enough for the opcode? Again, we're nowhere near the limits
  here, but eventually multiplexing might be necessary.

That's just off the top of my head, probably other things to consider
too.

-- 
Jens Axboe


  reply	other threads:[~2021-03-18 18:41 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-17 22:10 [PATCHSET v4 0/8] io_uring passthrough support Jens Axboe
2021-03-17 22:10 ` [PATCH 1/8] io_uring: split up io_uring_sqe into hdr + main Jens Axboe
2021-03-18  5:34   ` Christoph Hellwig
2021-03-18 18:40     ` Jens Axboe [this message]
2021-03-19 11:20       ` Stefan Metzmacher
2021-03-19 13:29       ` Christoph Hellwig
2022-02-24 22:34       ` Luis Chamberlain
2021-03-17 22:10 ` [PATCH 2/8] io_uring: add infrastructure around io_uring_cmd_sqe issue type Jens Axboe
2021-03-17 22:10 ` [PATCH 3/8] fs: add file_operations->uring_cmd() Jens Axboe
2021-03-18  5:38   ` Christoph Hellwig
2021-03-18 18:41     ` Jens Axboe
2022-02-17  1:27     ` Luis Chamberlain
2022-02-17  1:25   ` Luis Chamberlain
2021-03-17 22:10 ` [PATCH 4/8] io_uring: add support for IORING_OP_URING_CMD Jens Axboe
2021-03-18  5:42   ` Christoph Hellwig
2021-03-18 18:43     ` Jens Axboe
2021-03-17 22:10 ` [PATCH 5/8] block: wire up support for file_operations->uring_cmd() Jens Axboe
2021-03-18  5:44   ` Christoph Hellwig
2021-03-17 22:10 ` [PATCH 6/8] block: add example ioctl Jens Axboe
2021-03-18  5:45   ` Christoph Hellwig
2021-03-18 12:43     ` Pavel Begunkov
2021-03-18 18:44     ` Jens Axboe
2021-03-17 22:10 ` [PATCH 7/8] net: wire up support for file_operations->uring_cmd() Jens Axboe
2022-02-17  1:03   ` Luis Chamberlain
2021-03-17 22:10 ` [PATCH 8/8] net: add example SOCKET_URING_OP_SIOCINQ/SOCKET_URING_OP_SIOCOUTQ Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox