* [PATCH 1/3] io_uring: Fix probe of disabled operations
2024-06-19 2:06 [PATCH 0/3] io_uring op probing fixes Gabriel Krisman Bertazi
@ 2024-06-19 2:06 ` Gabriel Krisman Bertazi
2024-06-19 2:06 ` [PATCH 2/3] io_uring: Allocate only necessary memory in io_probe Gabriel Krisman Bertazi
` (3 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-06-19 2:06 UTC (permalink / raw)
To: axboe; +Cc: io-uring, Gabriel Krisman Bertazi
io_probe checks io_issue_def->not_supported, but we never really set
that field, as we mark non-supported functions through a specific ->prep
handler. This means we end up returning IO_URING_OP_SUPPORTED, even for
disabled operations. Fix it by just checking the prep handler itself.
Fixes: 66f4af93da57 ("io_uring: add support for probing opcodes")
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
io_uring/opdef.c | 8 ++++++++
io_uring/opdef.h | 4 ++--
io_uring/register.c | 2 +-
3 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index 7d5c51fb8e6e..f3af2126fc7d 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -746,6 +746,14 @@ const char *io_uring_get_opcode(u8 opcode)
return "INVALID";
}
+bool io_uring_op_supported(u8 opcode)
+{
+ if (opcode < IORING_OP_LAST &&
+ io_issue_defs[opcode].prep != io_eopnotsupp_prep)
+ return true;
+ return false;
+}
+
void __init io_uring_optable_init(void)
{
int i;
diff --git a/io_uring/opdef.h b/io_uring/opdef.h
index 7ee6f5aa90aa..14456436ff74 100644
--- a/io_uring/opdef.h
+++ b/io_uring/opdef.h
@@ -17,8 +17,6 @@ struct io_issue_def {
unsigned poll_exclusive : 1;
/* op supports buffer selection */
unsigned buffer_select : 1;
- /* opcode is not supported by this kernel */
- unsigned not_supported : 1;
/* skip auditing */
unsigned audit_skip : 1;
/* supports ioprio */
@@ -47,5 +45,7 @@ struct io_cold_def {
extern const struct io_issue_def io_issue_defs[];
extern const struct io_cold_def io_cold_defs[];
+bool io_uring_op_supported(u8 opcode);
+
void io_uring_optable_init(void);
#endif
diff --git a/io_uring/register.c b/io_uring/register.c
index 50e9cbf85f7d..75f8e85cf0b0 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -59,7 +59,7 @@ static __cold int io_probe(struct io_ring_ctx *ctx, void __user *arg,
for (i = 0; i < nr_args; i++) {
p->ops[i].op = i;
- if (!io_issue_defs[i].not_supported)
+ if (io_uring_op_supported(i))
p->ops[i].flags = IO_URING_OP_SUPPORTED;
}
p->ops_len = i;
--
2.45.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/3] io_uring: Allocate only necessary memory in io_probe
2024-06-19 2:06 [PATCH 0/3] io_uring op probing fixes Gabriel Krisman Bertazi
2024-06-19 2:06 ` [PATCH 1/3] io_uring: Fix probe of disabled operations Gabriel Krisman Bertazi
@ 2024-06-19 2:06 ` Gabriel Krisman Bertazi
2024-06-19 2:06 ` [PATCH 3/3] io_uring: Don't read userspace data " Gabriel Krisman Bertazi
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-06-19 2:06 UTC (permalink / raw)
To: axboe; +Cc: io-uring, Gabriel Krisman Bertazi
We write at most IORING_OP_LAST entries in the probe buffer, so we don't
need to allocate temporary space for more than that. As a side effect,
we no longer can overflow "size".
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
io_uring/register.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/io_uring/register.c b/io_uring/register.c
index 75f8e85cf0b0..8409fc80c1cb 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -39,9 +39,10 @@ static __cold int io_probe(struct io_ring_ctx *ctx, void __user *arg,
size_t size;
int i, ret;
+ if (nr_args > IORING_OP_LAST)
+ nr_args = IORING_OP_LAST;
+
size = struct_size(p, ops, nr_args);
- if (size == SIZE_MAX)
- return -EOVERFLOW;
p = kzalloc(size, GFP_KERNEL);
if (!p)
return -ENOMEM;
@@ -54,8 +55,6 @@ static __cold int io_probe(struct io_ring_ctx *ctx, void __user *arg,
goto out;
p->last_op = IORING_OP_LAST - 1;
- if (nr_args > IORING_OP_LAST)
- nr_args = IORING_OP_LAST;
for (i = 0; i < nr_args; i++) {
p->ops[i].op = i;
--
2.45.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/3] io_uring: Don't read userspace data in io_probe
2024-06-19 2:06 [PATCH 0/3] io_uring op probing fixes Gabriel Krisman Bertazi
2024-06-19 2:06 ` [PATCH 1/3] io_uring: Fix probe of disabled operations Gabriel Krisman Bertazi
2024-06-19 2:06 ` [PATCH 2/3] io_uring: Allocate only necessary memory in io_probe Gabriel Krisman Bertazi
@ 2024-06-19 2:06 ` Gabriel Krisman Bertazi
2024-06-19 13:44 ` Jens Axboe
2024-06-19 13:42 ` [PATCH 0/3] io_uring op probing fixes Jens Axboe
2024-06-19 14:58 ` (subset) " Jens Axboe
4 siblings, 1 reply; 9+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-06-19 2:06 UTC (permalink / raw)
To: axboe; +Cc: io-uring, Gabriel Krisman Bertazi
We don't need to read the userspace buffer, and the kernel side is
expected to write over it anyway. Perhaps this was meant to allow
expansion of the interface for future parameters? If we ever need to do
it, perhaps it should be done as a new io_uring opcode.
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
io_uring/register.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/io_uring/register.c b/io_uring/register.c
index 8409fc80c1cb..a60eba22141a 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -37,7 +37,7 @@ static __cold int io_probe(struct io_ring_ctx *ctx, void __user *arg,
{
struct io_uring_probe *p;
size_t size;
- int i, ret;
+ int i, ret = 0;
if (nr_args > IORING_OP_LAST)
nr_args = IORING_OP_LAST;
@@ -47,13 +47,6 @@ static __cold int io_probe(struct io_ring_ctx *ctx, void __user *arg,
if (!p)
return -ENOMEM;
- ret = -EFAULT;
- if (copy_from_user(p, arg, size))
- goto out;
- ret = -EINVAL;
- if (memchr_inv(p, 0, size))
- goto out;
-
p->last_op = IORING_OP_LAST - 1;
for (i = 0; i < nr_args; i++) {
@@ -63,10 +56,8 @@ static __cold int io_probe(struct io_ring_ctx *ctx, void __user *arg,
}
p->ops_len = i;
- ret = 0;
if (copy_to_user(arg, p, size))
ret = -EFAULT;
-out:
kfree(p);
return ret;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] io_uring: Don't read userspace data in io_probe
2024-06-19 2:06 ` [PATCH 3/3] io_uring: Don't read userspace data " Gabriel Krisman Bertazi
@ 2024-06-19 13:44 ` Jens Axboe
2024-06-19 14:55 ` Gabriel Krisman Bertazi
0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2024-06-19 13:44 UTC (permalink / raw)
To: Gabriel Krisman Bertazi; +Cc: io-uring
On 6/18/24 8:06 PM, Gabriel Krisman Bertazi wrote:
> We don't need to read the userspace buffer, and the kernel side is
> expected to write over it anyway. Perhaps this was meant to allow
> expansion of the interface for future parameters? If we ever need to do
> it, perhaps it should be done as a new io_uring opcode.
Right, it's checked so that we could use it for input values in the
future. By ensuring that userspace must zero it, then we could add input
values and flags in the future.
Is there a good reason to make this separate change? If not, I'd say
drop it and we can always discuss when there's an actual need to do so.
At least we have the option of passing in some information with the
current code, in a backwards compatible fashion.
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] io_uring: Don't read userspace data in io_probe
2024-06-19 13:44 ` Jens Axboe
@ 2024-06-19 14:55 ` Gabriel Krisman Bertazi
2024-06-19 14:57 ` Jens Axboe
0 siblings, 1 reply; 9+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-06-19 14:55 UTC (permalink / raw)
To: Jens Axboe; +Cc: io-uring
Jens Axboe <[email protected]> writes:
> On 6/18/24 8:06 PM, Gabriel Krisman Bertazi wrote:
>> We don't need to read the userspace buffer, and the kernel side is
>> expected to write over it anyway. Perhaps this was meant to allow
>> expansion of the interface for future parameters? If we ever need to do
>> it, perhaps it should be done as a new io_uring opcode.
>
> Right, it's checked so that we could use it for input values in the
> future. By ensuring that userspace must zero it, then we could add input
> values and flags in the future.
>
> Is there a good reason to make this separate change? If not, I'd say
> drop it and we can always discuss when there's an actual need to do so.
> At least we have the option of passing in some information with the
> current code, in a backwards compatible fashion.
There is no reason other than it is unused. I'm fine with dropping it.
I'll wait for feedback on the other patches and, if we need a new
iteration, I'll skip this one.
Thanks!
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] io_uring: Don't read userspace data in io_probe
2024-06-19 14:55 ` Gabriel Krisman Bertazi
@ 2024-06-19 14:57 ` Jens Axboe
0 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2024-06-19 14:57 UTC (permalink / raw)
To: Gabriel Krisman Bertazi; +Cc: io-uring
On 6/19/24 8:55 AM, Gabriel Krisman Bertazi wrote:
> Jens Axboe <[email protected]> writes:
>
>> On 6/18/24 8:06 PM, Gabriel Krisman Bertazi wrote:
>>> We don't need to read the userspace buffer, and the kernel side is
>>> expected to write over it anyway. Perhaps this was meant to allow
>>> expansion of the interface for future parameters? If we ever need to do
>>> it, perhaps it should be done as a new io_uring opcode.
>>
>> Right, it's checked so that we could use it for input values in the
>> future. By ensuring that userspace must zero it, then we could add input
>> values and flags in the future.
>>
>> Is there a good reason to make this separate change? If not, I'd say
>> drop it and we can always discuss when there's an actual need to do so.
>> At least we have the option of passing in some information with the
>> current code, in a backwards compatible fashion.
>
> There is no reason other than it is unused. I'm fine with dropping it.
>
> I'll wait for feedback on the other patches and, if we need a new
> iteration, I'll skip this one.
I think the rest look fine. The unsupported part was mostly a thing for
backports, did use it myself internally once for the meta kernel. But I
do think we should just kill it, so fine with doing that.
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] io_uring op probing fixes
2024-06-19 2:06 [PATCH 0/3] io_uring op probing fixes Gabriel Krisman Bertazi
` (2 preceding siblings ...)
2024-06-19 2:06 ` [PATCH 3/3] io_uring: Don't read userspace data " Gabriel Krisman Bertazi
@ 2024-06-19 13:42 ` Jens Axboe
2024-06-19 14:58 ` (subset) " Jens Axboe
4 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2024-06-19 13:42 UTC (permalink / raw)
To: Gabriel Krisman Bertazi; +Cc: io-uring
On 6/18/24 8:06 PM, Gabriel Krisman Bertazi wrote:
> Hi Jens,
>
> I didn't know this interface existed until today, when I started looking
> at creating exactly this feature. My goal is to know which operation
> supports which modes (registered buffers, bundled writes, iopoll). Now
> that I know it exists, I might just expose the extra information through
> io_uring_probe_op->flags, instead of adding a new operation. What do you
> think?
Yes! I've always wanted to add more to the probe side, for example
which flags a given op may set in cqes as well, and what op specific
flags it takes as well. Ideally something that is then also used when
doing the prep in the op itself, as not to maintain this information
in multiple spots.
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: (subset) [PATCH 0/3] io_uring op probing fixes
2024-06-19 2:06 [PATCH 0/3] io_uring op probing fixes Gabriel Krisman Bertazi
` (3 preceding siblings ...)
2024-06-19 13:42 ` [PATCH 0/3] io_uring op probing fixes Jens Axboe
@ 2024-06-19 14:58 ` Jens Axboe
4 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2024-06-19 14:58 UTC (permalink / raw)
To: Gabriel Krisman Bertazi; +Cc: io-uring
On Tue, 18 Jun 2024 22:06:17 -0400, Gabriel Krisman Bertazi wrote:
> I didn't know this interface existed until today, when I started looking
> at creating exactly this feature. My goal is to know which operation
> supports which modes (registered buffers, bundled writes, iopoll). Now
> that I know it exists, I might just expose the extra information through
> io_uring_probe_op->flags, instead of adding a new operation. What do you
> think?
>
> [...]
Applied, thanks!
[1/3] io_uring: Fix probe of disabled operations
commit: 3e05b222382ec67dce7358d50b6006e91d028d8b
[2/3] io_uring: Allocate only necessary memory in io_probe
commit: 6bc9199d0c84f5cd72922223231c7708698059a2
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread