From: Jens Axboe <[email protected]>
To: Pavel Begunkov <[email protected]>,
Hao Xu <[email protected]>
Cc: [email protected], [email protected],
Jiufei Xue <[email protected]>,
Joseph Qi <[email protected]>
Subject: Re: [PATCH v3 RESEND] io_uring: add timeout support for io_uring_enter()
Date: Wed, 4 Nov 2020 13:50:06 -0700 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 11/4/20 1:28 PM, Jens Axboe wrote:
> On 11/4/20 1:16 PM, Pavel Begunkov wrote:
>> On 04/11/2020 19:34, Jens Axboe wrote:
>>> On 11/4/20 12:27 PM, Pavel Begunkov wrote:
>>>> On 04/11/2020 18:32, Jens Axboe wrote:
>>>>> On 11/4/20 10:50 AM, Jens Axboe wrote:
>>>>>> +struct io_uring_getevents_arg {
>>>>>> + sigset_t *sigmask;
>>>>>> + struct __kernel_timespec *ts;
>>>>>> +};
>>>>>> +
>>>>>
>>>>> I missed that this is still not right, I did bring it up in your last
>>>>> posting though - you can't have pointers as a user API, since the size
>>>>> of the pointer will vary depending on whether this is a 32-bit or 64-bit
>>>>> arch (or 32-bit app running on 64-bit kernel).
>>>>
>>>> Maybe it would be better
>>>>
>>>> 1) to kill this extra indirection?
>>>>
>>>> struct io_uring_getevents_arg {
>>>> - sigset_t *sigmask;
>>>> - struct __kernel_timespec *ts;
>>>> + sigset_t sigmask;
>>>> + struct __kernel_timespec ts;
>>>> };
>>>>
>>>> then,
>>>>
>>>> sigset_t *sig = (...)arg;
>>>> __kernel_timespec* ts = (...)(arg + offset);
>>>
>>> But then it's kind of hard to know which, if any, of them are set... I
>>> did think about this, and any solution seemed worse than just having the
>>> extra indirection.
>>
>> struct io_uring_getevents_arg {
>> sigset_t sigmask;
>> u32 mask;
>> struct __kernel_timespec ts;
>> };
>>
>> if size > sizeof(sigmask), then use mask to determine that.
>> Though, not sure how horrid the rest of the code would be.
>
> I'm not saying it's not possible, just that I think the end result would
> be worse in terms of both kernel code and how the user applications (or
> liburing) would need to use it. I'd rather sacrifice an extra copy for
> something that's straight forward (and logical) to use, rather than
> needing weird setups or hoops to jump through. And this mask vs
> sizeof(mask) thing seems pretty horrendeous to me :-)
>
>>> Yeah, not doing the extra indirection would save a copy, but don't think
>>> it's worth it for this path.
>>
>> I much more don't like branching like IORING_ENTER_GETEVENTS_TIMEOUT,
>> from conceptual point. I may try it out to see how it looks like while
>> it's still for-next.
>
> One thing I think we should change is the name,
> IORING_ENTER_GETEVENTS_TIMEOUT will quickly be a bad name if we end up
> adding just one more thing to the struct. Would be better to call it
> IORING_ENTER_EXTRA_DATA or something, meaning that the sigmask pointer
> is a pointer to the aux data instead of a sigmask. Better name
> suggestions welcome...
I'd be inclined to do something like the below:
- Rename it to IORING_ENTER_SIG_IS_DATA, which I think is more future
proof and explains it too. Ditto for the feature flag.
- Move the checking and getting to under GETEVENTS. This removes a weird
case where you'd get EINVAL if IORING_ENTER_SIG_IS_DATA is set but
IORING_ENTER_GETEVENTS isn't. We didn't previously fail a
non-getevents call if eg sigmask was set, so don't think we should add
this case. Only downside here is that if we fail the validation, we'll
only submit and return the submit count. Should be fine, as we'd end
up with another enter and return the error there.
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8439cda54e21..694a87807ea1 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -9146,6 +9146,29 @@ static void io_sqpoll_wait_sq(struct io_ring_ctx *ctx)
finish_wait(&ctx->sqo_sq_wait, &wait);
}
+static int io_get_sig_is_data(unsigned flags, const void __user *argp,
+ struct __kernel_timespec __user **ts,
+ const sigset_t __user **sig, size_t *sigsz)
+{
+ struct io_uring_getevents_arg arg;
+
+ /* deal with IORING_ENTER_SIG_IS_DATA */
+ if (flags & IORING_ENTER_SIG_IS_DATA) {
+ if (*sigsz != sizeof(arg))
+ return -EINVAL;
+ if (copy_from_user(&arg, argp, sizeof(arg)))
+ return -EFAULT;
+ *sig = u64_to_user_ptr(arg.sigmask);
+ *sigsz = arg.sigmask_sz;
+ *ts = u64_to_user_ptr(arg.ts);
+ } else {
+ *sig = (const sigset_t __user *) argp;
+ *ts = NULL;
+ }
+
+ return 0;
+}
+
SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
u32, min_complete, u32, flags, const void __user *, argp,
size_t, sigsz)
@@ -9154,32 +9177,13 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
long ret = -EBADF;
int submitted = 0;
struct fd f;
- const sigset_t __user *sig;
- struct __kernel_timespec __user *ts;
- struct io_uring_getevents_arg arg;
io_run_task_work();
if (flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP |
- IORING_ENTER_SQ_WAIT | IORING_ENTER_GETEVENTS_TIMEOUT))
+ IORING_ENTER_SQ_WAIT | IORING_ENTER_SIG_IS_DATA))
return -EINVAL;
- /* deal with IORING_ENTER_GETEVENTS_TIMEOUT */
- if (flags & IORING_ENTER_GETEVENTS_TIMEOUT) {
- if (!(flags & IORING_ENTER_GETEVENTS))
- return -EINVAL;
- if (sigsz != sizeof(arg))
- return -EINVAL;
- if (copy_from_user(&arg, argp, sizeof(arg)))
- return -EFAULT;
- sig = u64_to_user_ptr(arg.sigmask);
- sigsz = arg.sigmask_sz;
- ts = u64_to_user_ptr(arg.ts);
- } else {
- sig = (const sigset_t __user *)argp;
- ts = NULL;
- }
-
f = fdget(fd);
if (!f.file)
return -EBADF;
@@ -9223,6 +9227,13 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
goto out;
}
if (flags & IORING_ENTER_GETEVENTS) {
+ const sigset_t __user *sig;
+ struct __kernel_timespec __user *ts;
+
+ ret = io_get_sig_is_data(flags, argp, &ts, &sig, &sigsz);
+ if (unlikely(ret))
+ goto out;
+
min_complete = min(min_complete, ctx->cq_entries);
/*
@@ -9598,7 +9609,7 @@ static int io_uring_create(unsigned entries, struct io_uring_params *p,
IORING_FEAT_SUBMIT_STABLE | IORING_FEAT_RW_CUR_POS |
IORING_FEAT_CUR_PERSONALITY | IORING_FEAT_FAST_POLL |
IORING_FEAT_POLL_32BITS | IORING_FEAT_SQPOLL_NONFIXED |
- IORING_FEAT_GETEVENTS_TIMEOUT;
+ IORING_FEAT_SIG_IS_DATA;
if (copy_to_user(params, p, sizeof(*p))) {
ret = -EFAULT;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 37bea07c12f2..0fa095347fb6 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -317,7 +317,7 @@ asmlinkage long sys_io_uring_setup(u32 entries,
struct io_uring_params __user *p);
asmlinkage long sys_io_uring_enter(unsigned int fd, u32 to_submit,
u32 min_complete, u32 flags,
- const sigset_t __user *sig, size_t sigsz);
+ const void __user *argp, size_t sigsz);
asmlinkage long sys_io_uring_register(unsigned int fd, unsigned int op,
void __user *arg, unsigned int nr_args);
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 1a92985a9ee8..4832addccfa6 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -231,7 +231,7 @@ struct io_cqring_offsets {
#define IORING_ENTER_GETEVENTS (1U << 0)
#define IORING_ENTER_SQ_WAKEUP (1U << 1)
#define IORING_ENTER_SQ_WAIT (1U << 2)
-#define IORING_ENTER_GETEVENTS_TIMEOUT (1U << 3)
+#define IORING_ENTER_SIG_IS_DATA (1U << 3)
/*
* Passed in for io_uring_setup(2). Copied back with updated info on success
@@ -260,7 +260,7 @@ struct io_uring_params {
#define IORING_FEAT_FAST_POLL (1U << 5)
#define IORING_FEAT_POLL_32BITS (1U << 6)
#define IORING_FEAT_SQPOLL_NONFIXED (1U << 7)
-#define IORING_FEAT_GETEVENTS_TIMEOUT (1U << 8)
+#define IORING_FEAT_SIG_IS_DATA (1U << 8)
/*
* io_uring_register(2) opcodes and arguments
--
Jens Axboe
next prev parent reply other threads:[~2020-11-04 20:50 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-02 8:50 [PATCH v3] io_uring: add timeout support for io_uring_enter() Hao Xu
2020-11-03 2:54 ` [PATCH v3 RESEND] " Hao Xu
2020-11-04 17:50 ` Jens Axboe
2020-11-04 18:32 ` Jens Axboe
2020-11-04 19:06 ` Jens Axboe
2020-11-04 19:27 ` Pavel Begunkov
2020-11-04 19:34 ` Jens Axboe
2020-11-04 20:16 ` Pavel Begunkov
2020-11-04 20:28 ` Jens Axboe
2020-11-04 20:50 ` Jens Axboe [this message]
2020-11-04 21:20 ` Pavel Begunkov
2020-11-04 21:27 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox