Re: [RFC PATCH] io_uring: add support for IORING_OP_IOCTL

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Jens Axboe <[email protected]>
To: Pavel Begunkov <[email protected]>, Jann Horn <[email protected]>
Cc: io-uring <[email protected]>,
	kernel list <[email protected]>
Subject: Re: [RFC PATCH] io_uring: add support for IORING_OP_IOCTL
Date: Sat, 14 Dec 2019 11:52:15 -0700	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 12/14/19 10:56 AM, Pavel Begunkov wrote:
> 
> On 14/12/2019 20:12, Jann Horn wrote:
>> On Sat, Dec 14, 2019 at 4:30 PM Pavel Begunkov <[email protected]> wrote:
>>> This works almost like ioctl(2), except it doesn't support a bunch of
>>> common opcodes, (e.g. FIOCLEX and FIBMAP, see ioctl.c), and goes
>>> straight to a device specific implementation.
>>>
>>> The case in mind is dma-buf, drm and other ioctl-centric interfaces.
>>>
>>> Not-yet Signed-off-by: Pavel Begunkov <[email protected]>
>>> ---
>>>
>>> It clearly needs some testing first, though works fine with dma-buf,
>>> but I'd like to discuss whether the use cases are convincing enough,
>>> and is it ok to desert some ioctl opcodes. For the last point it's
>>> fairly easy to add, maybe except three requiring fd (e.g. FIOCLEX)
>>>
>>> P.S. Probably, it won't benefit enough to consider using io_uring
>>> in drm/mesa, but anyway.
>> [...]
>>> +static int io_ioctl(struct io_kiocb *req,
>>> +                   struct io_kiocb **nxt, bool force_nonblock)
>>> +{
>>> +       const struct io_uring_sqe *sqe = req->sqe;
>>> +       unsigned int cmd = READ_ONCE(sqe->ioctl_cmd);
>>> +       unsigned long arg = READ_ONCE(sqe->ioctl_arg);
>>> +       int ret;
>>> +
>>> +       if (!req->file)
>>> +               return -EBADF;
>>> +       if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>> +               return -EINVAL;
>>> +       if (unlikely(sqe->ioprio || sqe->addr || sqe->buf_index
>>> +               || sqe->rw_flags))
>>> +               return -EINVAL;
>>> +       if (force_nonblock)
>>> +               return -EAGAIN;
>>> +
>>> +       ret = security_file_ioctl(req->file, cmd, arg);
>>> +       if (!ret)
>>> +               ret = (int)vfs_ioctl(req->file, cmd, arg);
>>
>> This isn't going to work. For several of the syscalls that were added,
>> special care had to be taken to avoid bugs - like for RECVMSG, for the
>> upcoming OPEN/CLOSE stuff, and so on.
>>
>> And in principle, ioctls handlers can do pretty much all of the things
>> syscalls can do, and more. They can look at the caller's PID, they can
>> open and close (well, technically that's slightly unsafe, but IIRC
>> autofs does it anyway) things in the file descriptor table, they can
>> give another process access to the calling process in some way, and so
>> on. If you just allow calling arbitrary ioctls through io_uring, you
>> will certainly get bugs, and probably security bugs, too.
>>
>> Therefore, I would prefer to see this not happen at all; and if you do
>> have a usecase where you think the complexity is worth it, then I
>> think you'll have to add new infrastructure that allows each
>> file_operations instance to opt in to having specific ioctls called
>> via this mechanism, or something like that, and ensure that each of
>> the exposed ioctls only performs operations that are safe from uring
>> worker context.
> 
> Sounds like hell of a problem. Thanks for sorting this out!

While the ioctl approach is tempting, for the use cases where it makes
sense, I think we should just add a ioctl type opcode and have the
sub-opcode be somewhere else in the sqe. Because I do think there's
a large opportunity to expose a fast API that works with ioctl like
mechanisms. If we have

IORING_OP_IOCTL

and set aside an sqe field for the per-driver (or per-user) and
add a file_operations method for sending these to the fd, then we'll
have a much better (and faster + async) API than ioctls. We could
add fops->uring_issue() or something, and that passes the io_kiocb.
When it completes, the ->io_uring_issue() posts a completion by
calling io_uring_complete_req() or something.

Outside of the issues that Jann outlined, ioctls are also such a
decade old mess that we have to do the -EAGAIN punt for all of them
like you did in your patch. If it's opt-in like ->uring_issue(), then
care could be taken to do this right and just have it return -EAGAIN
if it does need async context.

ret = fops->uring_issue(req, force_nonblock);
if (ret == -EAGAIN) {
	... usual punt ...
}

I think working on this would be great, and some of the more performance
sensitive ioctl cases should flock to it.

-- 
Jens Axboe

next prev parent reply	other threads:[~2019-12-14 18:52 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-14 15:29 [RFC PATCH] io_uring: add support for IORING_OP_IOCTL Pavel Begunkov
2019-12-14 17:12 ` Jann Horn
2019-12-14 17:56   ` Pavel Begunkov
2019-12-14 18:52     ` Jens Axboe [this message]
2019-12-15 15:40       ` Pavel Begunkov
2020-01-08 13:26       ` Stefan Metzmacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox