public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Stefan Metzmacher <[email protected]>,
	Christian Brauner <[email protected]>,
	Jann Horn <[email protected]>
Cc: io-uring <[email protected]>,
	Linux API Mailing List <[email protected]>,
	Pavel Begunkov <[email protected]>
Subject: Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
Date: Thu, 30 Jan 2020 08:34:26 -0700	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 1/30/20 7:47 AM, Stefan Metzmacher wrote:
> Am 30.01.20 um 15:11 schrieb Jens Axboe:
>> On 1/30/20 3:26 AM, Christian Brauner wrote:
>>> On Thu, Jan 30, 2020 at 11:11:58AM +0100, Jann Horn wrote:
>>>> On Thu, Jan 30, 2020 at 2:08 AM Jens Axboe <[email protected]> wrote:
>>>>> On 1/29/20 10:34 AM, Jens Axboe wrote:
>>>>>> On 1/29/20 7:59 AM, Jann Horn wrote:
>>>>>>> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <[email protected]> wrote:
>>>>>>>> On 1/28/20 11:04 AM, Jens Axboe wrote:
>>>>>>>>> On 1/28/20 10:19 AM, Jens Axboe wrote:
>>>>>>> [...]
>>>>>>>>>> #1 adds support for registering the personality of the invoking task,
>>>>>>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
>>>>>>>>>> just having one link, it doesn't support a chain of them.
>>>>>>> [...]
>>>>>>>> I didn't like it becoming a bit too complicated, both in terms of
>>>>>>>> implementation and use. And the fact that we'd have to jump through
>>>>>>>> hoops to make this work for a full chain.
>>>>>>>>
>>>>>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
>>>>>>>> This makes it way easier to use. Same branch:
>>>>>>>>
>>>>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>>>>>>>
>>>>>>>> I'd feel much better with this variant for 5.6.
>>>>>>>
>>>>>>> Some general feedback from an inspectability/debuggability perspective:
>>>>>>>
>>>>>>> At some point, it might be nice if you could add a .show_fdinfo
>>>>>>> handler to the io_uring_fops that makes it possible to get a rough
>>>>>>> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd,
>>>>>>> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be
>>>>>>> helpful for debugging to be able to see information about the fixed
>>>>>>> files and buffers that have been registered. Same for the
>>>>>>> personalities; that information might also be useful when someone is
>>>>>>> trying to figure out what privileges a running process actually has.
>>>>>>
>>>>>> Agree, that would be a very useful addition. I'll take a look at it.
>>>>>
>>>>> Jann, how much info are you looking for? Here's a rough start, just
>>>>> shows the number of registered files and buffers, and lists the
>>>>> personalities registered. We could also dump the buffer info for
>>>>> each of them, and ditto for the files. Not sure how much verbosity
>>>>> is acceptable in fdinfo?
>>>>
>>>> At the moment, I personally am just interested in this from the
>>>> perspective of being able to audit the state of personalities, to make
>>>> important information about the security state of processes visible.
>>>>
>>>> Good point about verbosity in fdinfo - I'm not sure about that myself either.
>>>>
>>>>> Here's the test app for personality:
>>>>
>>>> Oh, that was quick...
>>>>
>>>>> # cat 3
>>>>> pos:    0
>>>>> flags:  02000002
>>>>> mnt_id: 14
>>>>> user-files: 0
>>>>> user-bufs: 0
>>>>> personalities:
>>>>>             1: uid=0/gid=0
>>>>>
>>>>>
>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>> index c5ca84a305d3..0b2c7d800297 100644
>>>>> --- a/fs/io_uring.c
>>>>> +++ b/fs/io_uring.c
>>>>> @@ -6511,6 +6505,45 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
>>>>>         return submitted ? submitted : ret;
>>>>>  }
>>>>>
>>>>> +struct ring_show_idr {
>>>>> +       struct io_ring_ctx *ctx;
>>>>> +       struct seq_file *m;
>>>>> +};
>>>>> +
>>>>> +static int io_uring_show_cred(int id, void *p, void *data)
>>>>> +{
>>>>> +       struct ring_show_idr *r = data;
>>>>> +       const struct cred *cred = p;
>>>>> +
>>>>> +       seq_printf(r->m, "\t%5d: uid=%u/gid=%u\n", id, cred->uid.val,
>>>>> +                                               cred->gid.val);
>>>>
>>>> As Stefan said, the ->uid and ->gid aren't very useful, since when a
>>>> process switches UIDs for accessing things in the filesystem, it
>>>> probably only changes its EUID and FSUID, not its RUID.
>>>> I think what's particularly relevant for uring would be the ->fsuid
>>>> and the ->fsgid along with ->cap_effective; and perhaps for some
>>>> operations also the ->euid and ->egid. The real UID/GID aren't really
>>>> relevant when performing normal filesystem operations and such.
>>>
>>> This should probably just use the same format that is found in
>>> /proc/<pid>/status to make it easy for tools to use the same parsing
>>> logic and for the sake of consistency. We've adapted the same format for
>>> pidfds. So that would mean:
>>>
>>> Uid:	1000	1000	1000	1000
>>> Gid:	1000	1000	1000	1000
>>>
>>> Which would be: Real, effective, saved set, and filesystem {G,U}IDs
>>>
>>> And CapEff in /proc/<pid>/status has the format:
>>> CapEff:	0000000000000000
>>
>> I agree, consistency is good. I've added this, and also changed the
>> naming to be CamelCase, which is seems like most of them are. Now it
>> looks like this:
>>
>> pos:	0
>> flags:	02000002
>> mnt_id:	14
>> UserFiles:     0
>> UserBufs:     0
>> Personalities:
>>     1
>> 	Uid:	0		0		0		0
>> 	Gid:	0		0		0		0
>> 	Groups:	0
>> 	CapEff:	0000003fffffffff
>>
>> for a single personality registered (root). I have to indent it an extra
>> tab to display each personality.
> 
> That looks good.
> 
> Maybe also print some details of struct io_ring_ctx,
> flags and the ring sizes, ctx->cred.
> 
> Maybe details for io_wq and sqo_thread.

Yeah, I agree that we should probably just add a ton more, there's
plenty of information that would be useful. But let's start simple - I
forgot to CC you on the patch I just sent out, but it's basically the
above cleaned up. We dump information that's registered with the ring,
that's the theme right now. I'd be happy to add some of the state
information as well, we should do that as a separate patch.

> Maybe pending requests?
> I'm not sure about how io_wq threads work in detail.
> Is it possible that a large number of blocking request
> (against an external harddisk with disconnected cable)
> to block other blocking requests to a working ssd?
> It would be good to diagnose such situations from
> the output.

io_uring doesn't necessarily track pending requests, only if it has to.
For bounded request time IO, like the above, it'll depend on the
concurrency level. If you setup the ring with eg N entries, that'll be
at most N pending bounded requests. If all of those are blocked because
the disk isn't responding, yes, that could happen. At least until the
timeout happens.

> How is this supposed to be ABI-wise? Is it possible to change
> the output in later kernel versions?

We should always be able to append to the file, I'd just prefer if we
don't change the format of lines that have already been added.

-- 
Jens Axboe


  reply	other threads:[~2020-01-30 15:34 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-28 10:18 IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? Stefan Metzmacher
2020-01-28 16:10 ` Jens Axboe
2020-01-28 16:17   ` Stefan Metzmacher
2020-01-28 16:19     ` Jens Axboe
2020-01-28 17:19       ` Jens Axboe
2020-01-28 18:04         ` Jens Axboe
2020-01-28 19:42           ` Jens Axboe
2020-01-28 20:16             ` Pavel Begunkov
2020-01-28 20:19               ` Jens Axboe
2020-01-28 20:50                 ` Pavel Begunkov
2020-01-28 20:56                   ` Jens Axboe
2020-01-28 21:25                     ` Christian Brauner
2020-01-28 22:38                       ` Pavel Begunkov
2020-01-28 23:36             ` Pavel Begunkov
2020-01-28 23:40               ` Jens Axboe
2020-01-28 23:51                 ` Jens Axboe
2020-01-29  0:10                   ` Pavel Begunkov
2020-01-29  0:15                     ` Jens Axboe
2020-01-29  0:18                       ` Jens Axboe
2020-01-29  0:20                     ` Jens Axboe
2020-01-29  0:21                       ` Pavel Begunkov
2020-01-29  0:24                         ` Jens Axboe
2020-01-29  0:54                           ` Jens Axboe
2020-01-29 10:17                             ` Pavel Begunkov
2020-01-29 13:11                               ` Stefan Metzmacher
2020-01-29 13:41                                 ` Pavel Begunkov
2020-01-29 13:56                                   ` Stefan Metzmacher
2020-01-29 14:23                                     ` Pavel Begunkov
2020-01-29 14:27                                       ` Stefan Metzmacher
2020-01-29 14:34                                         ` Pavel Begunkov
2020-01-29 17:34                                       ` Jens Axboe
2020-01-29 17:42                                         ` Jens Axboe
2020-01-29 20:09                                           ` Stefan Metzmacher
2020-01-29 20:48                                             ` Jens Axboe
2020-01-29 17:46                                         ` Pavel Begunkov
2020-01-29 14:59             ` Jann Horn
2020-01-29 17:34               ` Jens Axboe
2020-01-30  1:08                 ` Jens Axboe
2020-01-30  2:20                   ` Jens Axboe
2020-01-30  3:18                     ` Jens Axboe
2020-01-30  6:53                   ` Stefan Metzmacher
2020-01-30 10:11                   ` Jann Horn
2020-01-30 10:26                     ` Christian Brauner
2020-01-30 14:11                       ` Jens Axboe
2020-01-30 14:47                         ` Stefan Metzmacher
2020-01-30 15:34                           ` Jens Axboe [this message]
2020-01-30 15:13                         ` Christian Brauner
2020-01-30 15:29                           ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox