* IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? @ 2020-01-28 10:18 Stefan Metzmacher 2020-01-28 16:10 ` Jens Axboe 0 siblings, 1 reply; 48+ messages in thread From: Stefan Metzmacher @ 2020-01-28 10:18 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, Linux API Mailing List [-- Attachment #1.1: Type: text/plain, Size: 1080 bytes --] Hi Jens, now that we have IORING_FEAT_CUR_PERSONALITY... How can we optimize the fileserver case now, in order to avoid the overhead of always calling 5 syscalls before io_uring_enter()?: /* gain root again */ setresuid(-1,0,-1); setresgid(-1,0,-1) /* impersonate the user with groups */ setgroups(num, grps); setresgid(-1,gid,-1); setresuid(-1,uid,-1); /* trigger the operation */ io_uring_enter(); I guess some kind of IORING_REGISTER_CREDS[_UPDATE] would be good, together with a IOSQE_FIXED_CREDS in order to specify credentials per operation. Or we make it much more generic and introduce a credsfd_create() syscall in order to get an fd for a credential handle, maybe together with another syscall to activate the credentials of the current thread (or let a write to the fd trigger the activation in order to avoid an additional syscall number). Having just an fd would allow IORING_REGISTER_CREDS[_UPDATE] to be just an array of int values instead of a more complex structure to define the credentials. What do you think? metze [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-28 10:18 IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? Stefan Metzmacher @ 2020-01-28 16:10 ` Jens Axboe 2020-01-28 16:17 ` Stefan Metzmacher 0 siblings, 1 reply; 48+ messages in thread From: Jens Axboe @ 2020-01-28 16:10 UTC (permalink / raw) To: Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List On 1/28/20 3:18 AM, Stefan Metzmacher wrote: > Hi Jens, > > now that we have IORING_FEAT_CUR_PERSONALITY... > > How can we optimize the fileserver case now, in order to avoid the > overhead of always calling 5 syscalls before io_uring_enter()?: > > /* gain root again */ > setresuid(-1,0,-1); setresgid(-1,0,-1) > /* impersonate the user with groups */ > setgroups(num, grps); setresgid(-1,gid,-1); setresuid(-1,uid,-1); > /* trigger the operation */ > io_uring_enter(); > > I guess some kind of IORING_REGISTER_CREDS[_UPDATE] would be > good, together with a IOSQE_FIXED_CREDS in order to specify > credentials per operation. > > Or we make it much more generic and introduce a credsfd_create() > syscall in order to get an fd for a credential handle, maybe > together with another syscall to activate the credentials of > the current thread (or let a write to the fd trigger the activation > in order to avoid an additional syscall number). > > Having just an fd would allow IORING_REGISTER_CREDS[_UPDATE] > to be just an array of int values instead of a more complex > structure to define the credentials. I'd rather avoid having to add more infrastructure for this, even if credsfd_create() would be nifty. With that in mind, something like: - Application does IORING_REGISTER_CREDS, which returns some index - Add a IORING_OP_USE_CREDS opcode, which sets the creds associated with dependent commands - Actual request is linked to the IORING_OP_USE_CREDS command, any link off IORING_OP_USE_CREDS will use those credentials - IORING_UNREGISTER_CREDS removes the registered creds Just throwing that out there, definitely willing to entertain other methods that make sense for this. Trying to avoid needing to put this information in the SQE itself, hence the idea to use a chain of links for it. The downside is that we'll need to maintain an array of key -> creds, but that's probably not a big deal. What do you think? Ideally I'd like to get this done for 5.6 even if we are a bit late, so you'll have everything you need with that release. -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-28 16:10 ` Jens Axboe @ 2020-01-28 16:17 ` Stefan Metzmacher 2020-01-28 16:19 ` Jens Axboe 0 siblings, 1 reply; 48+ messages in thread From: Stefan Metzmacher @ 2020-01-28 16:17 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, Linux API Mailing List [-- Attachment #1.1: Type: text/plain, Size: 2515 bytes --] Am 28.01.20 um 17:10 schrieb Jens Axboe: > On 1/28/20 3:18 AM, Stefan Metzmacher wrote: >> Hi Jens, >> >> now that we have IORING_FEAT_CUR_PERSONALITY... >> >> How can we optimize the fileserver case now, in order to avoid the >> overhead of always calling 5 syscalls before io_uring_enter()?: >> >> /* gain root again */ >> setresuid(-1,0,-1); setresgid(-1,0,-1) >> /* impersonate the user with groups */ >> setgroups(num, grps); setresgid(-1,gid,-1); setresuid(-1,uid,-1); >> /* trigger the operation */ >> io_uring_enter(); >> >> I guess some kind of IORING_REGISTER_CREDS[_UPDATE] would be >> good, together with a IOSQE_FIXED_CREDS in order to specify >> credentials per operation. >> >> Or we make it much more generic and introduce a credsfd_create() >> syscall in order to get an fd for a credential handle, maybe >> together with another syscall to activate the credentials of >> the current thread (or let a write to the fd trigger the activation >> in order to avoid an additional syscall number). >> >> Having just an fd would allow IORING_REGISTER_CREDS[_UPDATE] >> to be just an array of int values instead of a more complex >> structure to define the credentials. > > I'd rather avoid having to add more infrastructure for this, even if > credsfd_create() would be nifty. > > With that in mind, something like: > > - Application does IORING_REGISTER_CREDS, which returns some index > > - Add a IORING_OP_USE_CREDS opcode, which sets the creds associated > with dependent commands > - Actual request is linked to the IORING_OP_USE_CREDS command, any > link off IORING_OP_USE_CREDS will use those credentials Using links for this sounds ok. > - IORING_UNREGISTER_CREDS removes the registered creds > > Just throwing that out there, definitely willing to entertain other > methods that make sense for this. Trying to avoid needing to put this > information in the SQE itself, hence the idea to use a chain of links > for it. > > The downside is that we'll need to maintain an array of key -> creds, > but that's probably not a big deal. > > What do you think? So IORING_REGISTER_CREDS would be a simple operation that just takes a snapshot of the current_cred() and returns an id that can be passed to IORING_OP_USE_CREDS or IORING_UNREGISTER_CREDS? > Ideally I'd like to get this done for 5.6 even if we > are a bit late, so you'll have everything you need with that release. That would be great! Thanks! metze [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-28 16:17 ` Stefan Metzmacher @ 2020-01-28 16:19 ` Jens Axboe 2020-01-28 17:19 ` Jens Axboe 0 siblings, 1 reply; 48+ messages in thread From: Jens Axboe @ 2020-01-28 16:19 UTC (permalink / raw) To: Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List On 1/28/20 9:17 AM, Stefan Metzmacher wrote: > Am 28.01.20 um 17:10 schrieb Jens Axboe: >> On 1/28/20 3:18 AM, Stefan Metzmacher wrote: >>> Hi Jens, >>> >>> now that we have IORING_FEAT_CUR_PERSONALITY... >>> >>> How can we optimize the fileserver case now, in order to avoid the >>> overhead of always calling 5 syscalls before io_uring_enter()?: >>> >>> /* gain root again */ >>> setresuid(-1,0,-1); setresgid(-1,0,-1) >>> /* impersonate the user with groups */ >>> setgroups(num, grps); setresgid(-1,gid,-1); setresuid(-1,uid,-1); >>> /* trigger the operation */ >>> io_uring_enter(); >>> >>> I guess some kind of IORING_REGISTER_CREDS[_UPDATE] would be >>> good, together with a IOSQE_FIXED_CREDS in order to specify >>> credentials per operation. >>> >>> Or we make it much more generic and introduce a credsfd_create() >>> syscall in order to get an fd for a credential handle, maybe >>> together with another syscall to activate the credentials of >>> the current thread (or let a write to the fd trigger the activation >>> in order to avoid an additional syscall number). >>> >>> Having just an fd would allow IORING_REGISTER_CREDS[_UPDATE] >>> to be just an array of int values instead of a more complex >>> structure to define the credentials. >> >> I'd rather avoid having to add more infrastructure for this, even if >> credsfd_create() would be nifty. >> >> With that in mind, something like: >> >> - Application does IORING_REGISTER_CREDS, which returns some index >> >> - Add a IORING_OP_USE_CREDS opcode, which sets the creds associated >> with dependent commands >> - Actual request is linked to the IORING_OP_USE_CREDS command, any >> link off IORING_OP_USE_CREDS will use those credentials > > Using links for this sounds ok. Great! I'll try and hack this up and see how it turns out. >> - IORING_UNREGISTER_CREDS removes the registered creds >> >> Just throwing that out there, definitely willing to entertain other >> methods that make sense for this. Trying to avoid needing to put this >> information in the SQE itself, hence the idea to use a chain of links >> for it. >> >> The downside is that we'll need to maintain an array of key -> creds, >> but that's probably not a big deal. >> >> What do you think? > > So IORING_REGISTER_CREDS would be a simple operation that just takes a > snapshot of the current_cred() and returns an id that can be passed to > IORING_OP_USE_CREDS or IORING_UNREGISTER_CREDS? Right, you would not pass in any arguments, it'd have to be run from the personality you wish to register. It simply returns an integer, which is a key to use for IORING_OP_USE_CREDS, or at the end for IORING_UNREGISTER_CREDS when you no longer wish to use this personality. >> Ideally I'd like to get this done for 5.6 even if we >> are a bit late, so you'll have everything you need with that release. > > That would be great! Crossing fingers... -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-28 16:19 ` Jens Axboe @ 2020-01-28 17:19 ` Jens Axboe 2020-01-28 18:04 ` Jens Axboe 0 siblings, 1 reply; 48+ messages in thread From: Jens Axboe @ 2020-01-28 17:19 UTC (permalink / raw) To: Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List, Pavel Begunkov On 1/28/20 9:19 AM, Jens Axboe wrote: > On 1/28/20 9:17 AM, Stefan Metzmacher wrote: >> Am 28.01.20 um 17:10 schrieb Jens Axboe: >>> On 1/28/20 3:18 AM, Stefan Metzmacher wrote: >>>> Hi Jens, >>>> >>>> now that we have IORING_FEAT_CUR_PERSONALITY... >>>> >>>> How can we optimize the fileserver case now, in order to avoid the >>>> overhead of always calling 5 syscalls before io_uring_enter()?: >>>> >>>> /* gain root again */ >>>> setresuid(-1,0,-1); setresgid(-1,0,-1) >>>> /* impersonate the user with groups */ >>>> setgroups(num, grps); setresgid(-1,gid,-1); setresuid(-1,uid,-1); >>>> /* trigger the operation */ >>>> io_uring_enter(); >>>> >>>> I guess some kind of IORING_REGISTER_CREDS[_UPDATE] would be >>>> good, together with a IOSQE_FIXED_CREDS in order to specify >>>> credentials per operation. >>>> >>>> Or we make it much more generic and introduce a credsfd_create() >>>> syscall in order to get an fd for a credential handle, maybe >>>> together with another syscall to activate the credentials of >>>> the current thread (or let a write to the fd trigger the activation >>>> in order to avoid an additional syscall number). >>>> >>>> Having just an fd would allow IORING_REGISTER_CREDS[_UPDATE] >>>> to be just an array of int values instead of a more complex >>>> structure to define the credentials. >>> >>> I'd rather avoid having to add more infrastructure for this, even if >>> credsfd_create() would be nifty. >>> >>> With that in mind, something like: >>> >>> - Application does IORING_REGISTER_CREDS, which returns some index >>> >>> - Add a IORING_OP_USE_CREDS opcode, which sets the creds associated >>> with dependent commands >>> - Actual request is linked to the IORING_OP_USE_CREDS command, any >>> link off IORING_OP_USE_CREDS will use those credentials >> >> Using links for this sounds ok. > > Great! I'll try and hack this up and see how it turns out. > >>> - IORING_UNREGISTER_CREDS removes the registered creds >>> >>> Just throwing that out there, definitely willing to entertain other >>> methods that make sense for this. Trying to avoid needing to put this >>> information in the SQE itself, hence the idea to use a chain of links >>> for it. >>> >>> The downside is that we'll need to maintain an array of key -> creds, >>> but that's probably not a big deal. >>> >>> What do you think? >> >> So IORING_REGISTER_CREDS would be a simple operation that just takes a >> snapshot of the current_cred() and returns an id that can be passed to >> IORING_OP_USE_CREDS or IORING_UNREGISTER_CREDS? > > Right, you would not pass in any arguments, it'd have to be run from the > personality you wish to register. It simply returns an integer, which is > a key to use for IORING_OP_USE_CREDS, or at the end for > IORING_UNREGISTER_CREDS when you no longer wish to use this personality. > >>> Ideally I'd like to get this done for 5.6 even if we >>> are a bit late, so you'll have everything you need with that release. >> >> That would be great! > > Crossing fingers... OK, so here are two patches for testing: https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds #1 adds support for registering the personality of the invoking task, and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to just having one link, it doesn't support a chain of them. I'll try and write a test case for this just to see if it actually works, so far it's totally untested. Adding Pavel to the CC. -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-28 17:19 ` Jens Axboe @ 2020-01-28 18:04 ` Jens Axboe 2020-01-28 19:42 ` Jens Axboe 0 siblings, 1 reply; 48+ messages in thread From: Jens Axboe @ 2020-01-28 18:04 UTC (permalink / raw) To: Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List, Pavel Begunkov On 1/28/20 10:19 AM, Jens Axboe wrote: > On 1/28/20 9:19 AM, Jens Axboe wrote: >> On 1/28/20 9:17 AM, Stefan Metzmacher wrote: >>> Am 28.01.20 um 17:10 schrieb Jens Axboe: >>>> On 1/28/20 3:18 AM, Stefan Metzmacher wrote: >>>>> Hi Jens, >>>>> >>>>> now that we have IORING_FEAT_CUR_PERSONALITY... >>>>> >>>>> How can we optimize the fileserver case now, in order to avoid the >>>>> overhead of always calling 5 syscalls before io_uring_enter()?: >>>>> >>>>> /* gain root again */ >>>>> setresuid(-1,0,-1); setresgid(-1,0,-1) >>>>> /* impersonate the user with groups */ >>>>> setgroups(num, grps); setresgid(-1,gid,-1); setresuid(-1,uid,-1); >>>>> /* trigger the operation */ >>>>> io_uring_enter(); >>>>> >>>>> I guess some kind of IORING_REGISTER_CREDS[_UPDATE] would be >>>>> good, together with a IOSQE_FIXED_CREDS in order to specify >>>>> credentials per operation. >>>>> >>>>> Or we make it much more generic and introduce a credsfd_create() >>>>> syscall in order to get an fd for a credential handle, maybe >>>>> together with another syscall to activate the credentials of >>>>> the current thread (or let a write to the fd trigger the activation >>>>> in order to avoid an additional syscall number). >>>>> >>>>> Having just an fd would allow IORING_REGISTER_CREDS[_UPDATE] >>>>> to be just an array of int values instead of a more complex >>>>> structure to define the credentials. >>>> >>>> I'd rather avoid having to add more infrastructure for this, even if >>>> credsfd_create() would be nifty. >>>> >>>> With that in mind, something like: >>>> >>>> - Application does IORING_REGISTER_CREDS, which returns some index >>>> >>>> - Add a IORING_OP_USE_CREDS opcode, which sets the creds associated >>>> with dependent commands >>>> - Actual request is linked to the IORING_OP_USE_CREDS command, any >>>> link off IORING_OP_USE_CREDS will use those credentials >>> >>> Using links for this sounds ok. >> >> Great! I'll try and hack this up and see how it turns out. >> >>>> - IORING_UNREGISTER_CREDS removes the registered creds >>>> >>>> Just throwing that out there, definitely willing to entertain other >>>> methods that make sense for this. Trying to avoid needing to put this >>>> information in the SQE itself, hence the idea to use a chain of links >>>> for it. >>>> >>>> The downside is that we'll need to maintain an array of key -> creds, >>>> but that's probably not a big deal. >>>> >>>> What do you think? >>> >>> So IORING_REGISTER_CREDS would be a simple operation that just takes a >>> snapshot of the current_cred() and returns an id that can be passed to >>> IORING_OP_USE_CREDS or IORING_UNREGISTER_CREDS? >> >> Right, you would not pass in any arguments, it'd have to be run from the >> personality you wish to register. It simply returns an integer, which is >> a key to use for IORING_OP_USE_CREDS, or at the end for >> IORING_UNREGISTER_CREDS when you no longer wish to use this personality. >> >>>> Ideally I'd like to get this done for 5.6 even if we >>>> are a bit late, so you'll have everything you need with that release. >>> >>> That would be great! >> >> Crossing fingers... > > OK, so here are two patches for testing: > > https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds > > #1 adds support for registering the personality of the invoking task, > and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to > just having one link, it doesn't support a chain of them. > > I'll try and write a test case for this just to see if it actually works, > so far it's totally untested. > > Adding Pavel to the CC. Minor tweak to ensuring we do the right thing for async offload as well, and it tests fine for me. Test case is: - Run as root - Register personality for root - create root only file - check we can IORING_OP_OPENAT the file - switch to user id test - check we cannot IORING_OP_OPENAT the file - check that we can open the file with IORING_OP_USE_CREDS linked -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-28 18:04 ` Jens Axboe @ 2020-01-28 19:42 ` Jens Axboe 2020-01-28 20:16 ` Pavel Begunkov ` (2 more replies) 0 siblings, 3 replies; 48+ messages in thread From: Jens Axboe @ 2020-01-28 19:42 UTC (permalink / raw) To: Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List, Pavel Begunkov On 1/28/20 11:04 AM, Jens Axboe wrote: > On 1/28/20 10:19 AM, Jens Axboe wrote: >> On 1/28/20 9:19 AM, Jens Axboe wrote: >>> On 1/28/20 9:17 AM, Stefan Metzmacher wrote: >>>> Am 28.01.20 um 17:10 schrieb Jens Axboe: >>>>> On 1/28/20 3:18 AM, Stefan Metzmacher wrote: >>>>>> Hi Jens, >>>>>> >>>>>> now that we have IORING_FEAT_CUR_PERSONALITY... >>>>>> >>>>>> How can we optimize the fileserver case now, in order to avoid the >>>>>> overhead of always calling 5 syscalls before io_uring_enter()?: >>>>>> >>>>>> /* gain root again */ >>>>>> setresuid(-1,0,-1); setresgid(-1,0,-1) >>>>>> /* impersonate the user with groups */ >>>>>> setgroups(num, grps); setresgid(-1,gid,-1); setresuid(-1,uid,-1); >>>>>> /* trigger the operation */ >>>>>> io_uring_enter(); >>>>>> >>>>>> I guess some kind of IORING_REGISTER_CREDS[_UPDATE] would be >>>>>> good, together with a IOSQE_FIXED_CREDS in order to specify >>>>>> credentials per operation. >>>>>> >>>>>> Or we make it much more generic and introduce a credsfd_create() >>>>>> syscall in order to get an fd for a credential handle, maybe >>>>>> together with another syscall to activate the credentials of >>>>>> the current thread (or let a write to the fd trigger the activation >>>>>> in order to avoid an additional syscall number). >>>>>> >>>>>> Having just an fd would allow IORING_REGISTER_CREDS[_UPDATE] >>>>>> to be just an array of int values instead of a more complex >>>>>> structure to define the credentials. >>>>> >>>>> I'd rather avoid having to add more infrastructure for this, even if >>>>> credsfd_create() would be nifty. >>>>> >>>>> With that in mind, something like: >>>>> >>>>> - Application does IORING_REGISTER_CREDS, which returns some index >>>>> >>>>> - Add a IORING_OP_USE_CREDS opcode, which sets the creds associated >>>>> with dependent commands >>>>> - Actual request is linked to the IORING_OP_USE_CREDS command, any >>>>> link off IORING_OP_USE_CREDS will use those credentials >>>> >>>> Using links for this sounds ok. >>> >>> Great! I'll try and hack this up and see how it turns out. >>> >>>>> - IORING_UNREGISTER_CREDS removes the registered creds >>>>> >>>>> Just throwing that out there, definitely willing to entertain other >>>>> methods that make sense for this. Trying to avoid needing to put this >>>>> information in the SQE itself, hence the idea to use a chain of links >>>>> for it. >>>>> >>>>> The downside is that we'll need to maintain an array of key -> creds, >>>>> but that's probably not a big deal. >>>>> >>>>> What do you think? >>>> >>>> So IORING_REGISTER_CREDS would be a simple operation that just takes a >>>> snapshot of the current_cred() and returns an id that can be passed to >>>> IORING_OP_USE_CREDS or IORING_UNREGISTER_CREDS? >>> >>> Right, you would not pass in any arguments, it'd have to be run from the >>> personality you wish to register. It simply returns an integer, which is >>> a key to use for IORING_OP_USE_CREDS, or at the end for >>> IORING_UNREGISTER_CREDS when you no longer wish to use this personality. >>> >>>>> Ideally I'd like to get this done for 5.6 even if we >>>>> are a bit late, so you'll have everything you need with that release. >>>> >>>> That would be great! >>> >>> Crossing fingers... >> >> OK, so here are two patches for testing: >> >> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >> >> #1 adds support for registering the personality of the invoking task, >> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to >> just having one link, it doesn't support a chain of them. >> >> I'll try and write a test case for this just to see if it actually works, >> so far it's totally untested. >> >> Adding Pavel to the CC. > > Minor tweak to ensuring we do the right thing for async offload as well, > and it tests fine for me. Test case is: > > - Run as root > - Register personality for root > - create root only file > - check we can IORING_OP_OPENAT the file > - switch to user id test > - check we cannot IORING_OP_OPENAT the file > - check that we can open the file with IORING_OP_USE_CREDS linked I didn't like it becoming a bit too complicated, both in terms of implementation and use. And the fact that we'd have to jump through hoops to make this work for a full chain. So I punted and just added sqe->personality and IOSQE_PERSONALITY. This makes it way easier to use. Same branch: https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds I'd feel much better with this variant for 5.6. -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-28 19:42 ` Jens Axboe @ 2020-01-28 20:16 ` Pavel Begunkov 2020-01-28 20:19 ` Jens Axboe 2020-01-28 23:36 ` Pavel Begunkov 2020-01-29 14:59 ` Jann Horn 2 siblings, 1 reply; 48+ messages in thread From: Pavel Begunkov @ 2020-01-28 20:16 UTC (permalink / raw) To: Jens Axboe, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List [-- Attachment #1.1: Type: text/plain, Size: 2158 bytes --] On 28/01/2020 22:42, Jens Axboe wrote: > On 1/28/20 11:04 AM, Jens Axboe wrote: >> On 1/28/20 10:19 AM, Jens Axboe wrote: >>> On 1/28/20 9:19 AM, Jens Axboe wrote: >>>> On 1/28/20 9:17 AM, Stefan Metzmacher wrote: >>> OK, so here are two patches for testing: >>> >>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >>> >>> #1 adds support for registering the personality of the invoking task, >>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to >>> just having one link, it doesn't support a chain of them. >>> >>> I'll try and write a test case for this just to see if it actually works, >>> so far it's totally untested. >>> >>> Adding Pavel to the CC. >> >> Minor tweak to ensuring we do the right thing for async offload as well, >> and it tests fine for me. Test case is: >> >> - Run as root >> - Register personality for root >> - create root only file >> - check we can IORING_OP_OPENAT the file >> - switch to user id test >> - check we cannot IORING_OP_OPENAT the file >> - check that we can open the file with IORING_OP_USE_CREDS linked > > I didn't like it becoming a bit too complicated, both in terms of > implementation and use. And the fact that we'd have to jump through > hoops to make this work for a full chain. > > So I punted and just added sqe->personality and IOSQE_PERSONALITY. > This makes it way easier to use. Same branch: > > https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds > > I'd feel much better with this variant for 5.6. > To be honest, sounds pretty dangerous. Especially since somebody started talking about stealing fds from a process, it could lead to a nasty loophole somehow. E.g. root registers its credentials, passes io_uring it to non-privileged children, and then some process steals the uring fd (though, it would need priviledged mode for code-injection or else). Could we Cc here someone really keen on security? Stefan, could you please explain, how this 5 syscalls pattern from the first email came in the first place? Just want to understand the case. -- Pavel Begunkov [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-28 20:16 ` Pavel Begunkov @ 2020-01-28 20:19 ` Jens Axboe 2020-01-28 20:50 ` Pavel Begunkov 0 siblings, 1 reply; 48+ messages in thread From: Jens Axboe @ 2020-01-28 20:19 UTC (permalink / raw) To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List On 1/28/20 1:16 PM, Pavel Begunkov wrote: > On 28/01/2020 22:42, Jens Axboe wrote: >> On 1/28/20 11:04 AM, Jens Axboe wrote: >>> On 1/28/20 10:19 AM, Jens Axboe wrote: >>>> On 1/28/20 9:19 AM, Jens Axboe wrote: >>>>> On 1/28/20 9:17 AM, Stefan Metzmacher wrote: >>>> OK, so here are two patches for testing: >>>> >>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >>>> >>>> #1 adds support for registering the personality of the invoking task, >>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to >>>> just having one link, it doesn't support a chain of them. >>>> >>>> I'll try and write a test case for this just to see if it actually works, >>>> so far it's totally untested. >>>> >>>> Adding Pavel to the CC. >>> >>> Minor tweak to ensuring we do the right thing for async offload as well, >>> and it tests fine for me. Test case is: >>> >>> - Run as root >>> - Register personality for root >>> - create root only file >>> - check we can IORING_OP_OPENAT the file >>> - switch to user id test >>> - check we cannot IORING_OP_OPENAT the file >>> - check that we can open the file with IORING_OP_USE_CREDS linked >> >> I didn't like it becoming a bit too complicated, both in terms of >> implementation and use. And the fact that we'd have to jump through >> hoops to make this work for a full chain. >> >> So I punted and just added sqe->personality and IOSQE_PERSONALITY. >> This makes it way easier to use. Same branch: >> >> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >> >> I'd feel much better with this variant for 5.6. >> > > To be honest, sounds pretty dangerous. Especially since somebody started talking > about stealing fds from a process, it could lead to a nasty loophole somehow. > E.g. root registers its credentials, passes io_uring it to non-privileged > children, and then some process steals the uring fd (though, it would need > priviledged mode for code-injection or else). Could we Cc here someone really > keen on security? Link? If you can steal fds, then surely you've already lost any sense of security in the first place? Besides, if root registered the ring, the root credentials are already IN the ring. I don't see how this adds any extra holes. > Stefan, could you please explain, how this 5 syscalls pattern from the first > email came in the first place? Just want to understand the case. I think if you go back a bit in the archive, Stefan has a fuller explanation of how samba does the credentials dance. -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-28 20:19 ` Jens Axboe @ 2020-01-28 20:50 ` Pavel Begunkov 2020-01-28 20:56 ` Jens Axboe 0 siblings, 1 reply; 48+ messages in thread From: Pavel Begunkov @ 2020-01-28 20:50 UTC (permalink / raw) To: Jens Axboe, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List [-- Attachment #1.1: Type: text/plain, Size: 3099 bytes --] On 28/01/2020 23:19, Jens Axboe wrote: > On 1/28/20 1:16 PM, Pavel Begunkov wrote: >> On 28/01/2020 22:42, Jens Axboe wrote: >>> On 1/28/20 11:04 AM, Jens Axboe wrote: >>>> On 1/28/20 10:19 AM, Jens Axboe wrote: >>>>> On 1/28/20 9:19 AM, Jens Axboe wrote: >>>>>> On 1/28/20 9:17 AM, Stefan Metzmacher wrote: >>>>> OK, so here are two patches for testing: >>>>> >>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >>>>> >>>>> #1 adds support for registering the personality of the invoking task, >>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to >>>>> just having one link, it doesn't support a chain of them. >>>>> >>>>> I'll try and write a test case for this just to see if it actually works, >>>>> so far it's totally untested. >>>>> >>>>> Adding Pavel to the CC. >>>> >>>> Minor tweak to ensuring we do the right thing for async offload as well, >>>> and it tests fine for me. Test case is: >>>> >>>> - Run as root >>>> - Register personality for root >>>> - create root only file >>>> - check we can IORING_OP_OPENAT the file >>>> - switch to user id test >>>> - check we cannot IORING_OP_OPENAT the file >>>> - check that we can open the file with IORING_OP_USE_CREDS linked >>> >>> I didn't like it becoming a bit too complicated, both in terms of >>> implementation and use. And the fact that we'd have to jump through >>> hoops to make this work for a full chain. >>> >>> So I punted and just added sqe->personality and IOSQE_PERSONALITY. >>> This makes it way easier to use. Same branch: >>> >>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >>> >>> I'd feel much better with this variant for 5.6. >>> >> >> To be honest, sounds pretty dangerous. Especially since somebody started talking >> about stealing fds from a process, it could lead to a nasty loophole somehow. >> E.g. root registers its credentials, passes io_uring it to non-privileged >> children, and then some process steals the uring fd (though, it would need >> priviledged mode for code-injection or else). Could we Cc here someone really >> keen on security? > > Link? If you can steal fds, then surely you've already lost any sense of https://lwn.net/Articles/808997/ But I didn't looked up it yet. > security in the first place? Besides, if root registered the ring, the root > credentials are already IN the ring. I don't see how this adds any extra > holes. Isn't it what you fixed in ("don't use static creds/mm assignments") ? I'm not sure what capability (and whether any) it would need, but better to think such cases through. Just saying, I would prefer to ask a person with extensive security experience, unlike me. >> Stefan, could you please explain, how this 5 syscalls pattern from the first >> email came in the first place? Just want to understand the case. > > I think if you go back a bit in the archive, Stefan has a fuller explanation > of how samba does the credentials dance. Missed it, I'll take a look, thanks -- Pavel Begunkov [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-28 20:50 ` Pavel Begunkov @ 2020-01-28 20:56 ` Jens Axboe 2020-01-28 21:25 ` Christian Brauner 0 siblings, 1 reply; 48+ messages in thread From: Jens Axboe @ 2020-01-28 20:56 UTC (permalink / raw) To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List On 1/28/20 1:50 PM, Pavel Begunkov wrote: > On 28/01/2020 23:19, Jens Axboe wrote: >> On 1/28/20 1:16 PM, Pavel Begunkov wrote: >>> On 28/01/2020 22:42, Jens Axboe wrote: >>>> On 1/28/20 11:04 AM, Jens Axboe wrote: >>>>> On 1/28/20 10:19 AM, Jens Axboe wrote: >>>>>> On 1/28/20 9:19 AM, Jens Axboe wrote: >>>>>>> On 1/28/20 9:17 AM, Stefan Metzmacher wrote: >>>>>> OK, so here are two patches for testing: >>>>>> >>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >>>>>> >>>>>> #1 adds support for registering the personality of the invoking task, >>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to >>>>>> just having one link, it doesn't support a chain of them. >>>>>> >>>>>> I'll try and write a test case for this just to see if it actually works, >>>>>> so far it's totally untested. >>>>>> >>>>>> Adding Pavel to the CC. >>>>> >>>>> Minor tweak to ensuring we do the right thing for async offload as well, >>>>> and it tests fine for me. Test case is: >>>>> >>>>> - Run as root >>>>> - Register personality for root >>>>> - create root only file >>>>> - check we can IORING_OP_OPENAT the file >>>>> - switch to user id test >>>>> - check we cannot IORING_OP_OPENAT the file >>>>> - check that we can open the file with IORING_OP_USE_CREDS linked >>>> >>>> I didn't like it becoming a bit too complicated, both in terms of >>>> implementation and use. And the fact that we'd have to jump through >>>> hoops to make this work for a full chain. >>>> >>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY. >>>> This makes it way easier to use. Same branch: >>>> >>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >>>> >>>> I'd feel much better with this variant for 5.6. >>>> >>> >>> To be honest, sounds pretty dangerous. Especially since somebody started talking >>> about stealing fds from a process, it could lead to a nasty loophole somehow. >>> E.g. root registers its credentials, passes io_uring it to non-privileged >>> children, and then some process steals the uring fd (though, it would need >>> priviledged mode for code-injection or else). Could we Cc here someone really >>> keen on security? >> >> Link? If you can steal fds, then surely you've already lost any sense of > > https://lwn.net/Articles/808997/ > But I didn't looked up it yet. This isn't new by any stretch, it's always been possible to pass file descriptors through SCM_RIGHTS. This just gives you a new way to do it. That's not stealing or leaking, it's deliberately passing it to someone else. >> security in the first place? Besides, if root registered the ring, the root >> credentials are already IN the ring. I don't see how this adds any extra >> holes. > > Isn't it what you fixed in ("don't use static creds/mm assignments") ? Sure, but SQPOLL still uses it. > I'm not sure what capability (and whether any) it would need, but > better to think such cases through. Just saying, I would prefer to ask > a person with extensive security experience, unlike me. I don't disagree, but I really don't think this is any different than what we already allow. -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-28 20:56 ` Jens Axboe @ 2020-01-28 21:25 ` Christian Brauner 2020-01-28 22:38 ` Pavel Begunkov 0 siblings, 1 reply; 48+ messages in thread From: Christian Brauner @ 2020-01-28 21:25 UTC (permalink / raw) To: Jens Axboe Cc: Pavel Begunkov, Stefan Metzmacher, io-uring, Linux API Mailing List On Tue, Jan 28, 2020 at 01:56:00PM -0700, Jens Axboe wrote: > On 1/28/20 1:50 PM, Pavel Begunkov wrote: > > On 28/01/2020 23:19, Jens Axboe wrote: > >> On 1/28/20 1:16 PM, Pavel Begunkov wrote: > >>> On 28/01/2020 22:42, Jens Axboe wrote: > >>>> On 1/28/20 11:04 AM, Jens Axboe wrote: > >>>>> On 1/28/20 10:19 AM, Jens Axboe wrote: > >>>>>> On 1/28/20 9:19 AM, Jens Axboe wrote: > >>>>>>> On 1/28/20 9:17 AM, Stefan Metzmacher wrote: > >>>>>> OK, so here are two patches for testing: > >>>>>> > >>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds > >>>>>> > >>>>>> #1 adds support for registering the personality of the invoking task, > >>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to > >>>>>> just having one link, it doesn't support a chain of them. > >>>>>> > >>>>>> I'll try and write a test case for this just to see if it actually works, > >>>>>> so far it's totally untested. > >>>>>> > >>>>>> Adding Pavel to the CC. > >>>>> > >>>>> Minor tweak to ensuring we do the right thing for async offload as well, > >>>>> and it tests fine for me. Test case is: > >>>>> > >>>>> - Run as root > >>>>> - Register personality for root > >>>>> - create root only file > >>>>> - check we can IORING_OP_OPENAT the file > >>>>> - switch to user id test > >>>>> - check we cannot IORING_OP_OPENAT the file > >>>>> - check that we can open the file with IORING_OP_USE_CREDS linked > >>>> > >>>> I didn't like it becoming a bit too complicated, both in terms of > >>>> implementation and use. And the fact that we'd have to jump through > >>>> hoops to make this work for a full chain. > >>>> > >>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY. > >>>> This makes it way easier to use. Same branch: > >>>> > >>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds > >>>> > >>>> I'd feel much better with this variant for 5.6. > >>>> > >>> > >>> To be honest, sounds pretty dangerous. Especially since somebody started talking > >>> about stealing fds from a process, it could lead to a nasty loophole somehow. > >>> E.g. root registers its credentials, passes io_uring it to non-privileged > >>> children, and then some process steals the uring fd (though, it would need > >>> priviledged mode for code-injection or else). Could we Cc here someone really > >>> keen on security? > >> > >> Link? If you can steal fds, then surely you've already lost any sense of > > > > https://lwn.net/Articles/808997/ > > But I didn't looked up it yet. > > This isn't new by any stretch, it's always been possible to pass file > descriptors through SCM_RIGHTS. This just gives you a new way to do it. > That's not stealing or leaking, it's deliberately passing it to someone > else. I've been reading along quietly. In addition to what Jens said, to ease everyone's mind: pidfd_getfd() doesn't allow to unconditionally grab file descriptors for any task. That would be crazy. The calling task needs ptrace_may_access() permissions on the target task, i.e. the task from which you want to grab the io_uring file descriptor. And any calling task that has ptrace_may_access() permissions on the target can do much worse than just grabbing an fd. Christian ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-28 21:25 ` Christian Brauner @ 2020-01-28 22:38 ` Pavel Begunkov 0 siblings, 0 replies; 48+ messages in thread From: Pavel Begunkov @ 2020-01-28 22:38 UTC (permalink / raw) To: Christian Brauner, Jens Axboe Cc: Stefan Metzmacher, io-uring, Linux API Mailing List [-- Attachment #1.1: Type: text/plain, Size: 578 bytes --] On 29/01/2020 00:25, Christian Brauner wrote: > I've been reading along quietly. In addition to what Jens said, to ease > everyone's mind: pidfd_getfd() doesn't allow to unconditionally grab > file descriptors for any task. That would be crazy. The calling task > needs ptrace_may_access() permissions on the target task, i.e. the task > from which you want to grab the io_uring file descriptor. And any > calling task that has ptrace_may_access() permissions on the target can > do much worse than just grabbing an fd. Good to know, thanks! -- Pavel Begunkov [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-28 19:42 ` Jens Axboe 2020-01-28 20:16 ` Pavel Begunkov @ 2020-01-28 23:36 ` Pavel Begunkov 2020-01-28 23:40 ` Jens Axboe 2020-01-29 14:59 ` Jann Horn 2 siblings, 1 reply; 48+ messages in thread From: Pavel Begunkov @ 2020-01-28 23:36 UTC (permalink / raw) To: Jens Axboe, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List [-- Attachment #1.1: Type: text/plain, Size: 989 bytes --] On 28/01/2020 22:42, Jens Axboe wrote: > I didn't like it becoming a bit too complicated, both in terms of > implementation and use. And the fact that we'd have to jump through > hoops to make this work for a full chain. > > So I punted and just added sqe->personality and IOSQE_PERSONALITY. > This makes it way easier to use. Same branch: > > https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds > > I'd feel much better with this variant for 5.6. > Checked out ("don't use static creds/mm assignments") 1. do we miscount cred refs? We grab one in get_current_cred() for each async request, but if (worker->creds != work->creds) it will never be put. 2. shouldn't worker->creds be named {old,saved,etc}_creds? It's set as worker->creds = override_creds(work->creds); Where override_creds() returns previous creds. And if so, then the following fast check looks strange: worker->creds != work->creds -- Pavel Begunkov [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-28 23:36 ` Pavel Begunkov @ 2020-01-28 23:40 ` Jens Axboe 2020-01-28 23:51 ` Jens Axboe 0 siblings, 1 reply; 48+ messages in thread From: Jens Axboe @ 2020-01-28 23:40 UTC (permalink / raw) To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List On 1/28/20 4:36 PM, Pavel Begunkov wrote: > On 28/01/2020 22:42, Jens Axboe wrote: >> I didn't like it becoming a bit too complicated, both in terms of >> implementation and use. And the fact that we'd have to jump through >> hoops to make this work for a full chain. >> >> So I punted and just added sqe->personality and IOSQE_PERSONALITY. >> This makes it way easier to use. Same branch: >> >> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >> >> I'd feel much better with this variant for 5.6. >> > > Checked out ("don't use static creds/mm assignments") > > 1. do we miscount cred refs? We grab one in get_current_cred() for each async > request, but if (worker->creds != work->creds) it will never be put. Yeah I think you're right, that needs a bit of fixing up. > 2. shouldn't worker->creds be named {old,saved,etc}_creds? It's set as > > worker->creds = override_creds(work->creds); > > Where override_creds() returns previous creds. And if so, then the following > fast check looks strange: > > worker->creds != work->creds Don't care too much about the naming, but the logic does appear off. I'll take a look at both of these tonight, unless you beat me to it. -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-28 23:40 ` Jens Axboe @ 2020-01-28 23:51 ` Jens Axboe 2020-01-29 0:10 ` Pavel Begunkov 0 siblings, 1 reply; 48+ messages in thread From: Jens Axboe @ 2020-01-28 23:51 UTC (permalink / raw) To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List On 1/28/20 4:40 PM, Jens Axboe wrote: > On 1/28/20 4:36 PM, Pavel Begunkov wrote: >> On 28/01/2020 22:42, Jens Axboe wrote: >>> I didn't like it becoming a bit too complicated, both in terms of >>> implementation and use. And the fact that we'd have to jump through >>> hoops to make this work for a full chain. >>> >>> So I punted and just added sqe->personality and IOSQE_PERSONALITY. >>> This makes it way easier to use. Same branch: >>> >>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >>> >>> I'd feel much better with this variant for 5.6. >>> >> >> Checked out ("don't use static creds/mm assignments") >> >> 1. do we miscount cred refs? We grab one in get_current_cred() for each async >> request, but if (worker->creds != work->creds) it will never be put. > > Yeah I think you're right, that needs a bit of fixing up. I think this may have gotten fixed with the later addition posted today? I'll double check. But for the newer stuff, we put it for both cases when the request is freed. >> 2. shouldn't worker->creds be named {old,saved,etc}_creds? It's set as >> >> worker->creds = override_creds(work->creds); >> >> Where override_creds() returns previous creds. And if so, then the following >> fast check looks strange: >> >> worker->creds != work->creds > > Don't care too much about the naming, but the logic does appear off. > I'll take a look at both of these tonight, unless you beat me to it. Testing this now, what a braino. diff --git a/fs/io-wq.c b/fs/io-wq.c index ee49e8852d39..8fbbadf04cc3 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -56,7 +56,8 @@ struct io_worker { struct rcu_head rcu; struct mm_struct *mm; - const struct cred *creds; + const struct cred *cur_creds; + const struct cred *saved_creds; struct files_struct *restore_files; }; @@ -135,9 +136,9 @@ static bool __io_worker_unuse(struct io_wqe *wqe, struct io_worker *worker) { bool dropped_lock = false; - if (worker->creds) { - revert_creds(worker->creds); - worker->creds = NULL; + if (worker->saved_creds) { + revert_creds(worker->saved_creds); + worker->cur_creds = worker->saved_creds = NULL; } if (current->files != worker->restore_files) { @@ -424,10 +425,11 @@ static void io_wq_switch_mm(struct io_worker *worker, struct io_wq_work *work) static void io_wq_switch_creds(struct io_worker *worker, struct io_wq_work *work) { - if (worker->creds) - revert_creds(worker->creds); + if (worker->saved_creds) + revert_creds(worker->saved_creds); - worker->creds = override_creds(work->creds); + worker->saved_creds = override_creds(work->creds); + worker->cur_creds = work->creds; } static void io_worker_handle_work(struct io_worker *worker) @@ -480,7 +482,7 @@ static void io_worker_handle_work(struct io_worker *worker) } if (work->mm != worker->mm) io_wq_switch_mm(worker, work); - if (worker->creds != work->creds) + if (worker->cur_creds != work->creds) io_wq_switch_creds(worker, work); /* * OK to set IO_WQ_WORK_CANCEL even for uncancellable work, -- Jens Axboe ^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-28 23:51 ` Jens Axboe @ 2020-01-29 0:10 ` Pavel Begunkov 2020-01-29 0:15 ` Jens Axboe 2020-01-29 0:20 ` Jens Axboe 0 siblings, 2 replies; 48+ messages in thread From: Pavel Begunkov @ 2020-01-29 0:10 UTC (permalink / raw) To: Jens Axboe, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List [-- Attachment #1.1: Type: text/plain, Size: 3959 bytes --] On 29/01/2020 02:51, Jens Axboe wrote: > On 1/28/20 4:40 PM, Jens Axboe wrote: >> On 1/28/20 4:36 PM, Pavel Begunkov wrote: >>> On 28/01/2020 22:42, Jens Axboe wrote: >>>> I didn't like it becoming a bit too complicated, both in terms of >>>> implementation and use. And the fact that we'd have to jump through >>>> hoops to make this work for a full chain. >>>> >>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY. >>>> This makes it way easier to use. Same branch: >>>> >>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >>>> >>>> I'd feel much better with this variant for 5.6. >>>> >>> >>> Checked out ("don't use static creds/mm assignments") >>> >>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async >>> request, but if (worker->creds != work->creds) it will never be put. >> >> Yeah I think you're right, that needs a bit of fixing up. > Hmm, it seems it leaks it unconditionally, as it grabs in a ref in override_creds(). > I think this may have gotten fixed with the later addition posted today? > I'll double check. But for the newer stuff, we put it for both cases > when the request is freed. Yeah, maybe. I got tangled trying to verify both at once and decided to start with the old one. >>> 2. shouldn't worker->creds be named {old,saved,etc}_creds? It's set as >>> >>> worker->creds = override_creds(work->creds); >>> >>> Where override_creds() returns previous creds. And if so, then the following >>> fast check looks strange: >>> >>> worker->creds != work->creds >> >> Don't care too much about the naming, but the logic does appear off. >> I'll take a look at both of these tonight, unless you beat me to it. Apparently, you're faster :) > > Testing this now, what a braino. > > diff --git a/fs/io-wq.c b/fs/io-wq.c > index ee49e8852d39..8fbbadf04cc3 100644 > --- a/fs/io-wq.c > +++ b/fs/io-wq.c > @@ -56,7 +56,8 @@ struct io_worker { > > struct rcu_head rcu; > struct mm_struct *mm; > - const struct cred *creds; > + const struct cred *cur_creds; > + const struct cred *saved_creds; > struct files_struct *restore_files; > }; > > @@ -135,9 +136,9 @@ static bool __io_worker_unuse(struct io_wqe *wqe, struct io_worker *worker) > { > bool dropped_lock = false; > > - if (worker->creds) { > - revert_creds(worker->creds); > - worker->creds = NULL; > + if (worker->saved_creds) { > + revert_creds(worker->saved_creds); > + worker->cur_creds = worker->saved_creds = NULL; > } > > if (current->files != worker->restore_files) { > @@ -424,10 +425,11 @@ static void io_wq_switch_mm(struct io_worker *worker, struct io_wq_work *work) > static void io_wq_switch_creds(struct io_worker *worker, > struct io_wq_work *work) > { > - if (worker->creds) > - revert_creds(worker->creds); > + if (worker->saved_creds) > + revert_creds(worker->saved_creds); > > - worker->creds = override_creds(work->creds); > + worker->saved_creds = override_creds(work->creds); > + worker->cur_creds = work->creds; > } How about as follows? rever_creds() is a bit heavier than put_creds(). static void io_wq_switch_creds(struct io_worker *worker, struct io_wq_work *work) { const struct cred *old_creds = override_creds(work->creds); if (worker->saved_creds) put_cred(old_creds); else worker->saved_creds = old; worker->cur_creds = work->creds; } > > static void io_worker_handle_work(struct io_worker *worker) > @@ -480,7 +482,7 @@ static void io_worker_handle_work(struct io_worker *worker) > } > if (work->mm != worker->mm) > io_wq_switch_mm(worker, work); > - if (worker->creds != work->creds) > + if (worker->cur_creds != work->creds) > io_wq_switch_creds(worker, work); > /* > * OK to set IO_WQ_WORK_CANCEL even for uncancellable work, > -- Pavel Begunkov [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 0:10 ` Pavel Begunkov @ 2020-01-29 0:15 ` Jens Axboe 2020-01-29 0:18 ` Jens Axboe 2020-01-29 0:20 ` Jens Axboe 1 sibling, 1 reply; 48+ messages in thread From: Jens Axboe @ 2020-01-29 0:15 UTC (permalink / raw) To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List On 1/28/20 5:10 PM, Pavel Begunkov wrote: > On 29/01/2020 02:51, Jens Axboe wrote: >> On 1/28/20 4:40 PM, Jens Axboe wrote: >>> On 1/28/20 4:36 PM, Pavel Begunkov wrote: >>>> On 28/01/2020 22:42, Jens Axboe wrote: >>>>> I didn't like it becoming a bit too complicated, both in terms of >>>>> implementation and use. And the fact that we'd have to jump through >>>>> hoops to make this work for a full chain. >>>>> >>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY. >>>>> This makes it way easier to use. Same branch: >>>>> >>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >>>>> >>>>> I'd feel much better with this variant for 5.6. >>>>> >>>> >>>> Checked out ("don't use static creds/mm assignments") >>>> >>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async >>>> request, but if (worker->creds != work->creds) it will never be put. >>> >>> Yeah I think you're right, that needs a bit of fixing up. >> > > Hmm, it seems it leaks it unconditionally, as it grabs in a ref in > override_creds(). > >> I think this may have gotten fixed with the later addition posted today? >> I'll double check. But for the newer stuff, we put it for both cases >> when the request is freed. > > Yeah, maybe. I got tangled trying to verify both at once and decided to start > with the old one. > > >>>> 2. shouldn't worker->creds be named {old,saved,etc}_creds? It's set as >>>> >>>> worker->creds = override_creds(work->creds); >>>> >>>> Where override_creds() returns previous creds. And if so, then the following >>>> fast check looks strange: >>>> >>>> worker->creds != work->creds >>> >>> Don't care too much about the naming, but the logic does appear off. >>> I'll take a look at both of these tonight, unless you beat me to it. > > Apparently, you're faster :) > >> >> Testing this now, what a braino. >> >> diff --git a/fs/io-wq.c b/fs/io-wq.c >> index ee49e8852d39..8fbbadf04cc3 100644 >> --- a/fs/io-wq.c >> +++ b/fs/io-wq.c >> @@ -56,7 +56,8 @@ struct io_worker { >> >> struct rcu_head rcu; >> struct mm_struct *mm; >> - const struct cred *creds; >> + const struct cred *cur_creds; >> + const struct cred *saved_creds; >> struct files_struct *restore_files; >> }; >> >> @@ -135,9 +136,9 @@ static bool __io_worker_unuse(struct io_wqe *wqe, struct io_worker *worker) >> { >> bool dropped_lock = false; >> >> - if (worker->creds) { >> - revert_creds(worker->creds); >> - worker->creds = NULL; >> + if (worker->saved_creds) { >> + revert_creds(worker->saved_creds); >> + worker->cur_creds = worker->saved_creds = NULL; >> } >> >> if (current->files != worker->restore_files) { >> @@ -424,10 +425,11 @@ static void io_wq_switch_mm(struct io_worker *worker, struct io_wq_work *work) >> static void io_wq_switch_creds(struct io_worker *worker, >> struct io_wq_work *work) >> { >> - if (worker->creds) >> - revert_creds(worker->creds); >> + if (worker->saved_creds) >> + revert_creds(worker->saved_creds); >> >> - worker->creds = override_creds(work->creds); >> + worker->saved_creds = override_creds(work->creds); >> + worker->cur_creds = work->creds; >> } > > How about as follows? rever_creds() is a bit heavier than put_creds(). > > static void io_wq_switch_creds(struct io_worker *worker, > struct io_wq_work *work) > { > const struct cred *old_creds = override_creds(work->creds); > > if (worker->saved_creds) > put_cred(old_creds); > else > worker->saved_creds = old; > worker->cur_creds = work->creds; > } Looks good to me, I'll fold. -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 0:15 ` Jens Axboe @ 2020-01-29 0:18 ` Jens Axboe 0 siblings, 0 replies; 48+ messages in thread From: Jens Axboe @ 2020-01-29 0:18 UTC (permalink / raw) To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List On 1/28/20 5:15 PM, Jens Axboe wrote: > On 1/28/20 5:10 PM, Pavel Begunkov wrote: >> On 29/01/2020 02:51, Jens Axboe wrote: >>> On 1/28/20 4:40 PM, Jens Axboe wrote: >>>> On 1/28/20 4:36 PM, Pavel Begunkov wrote: >>>>> On 28/01/2020 22:42, Jens Axboe wrote: >>>>>> I didn't like it becoming a bit too complicated, both in terms of >>>>>> implementation and use. And the fact that we'd have to jump through >>>>>> hoops to make this work for a full chain. >>>>>> >>>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY. >>>>>> This makes it way easier to use. Same branch: >>>>>> >>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >>>>>> >>>>>> I'd feel much better with this variant for 5.6. >>>>>> >>>>> >>>>> Checked out ("don't use static creds/mm assignments") >>>>> >>>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async >>>>> request, but if (worker->creds != work->creds) it will never be put. >>>> >>>> Yeah I think you're right, that needs a bit of fixing up. >>> >> >> Hmm, it seems it leaks it unconditionally, as it grabs in a ref in >> override_creds(). >> >>> I think this may have gotten fixed with the later addition posted today? >>> I'll double check. But for the newer stuff, we put it for both cases >>> when the request is freed. >> >> Yeah, maybe. I got tangled trying to verify both at once and decided to start >> with the old one. >> >> >>>>> 2. shouldn't worker->creds be named {old,saved,etc}_creds? It's set as >>>>> >>>>> worker->creds = override_creds(work->creds); >>>>> >>>>> Where override_creds() returns previous creds. And if so, then the following >>>>> fast check looks strange: >>>>> >>>>> worker->creds != work->creds >>>> >>>> Don't care too much about the naming, but the logic does appear off. >>>> I'll take a look at both of these tonight, unless you beat me to it. >> >> Apparently, you're faster :) >> >>> >>> Testing this now, what a braino. >>> >>> diff --git a/fs/io-wq.c b/fs/io-wq.c >>> index ee49e8852d39..8fbbadf04cc3 100644 >>> --- a/fs/io-wq.c >>> +++ b/fs/io-wq.c >>> @@ -56,7 +56,8 @@ struct io_worker { >>> >>> struct rcu_head rcu; >>> struct mm_struct *mm; >>> - const struct cred *creds; >>> + const struct cred *cur_creds; >>> + const struct cred *saved_creds; >>> struct files_struct *restore_files; >>> }; >>> >>> @@ -135,9 +136,9 @@ static bool __io_worker_unuse(struct io_wqe *wqe, struct io_worker *worker) >>> { >>> bool dropped_lock = false; >>> >>> - if (worker->creds) { >>> - revert_creds(worker->creds); >>> - worker->creds = NULL; >>> + if (worker->saved_creds) { >>> + revert_creds(worker->saved_creds); >>> + worker->cur_creds = worker->saved_creds = NULL; >>> } >>> >>> if (current->files != worker->restore_files) { >>> @@ -424,10 +425,11 @@ static void io_wq_switch_mm(struct io_worker *worker, struct io_wq_work *work) >>> static void io_wq_switch_creds(struct io_worker *worker, >>> struct io_wq_work *work) >>> { >>> - if (worker->creds) >>> - revert_creds(worker->creds); >>> + if (worker->saved_creds) >>> + revert_creds(worker->saved_creds); >>> >>> - worker->creds = override_creds(work->creds); >>> + worker->saved_creds = override_creds(work->creds); >>> + worker->cur_creds = work->creds; >>> } >> >> How about as follows? rever_creds() is a bit heavier than put_creds(). >> >> static void io_wq_switch_creds(struct io_worker *worker, >> struct io_wq_work *work) >> { >> const struct cred *old_creds = override_creds(work->creds); >> >> if (worker->saved_creds) >> put_cred(old_creds); >> else >> worker->saved_creds = old; >> worker->cur_creds = work->creds; >> } > > Looks good to me, I'll fold. Actually, it doesn't clear current->cred then, which seems a bit.. unfortunate. Could be a source of issues. -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 0:10 ` Pavel Begunkov 2020-01-29 0:15 ` Jens Axboe @ 2020-01-29 0:20 ` Jens Axboe 2020-01-29 0:21 ` Pavel Begunkov 1 sibling, 1 reply; 48+ messages in thread From: Jens Axboe @ 2020-01-29 0:20 UTC (permalink / raw) To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List On 1/28/20 5:10 PM, Pavel Begunkov wrote: >>>> Checked out ("don't use static creds/mm assignments") >>>> >>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async >>>> request, but if (worker->creds != work->creds) it will never be put. >>> >>> Yeah I think you're right, that needs a bit of fixing up. >> > > Hmm, it seems it leaks it unconditionally, as it grabs in a ref in > override_creds(). > We grab one there, and an extra one. Then we drop one of them inline, and the other in __io_req_aux_free(). -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 0:20 ` Jens Axboe @ 2020-01-29 0:21 ` Pavel Begunkov 2020-01-29 0:24 ` Jens Axboe 0 siblings, 1 reply; 48+ messages in thread From: Pavel Begunkov @ 2020-01-29 0:21 UTC (permalink / raw) To: Jens Axboe, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List [-- Attachment #1.1: Type: text/plain, Size: 687 bytes --] On 29/01/2020 03:20, Jens Axboe wrote: > On 1/28/20 5:10 PM, Pavel Begunkov wrote: >>>>> Checked out ("don't use static creds/mm assignments") >>>>> >>>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async >>>>> request, but if (worker->creds != work->creds) it will never be put. >>>> >>>> Yeah I think you're right, that needs a bit of fixing up. >>> >> >> Hmm, it seems it leaks it unconditionally, as it grabs in a ref in >> override_creds(). >> > > We grab one there, and an extra one. Then we drop one of them inline, > and the other in __io_req_aux_free(). > Yeah, with the last patch it should make it even -- Pavel Begunkov [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 0:21 ` Pavel Begunkov @ 2020-01-29 0:24 ` Jens Axboe 2020-01-29 0:54 ` Jens Axboe 0 siblings, 1 reply; 48+ messages in thread From: Jens Axboe @ 2020-01-29 0:24 UTC (permalink / raw) To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List On 1/28/20 5:21 PM, Pavel Begunkov wrote: > On 29/01/2020 03:20, Jens Axboe wrote: >> On 1/28/20 5:10 PM, Pavel Begunkov wrote: >>>>>> Checked out ("don't use static creds/mm assignments") >>>>>> >>>>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async >>>>>> request, but if (worker->creds != work->creds) it will never be put. >>>>> >>>>> Yeah I think you're right, that needs a bit of fixing up. >>>> >>> >>> Hmm, it seems it leaks it unconditionally, as it grabs in a ref in >>> override_creds(). >>> >> >> We grab one there, and an extra one. Then we drop one of them inline, >> and the other in __io_req_aux_free(). >> > Yeah, with the last patch it should make it even OK good we agree on that. I should probably pull back that bit to the original patch to avoid having a hole in there... -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 0:24 ` Jens Axboe @ 2020-01-29 0:54 ` Jens Axboe 2020-01-29 10:17 ` Pavel Begunkov 0 siblings, 1 reply; 48+ messages in thread From: Jens Axboe @ 2020-01-29 0:54 UTC (permalink / raw) To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List On 1/28/20 5:24 PM, Jens Axboe wrote: > On 1/28/20 5:21 PM, Pavel Begunkov wrote: >> On 29/01/2020 03:20, Jens Axboe wrote: >>> On 1/28/20 5:10 PM, Pavel Begunkov wrote: >>>>>>> Checked out ("don't use static creds/mm assignments") >>>>>>> >>>>>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async >>>>>>> request, but if (worker->creds != work->creds) it will never be put. >>>>>> >>>>>> Yeah I think you're right, that needs a bit of fixing up. >>>>> >>>> >>>> Hmm, it seems it leaks it unconditionally, as it grabs in a ref in >>>> override_creds(). >>>> >>> >>> We grab one there, and an extra one. Then we drop one of them inline, >>> and the other in __io_req_aux_free(). >>> >> Yeah, with the last patch it should make it even > > OK good we agree on that. I should probably pull back that bit to the > original patch to avoid having a hole in there... Done -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 0:54 ` Jens Axboe @ 2020-01-29 10:17 ` Pavel Begunkov 2020-01-29 13:11 ` Stefan Metzmacher 0 siblings, 1 reply; 48+ messages in thread From: Pavel Begunkov @ 2020-01-29 10:17 UTC (permalink / raw) To: Jens Axboe, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List [-- Attachment #1.1: Type: text/plain, Size: 1416 bytes --] On 29/01/2020 03:54, Jens Axboe wrote: > On 1/28/20 5:24 PM, Jens Axboe wrote: >> On 1/28/20 5:21 PM, Pavel Begunkov wrote: >>> On 29/01/2020 03:20, Jens Axboe wrote: >>>> On 1/28/20 5:10 PM, Pavel Begunkov wrote: >>>>>>>> Checked out ("don't use static creds/mm assignments") >>>>>>>> >>>>>>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async >>>>>>>> request, but if (worker->creds != work->creds) it will never be put. >>>>>>> >>>>>>> Yeah I think you're right, that needs a bit of fixing up. >>>>>> >>>>> >>>>> Hmm, it seems it leaks it unconditionally, as it grabs in a ref in >>>>> override_creds(). >>>>> >>>> >>>> We grab one there, and an extra one. Then we drop one of them inline, >>>> and the other in __io_req_aux_free(). >>>> >>> Yeah, with the last patch it should make it even >> >> OK good we agree on that. I should probably pull back that bit to the >> original patch to avoid having a hole in there... > > Done > ("io_uring/io-wq: don't use static creds/mm assignments") and ("io_uring: support using a registered personality for commands") looks good now. Reviewed-by: Pavel Begunkov <[email protected]> BTW, why do use decided to use idr, but not linear buffer scheme as for the rest registered ones? We can rework it later for performance with linear search in the buffer upon registration. -- Pavel Begunkov [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 10:17 ` Pavel Begunkov @ 2020-01-29 13:11 ` Stefan Metzmacher 2020-01-29 13:41 ` Pavel Begunkov 0 siblings, 1 reply; 48+ messages in thread From: Stefan Metzmacher @ 2020-01-29 13:11 UTC (permalink / raw) To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Linux API Mailing List [-- Attachment #1.1: Type: text/plain, Size: 4113 bytes --] Am 29.01.20 um 11:17 schrieb Pavel Begunkov: > On 29/01/2020 03:54, Jens Axboe wrote: >> On 1/28/20 5:24 PM, Jens Axboe wrote: >>> On 1/28/20 5:21 PM, Pavel Begunkov wrote: >>>> On 29/01/2020 03:20, Jens Axboe wrote: >>>>> On 1/28/20 5:10 PM, Pavel Begunkov wrote: >>>>>>>>> Checked out ("don't use static creds/mm assignments") >>>>>>>>> >>>>>>>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async >>>>>>>>> request, but if (worker->creds != work->creds) it will never be put. >>>>>>>> >>>>>>>> Yeah I think you're right, that needs a bit of fixing up. >>>>>>> >>>>>> >>>>>> Hmm, it seems it leaks it unconditionally, as it grabs in a ref in >>>>>> override_creds(). >>>>>> >>>>> >>>>> We grab one there, and an extra one. Then we drop one of them inline, >>>>> and the other in __io_req_aux_free(). >>>>> >>>> Yeah, with the last patch it should make it even >>> >>> OK good we agree on that. I should probably pull back that bit to the >>> original patch to avoid having a hole in there... >> >> Done >> > > ("io_uring/io-wq: don't use static creds/mm assignments") and ("io_uring: > support using a registered personality for commands") looks good now. > > Reviewed-by: Pavel Begunkov <[email protected]> I'm very happy with the design, thanks! That exactly what I had in mind:-) It would also work with IORING_SETUP_SQPOLL, correct? However I think there're a few things to improve/simplify. > https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=a26d26412e1e1783473f9dc8f030c3af3d54b1a6 In fs/io_uring.c mmgrab() and get_current_cred() are used together in two places, why is put_cred() called in __io_req_aux_free while mmdrop() is called from io_put_work(). I think both should be called in io_put_work(), that makes the code much easier to understand. My guess is that you choose __io_req_aux_free() for put_cred() because of the following patches, but I'll explain on the other commit why it's not needed. > https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=d9db233adf034bd7855ba06190525e10a05868be A minor one would be starting with 1 instead of 0 and using idr_alloc_cyclic() in order to avoid immediate reuse of ids. That way we could include the id in the tracing message and 0 would mean the current creds were used. > +static int io_remove_personalities(int id, void *p, void *data) > +{ > + struct io_ring_ctx *ctx = data; > + > + idr_remove(&ctx->personality_idr, id); Here we need something like: put_creds((const struct cred *)p); > + return 0; > +} The io_uring_register() calles would look like this, correct? id = io_uring_register(ring_fd, IORING_REGISTER_PERSONALITY, NULL, 0); io_uring_register(ring_fd, IORING_UNREGISTER_PERSONALITY, NULL, id); > https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=eec9e69e0ad9ad364e1b6a5dfc52ad576afee235 > + > + if (sqe_flags & IOSQE_PERSONALITY) { > + int id = READ_ONCE(sqe->personality); > + > + req->work.creds = idr_find(&ctx->personality_idr, id); > + if (unlikely(!req->work.creds)) { > + ret = -EINVAL; > + goto err_req; > + } > + get_cred(req->work.creds);> + old_creds = override_creds(req->work.creds); > + } > + Here we could use a helper variable const struct cred *personality_creds; and leave req->work.creds as NULL. It means we can avoid the explicit get_cred() call and can skip the following hunk too: > @@ -3977,7 +3977,8 @@ static int io_req_defer_prep(struct io_kiocb *req, > mmgrab(current->mm); > req->work.mm = current->mm; > } > - req->work.creds = get_current_cred(); > + if (!req->work.creds) > + req->work.creds = get_current_cred(); > > switch (req->opcode) { > case IORING_OP_NOP: The override_creds(personality_creds) has changed current->cred and get_current_cred() will just pick it up as in the default case. This would make the patch much simpler and allows put_cred() to be in io_put_work() instead of __io_req_aux_free() as explained above. metze [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 13:11 ` Stefan Metzmacher @ 2020-01-29 13:41 ` Pavel Begunkov 2020-01-29 13:56 ` Stefan Metzmacher 0 siblings, 1 reply; 48+ messages in thread From: Pavel Begunkov @ 2020-01-29 13:41 UTC (permalink / raw) To: Stefan Metzmacher, Jens Axboe; +Cc: io-uring, Linux API Mailing List On 1/29/2020 4:11 PM, Stefan Metzmacher wrote: > Am 29.01.20 um 11:17 schrieb Pavel Begunkov: >> On 29/01/2020 03:54, Jens Axboe wrote: >>> On 1/28/20 5:24 PM, Jens Axboe wrote: >>>> On 1/28/20 5:21 PM, Pavel Begunkov wrote: >>>>> On 29/01/2020 03:20, Jens Axboe wrote: >>>>>> On 1/28/20 5:10 PM, Pavel Begunkov wrote: >>>>>>>>>> Checked out ("don't use static creds/mm assignments") >>>>>>>>>> >>>>>>>>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async >>>>>>>>>> request, but if (worker->creds != work->creds) it will never be put. >>>>>>>>> >>>>>>>>> Yeah I think you're right, that needs a bit of fixing up. >>>>>>>> >>>>>>> >>>>>>> Hmm, it seems it leaks it unconditionally, as it grabs in a ref in >>>>>>> override_creds(). >>>>>>> >>>>>> >>>>>> We grab one there, and an extra one. Then we drop one of them inline, >>>>>> and the other in __io_req_aux_free(). >>>>>> >>>>> Yeah, with the last patch it should make it even >>>> >>>> OK good we agree on that. I should probably pull back that bit to the >>>> original patch to avoid having a hole in there... >>> >>> Done >>> >> >> ("io_uring/io-wq: don't use static creds/mm assignments") and ("io_uring: >> support using a registered personality for commands") looks good now. >> >> Reviewed-by: Pavel Begunkov <[email protected]> > > > I'm very happy with the design, thanks! > That exactly what I had in mind:-) > > It would also work with IORING_SETUP_SQPOLL, correct? > Yep > However I think there're a few things to improve/simplify. > Since 5.6 is already semi-open, it'd be great to have an incremental patch for that. I'll retoss things as usual, if nobody do it before. >> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=a26d26412e1e1783473f9dc8f030c3af3d54b1a6 > > In fs/io_uring.c mmgrab() and get_current_cred() are used together in > two places, why is put_cred() called in __io_req_aux_free while > mmdrop() is called from io_put_work(). I think both should be called > in io_put_work(), that makes the code much easier to understand. > > My guess is that you choose __io_req_aux_free() for put_cred() because > of the following patches, but I'll explain on the other commit > why it's not needed. > >> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=d9db233adf034bd7855ba06190525e10a05868be > > A minor one would be starting with 1 instead of 0 and using > idr_alloc_cyclic() in order to avoid immediate reuse of ids. > That way we could include the id in the tracing message and > 0 would mean the current creds were used. > >> +static int io_remove_personalities(int id, void *p, void *data) >> +{ >> + struct io_ring_ctx *ctx = data; >> + >> + idr_remove(&ctx->personality_idr, id); > > Here we need something like: > put_creds((const struct cred *)p); Good catch > >> + return 0; >> +} > > > The io_uring_register() calles would look like this, correct? > > id = io_uring_register(ring_fd, IORING_REGISTER_PERSONALITY, NULL, 0); > io_uring_register(ring_fd, IORING_UNREGISTER_PERSONALITY, NULL, id); > >> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=eec9e69e0ad9ad364e1b6a5dfc52ad576afee235 >> + >> + if (sqe_flags & IOSQE_PERSONALITY) { >> + int id = READ_ONCE(sqe->personality); >> + >> + req->work.creds = idr_find(&ctx->personality_idr, id); >> + if (unlikely(!req->work.creds)) { >> + ret = -EINVAL; >> + goto err_req; >> + } >> + get_cred(req->work.creds);> + old_creds = override_creds(req->work.creds); >> + } >> + > > Here we could use a helper variable > const struct cred *personality_creds; > and leave req->work.creds as NULL. > It means we can avoid the explicit get_cred() call > and can skip the following hunk too: > >> @@ -3977,7 +3977,8 @@ static int io_req_defer_prep(struct io_kiocb *req, >> mmgrab(current->mm); >> req->work.mm = current->mm; >> } >> - req->work.creds = get_current_cred(); >> + if (!req->work.creds) >> + req->work.creds = get_current_cred(); >> >> switch (req->opcode) { >> case IORING_OP_NOP: > > The override_creds(personality_creds) has changed current->cred > and get_current_cred() will just pick it up as in the default case. > > This would make the patch much simpler and allows put_cred() to be > in io_put_work() instead of __io_req_aux_free() as explained above. > It's one extra get_current_cred(). I'd prefer to find another way to clean this up. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 13:41 ` Pavel Begunkov @ 2020-01-29 13:56 ` Stefan Metzmacher 2020-01-29 14:23 ` Pavel Begunkov 0 siblings, 1 reply; 48+ messages in thread From: Stefan Metzmacher @ 2020-01-29 13:56 UTC (permalink / raw) To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Linux API Mailing List [-- Attachment #1.1: Type: text/plain, Size: 5356 bytes --] Am 29.01.20 um 14:41 schrieb Pavel Begunkov: > On 1/29/2020 4:11 PM, Stefan Metzmacher wrote: >> Am 29.01.20 um 11:17 schrieb Pavel Begunkov: >>> On 29/01/2020 03:54, Jens Axboe wrote: >>>> On 1/28/20 5:24 PM, Jens Axboe wrote: >>>>> On 1/28/20 5:21 PM, Pavel Begunkov wrote: >>>>>> On 29/01/2020 03:20, Jens Axboe wrote: >>>>>>> On 1/28/20 5:10 PM, Pavel Begunkov wrote: >>>>>>>>>>> Checked out ("don't use static creds/mm assignments") >>>>>>>>>>> >>>>>>>>>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async >>>>>>>>>>> request, but if (worker->creds != work->creds) it will never be put. >>>>>>>>>> >>>>>>>>>> Yeah I think you're right, that needs a bit of fixing up. >>>>>>>>> >>>>>>>> >>>>>>>> Hmm, it seems it leaks it unconditionally, as it grabs in a ref in >>>>>>>> override_creds(). >>>>>>>> >>>>>>> >>>>>>> We grab one there, and an extra one. Then we drop one of them inline, >>>>>>> and the other in __io_req_aux_free(). >>>>>>> >>>>>> Yeah, with the last patch it should make it even >>>>> >>>>> OK good we agree on that. I should probably pull back that bit to the >>>>> original patch to avoid having a hole in there... >>>> >>>> Done >>>> >>> >>> ("io_uring/io-wq: don't use static creds/mm assignments") and ("io_uring: >>> support using a registered personality for commands") looks good now. >>> >>> Reviewed-by: Pavel Begunkov <[email protected]> >> >> >> I'm very happy with the design, thanks! >> That exactly what I had in mind:-) >> >> It would also work with IORING_SETUP_SQPOLL, correct? >> > > Yep > >> However I think there're a few things to improve/simplify. >> > > Since 5.6 is already semi-open, it'd be great to have an incremental > patch for that. I'll retoss things as usual, if nobody do it before. I'll wait for comments from Jens first:-) I guess we'll have things changed in his branch, when I wake up tomorrow. Otherwise I can also create patches and submit them. But I currently don't have an environment where I can do runtime tests with it. >>> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=a26d26412e1e1783473f9dc8f030c3af3d54b1a6 >> >> In fs/io_uring.c mmgrab() and get_current_cred() are used together in >> two places, why is put_cred() called in __io_req_aux_free while >> mmdrop() is called from io_put_work(). I think both should be called >> in io_put_work(), that makes the code much easier to understand. >> >> My guess is that you choose __io_req_aux_free() for put_cred() because >> of the following patches, but I'll explain on the other commit >> why it's not needed. >> >>> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=d9db233adf034bd7855ba06190525e10a05868be >> >> A minor one would be starting with 1 instead of 0 and using >> idr_alloc_cyclic() in order to avoid immediate reuse of ids. >> That way we could include the id in the tracing message and >> 0 would mean the current creds were used. >> >>> +static int io_remove_personalities(int id, void *p, void *data) >>> +{ >>> + struct io_ring_ctx *ctx = data; >>> + >>> + idr_remove(&ctx->personality_idr, id); >> >> Here we need something like: >> put_creds((const struct cred *)p); > > Good catch > >> >>> + return 0; >>> +} >> >> >> The io_uring_register() calles would look like this, correct? >> >> id = io_uring_register(ring_fd, IORING_REGISTER_PERSONALITY, NULL, 0); >> io_uring_register(ring_fd, IORING_UNREGISTER_PERSONALITY, NULL, id); >> >>> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=eec9e69e0ad9ad364e1b6a5dfc52ad576afee235 >>> + >>> + if (sqe_flags & IOSQE_PERSONALITY) { >>> + int id = READ_ONCE(sqe->personality); >>> + >>> + req->work.creds = idr_find(&ctx->personality_idr, id); >>> + if (unlikely(!req->work.creds)) { >>> + ret = -EINVAL; >>> + goto err_req; >>> + } >>> + get_cred(req->work.creds);> + old_creds = override_creds(req->work.creds); >>> + } >>> + >> >> Here we could use a helper variable >> const struct cred *personality_creds; >> and leave req->work.creds as NULL. >> It means we can avoid the explicit get_cred() call >> and can skip the following hunk too: >> >>> @@ -3977,7 +3977,8 @@ static int io_req_defer_prep(struct io_kiocb *req, >>> mmgrab(current->mm); >>> req->work.mm = current->mm; >>> } >>> - req->work.creds = get_current_cred(); >>> + if (!req->work.creds) >>> + req->work.creds = get_current_cred(); >>> >>> switch (req->opcode) { >>> case IORING_OP_NOP: >> >> The override_creds(personality_creds) has changed current->cred >> and get_current_cred() will just pick it up as in the default case. >> >> This would make the patch much simpler and allows put_cred() to be >> in io_put_work() instead of __io_req_aux_free() as explained above. >> > > It's one extra get_current_cred(). I'd prefer to find another way to > clean this up. As far as I can see it avoids a get_cred() in the IOSQE_PERSONALITY case and the if (!req->work.creds) for both cases. What do you mean exactly with one extra get_current_cred()? Is that any worse than calling get_cred() and having an if check? It also seems to avoid req->work.creds from being filled at all for the non-blocking case. metze [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 13:56 ` Stefan Metzmacher @ 2020-01-29 14:23 ` Pavel Begunkov 2020-01-29 14:27 ` Stefan Metzmacher 2020-01-29 17:34 ` Jens Axboe 0 siblings, 2 replies; 48+ messages in thread From: Pavel Begunkov @ 2020-01-29 14:23 UTC (permalink / raw) To: Stefan Metzmacher, Jens Axboe; +Cc: io-uring, Linux API Mailing List On 1/29/2020 4:56 PM, Stefan Metzmacher wrote: >>> However I think there're a few things to improve/simplify. >> Since 5.6 is already semi-open, it'd be great to have an incremental >> patch for that. I'll retoss things as usual, if nobody do it before. > > I'll wait for comments from Jens first:-) > I guess we'll have things changed in his branch, when I wake up > tomorrow. Otherwise I can also create patches and submit them. Sure, I won't get there any time soon. > > But I currently don't have an environment where I can do runtime tests > with it. > >>>> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=a26d26412e1e1783473f9dc8f030c3af3d54b1a6 >>> >>> In fs/io_uring.c mmgrab() and get_current_cred() are used together in >>> two places, why is put_cred() called in __io_req_aux_free while >>> mmdrop() is called from io_put_work(). I think both should be called >>> in io_put_work(), that makes the code much easier to understand. >>> >>> My guess is that you choose __io_req_aux_free() for put_cred() because >>> of the following patches, but I'll explain on the other commit >>> why it's not needed. >>> >>>> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=d9db233adf034bd7855ba06190525e10a05868be >>> >>> A minor one would be starting with 1 instead of 0 and using >>> idr_alloc_cyclic() in order to avoid immediate reuse of ids. >>> That way we could include the id in the tracing message and >>> 0 would mean the current creds were used. >>> >>>> +static int io_remove_personalities(int id, void *p, void *data) >>>> +{ >>>> + struct io_ring_ctx *ctx = data; >>>> + >>>> + idr_remove(&ctx->personality_idr, id); >>> >>> Here we need something like: >>> put_creds((const struct cred *)p); >> >> Good catch >> >>> >>>> + return 0; >>>> +} >>> >>> >>> The io_uring_register() calles would look like this, correct? >>> >>> id = io_uring_register(ring_fd, IORING_REGISTER_PERSONALITY, NULL, 0); >>> io_uring_register(ring_fd, IORING_UNREGISTER_PERSONALITY, NULL, id); >>> >>>> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=eec9e69e0ad9ad364e1b6a5dfc52ad576afee235 >>>> + >>>> + if (sqe_flags & IOSQE_PERSONALITY) { >>>> + int id = READ_ONCE(sqe->personality); >>>> + >>>> + req->work.creds = idr_find(&ctx->personality_idr, id); >>>> + if (unlikely(!req->work.creds)) { >>>> + ret = -EINVAL; >>>> + goto err_req; >>>> + } >>>> + get_cred(req->work.creds);> + old_creds = override_creds(req->work.creds); >>>> + } >>>> + >>> >>> Here we could use a helper variable >>> const struct cred *personality_creds; >>> and leave req->work.creds as NULL. >>> It means we can avoid the explicit get_cred() call >>> and can skip the following hunk too: >>> >>>> @@ -3977,7 +3977,8 @@ static int io_req_defer_prep(struct io_kiocb *req, >>>> mmgrab(current->mm); >>>> req->work.mm = current->mm; >>>> } >>>> - req->work.creds = get_current_cred(); >>>> + if (!req->work.creds) >>>> + req->work.creds = get_current_cred(); >>>> >>>> switch (req->opcode) { >>>> case IORING_OP_NOP: >>> >>> The override_creds(personality_creds) has changed current->cred >>> and get_current_cred() will just pick it up as in the default case. >>> >>> This would make the patch much simpler and allows put_cred() to be >>> in io_put_work() instead of __io_req_aux_free() as explained above. >>> >> >> It's one extra get_current_cred(). I'd prefer to find another way to >> clean this up. > > As far as I can see it avoids a get_cred() in the IOSQE_PERSONALITY case > and the if (!req->work.creds) for both cases. Great, that you turned attention to that! override_creds() is already grabbing a ref, so it shouldn't call get_cred() there. So, that's a bug. It could be I'm wrong with the statement above, need to recheck all this code to be sure. BTW, io_req_defer_prep() may be called twice for a req, so you will reassign it without putting a ref. It's safer to leave NULL checks. At least, until I've done reworking and fixing preparation paths. > > What do you mean exactly with one extra get_current_cred()? > Is that any worse than calling get_cred() and having an if check? > > It also seems to avoid req->work.creds from being filled at all > for the non-blocking case. > > metze > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 14:23 ` Pavel Begunkov @ 2020-01-29 14:27 ` Stefan Metzmacher 2020-01-29 14:34 ` Pavel Begunkov 2020-01-29 17:34 ` Jens Axboe 1 sibling, 1 reply; 48+ messages in thread From: Stefan Metzmacher @ 2020-01-29 14:27 UTC (permalink / raw) To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Linux API Mailing List [-- Attachment #1.1: Type: text/plain, Size: 1590 bytes --] Hi Pavel, >>>>> @@ -3977,7 +3977,8 @@ static int io_req_defer_prep(struct io_kiocb *req, >>>>> mmgrab(current->mm); >>>>> req->work.mm = current->mm; >>>>> } >>>>> - req->work.creds = get_current_cred(); >>>>> + if (!req->work.creds) >>>>> + req->work.creds = get_current_cred(); >>>>> >>>>> switch (req->opcode) { >>>>> case IORING_OP_NOP: >>>> >>>> The override_creds(personality_creds) has changed current->cred >>>> and get_current_cred() will just pick it up as in the default case. >>>> >>>> This would make the patch much simpler and allows put_cred() to be >>>> in io_put_work() instead of __io_req_aux_free() as explained above. >>>> >>> >>> It's one extra get_current_cred(). I'd prefer to find another way to >>> clean this up. >> >> As far as I can see it avoids a get_cred() in the IOSQE_PERSONALITY case >> and the if (!req->work.creds) for both cases. > > Great, that you turned attention to that! override_creds() is already > grabbing a ref, so it shouldn't call get_cred() there. > So, that's a bug. > > It could be I'm wrong with the statement above, need to recheck all this > code to be sure. > > BTW, io_req_defer_prep() may be called twice for a req, so you will > reassign it without putting a ref. It's safer to leave NULL checks. At > least, until I've done reworking and fixing preparation paths. Ok, but that would be already a bug in "io_uring/io-wq: don't use static creds/mm assignments" instead of logically being part of "io_uring: support using a registered personality for commands" metze [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 14:27 ` Stefan Metzmacher @ 2020-01-29 14:34 ` Pavel Begunkov 0 siblings, 0 replies; 48+ messages in thread From: Pavel Begunkov @ 2020-01-29 14:34 UTC (permalink / raw) To: Stefan Metzmacher, Jens Axboe; +Cc: io-uring, Linux API Mailing List On 1/29/2020 5:27 PM, Stefan Metzmacher wrote: >> Great, that you turned attention to that! override_creds() is already >> grabbing a ref, so it shouldn't call get_cred() there. >> So, that's a bug. >> >> It could be I'm wrong with the statement above, need to recheck all this >> code to be sure. >> >> BTW, io_req_defer_prep() may be called twice for a req, so you will >> reassign it without putting a ref. It's safer to leave NULL checks. At >> least, until I've done reworking and fixing preparation paths. > > Ok, but that would be already a bug in > "io_uring/io-wq: don't use static creds/mm assignments" > instead of logically being part of > "io_uring: support using a registered personality for commands" Right. Probably should be backported there, since it's just 2 commits before. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 14:23 ` Pavel Begunkov 2020-01-29 14:27 ` Stefan Metzmacher @ 2020-01-29 17:34 ` Jens Axboe 2020-01-29 17:42 ` Jens Axboe 2020-01-29 17:46 ` Pavel Begunkov 1 sibling, 2 replies; 48+ messages in thread From: Jens Axboe @ 2020-01-29 17:34 UTC (permalink / raw) To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List On 1/29/20 7:23 AM, Pavel Begunkov wrote: >>>> The override_creds(personality_creds) has changed current->cred >>>> and get_current_cred() will just pick it up as in the default case. >>>> >>>> This would make the patch much simpler and allows put_cred() to be >>>> in io_put_work() instead of __io_req_aux_free() as explained above. >>>> >>> >>> It's one extra get_current_cred(). I'd prefer to find another way to >>> clean this up. >> >> As far as I can see it avoids a get_cred() in the IOSQE_PERSONALITY case >> and the if (!req->work.creds) for both cases. > > Great, that you turned attention to that! override_creds() is already > grabbing a ref, so it shouldn't call get_cred() there. > So, that's a bug. It's not though - one is dropped in that function, the other when the request is freed. So we do need two references to it. With the proposed change to keep the override_creds() variable local for that spot we don't, and the get_cred() can then go. > It could be I'm wrong with the statement above, need to recheck all this > code to be sure. I think you are :-) > BTW, io_req_defer_prep() may be called twice for a req, so you will > reassign it without putting a ref. It's safer to leave NULL checks. At > least, until I've done reworking and fixing preparation paths. Agree, the NULL checks are safer and we should keep them. Going through the rest of this thread, I'm making the following changes: - ID must be > 0. I like that change, as we don't need an sqe flag to select personality then, and it also makes it obvious that id == 0 is just using current creds. - Fixed the missing put_cred() in the teardown - Use a local variable in io_submit_sqe() instead of assigning the creds to req->work.creds there - Use cyclic idr allocation I'm going to fold in as appropriate. If there are fixes needed on top of that, let's do them separately. -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 17:34 ` Jens Axboe @ 2020-01-29 17:42 ` Jens Axboe 2020-01-29 20:09 ` Stefan Metzmacher 2020-01-29 17:46 ` Pavel Begunkov 1 sibling, 1 reply; 48+ messages in thread From: Jens Axboe @ 2020-01-29 17:42 UTC (permalink / raw) To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List On 1/29/20 10:34 AM, Jens Axboe wrote: > On 1/29/20 7:23 AM, Pavel Begunkov wrote: >>>>> The override_creds(personality_creds) has changed current->cred >>>>> and get_current_cred() will just pick it up as in the default case. >>>>> >>>>> This would make the patch much simpler and allows put_cred() to be >>>>> in io_put_work() instead of __io_req_aux_free() as explained above. >>>>> >>>> >>>> It's one extra get_current_cred(). I'd prefer to find another way to >>>> clean this up. >>> >>> As far as I can see it avoids a get_cred() in the IOSQE_PERSONALITY case >>> and the if (!req->work.creds) for both cases. >> >> Great, that you turned attention to that! override_creds() is already >> grabbing a ref, so it shouldn't call get_cred() there. >> So, that's a bug. > > It's not though - one is dropped in that function, the other when the > request is freed. So we do need two references to it. With the proposed > change to keep the override_creds() variable local for that spot we > don't, and the get_cred() can then go. > >> It could be I'm wrong with the statement above, need to recheck all this >> code to be sure. > > I think you are :-) > >> BTW, io_req_defer_prep() may be called twice for a req, so you will >> reassign it without putting a ref. It's safer to leave NULL checks. At >> least, until I've done reworking and fixing preparation paths. > > Agree, the NULL checks are safer and we should keep them. > > Going through the rest of this thread, I'm making the following changes: > > - ID must be > 0. I like that change, as we don't need an sqe flag to > select personality then, and it also makes it obvious that id == 0 is > just using current creds. > > - Fixed the missing put_cred() in the teardown > > - Use a local variable in io_submit_sqe() instead of assigning the > creds to req->work.creds there > > - Use cyclic idr allocation > > I'm going to fold in as appropriate. If there are fixes needed on top of > that, let's do them separately. In particular, would love a patch that only assigns req->work.creds if we do go async, so we can leave the put_cred() in io_put_work() instead of needing it in __io_req_aux_free(). -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 17:42 ` Jens Axboe @ 2020-01-29 20:09 ` Stefan Metzmacher 2020-01-29 20:48 ` Jens Axboe 0 siblings, 1 reply; 48+ messages in thread From: Stefan Metzmacher @ 2020-01-29 20:09 UTC (permalink / raw) To: Jens Axboe, Pavel Begunkov; +Cc: io-uring, Linux API Mailing List Am 29.01.20 um 18:42 schrieb Jens Axboe: > On 1/29/20 10:34 AM, Jens Axboe wrote: >> On 1/29/20 7:23 AM, Pavel Begunkov wrote: >>>>>> The override_creds(personality_creds) has changed current->cred >>>>>> and get_current_cred() will just pick it up as in the default case. >>>>>> >>>>>> This would make the patch much simpler and allows put_cred() to be >>>>>> in io_put_work() instead of __io_req_aux_free() as explained above. >>>>>> >>>>> >>>>> It's one extra get_current_cred(). I'd prefer to find another way to >>>>> clean this up. >>>> >>>> As far as I can see it avoids a get_cred() in the IOSQE_PERSONALITY case >>>> and the if (!req->work.creds) for both cases. >>> >>> Great, that you turned attention to that! override_creds() is already >>> grabbing a ref, so it shouldn't call get_cred() there. >>> So, that's a bug. >> >> It's not though - one is dropped in that function, the other when the >> request is freed. So we do need two references to it. With the proposed >> change to keep the override_creds() variable local for that spot we >> don't, and the get_cred() can then go. >> >>> It could be I'm wrong with the statement above, need to recheck all this >>> code to be sure. >> >> I think you are :-) >> >>> BTW, io_req_defer_prep() may be called twice for a req, so you will >>> reassign it without putting a ref. It's safer to leave NULL checks. At >>> least, until I've done reworking and fixing preparation paths. >> >> Agree, the NULL checks are safer and we should keep them. >> >> Going through the rest of this thread, I'm making the following changes: >> >> - ID must be > 0. I like that change, as we don't need an sqe flag to >> select personality then, and it also makes it obvious that id == 0 is >> just using current creds. >> >> - Fixed the missing put_cred() in the teardown >> >> - Use a local variable in io_submit_sqe() instead of assigning the >> creds to req->work.creds there >> >> - Use cyclic idr allocation >> >> I'm going to fold in as appropriate. If there are fixes needed on top of >> that, let's do them separately. > > In particular, would love a patch that only assigns req->work.creds if > we do go async, so we can leave the put_cred() in io_put_work() > instead of needing it in __io_req_aux_free(). I made some improvements here: https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/for-5.6/io_uring-vfs Feel free to squash https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=ce8812c9b935467bb08ed4d528dd92b9f67e221c into https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=22021e95e73d4658a6c834a3276886e127ab8425 and add my review to it. If you're confident that it's safe you can call io_req_work_drop_env() also in io_put_work(), metze ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 20:09 ` Stefan Metzmacher @ 2020-01-29 20:48 ` Jens Axboe 0 siblings, 0 replies; 48+ messages in thread From: Jens Axboe @ 2020-01-29 20:48 UTC (permalink / raw) To: Stefan Metzmacher, Pavel Begunkov; +Cc: io-uring, Linux API Mailing List On 1/29/20 1:09 PM, Stefan Metzmacher wrote: > Am 29.01.20 um 18:42 schrieb Jens Axboe: >> On 1/29/20 10:34 AM, Jens Axboe wrote: >>> On 1/29/20 7:23 AM, Pavel Begunkov wrote: >>>>>>> The override_creds(personality_creds) has changed current->cred >>>>>>> and get_current_cred() will just pick it up as in the default case. >>>>>>> >>>>>>> This would make the patch much simpler and allows put_cred() to be >>>>>>> in io_put_work() instead of __io_req_aux_free() as explained above. >>>>>>> >>>>>> >>>>>> It's one extra get_current_cred(). I'd prefer to find another way to >>>>>> clean this up. >>>>> >>>>> As far as I can see it avoids a get_cred() in the IOSQE_PERSONALITY case >>>>> and the if (!req->work.creds) for both cases. >>>> >>>> Great, that you turned attention to that! override_creds() is already >>>> grabbing a ref, so it shouldn't call get_cred() there. >>>> So, that's a bug. >>> >>> It's not though - one is dropped in that function, the other when the >>> request is freed. So we do need two references to it. With the proposed >>> change to keep the override_creds() variable local for that spot we >>> don't, and the get_cred() can then go. >>> >>>> It could be I'm wrong with the statement above, need to recheck all this >>>> code to be sure. >>> >>> I think you are :-) >>> >>>> BTW, io_req_defer_prep() may be called twice for a req, so you will >>>> reassign it without putting a ref. It's safer to leave NULL checks. At >>>> least, until I've done reworking and fixing preparation paths. >>> >>> Agree, the NULL checks are safer and we should keep them. >>> >>> Going through the rest of this thread, I'm making the following changes: >>> >>> - ID must be > 0. I like that change, as we don't need an sqe flag to >>> select personality then, and it also makes it obvious that id == 0 is >>> just using current creds. >>> >>> - Fixed the missing put_cred() in the teardown >>> >>> - Use a local variable in io_submit_sqe() instead of assigning the >>> creds to req->work.creds there >>> >>> - Use cyclic idr allocation >>> >>> I'm going to fold in as appropriate. If there are fixes needed on top of >>> that, let's do them separately. >> >> In particular, would love a patch that only assigns req->work.creds if >> we do go async, so we can leave the put_cred() in io_put_work() >> instead of needing it in __io_req_aux_free(). > > I made some improvements here: > > https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/for-5.6/io_uring-vfs > > Feel free to squash > https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=ce8812c9b935467bb08ed4d528dd92b9f67e221c > into > https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=22021e95e73d4658a6c834a3276886e127ab8425 > and add my review to it. Looks good to me, folded. Thanks! > If you're confident that it's safe you can > call io_req_work_drop_env() also in io_put_work(), Let's look into that later, want to flush this out sooner rather than later... -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 17:34 ` Jens Axboe 2020-01-29 17:42 ` Jens Axboe @ 2020-01-29 17:46 ` Pavel Begunkov 1 sibling, 0 replies; 48+ messages in thread From: Pavel Begunkov @ 2020-01-29 17:46 UTC (permalink / raw) To: Jens Axboe, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List On 1/29/2020 8:34 PM, Jens Axboe wrote: > On 1/29/20 7:23 AM, Pavel Begunkov wrote: >>>>> The override_creds(personality_creds) has changed current->cred >>>>> and get_current_cred() will just pick it up as in the default case. >>>>> >>>>> This would make the patch much simpler and allows put_cred() to be >>>>> in io_put_work() instead of __io_req_aux_free() as explained above. >>>>> >>>> >>>> It's one extra get_current_cred(). I'd prefer to find another way to >>>> clean this up. >>> >>> As far as I can see it avoids a get_cred() in the IOSQE_PERSONALITY case >>> and the if (!req->work.creds) for both cases. >> >> Great, that you turned attention to that! override_creds() is already >> grabbing a ref, so it shouldn't call get_cred() there. >> So, that's a bug. > > It's not though - one is dropped in that function, the other when the > request is freed. So we do need two references to it. With the proposed > change to keep the override_creds() variable local for that spot we > don't, and the get_cred() can then go. You're right here. It seems, it was too much looking at the same code :) > >> It could be I'm wrong with the statement above, need to recheck all this >> code to be sure. > > I think you are :-) Considering above, there shouldn't be much difference indeed. One extra rcu_dereference() in get_current_creds() instead of get_cred(), but that's nothing. Later we can hide one get by using submission state from the long patchset, I sent a while ago. > >> BTW, io_req_defer_prep() may be called twice for a req, so you will >> reassign it without putting a ref. It's safer to leave NULL checks. At >> least, until I've done reworking and fixing preparation paths. > > Agree, the NULL checks are safer and we should keep them. > > Going through the rest of this thread, I'm making the following changes: > > - ID must be > 0. I like that change, as we don't need an sqe flag to > select personality then, and it also makes it obvious that id == 0 is > just using current creds. > > - Fixed the missing put_cred() in the teardown > > - Use a local variable in io_submit_sqe() instead of assigning the > creds to req->work.creds there > > - Use cyclic idr allocation > > I'm going to fold in as appropriate. If there are fixes needed on top of > that, let's do them separately. > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-28 19:42 ` Jens Axboe 2020-01-28 20:16 ` Pavel Begunkov 2020-01-28 23:36 ` Pavel Begunkov @ 2020-01-29 14:59 ` Jann Horn 2020-01-29 17:34 ` Jens Axboe 2 siblings, 1 reply; 48+ messages in thread From: Jann Horn @ 2020-01-29 14:59 UTC (permalink / raw) To: Jens Axboe Cc: Stefan Metzmacher, io-uring, Linux API Mailing List, Pavel Begunkov On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <[email protected]> wrote: > On 1/28/20 11:04 AM, Jens Axboe wrote: > > On 1/28/20 10:19 AM, Jens Axboe wrote: [...] > >> #1 adds support for registering the personality of the invoking task, > >> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to > >> just having one link, it doesn't support a chain of them. [...] > I didn't like it becoming a bit too complicated, both in terms of > implementation and use. And the fact that we'd have to jump through > hoops to make this work for a full chain. > > So I punted and just added sqe->personality and IOSQE_PERSONALITY. > This makes it way easier to use. Same branch: > > https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds > > I'd feel much better with this variant for 5.6. Some general feedback from an inspectability/debuggability perspective: At some point, it might be nice if you could add a .show_fdinfo handler to the io_uring_fops that makes it possible to get a rough overview over the state of the uring by reading /proc/$pid/fdinfo/$fd, just like e.g. eventfd (see eventfd_show_fdinfo()). It might be helpful for debugging to be able to see information about the fixed files and buffers that have been registered. Same for the personalities; that information might also be useful when someone is trying to figure out what privileges a running process actually has. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 14:59 ` Jann Horn @ 2020-01-29 17:34 ` Jens Axboe 2020-01-30 1:08 ` Jens Axboe 0 siblings, 1 reply; 48+ messages in thread From: Jens Axboe @ 2020-01-29 17:34 UTC (permalink / raw) To: Jann Horn Cc: Stefan Metzmacher, io-uring, Linux API Mailing List, Pavel Begunkov On 1/29/20 7:59 AM, Jann Horn wrote: > On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <[email protected]> wrote: >> On 1/28/20 11:04 AM, Jens Axboe wrote: >>> On 1/28/20 10:19 AM, Jens Axboe wrote: > [...] >>>> #1 adds support for registering the personality of the invoking task, >>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to >>>> just having one link, it doesn't support a chain of them. > [...] >> I didn't like it becoming a bit too complicated, both in terms of >> implementation and use. And the fact that we'd have to jump through >> hoops to make this work for a full chain. >> >> So I punted and just added sqe->personality and IOSQE_PERSONALITY. >> This makes it way easier to use. Same branch: >> >> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >> >> I'd feel much better with this variant for 5.6. > > Some general feedback from an inspectability/debuggability perspective: > > At some point, it might be nice if you could add a .show_fdinfo > handler to the io_uring_fops that makes it possible to get a rough > overview over the state of the uring by reading /proc/$pid/fdinfo/$fd, > just like e.g. eventfd (see eventfd_show_fdinfo()). It might be > helpful for debugging to be able to see information about the fixed > files and buffers that have been registered. Same for the > personalities; that information might also be useful when someone is > trying to figure out what privileges a running process actually has. Agree, that would be a very useful addition. I'll take a look at it. -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-29 17:34 ` Jens Axboe @ 2020-01-30 1:08 ` Jens Axboe 2020-01-30 2:20 ` Jens Axboe ` (2 more replies) 0 siblings, 3 replies; 48+ messages in thread From: Jens Axboe @ 2020-01-30 1:08 UTC (permalink / raw) To: Jann Horn Cc: Stefan Metzmacher, io-uring, Linux API Mailing List, Pavel Begunkov On 1/29/20 10:34 AM, Jens Axboe wrote: > On 1/29/20 7:59 AM, Jann Horn wrote: >> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <[email protected]> wrote: >>> On 1/28/20 11:04 AM, Jens Axboe wrote: >>>> On 1/28/20 10:19 AM, Jens Axboe wrote: >> [...] >>>>> #1 adds support for registering the personality of the invoking task, >>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to >>>>> just having one link, it doesn't support a chain of them. >> [...] >>> I didn't like it becoming a bit too complicated, both in terms of >>> implementation and use. And the fact that we'd have to jump through >>> hoops to make this work for a full chain. >>> >>> So I punted and just added sqe->personality and IOSQE_PERSONALITY. >>> This makes it way easier to use. Same branch: >>> >>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >>> >>> I'd feel much better with this variant for 5.6. >> >> Some general feedback from an inspectability/debuggability perspective: >> >> At some point, it might be nice if you could add a .show_fdinfo >> handler to the io_uring_fops that makes it possible to get a rough >> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd, >> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be >> helpful for debugging to be able to see information about the fixed >> files and buffers that have been registered. Same for the >> personalities; that information might also be useful when someone is >> trying to figure out what privileges a running process actually has. > > Agree, that would be a very useful addition. I'll take a look at it. Jann, how much info are you looking for? Here's a rough start, just shows the number of registered files and buffers, and lists the personalities registered. We could also dump the buffer info for each of them, and ditto for the files. Not sure how much verbosity is acceptable in fdinfo? Here's the test app for personality: # cat 3 pos: 0 flags: 02000002 mnt_id: 14 user-files: 0 user-bufs: 0 personalities: 1: uid=0/gid=0 diff --git a/fs/io_uring.c b/fs/io_uring.c index c5ca84a305d3..0b2c7d800297 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -6511,6 +6505,45 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, return submitted ? submitted : ret; } +struct ring_show_idr { + struct io_ring_ctx *ctx; + struct seq_file *m; +}; + +static int io_uring_show_cred(int id, void *p, void *data) +{ + struct ring_show_idr *r = data; + const struct cred *cred = p; + + seq_printf(r->m, "\t%5d: uid=%u/gid=%u\n", id, cred->uid.val, + cred->gid.val); + return 0; +} + +static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m) +{ + struct ring_show_idr r = { .ctx = ctx, .m = m }; + + mutex_lock(&ctx->uring_lock); + seq_printf(m, "user-files: %d\n", ctx->nr_user_files); + seq_printf(m, "user-bufs: %d\n", ctx->nr_user_bufs); + if (!idr_is_empty(&ctx->personality_idr)) { + seq_printf(m, "personalities:\n"); + idr_for_each(&ctx->personality_idr, io_uring_show_cred, &r); + } + mutex_unlock(&ctx->uring_lock); +} + +static void io_uring_show_fdinfo(struct seq_file *m, struct file *f) +{ + struct io_ring_ctx *ctx = f->private_data; + + if (percpu_ref_tryget(&ctx->refs)) { + __io_uring_show_fdinfo(ctx, m); + percpu_ref_put(&ctx->refs); + } +} + static const struct file_operations io_uring_fops = { .release = io_uring_release, .flush = io_uring_flush, @@ -6521,6 +6554,7 @@ static const struct file_operations io_uring_fops = { #endif .poll = io_uring_poll, .fasync = io_uring_fasync, + .show_fdinfo = io_uring_show_fdinfo, }; static int io_allocate_scq_urings(struct io_ring_ctx *ctx, -- Jens Axboe ^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-30 1:08 ` Jens Axboe @ 2020-01-30 2:20 ` Jens Axboe 2020-01-30 3:18 ` Jens Axboe 2020-01-30 6:53 ` Stefan Metzmacher 2020-01-30 10:11 ` Jann Horn 2 siblings, 1 reply; 48+ messages in thread From: Jens Axboe @ 2020-01-30 2:20 UTC (permalink / raw) To: Jann Horn Cc: Stefan Metzmacher, io-uring, Linux API Mailing List, Pavel Begunkov On 1/29/20 6:08 PM, Jens Axboe wrote: > On 1/29/20 10:34 AM, Jens Axboe wrote: >> On 1/29/20 7:59 AM, Jann Horn wrote: >>> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <[email protected]> wrote: >>>> On 1/28/20 11:04 AM, Jens Axboe wrote: >>>>> On 1/28/20 10:19 AM, Jens Axboe wrote: >>> [...] >>>>>> #1 adds support for registering the personality of the invoking task, >>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to >>>>>> just having one link, it doesn't support a chain of them. >>> [...] >>>> I didn't like it becoming a bit too complicated, both in terms of >>>> implementation and use. And the fact that we'd have to jump through >>>> hoops to make this work for a full chain. >>>> >>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY. >>>> This makes it way easier to use. Same branch: >>>> >>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >>>> >>>> I'd feel much better with this variant for 5.6. >>> >>> Some general feedback from an inspectability/debuggability perspective: >>> >>> At some point, it might be nice if you could add a .show_fdinfo >>> handler to the io_uring_fops that makes it possible to get a rough >>> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd, >>> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be >>> helpful for debugging to be able to see information about the fixed >>> files and buffers that have been registered. Same for the >>> personalities; that information might also be useful when someone is >>> trying to figure out what privileges a running process actually has. >> >> Agree, that would be a very useful addition. I'll take a look at it. > > Jann, how much info are you looking for? Here's a rough start, just > shows the number of registered files and buffers, and lists the > personalities registered. We could also dump the buffer info for > each of them, and ditto for the files. Not sure how much verbosity > is acceptable in fdinfo? > > Here's the test app for personality: > > # cat 3 > pos: 0 > flags: 02000002 > mnt_id: 14 > user-files: 0 > user-bufs: 0 > personalities: > 1: uid=0/gid=0 Here's one that adds the registered buffers and files as well. So essentially this shows any info on the registered parts. diff --git a/fs/io_uring.c b/fs/io_uring.c index c5ca84a305d3..e306691bc7a4 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -6511,6 +6505,55 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, return submitted ? submitted : ret; } +static int io_uring_show_cred(int id, void *p, void *data) +{ + const struct cred *cred = p; + struct seq_file *m = data; + + seq_printf(m, "%5d: uid=%u/gid=%u\n", id, cred->uid.val, cred->gid.val); + return 0; +} + +static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m) +{ + int i; + + mutex_lock(&ctx->uring_lock); + seq_printf(m, "user-files: %d\n", ctx->nr_user_files); + for (i = 0; i < ctx->nr_user_files; i++) { + struct fixed_file_table *table; + struct file *f; + + table = &ctx->file_data->table[i >> IORING_FILE_TABLE_SHIFT]; + f = table->files[i & IORING_FILE_TABLE_MASK]; + if (f) + seq_printf(m, "%5d: %s\n", i, file_dentry(f)->d_iname); + else + seq_printf(m, "%5d: <none>\n", i); + } + seq_printf(m, "user-bufs: %d\n", ctx->nr_user_bufs); + for (i = 0; i < ctx->nr_user_bufs; i++) { + struct io_mapped_ubuf *buf = &ctx->user_bufs[i]; + + seq_printf(m, "%5d: 0x%llx/%lu\n", i, buf->ubuf, buf->len); + } + if (!idr_is_empty(&ctx->personality_idr)) { + seq_printf(m, "personalities:\n"); + idr_for_each(&ctx->personality_idr, io_uring_show_cred, m); + } + mutex_unlock(&ctx->uring_lock); +} + +static void io_uring_show_fdinfo(struct seq_file *m, struct file *f) +{ + struct io_ring_ctx *ctx = f->private_data; + + if (percpu_ref_tryget(&ctx->refs)) { + __io_uring_show_fdinfo(ctx, m); + percpu_ref_put(&ctx->refs); + } +} + static const struct file_operations io_uring_fops = { .release = io_uring_release, .flush = io_uring_flush, @@ -6521,6 +6564,7 @@ static const struct file_operations io_uring_fops = { #endif .poll = io_uring_poll, .fasync = io_uring_fasync, + .show_fdinfo = io_uring_show_fdinfo, }; static int io_allocate_scq_urings(struct io_ring_ctx *ctx, -- Jens Axboe ^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-30 2:20 ` Jens Axboe @ 2020-01-30 3:18 ` Jens Axboe 0 siblings, 0 replies; 48+ messages in thread From: Jens Axboe @ 2020-01-30 3:18 UTC (permalink / raw) To: Jann Horn Cc: Stefan Metzmacher, io-uring, Linux API Mailing List, Pavel Begunkov On 1/29/20 7:20 PM, Jens Axboe wrote: > On 1/29/20 6:08 PM, Jens Axboe wrote: >> On 1/29/20 10:34 AM, Jens Axboe wrote: >>> On 1/29/20 7:59 AM, Jann Horn wrote: >>>> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <[email protected]> wrote: >>>>> On 1/28/20 11:04 AM, Jens Axboe wrote: >>>>>> On 1/28/20 10:19 AM, Jens Axboe wrote: >>>> [...] >>>>>>> #1 adds support for registering the personality of the invoking task, >>>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to >>>>>>> just having one link, it doesn't support a chain of them. >>>> [...] >>>>> I didn't like it becoming a bit too complicated, both in terms of >>>>> implementation and use. And the fact that we'd have to jump through >>>>> hoops to make this work for a full chain. >>>>> >>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY. >>>>> This makes it way easier to use. Same branch: >>>>> >>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >>>>> >>>>> I'd feel much better with this variant for 5.6. >>>> >>>> Some general feedback from an inspectability/debuggability perspective: >>>> >>>> At some point, it might be nice if you could add a .show_fdinfo >>>> handler to the io_uring_fops that makes it possible to get a rough >>>> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd, >>>> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be >>>> helpful for debugging to be able to see information about the fixed >>>> files and buffers that have been registered. Same for the >>>> personalities; that information might also be useful when someone is >>>> trying to figure out what privileges a running process actually has. >>> >>> Agree, that would be a very useful addition. I'll take a look at it. >> >> Jann, how much info are you looking for? Here's a rough start, just >> shows the number of registered files and buffers, and lists the >> personalities registered. We could also dump the buffer info for >> each of them, and ditto for the files. Not sure how much verbosity >> is acceptable in fdinfo? >> >> Here's the test app for personality: >> >> # cat 3 >> pos: 0 >> flags: 02000002 >> mnt_id: 14 >> user-files: 0 >> user-bufs: 0 >> personalities: >> 1: uid=0/gid=0 > > Here's one that adds the registered buffers and files as well. So > essentially this shows any info on the registered parts. Output of this version, using t/io_uring from fio: # cat /proc/338/fdinfo/4 pos: 0 flags: 02000002 mnt_id: 14 user-files: 1 0: nvme0n1p2 user-bufs: 128 0: 55ea44b52000/4096 1: 55ea44b54000/4096 2: 55ea44b56000/4096 3: 55ea44b58000/4096 4: 55ea44b5a000/4096 5: 55ea44b5c000/4096 6: 55ea44b5e000/4096 7: 55ea44b60000/4096 8: 55ea44b62000/4096 9: 55ea44b64000/4096 10: 55ea44b66000/4096 11: 55ea44b68000/4096 12: 55ea44b6a000/4096 13: 55ea44b6c000/4096 14: 55ea44b6e000/4096 15: 55ea44b70000/4096 16: 55ea44b72000/4096 17: 55ea44b74000/4096 18: 55ea44b76000/4096 19: 55ea44b78000/4096 20: 55ea44b7a000/4096 21: 55ea44b7c000/4096 22: 55ea44b7e000/4096 23: 55ea44b80000/4096 24: 55ea44b82000/4096 25: 55ea44b84000/4096 26: 55ea44b86000/4096 27: 55ea44b88000/4096 28: 55ea44b8a000/4096 29: 55ea44b8c000/4096 30: 55ea44b8e000/4096 31: 55ea44b90000/4096 32: 55ea44b92000/4096 33: 55ea44b94000/4096 34: 55ea44b96000/4096 35: 55ea44b98000/4096 36: 55ea44b9a000/4096 37: 55ea44b9c000/4096 38: 55ea44b9e000/4096 39: 55ea44ba0000/4096 40: 55ea44ba2000/4096 41: 55ea44ba4000/4096 42: 55ea44ba6000/4096 43: 55ea44ba8000/4096 44: 55ea44baa000/4096 45: 55ea44bac000/4096 46: 55ea44bae000/4096 47: 55ea44bb0000/4096 48: 55ea44bb2000/4096 49: 55ea44bb4000/4096 50: 55ea44bb6000/4096 51: 55ea44bb8000/4096 52: 55ea44bba000/4096 53: 55ea44bbc000/4096 54: 55ea44bbe000/4096 55: 55ea44bc0000/4096 56: 55ea44bc2000/4096 57: 55ea44bc4000/4096 58: 55ea44bc6000/4096 59: 55ea44bc8000/4096 60: 55ea44bca000/4096 61: 55ea44bcc000/4096 62: 55ea44bce000/4096 63: 55ea44bd0000/4096 64: 55ea44bd2000/4096 65: 55ea44bd4000/4096 66: 55ea44bd6000/4096 67: 55ea44bd8000/4096 68: 55ea44bda000/4096 69: 55ea44bdc000/4096 70: 55ea44bde000/4096 71: 55ea44be0000/4096 72: 55ea44be2000/4096 73: 55ea44be4000/4096 74: 55ea44be6000/4096 75: 55ea44be8000/4096 76: 55ea44bea000/4096 77: 55ea44bec000/4096 78: 55ea44bee000/4096 79: 55ea44bf0000/4096 80: 55ea44bf2000/4096 81: 55ea44bf4000/4096 82: 55ea44bf6000/4096 83: 55ea44bf8000/4096 84: 55ea44bfa000/4096 85: 55ea44bfc000/4096 86: 55ea44bfe000/4096 87: 55ea44c00000/4096 88: 55ea44c02000/4096 89: 55ea44c04000/4096 90: 55ea44c06000/4096 91: 55ea44c08000/4096 92: 55ea44c0a000/4096 93: 55ea44c0c000/4096 94: 55ea44c0e000/4096 95: 55ea44c10000/4096 96: 55ea44c12000/4096 97: 55ea44c14000/4096 98: 55ea44c16000/4096 99: 55ea44c18000/4096 100: 55ea44c1a000/4096 101: 55ea44c1c000/4096 102: 55ea44c1e000/4096 103: 55ea44c20000/4096 104: 55ea44c22000/4096 105: 55ea44c24000/4096 106: 55ea44c26000/4096 107: 55ea44c28000/4096 108: 55ea44c2a000/4096 109: 55ea44c2c000/4096 110: 55ea44c2e000/4096 111: 55ea44c30000/4096 112: 55ea44c32000/4096 113: 55ea44c34000/4096 114: 55ea44c36000/4096 115: 55ea44c38000/4096 116: 55ea44c3a000/4096 117: 55ea44c3c000/4096 118: 55ea44c3e000/4096 119: 55ea44c40000/4096 120: 55ea44c42000/4096 121: 55ea44c44000/4096 122: 55ea44c46000/4096 123: 55ea44c48000/4096 124: 55ea44c4a000/4096 125: 55ea44c4c000/4096 126: 55ea44c4e000/4096 127: 55ea44c50000/4096 -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-30 1:08 ` Jens Axboe 2020-01-30 2:20 ` Jens Axboe @ 2020-01-30 6:53 ` Stefan Metzmacher 2020-01-30 10:11 ` Jann Horn 2 siblings, 0 replies; 48+ messages in thread From: Stefan Metzmacher @ 2020-01-30 6:53 UTC (permalink / raw) To: Jens Axboe, Jann Horn; +Cc: io-uring, Linux API Mailing List, Pavel Begunkov [-- Attachment #1.1: Type: text/plain, Size: 822 bytes --] Hi Jens, > diff --git a/fs/io_uring.c b/fs/io_uring.c > index c5ca84a305d3..0b2c7d800297 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -6511,6 +6505,45 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, > return submitted ? submitted : ret; > } > > +struct ring_show_idr { > + struct io_ring_ctx *ctx; > + struct seq_file *m; > +}; > + > +static int io_uring_show_cred(int id, void *p, void *data) > +{ > + struct ring_show_idr *r = data; > + const struct cred *cred = p; > + > + seq_printf(r->m, "\t%5d: uid=%u/gid=%u\n", id, cred->uid.val, > + cred->gid.val); > + return 0; > +} I think we should print similar information as task_state(), there we have: Uid: 1000 1000 1000 1000 Gid: 1000 1000 1000 1000 Groups: 1 2 3 4 5 1000 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-30 1:08 ` Jens Axboe 2020-01-30 2:20 ` Jens Axboe 2020-01-30 6:53 ` Stefan Metzmacher @ 2020-01-30 10:11 ` Jann Horn 2020-01-30 10:26 ` Christian Brauner 2 siblings, 1 reply; 48+ messages in thread From: Jann Horn @ 2020-01-30 10:11 UTC (permalink / raw) To: Jens Axboe Cc: Stefan Metzmacher, io-uring, Linux API Mailing List, Pavel Begunkov On Thu, Jan 30, 2020 at 2:08 AM Jens Axboe <[email protected]> wrote: > On 1/29/20 10:34 AM, Jens Axboe wrote: > > On 1/29/20 7:59 AM, Jann Horn wrote: > >> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <[email protected]> wrote: > >>> On 1/28/20 11:04 AM, Jens Axboe wrote: > >>>> On 1/28/20 10:19 AM, Jens Axboe wrote: > >> [...] > >>>>> #1 adds support for registering the personality of the invoking task, > >>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to > >>>>> just having one link, it doesn't support a chain of them. > >> [...] > >>> I didn't like it becoming a bit too complicated, both in terms of > >>> implementation and use. And the fact that we'd have to jump through > >>> hoops to make this work for a full chain. > >>> > >>> So I punted and just added sqe->personality and IOSQE_PERSONALITY. > >>> This makes it way easier to use. Same branch: > >>> > >>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds > >>> > >>> I'd feel much better with this variant for 5.6. > >> > >> Some general feedback from an inspectability/debuggability perspective: > >> > >> At some point, it might be nice if you could add a .show_fdinfo > >> handler to the io_uring_fops that makes it possible to get a rough > >> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd, > >> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be > >> helpful for debugging to be able to see information about the fixed > >> files and buffers that have been registered. Same for the > >> personalities; that information might also be useful when someone is > >> trying to figure out what privileges a running process actually has. > > > > Agree, that would be a very useful addition. I'll take a look at it. > > Jann, how much info are you looking for? Here's a rough start, just > shows the number of registered files and buffers, and lists the > personalities registered. We could also dump the buffer info for > each of them, and ditto for the files. Not sure how much verbosity > is acceptable in fdinfo? At the moment, I personally am just interested in this from the perspective of being able to audit the state of personalities, to make important information about the security state of processes visible. Good point about verbosity in fdinfo - I'm not sure about that myself either. > Here's the test app for personality: Oh, that was quick... > # cat 3 > pos: 0 > flags: 02000002 > mnt_id: 14 > user-files: 0 > user-bufs: 0 > personalities: > 1: uid=0/gid=0 > > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index c5ca84a305d3..0b2c7d800297 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -6511,6 +6505,45 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, > return submitted ? submitted : ret; > } > > +struct ring_show_idr { > + struct io_ring_ctx *ctx; > + struct seq_file *m; > +}; > + > +static int io_uring_show_cred(int id, void *p, void *data) > +{ > + struct ring_show_idr *r = data; > + const struct cred *cred = p; > + > + seq_printf(r->m, "\t%5d: uid=%u/gid=%u\n", id, cred->uid.val, > + cred->gid.val); As Stefan said, the ->uid and ->gid aren't very useful, since when a process switches UIDs for accessing things in the filesystem, it probably only changes its EUID and FSUID, not its RUID. I think what's particularly relevant for uring would be the ->fsuid and the ->fsgid along with ->cap_effective; and perhaps for some operations also the ->euid and ->egid. The real UID/GID aren't really relevant when performing normal filesystem operations and such. > + return 0; > +} ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-30 10:11 ` Jann Horn @ 2020-01-30 10:26 ` Christian Brauner 2020-01-30 14:11 ` Jens Axboe 0 siblings, 1 reply; 48+ messages in thread From: Christian Brauner @ 2020-01-30 10:26 UTC (permalink / raw) To: Jann Horn Cc: Jens Axboe, Stefan Metzmacher, io-uring, Linux API Mailing List, Pavel Begunkov On Thu, Jan 30, 2020 at 11:11:58AM +0100, Jann Horn wrote: > On Thu, Jan 30, 2020 at 2:08 AM Jens Axboe <[email protected]> wrote: > > On 1/29/20 10:34 AM, Jens Axboe wrote: > > > On 1/29/20 7:59 AM, Jann Horn wrote: > > >> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <[email protected]> wrote: > > >>> On 1/28/20 11:04 AM, Jens Axboe wrote: > > >>>> On 1/28/20 10:19 AM, Jens Axboe wrote: > > >> [...] > > >>>>> #1 adds support for registering the personality of the invoking task, > > >>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to > > >>>>> just having one link, it doesn't support a chain of them. > > >> [...] > > >>> I didn't like it becoming a bit too complicated, both in terms of > > >>> implementation and use. And the fact that we'd have to jump through > > >>> hoops to make this work for a full chain. > > >>> > > >>> So I punted and just added sqe->personality and IOSQE_PERSONALITY. > > >>> This makes it way easier to use. Same branch: > > >>> > > >>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds > > >>> > > >>> I'd feel much better with this variant for 5.6. > > >> > > >> Some general feedback from an inspectability/debuggability perspective: > > >> > > >> At some point, it might be nice if you could add a .show_fdinfo > > >> handler to the io_uring_fops that makes it possible to get a rough > > >> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd, > > >> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be > > >> helpful for debugging to be able to see information about the fixed > > >> files and buffers that have been registered. Same for the > > >> personalities; that information might also be useful when someone is > > >> trying to figure out what privileges a running process actually has. > > > > > > Agree, that would be a very useful addition. I'll take a look at it. > > > > Jann, how much info are you looking for? Here's a rough start, just > > shows the number of registered files and buffers, and lists the > > personalities registered. We could also dump the buffer info for > > each of them, and ditto for the files. Not sure how much verbosity > > is acceptable in fdinfo? > > At the moment, I personally am just interested in this from the > perspective of being able to audit the state of personalities, to make > important information about the security state of processes visible. > > Good point about verbosity in fdinfo - I'm not sure about that myself either. > > > Here's the test app for personality: > > Oh, that was quick... > > > # cat 3 > > pos: 0 > > flags: 02000002 > > mnt_id: 14 > > user-files: 0 > > user-bufs: 0 > > personalities: > > 1: uid=0/gid=0 > > > > > > diff --git a/fs/io_uring.c b/fs/io_uring.c > > index c5ca84a305d3..0b2c7d800297 100644 > > --- a/fs/io_uring.c > > +++ b/fs/io_uring.c > > @@ -6511,6 +6505,45 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, > > return submitted ? submitted : ret; > > } > > > > +struct ring_show_idr { > > + struct io_ring_ctx *ctx; > > + struct seq_file *m; > > +}; > > + > > +static int io_uring_show_cred(int id, void *p, void *data) > > +{ > > + struct ring_show_idr *r = data; > > + const struct cred *cred = p; > > + > > + seq_printf(r->m, "\t%5d: uid=%u/gid=%u\n", id, cred->uid.val, > > + cred->gid.val); > > As Stefan said, the ->uid and ->gid aren't very useful, since when a > process switches UIDs for accessing things in the filesystem, it > probably only changes its EUID and FSUID, not its RUID. > I think what's particularly relevant for uring would be the ->fsuid > and the ->fsgid along with ->cap_effective; and perhaps for some > operations also the ->euid and ->egid. The real UID/GID aren't really > relevant when performing normal filesystem operations and such. This should probably just use the same format that is found in /proc/<pid>/status to make it easy for tools to use the same parsing logic and for the sake of consistency. We've adapted the same format for pidfds. So that would mean: Uid: 1000 1000 1000 1000 Gid: 1000 1000 1000 1000 Which would be: Real, effective, saved set, and filesystem {G,U}IDs And CapEff in /proc/<pid>/status has the format: CapEff: 0000000000000000 Christian ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-30 10:26 ` Christian Brauner @ 2020-01-30 14:11 ` Jens Axboe 2020-01-30 14:47 ` Stefan Metzmacher 2020-01-30 15:13 ` Christian Brauner 0 siblings, 2 replies; 48+ messages in thread From: Jens Axboe @ 2020-01-30 14:11 UTC (permalink / raw) To: Christian Brauner, Jann Horn Cc: Stefan Metzmacher, io-uring, Linux API Mailing List, Pavel Begunkov On 1/30/20 3:26 AM, Christian Brauner wrote: > On Thu, Jan 30, 2020 at 11:11:58AM +0100, Jann Horn wrote: >> On Thu, Jan 30, 2020 at 2:08 AM Jens Axboe <[email protected]> wrote: >>> On 1/29/20 10:34 AM, Jens Axboe wrote: >>>> On 1/29/20 7:59 AM, Jann Horn wrote: >>>>> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <[email protected]> wrote: >>>>>> On 1/28/20 11:04 AM, Jens Axboe wrote: >>>>>>> On 1/28/20 10:19 AM, Jens Axboe wrote: >>>>> [...] >>>>>>>> #1 adds support for registering the personality of the invoking task, >>>>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to >>>>>>>> just having one link, it doesn't support a chain of them. >>>>> [...] >>>>>> I didn't like it becoming a bit too complicated, both in terms of >>>>>> implementation and use. And the fact that we'd have to jump through >>>>>> hoops to make this work for a full chain. >>>>>> >>>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY. >>>>>> This makes it way easier to use. Same branch: >>>>>> >>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >>>>>> >>>>>> I'd feel much better with this variant for 5.6. >>>>> >>>>> Some general feedback from an inspectability/debuggability perspective: >>>>> >>>>> At some point, it might be nice if you could add a .show_fdinfo >>>>> handler to the io_uring_fops that makes it possible to get a rough >>>>> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd, >>>>> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be >>>>> helpful for debugging to be able to see information about the fixed >>>>> files and buffers that have been registered. Same for the >>>>> personalities; that information might also be useful when someone is >>>>> trying to figure out what privileges a running process actually has. >>>> >>>> Agree, that would be a very useful addition. I'll take a look at it. >>> >>> Jann, how much info are you looking for? Here's a rough start, just >>> shows the number of registered files and buffers, and lists the >>> personalities registered. We could also dump the buffer info for >>> each of them, and ditto for the files. Not sure how much verbosity >>> is acceptable in fdinfo? >> >> At the moment, I personally am just interested in this from the >> perspective of being able to audit the state of personalities, to make >> important information about the security state of processes visible. >> >> Good point about verbosity in fdinfo - I'm not sure about that myself either. >> >>> Here's the test app for personality: >> >> Oh, that was quick... >> >>> # cat 3 >>> pos: 0 >>> flags: 02000002 >>> mnt_id: 14 >>> user-files: 0 >>> user-bufs: 0 >>> personalities: >>> 1: uid=0/gid=0 >>> >>> >>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>> index c5ca84a305d3..0b2c7d800297 100644 >>> --- a/fs/io_uring.c >>> +++ b/fs/io_uring.c >>> @@ -6511,6 +6505,45 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, >>> return submitted ? submitted : ret; >>> } >>> >>> +struct ring_show_idr { >>> + struct io_ring_ctx *ctx; >>> + struct seq_file *m; >>> +}; >>> + >>> +static int io_uring_show_cred(int id, void *p, void *data) >>> +{ >>> + struct ring_show_idr *r = data; >>> + const struct cred *cred = p; >>> + >>> + seq_printf(r->m, "\t%5d: uid=%u/gid=%u\n", id, cred->uid.val, >>> + cred->gid.val); >> >> As Stefan said, the ->uid and ->gid aren't very useful, since when a >> process switches UIDs for accessing things in the filesystem, it >> probably only changes its EUID and FSUID, not its RUID. >> I think what's particularly relevant for uring would be the ->fsuid >> and the ->fsgid along with ->cap_effective; and perhaps for some >> operations also the ->euid and ->egid. The real UID/GID aren't really >> relevant when performing normal filesystem operations and such. > > This should probably just use the same format that is found in > /proc/<pid>/status to make it easy for tools to use the same parsing > logic and for the sake of consistency. We've adapted the same format for > pidfds. So that would mean: > > Uid: 1000 1000 1000 1000 > Gid: 1000 1000 1000 1000 > > Which would be: Real, effective, saved set, and filesystem {G,U}IDs > > And CapEff in /proc/<pid>/status has the format: > CapEff: 0000000000000000 I agree, consistency is good. I've added this, and also changed the naming to be CamelCase, which is seems like most of them are. Now it looks like this: pos: 0 flags: 02000002 mnt_id: 14 UserFiles: 0 UserBufs: 0 Personalities: 1 Uid: 0 0 0 0 Gid: 0 0 0 0 Groups: 0 CapEff: 0000003fffffffff for a single personality registered (root). I have to indent it an extra tab to display each personality. diff --git a/fs/io_uring.c b/fs/io_uring.c index c5ca84a305d3..8b2411542f3e 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -6511,6 +6505,79 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, return submitted ? submitted : ret; } +static int io_uring_show_cred(int id, void *p, void *data) +{ + const struct cred *cred = p; + struct seq_file *m = data; + struct user_namespace *uns = seq_user_ns(m); + struct group_info *gi; + kernel_cap_t cap; + unsigned __capi; + int g; + + seq_printf(m, "%5d\n", id); + seq_put_decimal_ull(m, "\tUid:\t", from_kuid_munged(uns, cred->uid)); + seq_put_decimal_ull(m, "\t\t", from_kuid_munged(uns, cred->euid)); + seq_put_decimal_ull(m, "\t\t", from_kuid_munged(uns, cred->suid)); + seq_put_decimal_ull(m, "\t\t", from_kuid_munged(uns, cred->fsuid)); + seq_put_decimal_ull(m, "\n\tGid:\t", from_kgid_munged(uns, cred->gid)); + seq_put_decimal_ull(m, "\t\t", from_kgid_munged(uns, cred->egid)); + seq_put_decimal_ull(m, "\t\t", from_kgid_munged(uns, cred->sgid)); + seq_put_decimal_ull(m, "\t\t", from_kgid_munged(uns, cred->fsgid)); + seq_puts(m, "\n\tGroups:\t"); + gi = cred->group_info; + for (g = 0; g < gi->ngroups; g++) { + seq_put_decimal_ull(m, g ? " " : "", + from_kgid_munged(uns, gi->gid[g])); + } + seq_puts(m, "\n\tCapEff:\t"); + cap = cred->cap_effective; + CAP_FOR_EACH_U32(__capi) + seq_put_hex_ll(m, NULL, cap.cap[CAP_LAST_U32 - __capi], 8); + seq_putc(m, '\n'); + return 0; +} + +static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m) +{ + int i; + + mutex_lock(&ctx->uring_lock); + seq_printf(m, "UserFiles: %5u\n", ctx->nr_user_files); + for (i = 0; i < ctx->nr_user_files; i++) { + struct fixed_file_table *table; + struct file *f; + + table = &ctx->file_data->table[i >> IORING_FILE_TABLE_SHIFT]; + f = table->files[i & IORING_FILE_TABLE_MASK]; + if (f) + seq_printf(m, "%5u: %s\n", i, file_dentry(f)->d_iname); + else + seq_printf(m, "%5u: <none>\n", i); + } + seq_printf(m, "UserBufs: %5u\n", ctx->nr_user_bufs); + for (i = 0; i < ctx->nr_user_bufs; i++) { + struct io_mapped_ubuf *buf = &ctx->user_bufs[i]; + + seq_printf(m, "%5u: 0x%llx/%lu\n", i, buf->ubuf, buf->len); + } + if (!idr_is_empty(&ctx->personality_idr)) { + seq_printf(m, "Personalities:\n"); + idr_for_each(&ctx->personality_idr, io_uring_show_cred, m); + } + mutex_unlock(&ctx->uring_lock); +} + +static void io_uring_show_fdinfo(struct seq_file *m, struct file *f) +{ + struct io_ring_ctx *ctx = f->private_data; + + if (percpu_ref_tryget(&ctx->refs)) { + __io_uring_show_fdinfo(ctx, m); + percpu_ref_put(&ctx->refs); + } +} + static const struct file_operations io_uring_fops = { .release = io_uring_release, .flush = io_uring_flush, @@ -6521,6 +6588,7 @@ static const struct file_operations io_uring_fops = { #endif .poll = io_uring_poll, .fasync = io_uring_fasync, + .show_fdinfo = io_uring_show_fdinfo, }; static int io_allocate_scq_urings(struct io_ring_ctx *ctx, -- Jens Axboe ^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-30 14:11 ` Jens Axboe @ 2020-01-30 14:47 ` Stefan Metzmacher 2020-01-30 15:34 ` Jens Axboe 2020-01-30 15:13 ` Christian Brauner 1 sibling, 1 reply; 48+ messages in thread From: Stefan Metzmacher @ 2020-01-30 14:47 UTC (permalink / raw) To: Jens Axboe, Christian Brauner, Jann Horn Cc: io-uring, Linux API Mailing List, Pavel Begunkov [-- Attachment #1.1: Type: text/plain, Size: 5697 bytes --] Am 30.01.20 um 15:11 schrieb Jens Axboe: > On 1/30/20 3:26 AM, Christian Brauner wrote: >> On Thu, Jan 30, 2020 at 11:11:58AM +0100, Jann Horn wrote: >>> On Thu, Jan 30, 2020 at 2:08 AM Jens Axboe <[email protected]> wrote: >>>> On 1/29/20 10:34 AM, Jens Axboe wrote: >>>>> On 1/29/20 7:59 AM, Jann Horn wrote: >>>>>> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <[email protected]> wrote: >>>>>>> On 1/28/20 11:04 AM, Jens Axboe wrote: >>>>>>>> On 1/28/20 10:19 AM, Jens Axboe wrote: >>>>>> [...] >>>>>>>>> #1 adds support for registering the personality of the invoking task, >>>>>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to >>>>>>>>> just having one link, it doesn't support a chain of them. >>>>>> [...] >>>>>>> I didn't like it becoming a bit too complicated, both in terms of >>>>>>> implementation and use. And the fact that we'd have to jump through >>>>>>> hoops to make this work for a full chain. >>>>>>> >>>>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY. >>>>>>> This makes it way easier to use. Same branch: >>>>>>> >>>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >>>>>>> >>>>>>> I'd feel much better with this variant for 5.6. >>>>>> >>>>>> Some general feedback from an inspectability/debuggability perspective: >>>>>> >>>>>> At some point, it might be nice if you could add a .show_fdinfo >>>>>> handler to the io_uring_fops that makes it possible to get a rough >>>>>> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd, >>>>>> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be >>>>>> helpful for debugging to be able to see information about the fixed >>>>>> files and buffers that have been registered. Same for the >>>>>> personalities; that information might also be useful when someone is >>>>>> trying to figure out what privileges a running process actually has. >>>>> >>>>> Agree, that would be a very useful addition. I'll take a look at it. >>>> >>>> Jann, how much info are you looking for? Here's a rough start, just >>>> shows the number of registered files and buffers, and lists the >>>> personalities registered. We could also dump the buffer info for >>>> each of them, and ditto for the files. Not sure how much verbosity >>>> is acceptable in fdinfo? >>> >>> At the moment, I personally am just interested in this from the >>> perspective of being able to audit the state of personalities, to make >>> important information about the security state of processes visible. >>> >>> Good point about verbosity in fdinfo - I'm not sure about that myself either. >>> >>>> Here's the test app for personality: >>> >>> Oh, that was quick... >>> >>>> # cat 3 >>>> pos: 0 >>>> flags: 02000002 >>>> mnt_id: 14 >>>> user-files: 0 >>>> user-bufs: 0 >>>> personalities: >>>> 1: uid=0/gid=0 >>>> >>>> >>>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>>> index c5ca84a305d3..0b2c7d800297 100644 >>>> --- a/fs/io_uring.c >>>> +++ b/fs/io_uring.c >>>> @@ -6511,6 +6505,45 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, >>>> return submitted ? submitted : ret; >>>> } >>>> >>>> +struct ring_show_idr { >>>> + struct io_ring_ctx *ctx; >>>> + struct seq_file *m; >>>> +}; >>>> + >>>> +static int io_uring_show_cred(int id, void *p, void *data) >>>> +{ >>>> + struct ring_show_idr *r = data; >>>> + const struct cred *cred = p; >>>> + >>>> + seq_printf(r->m, "\t%5d: uid=%u/gid=%u\n", id, cred->uid.val, >>>> + cred->gid.val); >>> >>> As Stefan said, the ->uid and ->gid aren't very useful, since when a >>> process switches UIDs for accessing things in the filesystem, it >>> probably only changes its EUID and FSUID, not its RUID. >>> I think what's particularly relevant for uring would be the ->fsuid >>> and the ->fsgid along with ->cap_effective; and perhaps for some >>> operations also the ->euid and ->egid. The real UID/GID aren't really >>> relevant when performing normal filesystem operations and such. >> >> This should probably just use the same format that is found in >> /proc/<pid>/status to make it easy for tools to use the same parsing >> logic and for the sake of consistency. We've adapted the same format for >> pidfds. So that would mean: >> >> Uid: 1000 1000 1000 1000 >> Gid: 1000 1000 1000 1000 >> >> Which would be: Real, effective, saved set, and filesystem {G,U}IDs >> >> And CapEff in /proc/<pid>/status has the format: >> CapEff: 0000000000000000 > > I agree, consistency is good. I've added this, and also changed the > naming to be CamelCase, which is seems like most of them are. Now it > looks like this: > > pos: 0 > flags: 02000002 > mnt_id: 14 > UserFiles: 0 > UserBufs: 0 > Personalities: > 1 > Uid: 0 0 0 0 > Gid: 0 0 0 0 > Groups: 0 > CapEff: 0000003fffffffff > > for a single personality registered (root). I have to indent it an extra > tab to display each personality. That looks good. Maybe also print some details of struct io_ring_ctx, flags and the ring sizes, ctx->cred. Maybe details for io_wq and sqo_thread. Maybe pending requests? I'm not sure about how io_wq threads work in detail. Is it possible that a large number of blocking request (against an external harddisk with disconnected cable) to block other blocking requests to a working ssd? It would be good to diagnose such situations from the output. How is this supposed to be ABI-wise? Is it possible to change the output in later kernel versions? metze [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-30 14:47 ` Stefan Metzmacher @ 2020-01-30 15:34 ` Jens Axboe 0 siblings, 0 replies; 48+ messages in thread From: Jens Axboe @ 2020-01-30 15:34 UTC (permalink / raw) To: Stefan Metzmacher, Christian Brauner, Jann Horn Cc: io-uring, Linux API Mailing List, Pavel Begunkov On 1/30/20 7:47 AM, Stefan Metzmacher wrote: > Am 30.01.20 um 15:11 schrieb Jens Axboe: >> On 1/30/20 3:26 AM, Christian Brauner wrote: >>> On Thu, Jan 30, 2020 at 11:11:58AM +0100, Jann Horn wrote: >>>> On Thu, Jan 30, 2020 at 2:08 AM Jens Axboe <[email protected]> wrote: >>>>> On 1/29/20 10:34 AM, Jens Axboe wrote: >>>>>> On 1/29/20 7:59 AM, Jann Horn wrote: >>>>>>> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <[email protected]> wrote: >>>>>>>> On 1/28/20 11:04 AM, Jens Axboe wrote: >>>>>>>>> On 1/28/20 10:19 AM, Jens Axboe wrote: >>>>>>> [...] >>>>>>>>>> #1 adds support for registering the personality of the invoking task, >>>>>>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to >>>>>>>>>> just having one link, it doesn't support a chain of them. >>>>>>> [...] >>>>>>>> I didn't like it becoming a bit too complicated, both in terms of >>>>>>>> implementation and use. And the fact that we'd have to jump through >>>>>>>> hoops to make this work for a full chain. >>>>>>>> >>>>>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY. >>>>>>>> This makes it way easier to use. Same branch: >>>>>>>> >>>>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >>>>>>>> >>>>>>>> I'd feel much better with this variant for 5.6. >>>>>>> >>>>>>> Some general feedback from an inspectability/debuggability perspective: >>>>>>> >>>>>>> At some point, it might be nice if you could add a .show_fdinfo >>>>>>> handler to the io_uring_fops that makes it possible to get a rough >>>>>>> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd, >>>>>>> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be >>>>>>> helpful for debugging to be able to see information about the fixed >>>>>>> files and buffers that have been registered. Same for the >>>>>>> personalities; that information might also be useful when someone is >>>>>>> trying to figure out what privileges a running process actually has. >>>>>> >>>>>> Agree, that would be a very useful addition. I'll take a look at it. >>>>> >>>>> Jann, how much info are you looking for? Here's a rough start, just >>>>> shows the number of registered files and buffers, and lists the >>>>> personalities registered. We could also dump the buffer info for >>>>> each of them, and ditto for the files. Not sure how much verbosity >>>>> is acceptable in fdinfo? >>>> >>>> At the moment, I personally am just interested in this from the >>>> perspective of being able to audit the state of personalities, to make >>>> important information about the security state of processes visible. >>>> >>>> Good point about verbosity in fdinfo - I'm not sure about that myself either. >>>> >>>>> Here's the test app for personality: >>>> >>>> Oh, that was quick... >>>> >>>>> # cat 3 >>>>> pos: 0 >>>>> flags: 02000002 >>>>> mnt_id: 14 >>>>> user-files: 0 >>>>> user-bufs: 0 >>>>> personalities: >>>>> 1: uid=0/gid=0 >>>>> >>>>> >>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>>>> index c5ca84a305d3..0b2c7d800297 100644 >>>>> --- a/fs/io_uring.c >>>>> +++ b/fs/io_uring.c >>>>> @@ -6511,6 +6505,45 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, >>>>> return submitted ? submitted : ret; >>>>> } >>>>> >>>>> +struct ring_show_idr { >>>>> + struct io_ring_ctx *ctx; >>>>> + struct seq_file *m; >>>>> +}; >>>>> + >>>>> +static int io_uring_show_cred(int id, void *p, void *data) >>>>> +{ >>>>> + struct ring_show_idr *r = data; >>>>> + const struct cred *cred = p; >>>>> + >>>>> + seq_printf(r->m, "\t%5d: uid=%u/gid=%u\n", id, cred->uid.val, >>>>> + cred->gid.val); >>>> >>>> As Stefan said, the ->uid and ->gid aren't very useful, since when a >>>> process switches UIDs for accessing things in the filesystem, it >>>> probably only changes its EUID and FSUID, not its RUID. >>>> I think what's particularly relevant for uring would be the ->fsuid >>>> and the ->fsgid along with ->cap_effective; and perhaps for some >>>> operations also the ->euid and ->egid. The real UID/GID aren't really >>>> relevant when performing normal filesystem operations and such. >>> >>> This should probably just use the same format that is found in >>> /proc/<pid>/status to make it easy for tools to use the same parsing >>> logic and for the sake of consistency. We've adapted the same format for >>> pidfds. So that would mean: >>> >>> Uid: 1000 1000 1000 1000 >>> Gid: 1000 1000 1000 1000 >>> >>> Which would be: Real, effective, saved set, and filesystem {G,U}IDs >>> >>> And CapEff in /proc/<pid>/status has the format: >>> CapEff: 0000000000000000 >> >> I agree, consistency is good. I've added this, and also changed the >> naming to be CamelCase, which is seems like most of them are. Now it >> looks like this: >> >> pos: 0 >> flags: 02000002 >> mnt_id: 14 >> UserFiles: 0 >> UserBufs: 0 >> Personalities: >> 1 >> Uid: 0 0 0 0 >> Gid: 0 0 0 0 >> Groups: 0 >> CapEff: 0000003fffffffff >> >> for a single personality registered (root). I have to indent it an extra >> tab to display each personality. > > That looks good. > > Maybe also print some details of struct io_ring_ctx, > flags and the ring sizes, ctx->cred. > > Maybe details for io_wq and sqo_thread. Yeah, I agree that we should probably just add a ton more, there's plenty of information that would be useful. But let's start simple - I forgot to CC you on the patch I just sent out, but it's basically the above cleaned up. We dump information that's registered with the ring, that's the theme right now. I'd be happy to add some of the state information as well, we should do that as a separate patch. > Maybe pending requests? > I'm not sure about how io_wq threads work in detail. > Is it possible that a large number of blocking request > (against an external harddisk with disconnected cable) > to block other blocking requests to a working ssd? > It would be good to diagnose such situations from > the output. io_uring doesn't necessarily track pending requests, only if it has to. For bounded request time IO, like the above, it'll depend on the concurrency level. If you setup the ring with eg N entries, that'll be at most N pending bounded requests. If all of those are blocked because the disk isn't responding, yes, that could happen. At least until the timeout happens. > How is this supposed to be ABI-wise? Is it possible to change > the output in later kernel versions? We should always be able to append to the file, I'd just prefer if we don't change the format of lines that have already been added. -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-30 14:11 ` Jens Axboe 2020-01-30 14:47 ` Stefan Metzmacher @ 2020-01-30 15:13 ` Christian Brauner 2020-01-30 15:29 ` Jens Axboe 1 sibling, 1 reply; 48+ messages in thread From: Christian Brauner @ 2020-01-30 15:13 UTC (permalink / raw) To: Jens Axboe Cc: Jann Horn, Stefan Metzmacher, io-uring, Linux API Mailing List, Pavel Begunkov On Thu, Jan 30, 2020 at 07:11:08AM -0700, Jens Axboe wrote: > On 1/30/20 3:26 AM, Christian Brauner wrote: > > On Thu, Jan 30, 2020 at 11:11:58AM +0100, Jann Horn wrote: > >> On Thu, Jan 30, 2020 at 2:08 AM Jens Axboe <[email protected]> wrote: > >>> On 1/29/20 10:34 AM, Jens Axboe wrote: > >>>> On 1/29/20 7:59 AM, Jann Horn wrote: > >>>>> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <[email protected]> wrote: > >>>>>> On 1/28/20 11:04 AM, Jens Axboe wrote: > >>>>>>> On 1/28/20 10:19 AM, Jens Axboe wrote: > >>>>> [...] > >>>>>>>> #1 adds support for registering the personality of the invoking task, > >>>>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to > >>>>>>>> just having one link, it doesn't support a chain of them. > >>>>> [...] > >>>>>> I didn't like it becoming a bit too complicated, both in terms of > >>>>>> implementation and use. And the fact that we'd have to jump through > >>>>>> hoops to make this work for a full chain. > >>>>>> > >>>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY. > >>>>>> This makes it way easier to use. Same branch: > >>>>>> > >>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds > >>>>>> > >>>>>> I'd feel much better with this variant for 5.6. > >>>>> > >>>>> Some general feedback from an inspectability/debuggability perspective: > >>>>> > >>>>> At some point, it might be nice if you could add a .show_fdinfo > >>>>> handler to the io_uring_fops that makes it possible to get a rough > >>>>> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd, > >>>>> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be > >>>>> helpful for debugging to be able to see information about the fixed > >>>>> files and buffers that have been registered. Same for the > >>>>> personalities; that information might also be useful when someone is > >>>>> trying to figure out what privileges a running process actually has. > >>>> > >>>> Agree, that would be a very useful addition. I'll take a look at it. > >>> > >>> Jann, how much info are you looking for? Here's a rough start, just > >>> shows the number of registered files and buffers, and lists the > >>> personalities registered. We could also dump the buffer info for > >>> each of them, and ditto for the files. Not sure how much verbosity > >>> is acceptable in fdinfo? > >> > >> At the moment, I personally am just interested in this from the > >> perspective of being able to audit the state of personalities, to make > >> important information about the security state of processes visible. > >> > >> Good point about verbosity in fdinfo - I'm not sure about that myself either. Afaik, there's no rule here. I would expect that it shouldn't exceed 4096kb just because that is the limit that seems to be enforced for writes to proc files atm; other than that it should be the wild west. The fdinfo files are mostly interesting for anon_inode fds imho and the ones that come to mind right now simply don't have a lot of information to provide: eventfd timerfd seccomp_notify_fd Potentially, the mount fds from David could be extended in the future. (Side note: One thing that comes to mind is that we should probably enforce^Wdocument that all fdinfo files use CamelCase?) Christian ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? 2020-01-30 15:13 ` Christian Brauner @ 2020-01-30 15:29 ` Jens Axboe 0 siblings, 0 replies; 48+ messages in thread From: Jens Axboe @ 2020-01-30 15:29 UTC (permalink / raw) To: Christian Brauner Cc: Jann Horn, Stefan Metzmacher, io-uring, Linux API Mailing List, Pavel Begunkov On 1/30/20 8:13 AM, Christian Brauner wrote: > On Thu, Jan 30, 2020 at 07:11:08AM -0700, Jens Axboe wrote: >> On 1/30/20 3:26 AM, Christian Brauner wrote: >>> On Thu, Jan 30, 2020 at 11:11:58AM +0100, Jann Horn wrote: >>>> On Thu, Jan 30, 2020 at 2:08 AM Jens Axboe <[email protected]> wrote: >>>>> On 1/29/20 10:34 AM, Jens Axboe wrote: >>>>>> On 1/29/20 7:59 AM, Jann Horn wrote: >>>>>>> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <[email protected]> wrote: >>>>>>>> On 1/28/20 11:04 AM, Jens Axboe wrote: >>>>>>>>> On 1/28/20 10:19 AM, Jens Axboe wrote: >>>>>>> [...] >>>>>>>>>> #1 adds support for registering the personality of the invoking task, >>>>>>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to >>>>>>>>>> just having one link, it doesn't support a chain of them. >>>>>>> [...] >>>>>>>> I didn't like it becoming a bit too complicated, both in terms of >>>>>>>> implementation and use. And the fact that we'd have to jump through >>>>>>>> hoops to make this work for a full chain. >>>>>>>> >>>>>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY. >>>>>>>> This makes it way easier to use. Same branch: >>>>>>>> >>>>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds >>>>>>>> >>>>>>>> I'd feel much better with this variant for 5.6. >>>>>>> >>>>>>> Some general feedback from an inspectability/debuggability perspective: >>>>>>> >>>>>>> At some point, it might be nice if you could add a .show_fdinfo >>>>>>> handler to the io_uring_fops that makes it possible to get a rough >>>>>>> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd, >>>>>>> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be >>>>>>> helpful for debugging to be able to see information about the fixed >>>>>>> files and buffers that have been registered. Same for the >>>>>>> personalities; that information might also be useful when someone is >>>>>>> trying to figure out what privileges a running process actually has. >>>>>> >>>>>> Agree, that would be a very useful addition. I'll take a look at it. >>>>> >>>>> Jann, how much info are you looking for? Here's a rough start, just >>>>> shows the number of registered files and buffers, and lists the >>>>> personalities registered. We could also dump the buffer info for >>>>> each of them, and ditto for the files. Not sure how much verbosity >>>>> is acceptable in fdinfo? >>>> >>>> At the moment, I personally am just interested in this from the >>>> perspective of being able to audit the state of personalities, to make >>>> important information about the security state of processes visible. >>>> >>>> Good point about verbosity in fdinfo - I'm not sure about that myself either. > > Afaik, there's no rule here. I would expect that it shouldn't exceed > 4096kb just because that is the limit that seems to be enforced for > writes to proc files atm; other than that it should be the wild west. > The fdinfo files are mostly interesting for anon_inode fds imho and the > ones that come to mind right now simply don't have a lot of information > to provide: > > eventfd > timerfd > seccomp_notify_fd > > Potentially, the mount fds from David could be extended in the future. 4MB is huge, I'd not get anywhere near that. So I'd say the current format is probably fine. I honed it a little bit to be prettier, looks good to me. I'll send it out for review. > (Side note: One thing that comes to mind is that we should probably > enforce^Wdocument that all fdinfo files use CamelCase?) Fine with me, seems to already be the norm, would be nice to have it documented. -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
end of thread, other threads:[~2020-01-30 15:34 UTC | newest] Thread overview: 48+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-01-28 10:18 IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? Stefan Metzmacher 2020-01-28 16:10 ` Jens Axboe 2020-01-28 16:17 ` Stefan Metzmacher 2020-01-28 16:19 ` Jens Axboe 2020-01-28 17:19 ` Jens Axboe 2020-01-28 18:04 ` Jens Axboe 2020-01-28 19:42 ` Jens Axboe 2020-01-28 20:16 ` Pavel Begunkov 2020-01-28 20:19 ` Jens Axboe 2020-01-28 20:50 ` Pavel Begunkov 2020-01-28 20:56 ` Jens Axboe 2020-01-28 21:25 ` Christian Brauner 2020-01-28 22:38 ` Pavel Begunkov 2020-01-28 23:36 ` Pavel Begunkov 2020-01-28 23:40 ` Jens Axboe 2020-01-28 23:51 ` Jens Axboe 2020-01-29 0:10 ` Pavel Begunkov 2020-01-29 0:15 ` Jens Axboe 2020-01-29 0:18 ` Jens Axboe 2020-01-29 0:20 ` Jens Axboe 2020-01-29 0:21 ` Pavel Begunkov 2020-01-29 0:24 ` Jens Axboe 2020-01-29 0:54 ` Jens Axboe 2020-01-29 10:17 ` Pavel Begunkov 2020-01-29 13:11 ` Stefan Metzmacher 2020-01-29 13:41 ` Pavel Begunkov 2020-01-29 13:56 ` Stefan Metzmacher 2020-01-29 14:23 ` Pavel Begunkov 2020-01-29 14:27 ` Stefan Metzmacher 2020-01-29 14:34 ` Pavel Begunkov 2020-01-29 17:34 ` Jens Axboe 2020-01-29 17:42 ` Jens Axboe 2020-01-29 20:09 ` Stefan Metzmacher 2020-01-29 20:48 ` Jens Axboe 2020-01-29 17:46 ` Pavel Begunkov 2020-01-29 14:59 ` Jann Horn 2020-01-29 17:34 ` Jens Axboe 2020-01-30 1:08 ` Jens Axboe 2020-01-30 2:20 ` Jens Axboe 2020-01-30 3:18 ` Jens Axboe 2020-01-30 6:53 ` Stefan Metzmacher 2020-01-30 10:11 ` Jann Horn 2020-01-30 10:26 ` Christian Brauner 2020-01-30 14:11 ` Jens Axboe 2020-01-30 14:47 ` Stefan Metzmacher 2020-01-30 15:34 ` Jens Axboe 2020-01-30 15:13 ` Christian Brauner 2020-01-30 15:29 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox