Am 09.01.20 um 22:31 schrieb Jens Axboe: > On 1/9/20 3:40 AM, Stefan Metzmacher wrote: >>>> I'm sorry, but I'm still unsure we're talking about the same thing >>>> (or maybe I'm missing some basics here). >>>> >>>> My understanding of the io_uring_enter() is that it will execute as much >>>> non-blocking calls as it can without switching to any other kernel thread. >>> >>> Correct, any SQE that we can do without switching, we will. >>> >>>> And my fear is that openat will use get_current_cred() instead of >>>> ctx->creds. >>> >>> OK, I think I follow your concern. So you'd like to setup the rings from >>> a _different_ user, and then later on use it for submission for SQEs that >>> a specific user. So sort of the same as our initial discussion, except >>> the mapping would be static. The difference being that you might setup >>> the ring from a different user than the user that would be submitting IO >>> on it? >> >> Our current (much simplified here) flow is this: >> >> # we start as root >> seteuid(0);setegid(0);setgroups()... >> ... >> # we become the user555 and >> # create our desired credential token >> seteuid(555); seteguid(555); setgroups()... >> # Start an openat2 on behalf of user555 >> openat2() >> # we unbecome the user again and run as root >> seteuid(0);setegid(0); setgroups()... >> ... >> # we become the user444 and >> # create our desired credential token >> seteuid(444); seteguid(444); setgroups()... >> # Start an openat2 on behalf of user444 >> openat2() >> # we unbecome the user again and run as root >> seteuid(0);setegid(0); setgroups()... >> ... >> # we become the user555 and >> # create our desired credential token >> seteuid(555); seteguid(555); setgroups()... >> # Start an openat2 on behalf of user555 >> openat2() >> # we unbecome the user again and run as root >> seteuid(0);setegid(0); setgroups()... >> >> It means we have to do about 7 syscalls in order >> to open a file on behalf of a user. >> (In reality we cache things and avoid set*id() >> calls most of the time, but I want to demonstrate the >> simplified design here) >> >> With io_uring I'd like to use a flow like this: >> >> # we start as root >> seteuid(0);setegid(0);setgroups()... >> ... >> # we become the user444 and >> # create our desired credential token >> seteuid(444); seteguid(444); setgroups()... >> # we snapshot the credentials to the new ring for user444 >> ring444 = io_uring_setup() >> # we unbecome the user again and run as root >> seteuid(0);setegid(0);setgroups()... >> ... >> # we become the user555 and >> # create our desired credential token >> seteuid(555); seteguid(555); setgroups()... >> # we snapshot the credentials to the new ring for user555 >> ring555 = io_uring_setup() >> # we unbecome the user again and run as root >> seteuid(0);setegid(0);setgroups()... >> ... >> # Start an openat2 on behalf of user555 >> io_uring_enter(ring555, OP_OPENAT2...) >> ... >> # Start an openat2 on behalf of user444 >> io_uring_enter(ring444, OP_OPENAT2...) >> ... >> # Start an openat2 on behalf of user555 >> io_uring_enter(ring555, OP_OPENAT2...) >> >> So instead of constantly doing 7 syscalls per open, >> we would be down to just at most one. And I would assume >> that io_uring_enter() would do the temporary credential switch >> for me also in the non-blocking case. > > OK, thanks for spelling the use case out, makes it easier to understand > what you need in terms of what we currently can't do. > >>> If so, then we do need something to support that, probably an >>> IORING_REGISTER_CREDS or similar. This would allow you to replace the >>> creds you currently have in ctx->creds with whatever new one. >> >> I don't want to change ctx->creds, but I want it to be used consistently. >> >> What I think is missing is something like this: >> >> diff --git a/fs/io_uring.c b/fs/io_uring.c >> index 32aee149f652..55dbb154915a 100644 >> --- a/fs/io_uring.c >> +++ b/fs/io_uring.c >> @@ -6359,10 +6359,27 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, >> fd, u32, to_submit, >> struct mm_struct *cur_mm; >> >> mutex_lock(&ctx->uring_lock); >> + if (current->mm != ctx->sqo_mm) { >> + // TODO: somthing like this... >> + restore_mm = current->mm; >> + use_mm(ctx->sqo_mm); >> + } >> /* already have mm, so io_submit_sqes() won't try to >> grab it */ >> cur_mm = ctx->sqo_mm; >> + if (current_cred() != ctx->creds) { >> + // TODO: somthing like this... >> + restore_cred = override_creds(ctx->creds); >> + } >> submitted = io_submit_sqes(ctx, to_submit, f.file, fd, >> &cur_mm, false); >> + if (restore_cred != NULL) { >> + revert_creds(restore_cred); >> + } >> + if (restore_mm != NULL) { >> + // TODO: something like this... >> + unuse_mm(ctx->sqo_mm); >> + use_mm(restore_mm); >> + } >> mutex_unlock(&ctx->uring_lock); >> >> if (submitted != to_submit) >> >> I'm not sure if current->mm is needed, I just added it for completeness >> and as hint that io_op_defs[req->opcode].needs_mm is there and a >> needs_creds could also be added (if it helps with performance) >> >> Is it possible to trigger a change of current->mm from userspace? >> >> An IORING_REGISTER_CREDS would only be useful if it's possible to >> register a set of credentials and then use per io_uring_sqe credentials. >> That would also be fine for me, but I'm not sure it's needed for now. > > I think it'd be a cleaner way of doing the same thing as your patch > does. It seems a little odd to do this by default (having the ring > change personalities depending on who's using it), but from an opt-in > point of view, I think it makes more sense. > > That would make the IORING_REGISTER_ call something like > IORING_REGISTER_ADOPT_OWNER or something like that, meaning that the > ring would just assume the identify of the task that's calling > io_uring_enter(). > > Note that this also has to be passed through to the io-wq handler, as > the mappings there are currently static as well. What's the next step here? I think the current state is a security problem! The inline execution either needs to change the creds temporary or io_uring_enter() needs a general check that the current creds match the creds of the ring and return -EPERM or something similar. Thanks! metze