* [RFC 0/2] io_uring: disallow overlapping ranges for buffer registration @ 2020-06-12 2:23 Bijan Mottahedeh 2020-06-12 2:23 ` [RFC 1/2] " Bijan Mottahedeh 2020-06-12 2:23 ` [RFC 2/2] io_uring: report pinned memory usage Bijan Mottahedeh 0 siblings, 2 replies; 11+ messages in thread From: Bijan Mottahedeh @ 2020-06-12 2:23 UTC (permalink / raw) To: axboe; +Cc: io-uring This patch set has a couple of RFCs. The first patch aims to disallow overlapping of user buffers for registration. For example, right now it is possible to register the same identical buffer many times with the same system call which doesn't really make sense. I'm not sure if there is a valid use case for overlapping buffers. I think this check by itself might not be sufficient and more restrictions may be needed but I figured to send this out for feedback and check whether overlapping buffers should be allowed in the first place. The second patch aims to separate reporting and enforcing of pinned memory usage. Reporting the usage seems like a good idea even if no limit is enforced and ctx->account_mem is zero. However, I'm clear on the proper setting and usage of user->locked_vm mm->locked_vm mm->pinned_vm Looking at some other pin_user_page() callers, it seem mm->pinned_vm should be used by I'm not sure. Bijan Mottahedeh (2): io_uring: disallow overlapping ranges for buffer registration io_uring: report pinned memory usage fs/io_uring.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC 1/2] io_uring: disallow overlapping ranges for buffer registration 2020-06-12 2:23 [RFC 0/2] io_uring: disallow overlapping ranges for buffer registration Bijan Mottahedeh @ 2020-06-12 2:23 ` Bijan Mottahedeh 2020-06-12 15:16 ` Jens Axboe 2020-06-12 2:23 ` [RFC 2/2] io_uring: report pinned memory usage Bijan Mottahedeh 1 sibling, 1 reply; 11+ messages in thread From: Bijan Mottahedeh @ 2020-06-12 2:23 UTC (permalink / raw) To: axboe; +Cc: io-uring Buffer registration is expensive in terms of cpu/mem overhead and there seems no good reason to allow overlapping ranges. Signed-off-by: Bijan Mottahedeh <[email protected]> --- fs/io_uring.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 9158130..4248726 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -7211,6 +7211,12 @@ static int io_copy_iov(struct io_ring_ctx *ctx, struct iovec *dst, return 0; } +static inline int iov_overlap(struct iovec *v1, struct iovec *v2) +{ + return (v1->iov_base <= (v2->iov_base + v2->iov_len - 1) && + v2->iov_base <= (v1->iov_base + v1->iov_len - 1)); +} + static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg, unsigned nr_args) { @@ -7233,7 +7239,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg, struct io_mapped_ubuf *imu = &ctx->user_bufs[i]; unsigned long off, start, end, ubuf; int pret, nr_pages; - struct iovec iov; + struct iovec iov, prv_iov; size_t size; ret = io_copy_iov(ctx, &iov, arg, i); @@ -7258,6 +7264,12 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg, start = ubuf >> PAGE_SHIFT; nr_pages = end - start; + /* disallow overlapping buffers */ + if (i > 0 && iov_overlap(&prv_iov, &iov)) + goto err; + + prv_iov = iov; + if (ctx->account_mem) { ret = io_account_mem(ctx->user, nr_pages); if (ret) -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC 1/2] io_uring: disallow overlapping ranges for buffer registration 2020-06-12 2:23 ` [RFC 1/2] " Bijan Mottahedeh @ 2020-06-12 15:16 ` Jens Axboe 2020-06-12 18:22 ` Bijan Mottahedeh 0 siblings, 1 reply; 11+ messages in thread From: Jens Axboe @ 2020-06-12 15:16 UTC (permalink / raw) To: Bijan Mottahedeh; +Cc: io-uring On 6/11/20 8:23 PM, Bijan Mottahedeh wrote: > Buffer registration is expensive in terms of cpu/mem overhead and there > seems no good reason to allow overlapping ranges. There's also no good reason to disallow it imho, so not sure we should be doing that. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 1/2] io_uring: disallow overlapping ranges for buffer registration 2020-06-12 15:16 ` Jens Axboe @ 2020-06-12 18:22 ` Bijan Mottahedeh 2020-06-14 15:56 ` Jens Axboe 0 siblings, 1 reply; 11+ messages in thread From: Bijan Mottahedeh @ 2020-06-12 18:22 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring On 6/12/2020 8:16 AM, Jens Axboe wrote: > On 6/11/20 8:23 PM, Bijan Mottahedeh wrote: >> Buffer registration is expensive in terms of cpu/mem overhead and there >> seems no good reason to allow overlapping ranges. > There's also no good reason to disallow it imho, so not sure we should > be doing that. > My concern was about a malicious user without CAP_IPC_LOCK abusing its memlock limit and repeatedly register the same buffer as that would tie up cpu/mem resources to pin up to 1TB of memory, but maybe the argument is that the user should have not been granted that large of memlock limit? Also, without any restrictions, there are a huge number of ways overlapping ranges could be specified, creating a very large validation matrix. What the use cases for that flexibility are though, I don't know. --bijan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 1/2] io_uring: disallow overlapping ranges for buffer registration 2020-06-12 18:22 ` Bijan Mottahedeh @ 2020-06-14 15:56 ` Jens Axboe 0 siblings, 0 replies; 11+ messages in thread From: Jens Axboe @ 2020-06-14 15:56 UTC (permalink / raw) To: Bijan Mottahedeh; +Cc: io-uring On 6/12/20 12:22 PM, Bijan Mottahedeh wrote: > On 6/12/2020 8:16 AM, Jens Axboe wrote: >> On 6/11/20 8:23 PM, Bijan Mottahedeh wrote: >>> Buffer registration is expensive in terms of cpu/mem overhead and there >>> seems no good reason to allow overlapping ranges. >> There's also no good reason to disallow it imho, so not sure we should >> be doing that. >> > > My concern was about a malicious user without CAP_IPC_LOCK abusing its > memlock limit and repeatedly register the same buffer as that would tie > up cpu/mem resources to pin up to 1TB of memory, but maybe the argument > is that the user should have not been granted that large of memlock > limit? Also, without any restrictions, there are a huge number of ways > overlapping ranges could be specified, creating a very large validation > matrix. What the use cases for that flexibility are though, I don't know. Not sure I follow, so I must be missing something. We're accounting each buffer as it is, so how are we abusing the limit? My concern on the overlaps is mainly that someone could be (inadvertently, perhaps) doing it already, and now we're breaking that. Or maybe that are doing it purposely, for some reason I just don't know about. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC 2/2] io_uring: report pinned memory usage 2020-06-12 2:23 [RFC 0/2] io_uring: disallow overlapping ranges for buffer registration Bijan Mottahedeh 2020-06-12 2:23 ` [RFC 1/2] " Bijan Mottahedeh @ 2020-06-12 2:23 ` Bijan Mottahedeh 2020-06-12 15:16 ` Jens Axboe 1 sibling, 1 reply; 11+ messages in thread From: Bijan Mottahedeh @ 2020-06-12 2:23 UTC (permalink / raw) To: axboe; +Cc: io-uring Long term, it makes sense to separate reporting and enforcing of pinned memory usage. Signed-off-by: Bijan Mottahedeh <[email protected]> It is useful to view --- fs/io_uring.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 4248726..cf3acaa 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -7080,6 +7080,8 @@ static int io_sq_offload_start(struct io_ring_ctx *ctx, static void io_unaccount_mem(struct user_struct *user, unsigned long nr_pages) { atomic_long_sub(nr_pages, &user->locked_vm); + if (current->mm) + atomic_long_sub(nr_pages, ¤t->mm->pinned_vm); } static int io_account_mem(struct user_struct *user, unsigned long nr_pages) @@ -7096,6 +7098,8 @@ static int io_account_mem(struct user_struct *user, unsigned long nr_pages) return -ENOMEM; } while (atomic_long_cmpxchg(&user->locked_vm, cur_pages, new_pages) != cur_pages); + if (current->mm) + atomic_long_add(nr_pages, ¤t->mm->pinned_vm); return 0; } -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC 2/2] io_uring: report pinned memory usage 2020-06-12 2:23 ` [RFC 2/2] io_uring: report pinned memory usage Bijan Mottahedeh @ 2020-06-12 15:16 ` Jens Axboe 2020-06-12 15:19 ` Jens Axboe 0 siblings, 1 reply; 11+ messages in thread From: Jens Axboe @ 2020-06-12 15:16 UTC (permalink / raw) To: Bijan Mottahedeh; +Cc: io-uring On 6/11/20 8:23 PM, Bijan Mottahedeh wrote: > Long term, it makes sense to separate reporting and enforcing of pinned > memory usage. > > Signed-off-by: Bijan Mottahedeh <[email protected]> > > It is useful to view > --- > fs/io_uring.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 4248726..cf3acaa 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -7080,6 +7080,8 @@ static int io_sq_offload_start(struct io_ring_ctx *ctx, > static void io_unaccount_mem(struct user_struct *user, unsigned long nr_pages) > { > atomic_long_sub(nr_pages, &user->locked_vm); > + if (current->mm) > + atomic_long_sub(nr_pages, ¤t->mm->pinned_vm); > } > > static int io_account_mem(struct user_struct *user, unsigned long nr_pages) > @@ -7096,6 +7098,8 @@ static int io_account_mem(struct user_struct *user, unsigned long nr_pages) > return -ENOMEM; > } while (atomic_long_cmpxchg(&user->locked_vm, cur_pages, > new_pages) != cur_pages); > + if (current->mm) > + atomic_long_add(nr_pages, ¤t->mm->pinned_vm); > > return 0; > } current->mm should always be valid for these, so I think you can skip the checking of that and just make it unconditional. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 2/2] io_uring: report pinned memory usage 2020-06-12 15:16 ` Jens Axboe @ 2020-06-12 15:19 ` Jens Axboe 2020-06-12 15:50 ` Jens Axboe 2020-06-13 4:43 ` Bijan Mottahedeh 0 siblings, 2 replies; 11+ messages in thread From: Jens Axboe @ 2020-06-12 15:19 UTC (permalink / raw) To: Bijan Mottahedeh; +Cc: io-uring On 6/12/20 9:16 AM, Jens Axboe wrote: > On 6/11/20 8:23 PM, Bijan Mottahedeh wrote: >> Long term, it makes sense to separate reporting and enforcing of pinned >> memory usage. >> >> Signed-off-by: Bijan Mottahedeh <[email protected]> >> >> It is useful to view >> --- >> fs/io_uring.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/fs/io_uring.c b/fs/io_uring.c >> index 4248726..cf3acaa 100644 >> --- a/fs/io_uring.c >> +++ b/fs/io_uring.c >> @@ -7080,6 +7080,8 @@ static int io_sq_offload_start(struct io_ring_ctx *ctx, >> static void io_unaccount_mem(struct user_struct *user, unsigned long nr_pages) >> { >> atomic_long_sub(nr_pages, &user->locked_vm); >> + if (current->mm) >> + atomic_long_sub(nr_pages, ¤t->mm->pinned_vm); >> } >> >> static int io_account_mem(struct user_struct *user, unsigned long nr_pages) >> @@ -7096,6 +7098,8 @@ static int io_account_mem(struct user_struct *user, unsigned long nr_pages) >> return -ENOMEM; >> } while (atomic_long_cmpxchg(&user->locked_vm, cur_pages, >> new_pages) != cur_pages); >> + if (current->mm) >> + atomic_long_add(nr_pages, ¤t->mm->pinned_vm); >> >> return 0; >> } > > current->mm should always be valid for these, so I think you can skip the > checking of that and just make it unconditional. Two other issues with this: - It's an atomic64, so seems more appropriate to use the atomic64 helpers for this one. - The unaccount could potentially be a different mm, if the ring is shared and one task sets it up while another tears it down. So we'd need something to ensure consistency here. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 2/2] io_uring: report pinned memory usage 2020-06-12 15:19 ` Jens Axboe @ 2020-06-12 15:50 ` Jens Axboe 2020-06-13 4:43 ` Bijan Mottahedeh 1 sibling, 0 replies; 11+ messages in thread From: Jens Axboe @ 2020-06-12 15:50 UTC (permalink / raw) To: Bijan Mottahedeh; +Cc: io-uring On 6/12/20 9:19 AM, Jens Axboe wrote: > On 6/12/20 9:16 AM, Jens Axboe wrote: >> On 6/11/20 8:23 PM, Bijan Mottahedeh wrote: >>> Long term, it makes sense to separate reporting and enforcing of pinned >>> memory usage. >>> >>> Signed-off-by: Bijan Mottahedeh <[email protected]> >>> >>> It is useful to view >>> --- >>> fs/io_uring.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>> index 4248726..cf3acaa 100644 >>> --- a/fs/io_uring.c >>> +++ b/fs/io_uring.c >>> @@ -7080,6 +7080,8 @@ static int io_sq_offload_start(struct io_ring_ctx *ctx, >>> static void io_unaccount_mem(struct user_struct *user, unsigned long nr_pages) >>> { >>> atomic_long_sub(nr_pages, &user->locked_vm); >>> + if (current->mm) >>> + atomic_long_sub(nr_pages, ¤t->mm->pinned_vm); >>> } >>> >>> static int io_account_mem(struct user_struct *user, unsigned long nr_pages) >>> @@ -7096,6 +7098,8 @@ static int io_account_mem(struct user_struct *user, unsigned long nr_pages) >>> return -ENOMEM; >>> } while (atomic_long_cmpxchg(&user->locked_vm, cur_pages, >>> new_pages) != cur_pages); >>> + if (current->mm) >>> + atomic_long_add(nr_pages, ¤t->mm->pinned_vm); >>> >>> return 0; >>> } >> >> current->mm should always be valid for these, so I think you can skip the >> checking of that and just make it unconditional. > > Two other issues with this: > > - It's an atomic64, so seems more appropriate to use the atomic64 helpers > for this one. > - The unaccount could potentially be a different mm, if the ring is shared > and one task sets it up while another tears it down. So we'd need something > to ensure consistency here. IOW, this should just use ctx->sqo_mm, as that's grabbed and valid. Just need to ensure it's done correct in terms of ordering for setup. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 2/2] io_uring: report pinned memory usage 2020-06-12 15:19 ` Jens Axboe 2020-06-12 15:50 ` Jens Axboe @ 2020-06-13 4:43 ` Bijan Mottahedeh 2020-06-14 15:57 ` Jens Axboe 1 sibling, 1 reply; 11+ messages in thread From: Bijan Mottahedeh @ 2020-06-13 4:43 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring On 6/12/2020 8:19 AM, Jens Axboe wrote: > On 6/12/20 9:16 AM, Jens Axboe wrote: >> On 6/11/20 8:23 PM, Bijan Mottahedeh wrote: >>> Long term, it makes sense to separate reporting and enforcing of pinned >>> memory usage. >>> >>> Signed-off-by: Bijan Mottahedeh <[email protected]> >>> >>> It is useful to view >>> --- >>> fs/io_uring.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>> index 4248726..cf3acaa 100644 >>> --- a/fs/io_uring.c >>> +++ b/fs/io_uring.c >>> @@ -7080,6 +7080,8 @@ static int io_sq_offload_start(struct io_ring_ctx *ctx, >>> static void io_unaccount_mem(struct user_struct *user, unsigned long nr_pages) >>> { >>> atomic_long_sub(nr_pages, &user->locked_vm); >>> + if (current->mm) >>> + atomic_long_sub(nr_pages, ¤t->mm->pinned_vm); >>> } >>> >>> static int io_account_mem(struct user_struct *user, unsigned long nr_pages) >>> @@ -7096,6 +7098,8 @@ static int io_account_mem(struct user_struct *user, unsigned long nr_pages) >>> return -ENOMEM; >>> } while (atomic_long_cmpxchg(&user->locked_vm, cur_pages, >>> new_pages) != cur_pages); >>> + if (current->mm) >>> + atomic_long_add(nr_pages, ¤t->mm->pinned_vm); >>> >>> return 0; >>> } >> current->mm should always be valid for these, so I think you can skip the >> checking of that and just make it unconditional. > Two other issues with this: > > - It's an atomic64, so seems more appropriate to use the atomic64 helpers > for this one. > - The unaccount could potentially be a different mm, if the ring is shared > and one task sets it up while another tears it down. So we'd need something > to ensure consistency here. > Are you referring to a case where one process creates a ring and sends the ring fd to another process? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 2/2] io_uring: report pinned memory usage 2020-06-13 4:43 ` Bijan Mottahedeh @ 2020-06-14 15:57 ` Jens Axboe 0 siblings, 0 replies; 11+ messages in thread From: Jens Axboe @ 2020-06-14 15:57 UTC (permalink / raw) To: Bijan Mottahedeh; +Cc: io-uring On 6/12/20 10:43 PM, Bijan Mottahedeh wrote: > On 6/12/2020 8:19 AM, Jens Axboe wrote: >> On 6/12/20 9:16 AM, Jens Axboe wrote: >>> On 6/11/20 8:23 PM, Bijan Mottahedeh wrote: >>>> Long term, it makes sense to separate reporting and enforcing of pinned >>>> memory usage. >>>> >>>> Signed-off-by: Bijan Mottahedeh <[email protected]> >>>> >>>> It is useful to view >>>> --- >>>> fs/io_uring.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>>> index 4248726..cf3acaa 100644 >>>> --- a/fs/io_uring.c >>>> +++ b/fs/io_uring.c >>>> @@ -7080,6 +7080,8 @@ static int io_sq_offload_start(struct io_ring_ctx *ctx, >>>> static void io_unaccount_mem(struct user_struct *user, unsigned long nr_pages) >>>> { >>>> atomic_long_sub(nr_pages, &user->locked_vm); >>>> + if (current->mm) >>>> + atomic_long_sub(nr_pages, ¤t->mm->pinned_vm); >>>> } >>>> >>>> static int io_account_mem(struct user_struct *user, unsigned long nr_pages) >>>> @@ -7096,6 +7098,8 @@ static int io_account_mem(struct user_struct *user, unsigned long nr_pages) >>>> return -ENOMEM; >>>> } while (atomic_long_cmpxchg(&user->locked_vm, cur_pages, >>>> new_pages) != cur_pages); >>>> + if (current->mm) >>>> + atomic_long_add(nr_pages, ¤t->mm->pinned_vm); >>>> >>>> return 0; >>>> } >>> current->mm should always be valid for these, so I think you can skip the >>> checking of that and just make it unconditional. >> Two other issues with this: >> >> - It's an atomic64, so seems more appropriate to use the atomic64 helpers >> for this one. >> - The unaccount could potentially be a different mm, if the ring is shared >> and one task sets it up while another tears it down. So we'd need something >> to ensure consistency here. >> > Are you referring to a case where one process creates a ring and sends > the ring fd to another process? Or a simpler case, where someone has submissions and completions running on separate threads, and it just so happens that the completion side is the one to exit the ring. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-06-14 15:57 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-06-12 2:23 [RFC 0/2] io_uring: disallow overlapping ranges for buffer registration Bijan Mottahedeh 2020-06-12 2:23 ` [RFC 1/2] " Bijan Mottahedeh 2020-06-12 15:16 ` Jens Axboe 2020-06-12 18:22 ` Bijan Mottahedeh 2020-06-14 15:56 ` Jens Axboe 2020-06-12 2:23 ` [RFC 2/2] io_uring: report pinned memory usage Bijan Mottahedeh 2020-06-12 15:16 ` Jens Axboe 2020-06-12 15:19 ` Jens Axboe 2020-06-12 15:50 ` Jens Axboe 2020-06-13 4:43 ` Bijan Mottahedeh 2020-06-14 15:57 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox