* [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() [not found] <[email protected]> @ 2023-04-18 15:49 ` Lorenzo Stoakes 2023-04-18 15:55 ` David Hildenbrand 2023-04-19 16:35 ` Jens Axboe 2023-04-18 15:49 ` [PATCH v4 5/6] mm/gup: remove vmas parameter from pin_user_pages() Lorenzo Stoakes 1 sibling, 2 replies; 27+ messages in thread From: Lorenzo Stoakes @ 2023-04-18 15:49 UTC (permalink / raw) To: linux-mm, linux-kernel, Andrew Morton Cc: Matthew Wilcox, David Hildenbrand, Jens Axboe, Pavel Begunkov, io-uring, Lorenzo Stoakes We are shortly to remove pin_user_pages(), and instead perform the required VMA checks ourselves. In most cases there will be a single VMA so this should caues no undue impact on an already slow path. Doing this eliminates the one instance of vmas being used by pin_user_pages(). Suggested-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Lorenzo Stoakes <[email protected]> --- io_uring/rsrc.c | 55 ++++++++++++++++++++++++++++--------------------- 1 file changed, 31 insertions(+), 24 deletions(-) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 7a43aed8e395..3a927df9d913 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -1138,12 +1138,37 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages, return ret; } +static int check_vmas_locked(unsigned long addr, unsigned long len) +{ + struct file *file; + VMA_ITERATOR(vmi, current->mm, addr); + struct vm_area_struct *vma = vma_next(&vmi); + unsigned long end = addr + len; + + if (WARN_ON_ONCE(!vma)) + return -EINVAL; + + file = vma->vm_file; + if (file && !is_file_hugepages(file)) + return -EOPNOTSUPP; + + /* don't support file backed memory */ + for_each_vma_range(vmi, vma, end) { + if (vma->vm_file != file) + return -EINVAL; + + if (file && !vma_is_shmem(vma)) + return -EOPNOTSUPP; + } + + return 0; +} + struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages) { unsigned long start, end, nr_pages; - struct vm_area_struct **vmas = NULL; struct page **pages = NULL; - int i, pret, ret = -ENOMEM; + int pret, ret = -ENOMEM; end = (ubuf + len + PAGE_SIZE - 1) >> PAGE_SHIFT; start = ubuf >> PAGE_SHIFT; @@ -1153,31 +1178,14 @@ struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages) if (!pages) goto done; - vmas = kvmalloc_array(nr_pages, sizeof(struct vm_area_struct *), - GFP_KERNEL); - if (!vmas) - goto done; - ret = 0; mmap_read_lock(current->mm); + pret = pin_user_pages(ubuf, nr_pages, FOLL_WRITE | FOLL_LONGTERM, - pages, vmas); - if (pret == nr_pages) { - struct file *file = vmas[0]->vm_file; + pages, NULL); - /* don't support file backed memory */ - for (i = 0; i < nr_pages; i++) { - if (vmas[i]->vm_file != file) { - ret = -EINVAL; - break; - } - if (!file) - continue; - if (!vma_is_shmem(vmas[i]) && !is_file_hugepages(file)) { - ret = -EOPNOTSUPP; - break; - } - } + if (pret == nr_pages) { + ret = check_vmas_locked(ubuf, len); *npages = nr_pages; } else { ret = pret < 0 ? pret : -EFAULT; @@ -1194,7 +1202,6 @@ struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages) } ret = 0; done: - kvfree(vmas); if (ret < 0) { kvfree(pages); pages = ERR_PTR(ret); -- 2.40.0 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-18 15:49 ` [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() Lorenzo Stoakes @ 2023-04-18 15:55 ` David Hildenbrand 2023-04-18 15:56 ` David Hildenbrand 2023-04-18 16:25 ` Lorenzo Stoakes 2023-04-19 16:35 ` Jens Axboe 1 sibling, 2 replies; 27+ messages in thread From: David Hildenbrand @ 2023-04-18 15:55 UTC (permalink / raw) To: Lorenzo Stoakes, linux-mm, linux-kernel, Andrew Morton Cc: Matthew Wilcox, Jens Axboe, Pavel Begunkov, io-uring On 18.04.23 17:49, Lorenzo Stoakes wrote: > We are shortly to remove pin_user_pages(), and instead perform the required > VMA checks ourselves. In most cases there will be a single VMA so this > should caues no undue impact on an already slow path. > > Doing this eliminates the one instance of vmas being used by > pin_user_pages(). > > Suggested-by: Matthew Wilcox (Oracle) <[email protected]> > Signed-off-by: Lorenzo Stoakes <[email protected]> > --- > io_uring/rsrc.c | 55 ++++++++++++++++++++++++++++--------------------- > 1 file changed, 31 insertions(+), 24 deletions(-) > > diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c > index 7a43aed8e395..3a927df9d913 100644 > --- a/io_uring/rsrc.c > +++ b/io_uring/rsrc.c > @@ -1138,12 +1138,37 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages, > return ret; > } > > +static int check_vmas_locked(unsigned long addr, unsigned long len) TBH, the whole "_locked" suffix is a bit confusing. I was wondering why you'd want to check whether the VMAs are locked ... > +{ > + struct file *file; > + VMA_ITERATOR(vmi, current->mm, addr); > + struct vm_area_struct *vma = vma_next(&vmi); > + unsigned long end = addr + len; > + > + if (WARN_ON_ONCE(!vma)) > + return -EINVAL; > + > + file = vma->vm_file; > + if (file && !is_file_hugepages(file)) > + return -EOPNOTSUPP; You'd now be rejecting vma_is_shmem() here, no? -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-18 15:55 ` David Hildenbrand @ 2023-04-18 15:56 ` David Hildenbrand 2023-04-18 16:25 ` Lorenzo Stoakes 1 sibling, 0 replies; 27+ messages in thread From: David Hildenbrand @ 2023-04-18 15:56 UTC (permalink / raw) To: Lorenzo Stoakes, linux-mm, linux-kernel, Andrew Morton Cc: Matthew Wilcox, Jens Axboe, Pavel Begunkov, io-uring On 18.04.23 17:55, David Hildenbrand wrote: > On 18.04.23 17:49, Lorenzo Stoakes wrote: >> We are shortly to remove pin_user_pages(), and instead perform the required >> VMA checks ourselves. In most cases there will be a single VMA so this >> should caues no undue impact on an already slow path. >> >> Doing this eliminates the one instance of vmas being used by >> pin_user_pages(). >> >> Suggested-by: Matthew Wilcox (Oracle) <[email protected]> >> Signed-off-by: Lorenzo Stoakes <[email protected]> >> --- >> io_uring/rsrc.c | 55 ++++++++++++++++++++++++++++--------------------- >> 1 file changed, 31 insertions(+), 24 deletions(-) >> >> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c >> index 7a43aed8e395..3a927df9d913 100644 >> --- a/io_uring/rsrc.c >> +++ b/io_uring/rsrc.c >> @@ -1138,12 +1138,37 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages, >> return ret; >> } >> >> +static int check_vmas_locked(unsigned long addr, unsigned long len) > > TBH, the whole "_locked" suffix is a bit confusing. > > I was wondering why you'd want to check whether the VMAs are locked ... FWIW, "check_vmas_compatible_locked" might be better. But the "_locked" is still annoying. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-18 15:55 ` David Hildenbrand 2023-04-18 15:56 ` David Hildenbrand @ 2023-04-18 16:25 ` Lorenzo Stoakes 1 sibling, 0 replies; 27+ messages in thread From: Lorenzo Stoakes @ 2023-04-18 16:25 UTC (permalink / raw) To: David Hildenbrand Cc: linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox, Jens Axboe, Pavel Begunkov, io-uring On Tue, Apr 18, 2023 at 05:55:48PM +0200, David Hildenbrand wrote: > On 18.04.23 17:49, Lorenzo Stoakes wrote: > > We are shortly to remove pin_user_pages(), and instead perform the required > > VMA checks ourselves. In most cases there will be a single VMA so this > > should caues no undue impact on an already slow path. > > > > Doing this eliminates the one instance of vmas being used by > > pin_user_pages(). > > > > Suggested-by: Matthew Wilcox (Oracle) <[email protected]> > > Signed-off-by: Lorenzo Stoakes <[email protected]> > > --- > > io_uring/rsrc.c | 55 ++++++++++++++++++++++++++++--------------------- > > 1 file changed, 31 insertions(+), 24 deletions(-) > > > > diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c > > index 7a43aed8e395..3a927df9d913 100644 > > --- a/io_uring/rsrc.c > > +++ b/io_uring/rsrc.c > > @@ -1138,12 +1138,37 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages, > > return ret; > > } > > +static int check_vmas_locked(unsigned long addr, unsigned long len) > > TBH, the whole "_locked" suffix is a bit confusing. > > I was wondering why you'd want to check whether the VMAs are locked ... > Yeah it's annoying partly because GUP itself is super inconsistent about it. Idea is to try to indicate that you need to hold mmap_lock obviously... let's drop _locked then since we're inconsistent with it anyway. > > +{ > > + struct file *file; > > + VMA_ITERATOR(vmi, current->mm, addr); > > + struct vm_area_struct *vma = vma_next(&vmi); > > + unsigned long end = addr + len; > > + > > + if (WARN_ON_ONCE(!vma)) > > + return -EINVAL; > > + > > + file = vma->vm_file; > > + if (file && !is_file_hugepages(file)) > > + return -EOPNOTSUPP; > > You'd now be rejecting vma_is_shmem() here, no? > Good spot, I guess I was confused that io_uring would actually want to do that... not sure who'd want to actually mapping some shmem for this purpose! I'll update to make it match the existing code. > > -- > Thanks, > > David / dhildenb > To avoid spam, here's a -fix patch for both:- ----8<---- From 62838d66ee01e631c7c8aa3848b6892d1478c5b6 Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes <[email protected]> Date: Tue, 18 Apr 2023 17:11:01 +0100 Subject: [PATCH] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() Rename function to avoid confusion and correct shmem check as suggested by David. --- io_uring/rsrc.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 3a927df9d913..483b975e31b3 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -1138,7 +1138,7 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages, return ret; } -static int check_vmas_locked(unsigned long addr, unsigned long len) +static int check_vmas_compatible(unsigned long addr, unsigned long len) { struct file *file; VMA_ITERATOR(vmi, current->mm, addr); @@ -1149,15 +1149,16 @@ static int check_vmas_locked(unsigned long addr, unsigned long len) return -EINVAL; file = vma->vm_file; - if (file && !is_file_hugepages(file)) - return -EOPNOTSUPP; /* don't support file backed memory */ for_each_vma_range(vmi, vma, end) { if (vma->vm_file != file) return -EINVAL; - if (file && !vma_is_shmem(vma)) + if (!file) + continue; + + if (!vma_is_shmem(vma) && !is_file_hugepages(file)) return -EOPNOTSUPP; } @@ -1185,7 +1186,7 @@ struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages) pages, NULL); if (pret == nr_pages) { - ret = check_vmas_locked(ubuf, len); + ret = check_vmas_compatible(ubuf, len); *npages = nr_pages; } else { ret = pret < 0 ? pret : -EFAULT; -- 2.40.0 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-18 15:49 ` [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() Lorenzo Stoakes 2023-04-18 15:55 ` David Hildenbrand @ 2023-04-19 16:35 ` Jens Axboe 2023-04-19 16:59 ` Jens Axboe 2023-04-19 17:07 ` Lorenzo Stoakes 1 sibling, 2 replies; 27+ messages in thread From: Jens Axboe @ 2023-04-19 16:35 UTC (permalink / raw) To: Lorenzo Stoakes, linux-mm, linux-kernel, Andrew Morton Cc: Matthew Wilcox, David Hildenbrand, Pavel Begunkov, io-uring On 4/18/23 9:49?AM, Lorenzo Stoakes wrote: > We are shortly to remove pin_user_pages(), and instead perform the required > VMA checks ourselves. In most cases there will be a single VMA so this > should caues no undue impact on an already slow path. > > Doing this eliminates the one instance of vmas being used by > pin_user_pages(). First up, please don't just send single patches from a series. It's really annoying when you are trying to get the full picture. Just CC the whole series, so reviews don't have to look it up separately. So when you're doing a respin for what I'll mention below and the issue that David found, please don't just show us patch 4+5 of the series. > diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c > index 7a43aed8e395..3a927df9d913 100644 > --- a/io_uring/rsrc.c > +++ b/io_uring/rsrc.c > @@ -1138,12 +1138,37 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages, > return ret; > } > > +static int check_vmas_locked(unsigned long addr, unsigned long len) > +{ > + struct file *file; > + VMA_ITERATOR(vmi, current->mm, addr); > + struct vm_area_struct *vma = vma_next(&vmi); > + unsigned long end = addr + len; > + > + if (WARN_ON_ONCE(!vma)) > + return -EINVAL; > + > + file = vma->vm_file; > + if (file && !is_file_hugepages(file)) > + return -EOPNOTSUPP; > + > + /* don't support file backed memory */ > + for_each_vma_range(vmi, vma, end) { > + if (vma->vm_file != file) > + return -EINVAL; > + > + if (file && !vma_is_shmem(vma)) > + return -EOPNOTSUPP; > + } > + > + return 0; > +} I really dislike this naming. There's no point to doing locked in the naming here, it just makes people think it's checking whether the vmas are locked. Which is not at all what it does. Because what else would we think, there's nothing else in the name that suggests what it is actually checking. Don't put implied locking in the naming, the way to do that is to do something ala: lockdep_assert_held_read(¤t->mm->mmap_lock); though I don't think it's needed here at all, as there's just one caller and it's clearly inside. You could even just make a comment instead. So please rename this to indicate what it's ACTUALLY checking. -- Jens Axboe ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 16:35 ` Jens Axboe @ 2023-04-19 16:59 ` Jens Axboe 2023-04-19 17:23 ` Lorenzo Stoakes 2023-04-19 17:07 ` Lorenzo Stoakes 1 sibling, 1 reply; 27+ messages in thread From: Jens Axboe @ 2023-04-19 16:59 UTC (permalink / raw) To: Lorenzo Stoakes, linux-mm, linux-kernel, Andrew Morton Cc: Matthew Wilcox, David Hildenbrand, Pavel Begunkov, io-uring On 4/19/23 10:35?AM, Jens Axboe wrote: > On 4/18/23 9:49?AM, Lorenzo Stoakes wrote: >> We are shortly to remove pin_user_pages(), and instead perform the required >> VMA checks ourselves. In most cases there will be a single VMA so this >> should caues no undue impact on an already slow path. >> >> Doing this eliminates the one instance of vmas being used by >> pin_user_pages(). > > First up, please don't just send single patches from a series. It's > really annoying when you are trying to get the full picture. Just CC the > whole series, so reviews don't have to look it up separately. > > So when you're doing a respin for what I'll mention below and the issue > that David found, please don't just show us patch 4+5 of the series. I'll reply here too rather than keep some of this conversaion out-of-band. I don't necessarily think that making io buffer registration dumber and less efficient by needing a separate vma lookup after the fact is a huge deal, as I would imagine most workloads register buffers at setup time and then don't change them. But if people do switch sets at runtime, it's not necessarily a slow path. That said, I suspect the other bits that we do in here, like the GUP, is going to dominate the overhead anyway. My main question is, why don't we just have a __pin_user_pages or something helper that still takes the vmas argument, and drop it from pin_user_pages() only? That'd still allow the cleanup of the other users that don't care about the vma at all, while retaining the bundled functionality for the case/cases that do? That would avoid needing explicit vma iteration in io_uring. -- Jens Axboe ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 16:59 ` Jens Axboe @ 2023-04-19 17:23 ` Lorenzo Stoakes 2023-04-19 17:35 ` Jens Axboe 0 siblings, 1 reply; 27+ messages in thread From: Lorenzo Stoakes @ 2023-04-19 17:23 UTC (permalink / raw) To: Jens Axboe Cc: linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox, David Hildenbrand, Pavel Begunkov, io-uring, Jason Gunthorpe On Wed, Apr 19, 2023 at 10:59:27AM -0600, Jens Axboe wrote: > On 4/19/23 10:35?AM, Jens Axboe wrote: > > On 4/18/23 9:49?AM, Lorenzo Stoakes wrote: > >> We are shortly to remove pin_user_pages(), and instead perform the required > >> VMA checks ourselves. In most cases there will be a single VMA so this > >> should caues no undue impact on an already slow path. > >> > >> Doing this eliminates the one instance of vmas being used by > >> pin_user_pages(). > > > > First up, please don't just send single patches from a series. It's > > really annoying when you are trying to get the full picture. Just CC the > > whole series, so reviews don't have to look it up separately. > > > > So when you're doing a respin for what I'll mention below and the issue > > that David found, please don't just show us patch 4+5 of the series. > > I'll reply here too rather than keep some of this conversaion > out-of-band. > > I don't necessarily think that making io buffer registration dumber and > less efficient by needing a separate vma lookup after the fact is a huge > deal, as I would imagine most workloads register buffers at setup time > and then don't change them. But if people do switch sets at runtime, > it's not necessarily a slow path. That said, I suspect the other bits > that we do in here, like the GUP, is going to dominate the overhead > anyway. Thanks, and indeed I expect the GUP will dominate. > > My main question is, why don't we just have a __pin_user_pages or > something helper that still takes the vmas argument, and drop it from > pin_user_pages() only? That'd still allow the cleanup of the other users > that don't care about the vma at all, while retaining the bundled > functionality for the case/cases that do? That would avoid needing > explicit vma iteration in io_uring. > The desire here is to completely eliminate vmas as an externally available parameter from GUP. While we do have a newly introduced helper that returns a VMA, doing the lookup manually for all other vma cases (which look up a single page and vma), that is more so a helper that sits outside of GUP. Having a separate function that still bundled the vmas would essentially undermine the purpose of the series altogether which is not just to clean up some NULL's but rather to eliminate vmas as part of the GUP interface altogether. The reason for this is that by doing so we simplify the GUP interface, eliminate a whole class of possible future bugs with people holding onto pointers to vmas which may dangle and lead the way to future changes in GUP which might be more impactful, such as trying to find means to use the fast paths in more areas with an eye to gradual eradication of the use of mmap_lock. While we return VMAs, none of this is possible and it also makes the interface more confusing - without vmas GUP takes flags which define its behaviour and in most cases returns page objects. The odd rules about what can and cannot return vmas under what circumstances are not helpful for new users. Another point here is that Jason suggested adding a new FOLL_ALLOW_BROKEN_FILE_MAPPINGS flag which would, by default, not be set. This could assert that only shmem/hugetlb file mappings are permitted which would eliminate the need for you to perform this check at all. This leads into the larger point that GUP-writing file mappings is fundamentally broken due to e.g. GUP not honouring write notify so this check should at least in theory not be necessary. So it may be the case that should such a flag be added this code will simply be deleted at a future point :) > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 17:23 ` Lorenzo Stoakes @ 2023-04-19 17:35 ` Jens Axboe 2023-04-19 17:47 ` Lorenzo Stoakes 0 siblings, 1 reply; 27+ messages in thread From: Jens Axboe @ 2023-04-19 17:35 UTC (permalink / raw) To: Lorenzo Stoakes Cc: linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox, David Hildenbrand, Pavel Begunkov, io-uring, Jason Gunthorpe On 4/19/23 11:23?AM, Lorenzo Stoakes wrote: > On Wed, Apr 19, 2023 at 10:59:27AM -0600, Jens Axboe wrote: >> On 4/19/23 10:35?AM, Jens Axboe wrote: >>> On 4/18/23 9:49?AM, Lorenzo Stoakes wrote: >>>> We are shortly to remove pin_user_pages(), and instead perform the required >>>> VMA checks ourselves. In most cases there will be a single VMA so this >>>> should caues no undue impact on an already slow path. >>>> >>>> Doing this eliminates the one instance of vmas being used by >>>> pin_user_pages(). >>> >>> First up, please don't just send single patches from a series. It's >>> really annoying when you are trying to get the full picture. Just CC the >>> whole series, so reviews don't have to look it up separately. >>> >>> So when you're doing a respin for what I'll mention below and the issue >>> that David found, please don't just show us patch 4+5 of the series. >> >> I'll reply here too rather than keep some of this conversaion >> out-of-band. >> >> I don't necessarily think that making io buffer registration dumber and >> less efficient by needing a separate vma lookup after the fact is a huge >> deal, as I would imagine most workloads register buffers at setup time >> and then don't change them. But if people do switch sets at runtime, >> it's not necessarily a slow path. That said, I suspect the other bits >> that we do in here, like the GUP, is going to dominate the overhead >> anyway. > > Thanks, and indeed I expect the GUP will dominate. Unless you have a lot of vmas... Point is, it's _probably_ not a problem, but it might and it's making things worse for no real gain as far as I can tell outside of some notion of "cleaning up the code". >> My main question is, why don't we just have a __pin_user_pages or >> something helper that still takes the vmas argument, and drop it from >> pin_user_pages() only? That'd still allow the cleanup of the other users >> that don't care about the vma at all, while retaining the bundled >> functionality for the case/cases that do? That would avoid needing >> explicit vma iteration in io_uring. >> > > The desire here is to completely eliminate vmas as an externally available > parameter from GUP. While we do have a newly introduced helper that returns > a VMA, doing the lookup manually for all other vma cases (which look up a > single page and vma), that is more so a helper that sits outside of GUP. > > Having a separate function that still bundled the vmas would essentially > undermine the purpose of the series altogether which is not just to clean > up some NULL's but rather to eliminate vmas as part of the GUP interface > altogether. > > The reason for this is that by doing so we simplify the GUP interface, > eliminate a whole class of possible future bugs with people holding onto > pointers to vmas which may dangle and lead the way to future changes in GUP > which might be more impactful, such as trying to find means to use the fast > paths in more areas with an eye to gradual eradication of the use of > mmap_lock. > > While we return VMAs, none of this is possible and it also makes the > interface more confusing - without vmas GUP takes flags which define its > behaviour and in most cases returns page objects. The odd rules about what > can and cannot return vmas under what circumstances are not helpful for new > users. > > Another point here is that Jason suggested adding a new > FOLL_ALLOW_BROKEN_FILE_MAPPINGS flag which would, by default, not be > set. This could assert that only shmem/hugetlb file mappings are permitted > which would eliminate the need for you to perform this check at all. > > This leads into the larger point that GUP-writing file mappings is > fundamentally broken due to e.g. GUP not honouring write notify so this > check should at least in theory not be necessary. > > So it may be the case that should such a flag be added this code will > simply be deleted at a future point :) Why don't we do that first then? There's nothing more permanent than a temporary workaround/fix. Once it's in there, motivation to get rid of it for most people is zero because they just never see it. Seems like that'd be a much saner approach rather than the other way around, and make this patchset simpler/cleaner too as it'd only be removing code in all of the callers. -- Jens Axboe ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 17:35 ` Jens Axboe @ 2023-04-19 17:47 ` Lorenzo Stoakes 2023-04-19 17:51 ` Jens Axboe 0 siblings, 1 reply; 27+ messages in thread From: Lorenzo Stoakes @ 2023-04-19 17:47 UTC (permalink / raw) To: Jens Axboe Cc: linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox, David Hildenbrand, Pavel Begunkov, io-uring, Jason Gunthorpe On Wed, Apr 19, 2023 at 11:35:58AM -0600, Jens Axboe wrote: > On 4/19/23 11:23?AM, Lorenzo Stoakes wrote: > > On Wed, Apr 19, 2023 at 10:59:27AM -0600, Jens Axboe wrote: > >> On 4/19/23 10:35?AM, Jens Axboe wrote: > >>> On 4/18/23 9:49?AM, Lorenzo Stoakes wrote: > >>>> We are shortly to remove pin_user_pages(), and instead perform the required > >>>> VMA checks ourselves. In most cases there will be a single VMA so this > >>>> should caues no undue impact on an already slow path. > >>>> > >>>> Doing this eliminates the one instance of vmas being used by > >>>> pin_user_pages(). > >>> > >>> First up, please don't just send single patches from a series. It's > >>> really annoying when you are trying to get the full picture. Just CC the > >>> whole series, so reviews don't have to look it up separately. > >>> > >>> So when you're doing a respin for what I'll mention below and the issue > >>> that David found, please don't just show us patch 4+5 of the series. > >> > >> I'll reply here too rather than keep some of this conversaion > >> out-of-band. > >> > >> I don't necessarily think that making io buffer registration dumber and > >> less efficient by needing a separate vma lookup after the fact is a huge > >> deal, as I would imagine most workloads register buffers at setup time > >> and then don't change them. But if people do switch sets at runtime, > >> it's not necessarily a slow path. That said, I suspect the other bits > >> that we do in here, like the GUP, is going to dominate the overhead > >> anyway. > > > > Thanks, and indeed I expect the GUP will dominate. > > Unless you have a lot of vmas... Point is, it's _probably_ not a > problem, but it might and it's making things worse for no real gain as > far as I can tell outside of some notion of "cleaning up the code". > > >> My main question is, why don't we just have a __pin_user_pages or > >> something helper that still takes the vmas argument, and drop it from > >> pin_user_pages() only? That'd still allow the cleanup of the other users > >> that don't care about the vma at all, while retaining the bundled > >> functionality for the case/cases that do? That would avoid needing > >> explicit vma iteration in io_uring. > >> > > > > The desire here is to completely eliminate vmas as an externally available > > parameter from GUP. While we do have a newly introduced helper that returns > > a VMA, doing the lookup manually for all other vma cases (which look up a > > single page and vma), that is more so a helper that sits outside of GUP. > > > > Having a separate function that still bundled the vmas would essentially > > undermine the purpose of the series altogether which is not just to clean > > up some NULL's but rather to eliminate vmas as part of the GUP interface > > altogether. > > > > The reason for this is that by doing so we simplify the GUP interface, > > eliminate a whole class of possible future bugs with people holding onto > > pointers to vmas which may dangle and lead the way to future changes in GUP > > which might be more impactful, such as trying to find means to use the fast > > paths in more areas with an eye to gradual eradication of the use of > > mmap_lock. > > > > While we return VMAs, none of this is possible and it also makes the > > interface more confusing - without vmas GUP takes flags which define its > > behaviour and in most cases returns page objects. The odd rules about what > > can and cannot return vmas under what circumstances are not helpful for new > > users. > > > > Another point here is that Jason suggested adding a new > > FOLL_ALLOW_BROKEN_FILE_MAPPINGS flag which would, by default, not be > > set. This could assert that only shmem/hugetlb file mappings are permitted > > which would eliminate the need for you to perform this check at all. > > > > This leads into the larger point that GUP-writing file mappings is > > fundamentally broken due to e.g. GUP not honouring write notify so this > > check should at least in theory not be necessary. > > > > So it may be the case that should such a flag be added this code will > > simply be deleted at a future point :) > > Why don't we do that first then? There's nothing more permanent than a > temporary workaround/fix. Once it's in there, motivation to get rid of > it for most people is zero because they just never see it. Seems like > that'd be a much saner approach rather than the other way around, and > make this patchset simpler/cleaner too as it'd only be removing code in > all of the callers. > Because I'd then need to audit all GUP callers to see whether they in some way brokenly access files in order to know which should and should not use this new flag. It'd change this series from 'remove the vmas parameter' to something a lot more involved. I think it's much safer to do the two separately, as I feel that change would need quite a bit of scrutiny too. As for temporary, I can assure you I will be looking at introducing this flag, for what it's worth :) and Jason is certainly minded to do work in this area also. > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 17:47 ` Lorenzo Stoakes @ 2023-04-19 17:51 ` Jens Axboe 2023-04-19 18:18 ` Lorenzo Stoakes 0 siblings, 1 reply; 27+ messages in thread From: Jens Axboe @ 2023-04-19 17:51 UTC (permalink / raw) To: Lorenzo Stoakes Cc: linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox, David Hildenbrand, Pavel Begunkov, io-uring, Jason Gunthorpe On 4/19/23 11:47?AM, Lorenzo Stoakes wrote: > On Wed, Apr 19, 2023 at 11:35:58AM -0600, Jens Axboe wrote: >> On 4/19/23 11:23?AM, Lorenzo Stoakes wrote: >>> On Wed, Apr 19, 2023 at 10:59:27AM -0600, Jens Axboe wrote: >>>> On 4/19/23 10:35?AM, Jens Axboe wrote: >>>>> On 4/18/23 9:49?AM, Lorenzo Stoakes wrote: >>>>>> We are shortly to remove pin_user_pages(), and instead perform the required >>>>>> VMA checks ourselves. In most cases there will be a single VMA so this >>>>>> should caues no undue impact on an already slow path. >>>>>> >>>>>> Doing this eliminates the one instance of vmas being used by >>>>>> pin_user_pages(). >>>>> >>>>> First up, please don't just send single patches from a series. It's >>>>> really annoying when you are trying to get the full picture. Just CC the >>>>> whole series, so reviews don't have to look it up separately. >>>>> >>>>> So when you're doing a respin for what I'll mention below and the issue >>>>> that David found, please don't just show us patch 4+5 of the series. >>>> >>>> I'll reply here too rather than keep some of this conversaion >>>> out-of-band. >>>> >>>> I don't necessarily think that making io buffer registration dumber and >>>> less efficient by needing a separate vma lookup after the fact is a huge >>>> deal, as I would imagine most workloads register buffers at setup time >>>> and then don't change them. But if people do switch sets at runtime, >>>> it's not necessarily a slow path. That said, I suspect the other bits >>>> that we do in here, like the GUP, is going to dominate the overhead >>>> anyway. >>> >>> Thanks, and indeed I expect the GUP will dominate. >> >> Unless you have a lot of vmas... Point is, it's _probably_ not a >> problem, but it might and it's making things worse for no real gain as >> far as I can tell outside of some notion of "cleaning up the code". >> >>>> My main question is, why don't we just have a __pin_user_pages or >>>> something helper that still takes the vmas argument, and drop it from >>>> pin_user_pages() only? That'd still allow the cleanup of the other users >>>> that don't care about the vma at all, while retaining the bundled >>>> functionality for the case/cases that do? That would avoid needing >>>> explicit vma iteration in io_uring. >>>> >>> >>> The desire here is to completely eliminate vmas as an externally available >>> parameter from GUP. While we do have a newly introduced helper that returns >>> a VMA, doing the lookup manually for all other vma cases (which look up a >>> single page and vma), that is more so a helper that sits outside of GUP. >>> >>> Having a separate function that still bundled the vmas would essentially >>> undermine the purpose of the series altogether which is not just to clean >>> up some NULL's but rather to eliminate vmas as part of the GUP interface >>> altogether. >>> >>> The reason for this is that by doing so we simplify the GUP interface, >>> eliminate a whole class of possible future bugs with people holding onto >>> pointers to vmas which may dangle and lead the way to future changes in GUP >>> which might be more impactful, such as trying to find means to use the fast >>> paths in more areas with an eye to gradual eradication of the use of >>> mmap_lock. >>> >>> While we return VMAs, none of this is possible and it also makes the >>> interface more confusing - without vmas GUP takes flags which define its >>> behaviour and in most cases returns page objects. The odd rules about what >>> can and cannot return vmas under what circumstances are not helpful for new >>> users. >>> >>> Another point here is that Jason suggested adding a new >>> FOLL_ALLOW_BROKEN_FILE_MAPPINGS flag which would, by default, not be >>> set. This could assert that only shmem/hugetlb file mappings are permitted >>> which would eliminate the need for you to perform this check at all. >>> >>> This leads into the larger point that GUP-writing file mappings is >>> fundamentally broken due to e.g. GUP not honouring write notify so this >>> check should at least in theory not be necessary. >>> >>> So it may be the case that should such a flag be added this code will >>> simply be deleted at a future point :) >> >> Why don't we do that first then? There's nothing more permanent than a >> temporary workaround/fix. Once it's in there, motivation to get rid of >> it for most people is zero because they just never see it. Seems like >> that'd be a much saner approach rather than the other way around, and >> make this patchset simpler/cleaner too as it'd only be removing code in >> all of the callers. >> > > Because I'd then need to audit all GUP callers to see whether they in some > way brokenly access files in order to know which should and should not use > this new flag. It'd change this series from 'remove the vmas parameter' to > something a lot more involved. > > I think it's much safer to do the two separately, as I feel that change > would need quite a bit of scrutiny too. > > As for temporary, I can assure you I will be looking at introducing this > flag, for what it's worth :) and Jason is certainly minded to do work in > this area also. It's either feasible or it's not, and it didn't sound too bad in terms of getting it done to remove the temporary addition. Since we're now days away from the merge window and any of this would need to soak in for-next anyway for a bit, why not just do that other series first? It really is backward. And this happens sometimes when developing patchsets, at some point you realize that things would be easier/cleaner with another prep series first. Nothing wrong with that, but let's not be hesitant to shift direction a bit when it makes sense to do so. I keep getting this sense of urgency for a cleanup series. Why not just do it right from the get-go and make this series simpler? At that point there would be no discussion on it at all, as it would be a straight forward cleanup without adding an intermediate step that'd get deleted later anyway. -- Jens Axboe ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 17:51 ` Jens Axboe @ 2023-04-19 18:18 ` Lorenzo Stoakes 2023-04-19 18:22 ` Jason Gunthorpe 2023-04-19 18:23 ` Matthew Wilcox 0 siblings, 2 replies; 27+ messages in thread From: Lorenzo Stoakes @ 2023-04-19 18:18 UTC (permalink / raw) To: Jens Axboe Cc: linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox, David Hildenbrand, Pavel Begunkov, io-uring, Jason Gunthorpe On Wed, Apr 19, 2023 at 11:51:29AM -0600, Jens Axboe wrote: > On 4/19/23 11:47?AM, Lorenzo Stoakes wrote: > > On Wed, Apr 19, 2023 at 11:35:58AM -0600, Jens Axboe wrote: > >> On 4/19/23 11:23?AM, Lorenzo Stoakes wrote: > >>> On Wed, Apr 19, 2023 at 10:59:27AM -0600, Jens Axboe wrote: > >>>> On 4/19/23 10:35?AM, Jens Axboe wrote: > >>>>> On 4/18/23 9:49?AM, Lorenzo Stoakes wrote: > >>>>>> We are shortly to remove pin_user_pages(), and instead perform the required > >>>>>> VMA checks ourselves. In most cases there will be a single VMA so this > >>>>>> should caues no undue impact on an already slow path. > >>>>>> > >>>>>> Doing this eliminates the one instance of vmas being used by > >>>>>> pin_user_pages(). > >>>>> > >>>>> First up, please don't just send single patches from a series. It's > >>>>> really annoying when you are trying to get the full picture. Just CC the > >>>>> whole series, so reviews don't have to look it up separately. > >>>>> > >>>>> So when you're doing a respin for what I'll mention below and the issue > >>>>> that David found, please don't just show us patch 4+5 of the series. > >>>> > >>>> I'll reply here too rather than keep some of this conversaion > >>>> out-of-band. > >>>> > >>>> I don't necessarily think that making io buffer registration dumber and > >>>> less efficient by needing a separate vma lookup after the fact is a huge > >>>> deal, as I would imagine most workloads register buffers at setup time > >>>> and then don't change them. But if people do switch sets at runtime, > >>>> it's not necessarily a slow path. That said, I suspect the other bits > >>>> that we do in here, like the GUP, is going to dominate the overhead > >>>> anyway. > >>> > >>> Thanks, and indeed I expect the GUP will dominate. > >> > >> Unless you have a lot of vmas... Point is, it's _probably_ not a > >> problem, but it might and it's making things worse for no real gain as > >> far as I can tell outside of some notion of "cleaning up the code". > >> > >>>> My main question is, why don't we just have a __pin_user_pages or > >>>> something helper that still takes the vmas argument, and drop it from > >>>> pin_user_pages() only? That'd still allow the cleanup of the other users > >>>> that don't care about the vma at all, while retaining the bundled > >>>> functionality for the case/cases that do? That would avoid needing > >>>> explicit vma iteration in io_uring. > >>>> > >>> > >>> The desire here is to completely eliminate vmas as an externally available > >>> parameter from GUP. While we do have a newly introduced helper that returns > >>> a VMA, doing the lookup manually for all other vma cases (which look up a > >>> single page and vma), that is more so a helper that sits outside of GUP. > >>> > >>> Having a separate function that still bundled the vmas would essentially > >>> undermine the purpose of the series altogether which is not just to clean > >>> up some NULL's but rather to eliminate vmas as part of the GUP interface > >>> altogether. > >>> > >>> The reason for this is that by doing so we simplify the GUP interface, > >>> eliminate a whole class of possible future bugs with people holding onto > >>> pointers to vmas which may dangle and lead the way to future changes in GUP > >>> which might be more impactful, such as trying to find means to use the fast > >>> paths in more areas with an eye to gradual eradication of the use of > >>> mmap_lock. > >>> > >>> While we return VMAs, none of this is possible and it also makes the > >>> interface more confusing - without vmas GUP takes flags which define its > >>> behaviour and in most cases returns page objects. The odd rules about what > >>> can and cannot return vmas under what circumstances are not helpful for new > >>> users. > >>> > >>> Another point here is that Jason suggested adding a new > >>> FOLL_ALLOW_BROKEN_FILE_MAPPINGS flag which would, by default, not be > >>> set. This could assert that only shmem/hugetlb file mappings are permitted > >>> which would eliminate the need for you to perform this check at all. > >>> > >>> This leads into the larger point that GUP-writing file mappings is > >>> fundamentally broken due to e.g. GUP not honouring write notify so this > >>> check should at least in theory not be necessary. > >>> > >>> So it may be the case that should such a flag be added this code will > >>> simply be deleted at a future point :) > >> > >> Why don't we do that first then? There's nothing more permanent than a > >> temporary workaround/fix. Once it's in there, motivation to get rid of > >> it for most people is zero because they just never see it. Seems like > >> that'd be a much saner approach rather than the other way around, and > >> make this patchset simpler/cleaner too as it'd only be removing code in > >> all of the callers. > >> > > > > Because I'd then need to audit all GUP callers to see whether they in some > > way brokenly access files in order to know which should and should not use > > this new flag. It'd change this series from 'remove the vmas parameter' to > > something a lot more involved. > > > > I think it's much safer to do the two separately, as I feel that change > > would need quite a bit of scrutiny too. > > > > As for temporary, I can assure you I will be looking at introducing this > > flag, for what it's worth :) and Jason is certainly minded to do work in > > this area also. > > It's either feasible or it's not, and it didn't sound too bad in terms > of getting it done to remove the temporary addition. Since we're now > days away from the merge window and any of this would need to soak in > for-next anyway for a bit, why not just do that other series first? It > really is backward. And this happens sometimes when developing > patchsets, at some point you realize that things would be easier/cleaner > with another prep series first. Nothing wrong with that, but let's not > be hesitant to shift direction a bit when it makes sense to do so. > > I keep getting this sense of urgency for a cleanup series. Why not just > do it right from the get-go and make this series simpler? At that point > there would be no discussion on it at all, as it would be a straight > forward cleanup without adding an intermediate step that'd get deleted > later anyway. As I said, I think it is a little more than a cleanup, or at the very least a cleanup that leads the way to more functional changes (and eliminates a class of bugs). I'd also argue that we are doing things right with this patch series as-is, io_uring is the only sticking point because, believe it or not, it is the only place in the kernel that uses multiple vmas (it has been interesting to get a view on GUP use as a whole here). But obviously if you have concerns about performance I understand (note that the actual first iteration of this patch set added a flag specifically to avoid the need for this in order to cause you less trouble :) I would argue that you're _already_ manipulating VMAs (I do understand your desire not to do so obviously) the only thing that would change is how you get them and duplicated work (though likely less impactful). So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I would still need to come along and delete a bunch of your code afterwards. And unfortunately Pavel's recent change which insists on not having different vm_file's across VMAs for the buffer would have to be reverted so I expect it might not be entirely without discussion. However, if you really do feel that you can't accept this change as-is, I can put this series on hold and look at FOLL_ALLOW_BROKEN_FILE_MAPPING and we can return to this afterwards. > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 18:18 ` Lorenzo Stoakes @ 2023-04-19 18:22 ` Jason Gunthorpe 2023-04-19 18:50 ` Lorenzo Stoakes 2023-04-19 18:23 ` Matthew Wilcox 1 sibling, 1 reply; 27+ messages in thread From: Jason Gunthorpe @ 2023-04-19 18:22 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Jens Axboe, linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox, David Hildenbrand, Pavel Begunkov, io-uring On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote: > I'd also argue that we are doing things right with this patch series as-is, > io_uring is the only sticking point because, believe it or not, it is the > only place in the kernel that uses multiple vmas (it has been interesting > to get a view on GUP use as a whole here). I would say io_uring is the only place trying to open-code bug fixes for MM problems :\ As Jens says, these sorts of temporary work arounds become lingering problems that nobody wants to fix properly. > So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I > would still need to come along and delete a bunch of your code > afterwards. And unfortunately Pavel's recent change which insists on not > having different vm_file's across VMAs for the buffer would have to be > reverted so I expect it might not be entirely without discussion. Yeah, that should just be reverted. > However, if you really do feel that you can't accept this change as-is, I > can put this series on hold and look at FOLL_ALLOW_BROKEN_FILE_MAPPING and > we can return to this afterwards. It is probably not as bad as you think, I suspect only RDMA really wants to set the flag. Maybe something in media too, maybe. Then again that stuff doesn't work so incredibly badly maybe there really is no user of it and we should just block it completely. Jason ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 18:22 ` Jason Gunthorpe @ 2023-04-19 18:50 ` Lorenzo Stoakes 0 siblings, 0 replies; 27+ messages in thread From: Lorenzo Stoakes @ 2023-04-19 18:50 UTC (permalink / raw) To: Jason Gunthorpe Cc: Jens Axboe, linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox, David Hildenbrand, Pavel Begunkov, io-uring On Wed, Apr 19, 2023 at 03:22:19PM -0300, Jason Gunthorpe wrote: > On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote: > > > I'd also argue that we are doing things right with this patch series as-is, > > io_uring is the only sticking point because, believe it or not, it is the > > only place in the kernel that uses multiple vmas (it has been interesting > > to get a view on GUP use as a whole here). > > I would say io_uring is the only place trying to open-code bug fixes > for MM problems :\ As Jens says, these sorts of temporary work arounds become > lingering problems that nobody wants to fix properly. > > > So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I > > would still need to come along and delete a bunch of your code > > afterwards. And unfortunately Pavel's recent change which insists on not > > having different vm_file's across VMAs for the buffer would have to be > > reverted so I expect it might not be entirely without discussion. > > Yeah, that should just be reverted. > > > However, if you really do feel that you can't accept this change as-is, I > > can put this series on hold and look at FOLL_ALLOW_BROKEN_FILE_MAPPING and > > we can return to this afterwards. > > It is probably not as bad as you think, I suspect only RDMA really > wants to set the flag. Maybe something in media too, maybe. > > Then again that stuff doesn't work so incredibly badly maybe there > really is no user of it and we should just block it completely. > > Jason OK in this case I think we're all agreed that it's best to do this first then revisit this series afterwards. I will switch to working on this! And I will make sure to cc- everyone in :) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 18:18 ` Lorenzo Stoakes 2023-04-19 18:22 ` Jason Gunthorpe @ 2023-04-19 18:23 ` Matthew Wilcox 2023-04-19 18:24 ` Jason Gunthorpe 1 sibling, 1 reply; 27+ messages in thread From: Matthew Wilcox @ 2023-04-19 18:23 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Jens Axboe, linux-mm, linux-kernel, Andrew Morton, David Hildenbrand, Pavel Begunkov, io-uring, Jason Gunthorpe On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote: > So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I > would still need to come along and delete a bunch of your code > afterwards. And unfortunately Pavel's recent change which insists on not > having different vm_file's across VMAs for the buffer would have to be > reverted so I expect it might not be entirely without discussion. I don't even understand why Pavel wanted to make this change. The commit log really doesn't say. commit edd478269640 Author: Pavel Begunkov <[email protected]> Date: Wed Feb 22 14:36:48 2023 +0000 io_uring/rsrc: disallow multi-source reg buffers If two or more mappings go back to back to each other they can be passed into io_uring to be registered as a single registered buffer. That would even work if mappings came from different sources, e.g. it's possible to mix in this way anon pages and pages from shmem or hugetlb. That is not a problem but it'd rather be less prone if we forbid such mixing. Cc: <[email protected]> Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]> It even says "That is not a problem"! So why was this patch merged if it's not fixing a problem? It's now standing in the way of an actual cleanup. So why don't we revert it? There must be more to it than this ... ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 18:23 ` Matthew Wilcox @ 2023-04-19 18:24 ` Jason Gunthorpe 2023-04-19 18:35 ` Matthew Wilcox 2023-04-19 20:15 ` Jens Axboe 0 siblings, 2 replies; 27+ messages in thread From: Jason Gunthorpe @ 2023-04-19 18:24 UTC (permalink / raw) To: Matthew Wilcox Cc: Lorenzo Stoakes, Jens Axboe, linux-mm, linux-kernel, Andrew Morton, David Hildenbrand, Pavel Begunkov, io-uring On Wed, Apr 19, 2023 at 07:23:00PM +0100, Matthew Wilcox wrote: > On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote: > > So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I > > would still need to come along and delete a bunch of your code > > afterwards. And unfortunately Pavel's recent change which insists on not > > having different vm_file's across VMAs for the buffer would have to be > > reverted so I expect it might not be entirely without discussion. > > I don't even understand why Pavel wanted to make this change. The > commit log really doesn't say. > > commit edd478269640 > Author: Pavel Begunkov <[email protected]> > Date: Wed Feb 22 14:36:48 2023 +0000 > > io_uring/rsrc: disallow multi-source reg buffers > > If two or more mappings go back to back to each other they can be passed > into io_uring to be registered as a single registered buffer. That would > even work if mappings came from different sources, e.g. it's possible to > mix in this way anon pages and pages from shmem or hugetlb. That is not > a problem but it'd rather be less prone if we forbid such mixing. > > Cc: <[email protected]> > Signed-off-by: Pavel Begunkov <[email protected]> > Signed-off-by: Jens Axboe <[email protected]> > > It even says "That is not a problem"! So why was this patch merged > if it's not fixing a problem? > > It's now standing in the way of an actual cleanup. So why don't we > revert it? There must be more to it than this ... https://lore.kernel.org/all/[email protected]/ Jason ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 18:24 ` Jason Gunthorpe @ 2023-04-19 18:35 ` Matthew Wilcox 2023-04-19 18:45 ` Lorenzo Stoakes 2023-04-20 13:36 ` Pavel Begunkov 2023-04-19 20:15 ` Jens Axboe 1 sibling, 2 replies; 27+ messages in thread From: Matthew Wilcox @ 2023-04-19 18:35 UTC (permalink / raw) To: Jason Gunthorpe Cc: Lorenzo Stoakes, Jens Axboe, linux-mm, linux-kernel, Andrew Morton, David Hildenbrand, Pavel Begunkov, io-uring On Wed, Apr 19, 2023 at 03:24:55PM -0300, Jason Gunthorpe wrote: > On Wed, Apr 19, 2023 at 07:23:00PM +0100, Matthew Wilcox wrote: > > On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote: > > > So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I > > > would still need to come along and delete a bunch of your code > > > afterwards. And unfortunately Pavel's recent change which insists on not > > > having different vm_file's across VMAs for the buffer would have to be > > > reverted so I expect it might not be entirely without discussion. > > > > I don't even understand why Pavel wanted to make this change. The > > commit log really doesn't say. > > > > commit edd478269640 > > Author: Pavel Begunkov <[email protected]> > > Date: Wed Feb 22 14:36:48 2023 +0000 > > > > io_uring/rsrc: disallow multi-source reg buffers > > > > If two or more mappings go back to back to each other they can be passed > > into io_uring to be registered as a single registered buffer. That would > > even work if mappings came from different sources, e.g. it's possible to > > mix in this way anon pages and pages from shmem or hugetlb. That is not > > a problem but it'd rather be less prone if we forbid such mixing. > > > > Cc: <[email protected]> > > Signed-off-by: Pavel Begunkov <[email protected]> > > Signed-off-by: Jens Axboe <[email protected]> > > > > It even says "That is not a problem"! So why was this patch merged > > if it's not fixing a problem? > > > > It's now standing in the way of an actual cleanup. So why don't we > > revert it? There must be more to it than this ... > > https://lore.kernel.org/all/[email protected]/ So um, it's disallowed because Pavel couldn't understand why it should be allowed? This gets less and less convincing. FWIW, what I was suggesting was that we should have a FOLL_SINGLE_VMA flag, which would use our shiny new VMA lock infrastructure to look up and lock _one_ VMA instead of having the caller take the mmap_lock. Passing that flag would be a tighter restriction that Pavel implemented, but would certainly relieve some of his mental load. By the way, even if all pages are from the same VMA, they may still be a mixture of anon and file pages; think a MAP_PRIVATE of a file when only some pages have been written to. Or an anon MAP_SHARED which is accessible by a child process. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 18:35 ` Matthew Wilcox @ 2023-04-19 18:45 ` Lorenzo Stoakes 2023-04-19 23:22 ` Jason Gunthorpe 2023-04-20 13:36 ` Pavel Begunkov 1 sibling, 1 reply; 27+ messages in thread From: Lorenzo Stoakes @ 2023-04-19 18:45 UTC (permalink / raw) To: Matthew Wilcox Cc: Jason Gunthorpe, Jens Axboe, linux-mm, linux-kernel, Andrew Morton, David Hildenbrand, Pavel Begunkov, io-uring On Wed, Apr 19, 2023 at 07:35:33PM +0100, Matthew Wilcox wrote: > On Wed, Apr 19, 2023 at 03:24:55PM -0300, Jason Gunthorpe wrote: > > On Wed, Apr 19, 2023 at 07:23:00PM +0100, Matthew Wilcox wrote: > > > On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote: > > > > So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I > > > > would still need to come along and delete a bunch of your code > > > > afterwards. And unfortunately Pavel's recent change which insists on not > > > > having different vm_file's across VMAs for the buffer would have to be > > > > reverted so I expect it might not be entirely without discussion. > > > > > > I don't even understand why Pavel wanted to make this change. The > > > commit log really doesn't say. > > > > > > commit edd478269640 > > > Author: Pavel Begunkov <[email protected]> > > > Date: Wed Feb 22 14:36:48 2023 +0000 > > > > > > io_uring/rsrc: disallow multi-source reg buffers > > > > > > If two or more mappings go back to back to each other they can be passed > > > into io_uring to be registered as a single registered buffer. That would > > > even work if mappings came from different sources, e.g. it's possible to > > > mix in this way anon pages and pages from shmem or hugetlb. That is not > > > a problem but it'd rather be less prone if we forbid such mixing. > > > > > > Cc: <[email protected]> > > > Signed-off-by: Pavel Begunkov <[email protected]> > > > Signed-off-by: Jens Axboe <[email protected]> > > > > > > It even says "That is not a problem"! So why was this patch merged > > > if it's not fixing a problem? > > > > > > It's now standing in the way of an actual cleanup. So why don't we > > > revert it? There must be more to it than this ... > > > > https://lore.kernel.org/all/[email protected]/ > > So um, it's disallowed because Pavel couldn't understand why it > should be allowed? This gets less and less convincing. > > FWIW, what I was suggesting was that we should have a FOLL_SINGLE_VMA > flag, which would use our shiny new VMA lock infrastructure to look > up and lock _one_ VMA instead of having the caller take the mmap_lock. > Passing that flag would be a tighter restriction that Pavel implemented, > but would certainly relieve some of his mental load. > > By the way, even if all pages are from the same VMA, they may still be a > mixture of anon and file pages; think a MAP_PRIVATE of a file when > only some pages have been written to. Or an anon MAP_SHARED which is > accessible by a child process. Indeed, my motive for the series came out of a conversation with you about vmas being odd (thanks! :), however I did end up feeling FOLL_SINGLE_VMA would be too restricted and would break the uAPI. For example, imagine if a user (yes it'd be weird) mlock'd some pages in a buffer and not others, then we'd break their use case. Also (perhaps?) more feasibly, a user might mix hugetlb and anon pages. So I think that'd be too restrictive here. However the idea of just essentially taking what Jens has had to do open-coded and putting it into GUP as a whole really feels like the right thing to do. I do like the idea of a FOLL_SINGLE_VMA for other use cases though, the majority of which want one and one page only. Perhaps worth taking the helper added in this series (get_user_page_vma_remote() from [1]) and replacing it with an a full GUP function which has an interface explicitly for this common single page/vma case. [1]:https://lore.kernel.org/linux-mm/7c6f1ae88320bf11d2f583178a3d9e653e06ac63.1681831798.git.lstoakes@gmail.com/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 18:45 ` Lorenzo Stoakes @ 2023-04-19 23:22 ` Jason Gunthorpe 2023-04-20 13:57 ` Lorenzo Stoakes 0 siblings, 1 reply; 27+ messages in thread From: Jason Gunthorpe @ 2023-04-19 23:22 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Andrew Morton, David Hildenbrand, Pavel Begunkov, io-uring On Wed, Apr 19, 2023 at 07:45:06PM +0100, Lorenzo Stoakes wrote: > For example, imagine if a user (yes it'd be weird) mlock'd some pages in a > buffer and not others, then we'd break their use case. Also (perhaps?) more > feasibly, a user might mix hugetlb and anon pages. So I think that'd be too > restrictive here. Yeah, I agree we should not add a broad single-vma restriction to GUP. It turns any split of a VMA into a potentially uABI breaking change and we just don't need that headache in the mm.. > I do like the idea of a FOLL_SINGLE_VMA for other use cases though, the > majority of which want one and one page only. Perhaps worth taking the > helper added in this series (get_user_page_vma_remote() from [1]) and > replacing it with an a full GUP function which has an interface explicitly > for this common single page/vma case. Like I showed in another thread a function signature that can only do one page and also returns the VMA would force it to be used properly and we don't need a FOLL flag. Jason ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 23:22 ` Jason Gunthorpe @ 2023-04-20 13:57 ` Lorenzo Stoakes 0 siblings, 0 replies; 27+ messages in thread From: Lorenzo Stoakes @ 2023-04-20 13:57 UTC (permalink / raw) To: Jason Gunthorpe Cc: Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Andrew Morton, David Hildenbrand, Pavel Begunkov, io-uring On Wed, Apr 19, 2023 at 08:22:51PM -0300, Jason Gunthorpe wrote: > On Wed, Apr 19, 2023 at 07:45:06PM +0100, Lorenzo Stoakes wrote: > > > For example, imagine if a user (yes it'd be weird) mlock'd some pages in a > > buffer and not others, then we'd break their use case. Also (perhaps?) more > > feasibly, a user might mix hugetlb and anon pages. So I think that'd be too > > restrictive here. > > Yeah, I agree we should not add a broad single-vma restriction to > GUP. It turns any split of a VMA into a potentially uABI breaking > change and we just don't need that headache in the mm.. > > > I do like the idea of a FOLL_SINGLE_VMA for other use cases though, the > > majority of which want one and one page only. Perhaps worth taking the > > helper added in this series (get_user_page_vma_remote() from [1]) and > > replacing it with an a full GUP function which has an interface explicitly > > for this common single page/vma case. > > Like I showed in another thread a function signature that can only do > one page and also returns the VMA would force it to be used properly > and we don't need a FOLL flag. > Indeed the latest spin of the series uses this. The point is by doing so we can use per-VMA locks for a common case, I was thinking perhaps as a separate function call (or perhaps just reusing the wrapper). This would be entirely separate to all the other work. > Jason ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 18:35 ` Matthew Wilcox 2023-04-19 18:45 ` Lorenzo Stoakes @ 2023-04-20 13:36 ` Pavel Begunkov 2023-04-20 14:19 ` Lorenzo Stoakes 1 sibling, 1 reply; 27+ messages in thread From: Pavel Begunkov @ 2023-04-20 13:36 UTC (permalink / raw) To: Matthew Wilcox, Jason Gunthorpe Cc: Lorenzo Stoakes, Jens Axboe, linux-mm, linux-kernel, Andrew Morton, David Hildenbrand, io-uring On 4/19/23 19:35, Matthew Wilcox wrote: > On Wed, Apr 19, 2023 at 03:24:55PM -0300, Jason Gunthorpe wrote: >> On Wed, Apr 19, 2023 at 07:23:00PM +0100, Matthew Wilcox wrote: >>> On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote: >>>> So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I >>>> would still need to come along and delete a bunch of your code >>>> afterwards. And unfortunately Pavel's recent change which insists on not >>>> having different vm_file's across VMAs for the buffer would have to be >>>> reverted so I expect it might not be entirely without discussion. >>> >>> I don't even understand why Pavel wanted to make this change. The >>> commit log really doesn't say. >>> >>> commit edd478269640 >>> Author: Pavel Begunkov <[email protected]> >>> Date: Wed Feb 22 14:36:48 2023 +0000 >>> >>> io_uring/rsrc: disallow multi-source reg buffers >>> >>> If two or more mappings go back to back to each other they can be passed >>> into io_uring to be registered as a single registered buffer. That would >>> even work if mappings came from different sources, e.g. it's possible to >>> mix in this way anon pages and pages from shmem or hugetlb. That is not >>> a problem but it'd rather be less prone if we forbid such mixing. >>> >>> Cc: <[email protected]> >>> Signed-off-by: Pavel Begunkov <[email protected]> >>> Signed-off-by: Jens Axboe <[email protected]> >>> >>> It even says "That is not a problem"! So why was this patch merged >>> if it's not fixing a problem? >>> >>> It's now standing in the way of an actual cleanup. So why don't we >>> revert it? There must be more to it than this ... >> >> https://lore.kernel.org/all/[email protected]/ > > So um, it's disallowed because Pavel couldn't understand why it > should be allowed? This gets less and less convincing. Excuse me? I'm really sorry you "couldn't understand" the explanation as it has probably been too much of a "mental load", but let me try to elaborate. Because it's currently limited what can be registered, it's indeed not a big deal, but that will most certainly change, and I usually and apparently nonsensically prefer to tighten things up _before_ it becomes a problem. And again, taking a random set of buffers created for different purposes and registering it as a single entity is IMHO not a sane approach. Take p2pdma for instance, if would have been passed without intermixing there might not have been is_pci_p2pdma_page()/etc. for every single page in a bvec. That's why in general, it won't change for p2pdma but there might be other cases in the future. > FWIW, what I was suggesting was that we should have a FOLL_SINGLE_VMA > flag, which would use our shiny new VMA lock infrastructure to look > up and lock _one_ VMA instead of having the caller take the mmap_lock. > Passing that flag would be a tighter restriction that Pavel implemented, > but would certainly relieve some of his mental load. > > By the way, even if all pages are from the same VMA, they may still be a > mixture of anon and file pages; think a MAP_PRIVATE of a file when > only some pages have been written to. Or an anon MAP_SHARED which is > accessible by a child process. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-20 13:36 ` Pavel Begunkov @ 2023-04-20 14:19 ` Lorenzo Stoakes 2023-04-20 15:31 ` Jason Gunthorpe 0 siblings, 1 reply; 27+ messages in thread From: Lorenzo Stoakes @ 2023-04-20 14:19 UTC (permalink / raw) To: Pavel Begunkov Cc: Matthew Wilcox, Jason Gunthorpe, Jens Axboe, linux-mm, linux-kernel, Andrew Morton, David Hildenbrand, io-uring On Thu, Apr 20, 2023 at 02:36:47PM +0100, Pavel Begunkov wrote: > On 4/19/23 19:35, Matthew Wilcox wrote: > > On Wed, Apr 19, 2023 at 03:24:55PM -0300, Jason Gunthorpe wrote: > > > On Wed, Apr 19, 2023 at 07:23:00PM +0100, Matthew Wilcox wrote: > > > > On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote: > > > > > So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I > > > > > would still need to come along and delete a bunch of your code > > > > > afterwards. And unfortunately Pavel's recent change which insists on not > > > > > having different vm_file's across VMAs for the buffer would have to be > > > > > reverted so I expect it might not be entirely without discussion. > > > > > > > > I don't even understand why Pavel wanted to make this change. The > > > > commit log really doesn't say. > > > > > > > > commit edd478269640 > > > > Author: Pavel Begunkov <[email protected]> > > > > Date: Wed Feb 22 14:36:48 2023 +0000 > > > > > > > > io_uring/rsrc: disallow multi-source reg buffers > > > > > > > > If two or more mappings go back to back to each other they can be passed > > > > into io_uring to be registered as a single registered buffer. That would > > > > even work if mappings came from different sources, e.g. it's possible to > > > > mix in this way anon pages and pages from shmem or hugetlb. That is not > > > > a problem but it'd rather be less prone if we forbid such mixing. > > > > > > > > Cc: <[email protected]> > > > > Signed-off-by: Pavel Begunkov <[email protected]> > > > > Signed-off-by: Jens Axboe <[email protected]> > > > > > > > > It even says "That is not a problem"! So why was this patch merged > > > > if it's not fixing a problem? > > > > > > > > It's now standing in the way of an actual cleanup. So why don't we > > > > revert it? There must be more to it than this ... > > > > > > https://lore.kernel.org/all/[email protected]/ > > > > So um, it's disallowed because Pavel couldn't understand why it > > should be allowed? This gets less and less convincing. > > Excuse me? I'm really sorry you "couldn't understand" the explanation > as it has probably been too much of a "mental load", but let me try to > elaborate. > > Because it's currently limited what can be registered, it's indeed not > a big deal, but that will most certainly change, and I usually and > apparently nonsensically prefer to tighten things up _before_ it becomes > a problem. And again, taking a random set of buffers created for > different purposes and registering it as a single entity is IMHO not a > sane approach. > > Take p2pdma for instance, if would have been passed without intermixing > there might not have been is_pci_p2pdma_page()/etc. for every single page > in a bvec. That's why in general, it won't change for p2pdma but there > might be other cases in the future. > The proposed changes for GUP are equally intended to tighten things up both in advance of issues (e.g. misuse of vmas parameter), for the purposes of future improvements to GUP (optimising how we perform these operations) and most pertinently here - ensuring broken uses of GUP do not occur. So both are 'cleanups' in some sense :) I think it's important to point out that this change is not simply a desire to improve an interface but has practical implications going forward (which maybe aren't obvious at this point yet). The new approach to this changes goes further in that we essentially perform the existing check here (anon/shmem/hugetlb) but for _all_ FOLL_WRITE operations in GUP to avoid broken writes to file mappings, at which point we can just remove the check here altogether (I will post a series for this soon). I think that you guys should not have to be performing sanity checks here but rather we should handle it in GUP so you don't need to think about it at all. It feels like an mm failing that you have had to do so at all. So I guess the question is, do you feel the requirement for vm_file being the same should apply across _any_ GUP operation over a range or is it io_uring-specific? If the former then we should do it in GUP alongside the other checks (and you can comment accordingly on that patch series when it comes), if the latter then I would restate my opinion that we shouldn't be prevented from making improvements to GUP in for one caller that wants to iterate over these VMAs for custom checks + that should be done separately. > > > FWIW, what I was suggesting was that we should have a FOLL_SINGLE_VMA > > flag, which would use our shiny new VMA lock infrastructure to look > > up and lock _one_ VMA instead of having the caller take the mmap_lock. > > Passing that flag would be a tighter restriction that Pavel implemented, > > but would certainly relieve some of his mental load. > > > > By the way, even if all pages are from the same VMA, they may still be a > > mixture of anon and file pages; think a MAP_PRIVATE of a file when > > only some pages have been written to. Or an anon MAP_SHARED which is > > accessible by a child process. > > -- > Pavel Begunkov ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-20 14:19 ` Lorenzo Stoakes @ 2023-04-20 15:31 ` Jason Gunthorpe 0 siblings, 0 replies; 27+ messages in thread From: Jason Gunthorpe @ 2023-04-20 15:31 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Pavel Begunkov, Matthew Wilcox, Jens Axboe, linux-mm, linux-kernel, Andrew Morton, David Hildenbrand, io-uring On Thu, Apr 20, 2023 at 03:19:01PM +0100, Lorenzo Stoakes wrote: > So I guess the question is, do you feel the requirement for vm_file being > the same should apply across _any_ GUP operation over a range or is it > io_uring-specific? No need at all anywhere. GUP's API contract is you get a jumble of pages with certain properties. Ie normal GUP gives you a jumble of CPU cache coherent struct pages. The job of the GUP caller is to process this jumble. There is no architectural reason to restrict it beyond that, and io_uring has no functional need either. The caller of GUP is, by design, not supposed to know about files, or VMAs, or other MM details. GUP's purpose is to abstract that. I would not object if io_uring demanded that all struct pages be in a certain way, like all are P2P or something. But that is a *struct page* based test, not a VMA test. Checking VMAs is basically io_uring saying it knows better than GUP and better than the MM on how this abstraction should work. Jason ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 18:24 ` Jason Gunthorpe 2023-04-19 18:35 ` Matthew Wilcox @ 2023-04-19 20:15 ` Jens Axboe 2023-04-19 20:18 ` Jens Axboe 2023-04-20 13:37 ` Pavel Begunkov 1 sibling, 2 replies; 27+ messages in thread From: Jens Axboe @ 2023-04-19 20:15 UTC (permalink / raw) To: Jason Gunthorpe, Matthew Wilcox Cc: Lorenzo Stoakes, linux-mm, linux-kernel, Andrew Morton, David Hildenbrand, Pavel Begunkov, io-uring On 4/19/23 12:24?PM, Jason Gunthorpe wrote: > On Wed, Apr 19, 2023 at 07:23:00PM +0100, Matthew Wilcox wrote: >> On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote: >>> So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I >>> would still need to come along and delete a bunch of your code >>> afterwards. And unfortunately Pavel's recent change which insists on not >>> having different vm_file's across VMAs for the buffer would have to be >>> reverted so I expect it might not be entirely without discussion. >> >> I don't even understand why Pavel wanted to make this change. The >> commit log really doesn't say. >> >> commit edd478269640 >> Author: Pavel Begunkov <[email protected]> >> Date: Wed Feb 22 14:36:48 2023 +0000 >> >> io_uring/rsrc: disallow multi-source reg buffers >> >> If two or more mappings go back to back to each other they can be passed >> into io_uring to be registered as a single registered buffer. That would >> even work if mappings came from different sources, e.g. it's possible to >> mix in this way anon pages and pages from shmem or hugetlb. That is not >> a problem but it'd rather be less prone if we forbid such mixing. >> >> Cc: <[email protected]> >> Signed-off-by: Pavel Begunkov <[email protected]> >> Signed-off-by: Jens Axboe <[email protected]> >> >> It even says "That is not a problem"! So why was this patch merged >> if it's not fixing a problem? >> >> It's now standing in the way of an actual cleanup. So why don't we >> revert it? There must be more to it than this ... > > https://lore.kernel.org/all/[email protected]/ Let's just kill that patch that, I can add a revert for 6.4. I had forgotten about that patch and guess I didn't realize that most of the issue do in fact just stem from that. -- Jens Axboe ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 20:15 ` Jens Axboe @ 2023-04-19 20:18 ` Jens Axboe 2023-04-20 13:37 ` Pavel Begunkov 1 sibling, 0 replies; 27+ messages in thread From: Jens Axboe @ 2023-04-19 20:18 UTC (permalink / raw) To: Jason Gunthorpe, Matthew Wilcox Cc: Lorenzo Stoakes, linux-mm, linux-kernel, Andrew Morton, David Hildenbrand, Pavel Begunkov, io-uring On 4/19/23 2:15 PM, Jens Axboe wrote: > On 4/19/23 12:24?PM, Jason Gunthorpe wrote: >> On Wed, Apr 19, 2023 at 07:23:00PM +0100, Matthew Wilcox wrote: >>> On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote: >>>> So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I >>>> would still need to come along and delete a bunch of your code >>>> afterwards. And unfortunately Pavel's recent change which insists on not >>>> having different vm_file's across VMAs for the buffer would have to be >>>> reverted so I expect it might not be entirely without discussion. >>> >>> I don't even understand why Pavel wanted to make this change. The >>> commit log really doesn't say. >>> >>> commit edd478269640 >>> Author: Pavel Begunkov <[email protected]> >>> Date: Wed Feb 22 14:36:48 2023 +0000 >>> >>> io_uring/rsrc: disallow multi-source reg buffers >>> >>> If two or more mappings go back to back to each other they can be passed >>> into io_uring to be registered as a single registered buffer. That would >>> even work if mappings came from different sources, e.g. it's possible to >>> mix in this way anon pages and pages from shmem or hugetlb. That is not >>> a problem but it'd rather be less prone if we forbid such mixing. >>> >>> Cc: <[email protected]> >>> Signed-off-by: Pavel Begunkov <[email protected]> >>> Signed-off-by: Jens Axboe <[email protected]> >>> >>> It even says "That is not a problem"! So why was this patch merged >>> if it's not fixing a problem? >>> >>> It's now standing in the way of an actual cleanup. So why don't we >>> revert it? There must be more to it than this ... >> >> https://lore.kernel.org/all/[email protected]/ > > Let's just kill that patch that, I can add a revert for 6.4. I had > forgotten about that patch and guess I didn't realize that most of the > issue do in fact just stem from that. https://git.kernel.dk/cgit/linux-block/commit/?h=for-6.4/io_uring&id=fbd3aaf37886d3645b1bd6920f6298f5884049f8 -- Jens Axboe ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 20:15 ` Jens Axboe 2023-04-19 20:18 ` Jens Axboe @ 2023-04-20 13:37 ` Pavel Begunkov 1 sibling, 0 replies; 27+ messages in thread From: Pavel Begunkov @ 2023-04-20 13:37 UTC (permalink / raw) To: Jens Axboe, Jason Gunthorpe, Matthew Wilcox Cc: Lorenzo Stoakes, linux-mm, linux-kernel, Andrew Morton, David Hildenbrand, io-uring On 4/19/23 21:15, Jens Axboe wrote: > On 4/19/23 12:24?PM, Jason Gunthorpe wrote: >> On Wed, Apr 19, 2023 at 07:23:00PM +0100, Matthew Wilcox wrote: >>> On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote: >>>> So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I >>>> would still need to come along and delete a bunch of your code >>>> afterwards. And unfortunately Pavel's recent change which insists on not >>>> having different vm_file's across VMAs for the buffer would have to be >>>> reverted so I expect it might not be entirely without discussion. >>> >>> I don't even understand why Pavel wanted to make this change. The >>> commit log really doesn't say. >>> >>> commit edd478269640 >>> Author: Pavel Begunkov <[email protected]> >>> Date: Wed Feb 22 14:36:48 2023 +0000 >>> >>> io_uring/rsrc: disallow multi-source reg buffers >>> >>> If two or more mappings go back to back to each other they can be passed >>> into io_uring to be registered as a single registered buffer. That would >>> even work if mappings came from different sources, e.g. it's possible to >>> mix in this way anon pages and pages from shmem or hugetlb. That is not >>> a problem but it'd rather be less prone if we forbid such mixing. >>> >>> Cc: <[email protected]> >>> Signed-off-by: Pavel Begunkov <[email protected]> >>> Signed-off-by: Jens Axboe <[email protected]> >>> >>> It even says "That is not a problem"! So why was this patch merged >>> if it's not fixing a problem? >>> >>> It's now standing in the way of an actual cleanup. So why don't we >>> revert it? There must be more to it than this ... >> >> https://lore.kernel.org/all/[email protected]/ > > Let's just kill that patch that, I can add a revert for 6.4. I had > forgotten about that patch and guess I didn't realize that most of the > issue do in fact just stem from that. Well, we're now trading uapi sanity for cleanups, I don't know what I should take out of it. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() 2023-04-19 16:35 ` Jens Axboe 2023-04-19 16:59 ` Jens Axboe @ 2023-04-19 17:07 ` Lorenzo Stoakes 1 sibling, 0 replies; 27+ messages in thread From: Lorenzo Stoakes @ 2023-04-19 17:07 UTC (permalink / raw) To: Jens Axboe Cc: linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox, David Hildenbrand, Pavel Begunkov, io-uring On Wed, Apr 19, 2023 at 10:35:12AM -0600, Jens Axboe wrote: > On 4/18/23 9:49?AM, Lorenzo Stoakes wrote: > > We are shortly to remove pin_user_pages(), and instead perform the required > > VMA checks ourselves. In most cases there will be a single VMA so this > > should caues no undue impact on an already slow path. > > > > Doing this eliminates the one instance of vmas being used by > > pin_user_pages(). > > First up, please don't just send single patches from a series. It's > really annoying when you are trying to get the full picture. Just CC the > whole series, so reviews don't have to look it up separately. > Sorry about that, it's hard to strike the right balance between not spamming people and giving appropriate context, different people have different opinions about how best to do this, in retrospect would certainly have been a good idea to include you on all. > So when you're doing a respin for what I'll mention below and the issue > that David found, please don't just show us patch 4+5 of the series. ack > > > diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c > > index 7a43aed8e395..3a927df9d913 100644 > > --- a/io_uring/rsrc.c > > +++ b/io_uring/rsrc.c > > @@ -1138,12 +1138,37 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages, > > return ret; > > } > > > > +static int check_vmas_locked(unsigned long addr, unsigned long len) > > +{ > > + struct file *file; > > + VMA_ITERATOR(vmi, current->mm, addr); > > + struct vm_area_struct *vma = vma_next(&vmi); > > + unsigned long end = addr + len; > > + > > + if (WARN_ON_ONCE(!vma)) > > + return -EINVAL; > > + > > + file = vma->vm_file; > > + if (file && !is_file_hugepages(file)) > > + return -EOPNOTSUPP; > > + > > + /* don't support file backed memory */ > > + for_each_vma_range(vmi, vma, end) { > > + if (vma->vm_file != file) > > + return -EINVAL; > > + > > + if (file && !vma_is_shmem(vma)) > > + return -EOPNOTSUPP; > > + } > > + > > + return 0; > > +} > > I really dislike this naming. There's no point to doing locked in the > naming here, it just makes people think it's checking whether the vmas > are locked. Which is not at all what it does. Because what else would we > think, there's nothing else in the name that suggests what it is > actually checking. > > Don't put implied locking in the naming, the way to do that is to do > something ala: > > lockdep_assert_held_read(¤t->mm->mmap_lock); > > though I don't think it's needed here at all, as there's just one caller > and it's clearly inside. You could even just make a comment instead. > > So please rename this to indicate what it's ACTUALLY checking. ack will do! > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v4 5/6] mm/gup: remove vmas parameter from pin_user_pages() [not found] <[email protected]> 2023-04-18 15:49 ` [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() Lorenzo Stoakes @ 2023-04-18 15:49 ` Lorenzo Stoakes 1 sibling, 0 replies; 27+ messages in thread From: Lorenzo Stoakes @ 2023-04-18 15:49 UTC (permalink / raw) To: linux-mm, linux-kernel, Andrew Morton Cc: Matthew Wilcox, David Hildenbrand, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Dennis Dalessandro, Jason Gunthorpe, Leon Romanovsky, Christian Benvenuti, Nelson Escobar, Bernard Metzler, Mauro Carvalho Chehab, Michael S . Tsirkin, Jason Wang, Jens Axboe, Pavel Begunkov, Bjorn Topel, Magnus Karlsson, Maciej Fijalkowski, Jonathan Lemon, David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, linuxppc-dev, linux-rdma, linux-media, virtualization, kvm, netdev, io-uring, bpf, Lorenzo Stoakes We are now in a position where no caller of pin_user_pages() requires the vmas parameter at all, so eliminate this parameter from the function and all callers. This clears the way to removing the vmas parameter from GUP altogether. Acked-by: David Hildenbrand <[email protected]> Acked-by: Dennis Dalessandro <[email protected]> (for qib) Signed-off-by: Lorenzo Stoakes <[email protected]> --- arch/powerpc/mm/book3s64/iommu_api.c | 2 +- drivers/infiniband/hw/qib/qib_user_pages.c | 2 +- drivers/infiniband/hw/usnic/usnic_uiom.c | 2 +- drivers/infiniband/sw/siw/siw_mem.c | 2 +- drivers/media/v4l2-core/videobuf-dma-sg.c | 2 +- drivers/vdpa/vdpa_user/vduse_dev.c | 2 +- drivers/vhost/vdpa.c | 2 +- include/linux/mm.h | 3 +-- io_uring/rsrc.c | 4 +--- mm/gup.c | 9 +++------ mm/gup_test.c | 9 ++++----- net/xdp/xdp_umem.c | 2 +- 12 files changed, 17 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/mm/book3s64/iommu_api.c b/arch/powerpc/mm/book3s64/iommu_api.c index 81d7185e2ae8..d19fb1f3007d 100644 --- a/arch/powerpc/mm/book3s64/iommu_api.c +++ b/arch/powerpc/mm/book3s64/iommu_api.c @@ -105,7 +105,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, ret = pin_user_pages(ua + (entry << PAGE_SHIFT), n, FOLL_WRITE | FOLL_LONGTERM, - mem->hpages + entry, NULL); + mem->hpages + entry); if (ret == n) { pinned += n; continue; diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c index f693bc753b6b..1bb7507325bc 100644 --- a/drivers/infiniband/hw/qib/qib_user_pages.c +++ b/drivers/infiniband/hw/qib/qib_user_pages.c @@ -111,7 +111,7 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages, ret = pin_user_pages(start_page + got * PAGE_SIZE, num_pages - got, FOLL_LONGTERM | FOLL_WRITE, - p + got, NULL); + p + got); if (ret < 0) { mmap_read_unlock(current->mm); goto bail_release; diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c index 2a5cac2658ec..84e0f41e7dfa 100644 --- a/drivers/infiniband/hw/usnic/usnic_uiom.c +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c @@ -140,7 +140,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable, ret = pin_user_pages(cur_base, min_t(unsigned long, npages, PAGE_SIZE / sizeof(struct page *)), - gup_flags, page_list, NULL); + gup_flags, page_list); if (ret < 0) goto out; diff --git a/drivers/infiniband/sw/siw/siw_mem.c b/drivers/infiniband/sw/siw/siw_mem.c index f51ab2ccf151..e6e25f15567d 100644 --- a/drivers/infiniband/sw/siw/siw_mem.c +++ b/drivers/infiniband/sw/siw/siw_mem.c @@ -422,7 +422,7 @@ struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable) umem->page_chunk[i].plist = plist; while (nents) { rv = pin_user_pages(first_page_va, nents, foll_flags, - plist, NULL); + plist); if (rv < 0) goto out_sem_up; diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c index 53001532e8e3..405b89ea1054 100644 --- a/drivers/media/v4l2-core/videobuf-dma-sg.c +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c @@ -180,7 +180,7 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma, data, size, dma->nr_pages); err = pin_user_pages(data & PAGE_MASK, dma->nr_pages, gup_flags, - dma->pages, NULL); + dma->pages); if (err != dma->nr_pages) { dma->nr_pages = (err >= 0) ? err : 0; diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 0c3b48616a9f..1f80254604f0 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -995,7 +995,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev, goto out; pinned = pin_user_pages(uaddr, npages, FOLL_LONGTERM | FOLL_WRITE, - page_list, NULL); + page_list); if (pinned != npages) { ret = pinned < 0 ? pinned : -ENOMEM; goto out; diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 7be9d9d8f01c..4317128c1c62 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -952,7 +952,7 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v, while (npages) { sz2pin = min_t(unsigned long, npages, list_size); pinned = pin_user_pages(cur_base, sz2pin, - gup_flags, page_list, NULL); + gup_flags, page_list); if (sz2pin != pinned) { if (pinned < 0) { ret = pinned; diff --git a/include/linux/mm.h b/include/linux/mm.h index 0c236e2f25e2..d02c42d37bfc 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2410,8 +2410,7 @@ static inline struct page *get_user_page_vma_remote(struct mm_struct *mm, long get_user_pages(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages); long pin_user_pages(unsigned long start, unsigned long nr_pages, - unsigned int gup_flags, struct page **pages, - struct vm_area_struct **vmas); + unsigned int gup_flags, struct page **pages); long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, struct page **pages, unsigned int gup_flags); long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 3a927df9d913..82d90b97c17b 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -1180,10 +1180,8 @@ struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages) ret = 0; mmap_read_lock(current->mm); - pret = pin_user_pages(ubuf, nr_pages, FOLL_WRITE | FOLL_LONGTERM, - pages, NULL); - + pages); if (pret == nr_pages) { ret = check_vmas_locked(ubuf, len); *npages = nr_pages; diff --git a/mm/gup.c b/mm/gup.c index 9440aa54c741..dffbfa623aae 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -3124,8 +3124,6 @@ EXPORT_SYMBOL(pin_user_pages_remote); * @gup_flags: flags modifying lookup behaviour * @pages: array that receives pointers to the pages pinned. * Should be at least nr_pages long. - * @vmas: array of pointers to vmas corresponding to each page. - * Or NULL if the caller does not require them. * * Nearly the same as get_user_pages(), except that FOLL_TOUCH is not set, and * FOLL_PIN is set. @@ -3134,15 +3132,14 @@ EXPORT_SYMBOL(pin_user_pages_remote); * see Documentation/core-api/pin_user_pages.rst for details. */ long pin_user_pages(unsigned long start, unsigned long nr_pages, - unsigned int gup_flags, struct page **pages, - struct vm_area_struct **vmas) + unsigned int gup_flags, struct page **pages) { int locked = 1; - if (!is_valid_gup_args(pages, vmas, NULL, &gup_flags, FOLL_PIN)) + if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_PIN)) return 0; return __gup_longterm_locked(current->mm, start, nr_pages, - pages, vmas, &locked, gup_flags); + pages, NULL, &locked, gup_flags); } EXPORT_SYMBOL(pin_user_pages); diff --git a/mm/gup_test.c b/mm/gup_test.c index 9ba8ea23f84e..1668ce0e0783 100644 --- a/mm/gup_test.c +++ b/mm/gup_test.c @@ -146,18 +146,17 @@ static int __gup_test_ioctl(unsigned int cmd, pages + i); break; case PIN_BASIC_TEST: - nr = pin_user_pages(addr, nr, gup->gup_flags, pages + i, - NULL); + nr = pin_user_pages(addr, nr, gup->gup_flags, pages + i); break; case PIN_LONGTERM_BENCHMARK: nr = pin_user_pages(addr, nr, gup->gup_flags | FOLL_LONGTERM, - pages + i, NULL); + pages + i); break; case DUMP_USER_PAGES_TEST: if (gup->test_flags & GUP_TEST_FLAG_DUMP_PAGES_USE_PIN) nr = pin_user_pages(addr, nr, gup->gup_flags, - pages + i, NULL); + pages + i); else nr = get_user_pages(addr, nr, gup->gup_flags, pages + i); @@ -270,7 +269,7 @@ static inline int pin_longterm_test_start(unsigned long arg) gup_flags, pages); else cur_pages = pin_user_pages(addr, remaining_pages, - gup_flags, pages, NULL); + gup_flags, pages); if (cur_pages < 0) { pin_longterm_test_stop(); ret = cur_pages; diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index 02207e852d79..06cead2b8e34 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -103,7 +103,7 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address) mmap_read_lock(current->mm); npgs = pin_user_pages(address, umem->npgs, - gup_flags | FOLL_LONGTERM, &umem->pgs[0], NULL); + gup_flags | FOLL_LONGTERM, &umem->pgs[0]); mmap_read_unlock(current->mm); if (npgs != umem->npgs) { -- 2.40.0 ^ permalink raw reply related [flat|nested] 27+ messages in thread
end of thread, other threads:[~2023-04-20 15:31 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <[email protected]>
2023-04-18 15:49 ` [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages() Lorenzo Stoakes
2023-04-18 15:55 ` David Hildenbrand
2023-04-18 15:56 ` David Hildenbrand
2023-04-18 16:25 ` Lorenzo Stoakes
2023-04-19 16:35 ` Jens Axboe
2023-04-19 16:59 ` Jens Axboe
2023-04-19 17:23 ` Lorenzo Stoakes
2023-04-19 17:35 ` Jens Axboe
2023-04-19 17:47 ` Lorenzo Stoakes
2023-04-19 17:51 ` Jens Axboe
2023-04-19 18:18 ` Lorenzo Stoakes
2023-04-19 18:22 ` Jason Gunthorpe
2023-04-19 18:50 ` Lorenzo Stoakes
2023-04-19 18:23 ` Matthew Wilcox
2023-04-19 18:24 ` Jason Gunthorpe
2023-04-19 18:35 ` Matthew Wilcox
2023-04-19 18:45 ` Lorenzo Stoakes
2023-04-19 23:22 ` Jason Gunthorpe
2023-04-20 13:57 ` Lorenzo Stoakes
2023-04-20 13:36 ` Pavel Begunkov
2023-04-20 14:19 ` Lorenzo Stoakes
2023-04-20 15:31 ` Jason Gunthorpe
2023-04-19 20:15 ` Jens Axboe
2023-04-19 20:18 ` Jens Axboe
2023-04-20 13:37 ` Pavel Begunkov
2023-04-19 17:07 ` Lorenzo Stoakes
2023-04-18 15:49 ` [PATCH v4 5/6] mm/gup: remove vmas parameter from pin_user_pages() Lorenzo Stoakes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox