public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages()
       [not found] <[email protected]>
@ 2023-04-14 23:27 ` Lorenzo Stoakes
  2023-04-17 12:56   ` Jason Gunthorpe
  2023-04-14 23:27 ` [PATCH 6/7] mm/gup: remove vmas parameter from pin_user_pages() Lorenzo Stoakes
  1 sibling, 1 reply; 16+ messages in thread
From: Lorenzo Stoakes @ 2023-04-14 23:27 UTC (permalink / raw)
  To: linux-mm, linux-kernel, Andrew Morton
  Cc: Matthew Wilcox, David Hildenbrand, Jens Axboe, Pavel Begunkov,
	io-uring, Lorenzo Stoakes

Commit edd478269640 ("io_uring/rsrc: disallow multi-source reg buffers")
prevents io_pin_pages() from pinning pages spanning multiple VMAs with
permitted characteristics (anon/huge), requiring that all VMAs share the
same vm_file.

The newly introduced FOLL_SAME_FILE flag permits this to be expressed as a
GUP flag rather than having to retrieve VMAs to perform the check.

We then only need to perform a VMA lookup for the first VMA to assert the
anon/hugepage requirement as we know the rest of the VMAs will possess the
same characteristics.

Doing this eliminates the one instance of vmas being used by
pin_user_pages().

Signed-off-by: Lorenzo Stoakes <[email protected]>
Suggested-by: Matthew Wilcox (Oracle) <[email protected]>
---
 io_uring/rsrc.c | 39 ++++++++++++++++-----------------------
 1 file changed, 16 insertions(+), 23 deletions(-)

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 7a43aed8e395..adc860bcbd4f 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1141,9 +1141,8 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages,
 struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages)
 {
 	unsigned long start, end, nr_pages;
-	struct vm_area_struct **vmas = NULL;
 	struct page **pages = NULL;
-	int i, pret, ret = -ENOMEM;
+	int pret, ret = -ENOMEM;
 
 	end = (ubuf + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
 	start = ubuf >> PAGE_SHIFT;
@@ -1153,31 +1152,26 @@ struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages)
 	if (!pages)
 		goto done;
 
-	vmas = kvmalloc_array(nr_pages, sizeof(struct vm_area_struct *),
-			      GFP_KERNEL);
-	if (!vmas)
-		goto done;
-
 	ret = 0;
 	mmap_read_lock(current->mm);
-	pret = pin_user_pages(ubuf, nr_pages, FOLL_WRITE | FOLL_LONGTERM,
-			      pages, vmas);
+
+	pret = pin_user_pages(ubuf, nr_pages,
+			      FOLL_WRITE | FOLL_LONGTERM | FOLL_SAME_FILE,
+			      pages, NULL);
 	if (pret == nr_pages) {
-		struct file *file = vmas[0]->vm_file;
+		/*
+		 * lookup the first VMA, we require that all VMAs in range
+		 * maintain the same file characteristics, as enforced by
+		 * FOLL_SAME_FILE
+		 */
+		struct vm_area_struct *vma = vma_lookup(current->mm, ubuf);
+		struct file *file;
 
 		/* don't support file backed memory */
-		for (i = 0; i < nr_pages; i++) {
-			if (vmas[i]->vm_file != file) {
-				ret = -EINVAL;
-				break;
-			}
-			if (!file)
-				continue;
-			if (!vma_is_shmem(vmas[i]) && !is_file_hugepages(file)) {
-				ret = -EOPNOTSUPP;
-				break;
-			}
-		}
+		file = vma->vm_file;
+		if (file && !vma_is_shmem(vma) && !is_file_hugepages(file))
+			ret = -EOPNOTSUPP;
+
 		*npages = nr_pages;
 	} else {
 		ret = pret < 0 ? pret : -EFAULT;
@@ -1194,7 +1188,6 @@ struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages)
 	}
 	ret = 0;
 done:
-	kvfree(vmas);
 	if (ret < 0) {
 		kvfree(pages);
 		pages = ERR_PTR(ret);
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/7] mm/gup: remove vmas parameter from pin_user_pages()
       [not found] <[email protected]>
  2023-04-14 23:27 ` [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages() Lorenzo Stoakes
@ 2023-04-14 23:27 ` Lorenzo Stoakes
  1 sibling, 0 replies; 16+ messages in thread
From: Lorenzo Stoakes @ 2023-04-14 23:27 UTC (permalink / raw)
  To: linux-mm, linux-kernel, Andrew Morton
  Cc: Matthew Wilcox, David Hildenbrand, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Dennis Dalessandro,
	Jason Gunthorpe, Leon Romanovsky, Christian Benvenuti,
	Nelson Escobar, Bernard Metzler, Mauro Carvalho Chehab,
	Michael S . Tsirkin, Jason Wang, Jens Axboe, Pavel Begunkov,
	Bjorn Topel, Magnus Karlsson, Maciej Fijalkowski, Jonathan Lemon,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, linuxppc-dev, linux-rdma, linux-media,
	virtualization, kvm, netdev, io-uring, bpf, Lorenzo Stoakes

After the introduction of FOLL_SAME_FILE we no longer require vmas for any
invocation of pin_user_pages(), so eliminate this parameter from the
function and all callers.

This clears the way to removing the vmas parameter from GUP altogether.

Signed-off-by: Lorenzo Stoakes <[email protected]>
---
 arch/powerpc/mm/book3s64/iommu_api.c       | 2 +-
 drivers/infiniband/hw/qib/qib_user_pages.c | 2 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c   | 2 +-
 drivers/infiniband/sw/siw/siw_mem.c        | 2 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c  | 2 +-
 drivers/vdpa/vdpa_user/vduse_dev.c         | 2 +-
 drivers/vhost/vdpa.c                       | 2 +-
 include/linux/mm.h                         | 3 +--
 io_uring/rsrc.c                            | 2 +-
 mm/gup.c                                   | 9 +++------
 mm/gup_test.c                              | 9 ++++-----
 net/xdp/xdp_umem.c                         | 2 +-
 12 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/iommu_api.c b/arch/powerpc/mm/book3s64/iommu_api.c
index 81d7185e2ae8..d19fb1f3007d 100644
--- a/arch/powerpc/mm/book3s64/iommu_api.c
+++ b/arch/powerpc/mm/book3s64/iommu_api.c
@@ -105,7 +105,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
 
 		ret = pin_user_pages(ua + (entry << PAGE_SHIFT), n,
 				FOLL_WRITE | FOLL_LONGTERM,
-				mem->hpages + entry, NULL);
+				mem->hpages + entry);
 		if (ret == n) {
 			pinned += n;
 			continue;
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
index f693bc753b6b..1bb7507325bc 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -111,7 +111,7 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
 		ret = pin_user_pages(start_page + got * PAGE_SIZE,
 				     num_pages - got,
 				     FOLL_LONGTERM | FOLL_WRITE,
-				     p + got, NULL);
+				     p + got);
 		if (ret < 0) {
 			mmap_read_unlock(current->mm);
 			goto bail_release;
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 2a5cac2658ec..84e0f41e7dfa 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -140,7 +140,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable,
 		ret = pin_user_pages(cur_base,
 				     min_t(unsigned long, npages,
 				     PAGE_SIZE / sizeof(struct page *)),
-				     gup_flags, page_list, NULL);
+				     gup_flags, page_list);
 
 		if (ret < 0)
 			goto out;
diff --git a/drivers/infiniband/sw/siw/siw_mem.c b/drivers/infiniband/sw/siw/siw_mem.c
index f51ab2ccf151..e6e25f15567d 100644
--- a/drivers/infiniband/sw/siw/siw_mem.c
+++ b/drivers/infiniband/sw/siw/siw_mem.c
@@ -422,7 +422,7 @@ struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable)
 		umem->page_chunk[i].plist = plist;
 		while (nents) {
 			rv = pin_user_pages(first_page_va, nents, foll_flags,
-					    plist, NULL);
+					    plist);
 			if (rv < 0)
 				goto out_sem_up;
 
diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 53001532e8e3..405b89ea1054 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -180,7 +180,7 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
 		data, size, dma->nr_pages);
 
 	err = pin_user_pages(data & PAGE_MASK, dma->nr_pages, gup_flags,
-			     dma->pages, NULL);
+			     dma->pages);
 
 	if (err != dma->nr_pages) {
 		dma->nr_pages = (err >= 0) ? err : 0;
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 0c3b48616a9f..1f80254604f0 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -995,7 +995,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
 		goto out;
 
 	pinned = pin_user_pages(uaddr, npages, FOLL_LONGTERM | FOLL_WRITE,
-				page_list, NULL);
+				page_list);
 	if (pinned != npages) {
 		ret = pinned < 0 ? pinned : -ENOMEM;
 		goto out;
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 7be9d9d8f01c..4317128c1c62 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -952,7 +952,7 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
 	while (npages) {
 		sz2pin = min_t(unsigned long, npages, list_size);
 		pinned = pin_user_pages(cur_base, sz2pin,
-					gup_flags, page_list, NULL);
+					gup_flags, page_list);
 		if (sz2pin != pinned) {
 			if (pinned < 0) {
 				ret = pinned;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8dfa236cfb58..3f7d36ad7de7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2382,8 +2382,7 @@ long pin_user_pages_remote(struct mm_struct *mm,
 long get_user_pages(unsigned long start, unsigned long nr_pages,
 		    unsigned int gup_flags, struct page **pages);
 long pin_user_pages(unsigned long start, unsigned long nr_pages,
-		    unsigned int gup_flags, struct page **pages,
-		    struct vm_area_struct **vmas);
+		    unsigned int gup_flags, struct page **pages);
 long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
 		    struct page **pages, unsigned int gup_flags);
 long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index adc860bcbd4f..92d0d47e322c 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1157,7 +1157,7 @@ struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages)
 
 	pret = pin_user_pages(ubuf, nr_pages,
 			      FOLL_WRITE | FOLL_LONGTERM | FOLL_SAME_FILE,
-			      pages, NULL);
+			      pages);
 	if (pret == nr_pages) {
 		/*
 		 * lookup the first VMA, we require that all VMAs in range
diff --git a/mm/gup.c b/mm/gup.c
index 3954ce499a4a..714970ef3b30 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3132,8 +3132,6 @@ EXPORT_SYMBOL(pin_user_pages_remote);
  * @gup_flags:	flags modifying lookup behaviour
  * @pages:	array that receives pointers to the pages pinned.
  *		Should be at least nr_pages long.
- * @vmas:	array of pointers to vmas corresponding to each page.
- *		Or NULL if the caller does not require them.
  *
  * Nearly the same as get_user_pages(), except that FOLL_TOUCH is not set, and
  * FOLL_PIN is set.
@@ -3142,15 +3140,14 @@ EXPORT_SYMBOL(pin_user_pages_remote);
  * see Documentation/core-api/pin_user_pages.rst for details.
  */
 long pin_user_pages(unsigned long start, unsigned long nr_pages,
-		    unsigned int gup_flags, struct page **pages,
-		    struct vm_area_struct **vmas)
+		    unsigned int gup_flags, struct page **pages)
 {
 	int locked = 1;
 
-	if (!is_valid_gup_args(pages, vmas, NULL, &gup_flags, FOLL_PIN))
+	if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_PIN))
 		return 0;
 	return __gup_longterm_locked(current->mm, start, nr_pages,
-				     pages, vmas, &locked, gup_flags);
+				     pages, NULL, &locked, gup_flags);
 }
 EXPORT_SYMBOL(pin_user_pages);
 
diff --git a/mm/gup_test.c b/mm/gup_test.c
index 9ba8ea23f84e..1668ce0e0783 100644
--- a/mm/gup_test.c
+++ b/mm/gup_test.c
@@ -146,18 +146,17 @@ static int __gup_test_ioctl(unsigned int cmd,
 						 pages + i);
 			break;
 		case PIN_BASIC_TEST:
-			nr = pin_user_pages(addr, nr, gup->gup_flags, pages + i,
-					    NULL);
+			nr = pin_user_pages(addr, nr, gup->gup_flags, pages + i);
 			break;
 		case PIN_LONGTERM_BENCHMARK:
 			nr = pin_user_pages(addr, nr,
 					    gup->gup_flags | FOLL_LONGTERM,
-					    pages + i, NULL);
+					    pages + i);
 			break;
 		case DUMP_USER_PAGES_TEST:
 			if (gup->test_flags & GUP_TEST_FLAG_DUMP_PAGES_USE_PIN)
 				nr = pin_user_pages(addr, nr, gup->gup_flags,
-						    pages + i, NULL);
+						    pages + i);
 			else
 				nr = get_user_pages(addr, nr, gup->gup_flags,
 						    pages + i);
@@ -270,7 +269,7 @@ static inline int pin_longterm_test_start(unsigned long arg)
 							gup_flags, pages);
 		else
 			cur_pages = pin_user_pages(addr, remaining_pages,
-						   gup_flags, pages, NULL);
+						   gup_flags, pages);
 		if (cur_pages < 0) {
 			pin_longterm_test_stop();
 			ret = cur_pages;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 02207e852d79..06cead2b8e34 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -103,7 +103,7 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address)
 
 	mmap_read_lock(current->mm);
 	npgs = pin_user_pages(address, umem->npgs,
-			      gup_flags | FOLL_LONGTERM, &umem->pgs[0], NULL);
+			      gup_flags | FOLL_LONGTERM, &umem->pgs[0]);
 	mmap_read_unlock(current->mm);
 
 	if (npgs != umem->npgs) {
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages()
  2023-04-14 23:27 ` [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages() Lorenzo Stoakes
@ 2023-04-17 12:56   ` Jason Gunthorpe
  2023-04-17 13:19     ` Lorenzo Stoakes
  2023-04-18 16:25     ` Pavel Begunkov
  0 siblings, 2 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2023-04-17 12:56 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox,
	David Hildenbrand, Jens Axboe, Pavel Begunkov, io-uring

On Sat, Apr 15, 2023 at 12:27:45AM +0100, Lorenzo Stoakes wrote:
> Commit edd478269640 ("io_uring/rsrc: disallow multi-source reg buffers")
> prevents io_pin_pages() from pinning pages spanning multiple VMAs with
> permitted characteristics (anon/huge), requiring that all VMAs share the
> same vm_file.

That commmit doesn't really explain why io_uring is doing such a weird
thing.

What exactly is the problem with mixing struct pages from different
files and why of all the GUP users does only io_uring need to care
about this?

If there is no justification then lets revert that commit instead.

>  		/* don't support file backed memory */
> -		for (i = 0; i < nr_pages; i++) {
> -			if (vmas[i]->vm_file != file) {
> -				ret = -EINVAL;
> -				break;
> -			}
> -			if (!file)
> -				continue;
> -			if (!vma_is_shmem(vmas[i]) && !is_file_hugepages(file)) {
> -				ret = -EOPNOTSUPP;
> -				break;
> -			}
> -		}
> +		file = vma->vm_file;
> +		if (file && !vma_is_shmem(vma) && !is_file_hugepages(file))
> +			ret = -EOPNOTSUPP;
> +

Also, why is it doing this?

All GUP users don't work entirely right for any fops implementation
that assumes write protect is unconditionally possible. eg most
filesystems.

We've been ignoring blocking it because it is an ABI break and it does
sort of work in some cases.

I'd rather see something like FOLL_ALLOW_BROKEN_FILE_MAPPINGS than
io_uring open coding this kind of stuff.

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages()
  2023-04-17 12:56   ` Jason Gunthorpe
@ 2023-04-17 13:19     ` Lorenzo Stoakes
  2023-04-17 13:26       ` Jason Gunthorpe
  2023-04-18 16:25     ` Pavel Begunkov
  1 sibling, 1 reply; 16+ messages in thread
From: Lorenzo Stoakes @ 2023-04-17 13:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox,
	David Hildenbrand, Jens Axboe, Pavel Begunkov, io-uring

On Mon, Apr 17, 2023 at 09:56:34AM -0300, Jason Gunthorpe wrote:
> On Sat, Apr 15, 2023 at 12:27:45AM +0100, Lorenzo Stoakes wrote:
> > Commit edd478269640 ("io_uring/rsrc: disallow multi-source reg buffers")
> > prevents io_pin_pages() from pinning pages spanning multiple VMAs with
> > permitted characteristics (anon/huge), requiring that all VMAs share the
> > same vm_file.
>
> That commmit doesn't really explain why io_uring is doing such a weird
> thing.
>
> What exactly is the problem with mixing struct pages from different
> files and why of all the GUP users does only io_uring need to care
> about this?
>
> If there is no justification then lets revert that commit instead.
>
> >  		/* don't support file backed memory */
> > -		for (i = 0; i < nr_pages; i++) {
> > -			if (vmas[i]->vm_file != file) {
> > -				ret = -EINVAL;
> > -				break;
> > -			}
> > -			if (!file)
> > -				continue;
> > -			if (!vma_is_shmem(vmas[i]) && !is_file_hugepages(file)) {
> > -				ret = -EOPNOTSUPP;
> > -				break;
> > -			}
> > -		}
> > +		file = vma->vm_file;
> > +		if (file && !vma_is_shmem(vma) && !is_file_hugepages(file))
> > +			ret = -EOPNOTSUPP;
> > +
>
> Also, why is it doing this?
>
> All GUP users don't work entirely right for any fops implementation
> that assumes write protect is unconditionally possible. eg most
> filesystems.
>
> We've been ignoring blocking it because it is an ABI break and it does
> sort of work in some cases.
>

I will leave this to Jens and Pavel to revert on!

> I'd rather see something like FOLL_ALLOW_BROKEN_FILE_MAPPINGS than
> io_uring open coding this kind of stuff.
>

How would the semantics of this work? What is broken? It is a little
frustrating that we have FOLL_ANON but hugetlb as an outlying case, adding
FOLL_ANON_OR_HUGETLB was another consideration...

> Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages()
  2023-04-17 13:19     ` Lorenzo Stoakes
@ 2023-04-17 13:26       ` Jason Gunthorpe
  2023-04-17 14:00         ` Lorenzo Stoakes
  2023-04-17 19:00         ` Lorenzo Stoakes
  0 siblings, 2 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2023-04-17 13:26 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox,
	David Hildenbrand, Jens Axboe, Pavel Begunkov, io-uring

On Mon, Apr 17, 2023 at 02:19:16PM +0100, Lorenzo Stoakes wrote:

> > I'd rather see something like FOLL_ALLOW_BROKEN_FILE_MAPPINGS than
> > io_uring open coding this kind of stuff.
> >
> 
> How would the semantics of this work? What is broken? It is a little
> frustrating that we have FOLL_ANON but hugetlb as an outlying case, adding
> FOLL_ANON_OR_HUGETLB was another consideration...

It says "historically this user has accepted file backed pages and we
we think there may actually be users doing that, so don't break the
uABI"

Without the flag GUP would refuse to return file backed pages that can
trigger kernel crashes or data corruption.

Eg we'd want most places to not specify the flag and the few that do
to have some justification.

We should consdier removing FOLL_ANON, I'm not sure it really makes
sense these days for what proc is doing with it. All that proc stuff
could likely be turned into a kthread_use_mm() and a simple
copy_to/from user?

I suspect that eliminates the need to check for FOLL_ANON?

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages()
  2023-04-17 13:26       ` Jason Gunthorpe
@ 2023-04-17 14:00         ` Lorenzo Stoakes
  2023-04-17 14:15           ` Jason Gunthorpe
  2023-04-17 19:00         ` Lorenzo Stoakes
  1 sibling, 1 reply; 16+ messages in thread
From: Lorenzo Stoakes @ 2023-04-17 14:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox,
	David Hildenbrand, Jens Axboe, Pavel Begunkov, io-uring

On Mon, Apr 17, 2023 at 10:26:09AM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 17, 2023 at 02:19:16PM +0100, Lorenzo Stoakes wrote:
>
> > > I'd rather see something like FOLL_ALLOW_BROKEN_FILE_MAPPINGS than
> > > io_uring open coding this kind of stuff.
> > >
> >
> > How would the semantics of this work? What is broken? It is a little
> > frustrating that we have FOLL_ANON but hugetlb as an outlying case, adding
> > FOLL_ANON_OR_HUGETLB was another consideration...
>
> It says "historically this user has accepted file backed pages and we
> we think there may actually be users doing that, so don't break the
> uABI"

Having written a bunch here I suddenly realised that you probably mean for
this flag to NOT be applied to the io_uring code and thus have it enforce
the 'anonymous or hugetlb' check by default?

>
> Without the flag GUP would refuse to return file backed pages that can
> trigger kernel crashes or data corruption.
>
> Eg we'd want most places to not specify the flag and the few that do
> to have some justification.
>

So you mean to disallow file-backed page pinning as a whole unless this
flag is specified? For FOLL_GET I can see that access to the underlying
data is dangerous as the memory may get reclaimed or migrated, but surely
DMA-pinned memory (as is the case here) is safe?

Or is this a product more so of some kernel process accessing file-backed
pages for a file system which expects write-notify semantics and doesn't
get them in this case, which could indeed be horribly broken.

In which case yes this seems sensible.

> We should consdier removing FOLL_ANON, I'm not sure it really makes
> sense these days for what proc is doing with it. All that proc stuff
> could likely be turned into a kthread_use_mm() and a simple
> copy_to/from user?
>
> I suspect that eliminates the need to check for FOLL_ANON?
>
> Jason

I am definitely in favour of cutting things down if possible, and very much
prefer the use of uaccess if we are able to do so rather than GUP.

I do feel that GUP should be focused purely on pinning memory rather than
manipulating it (whether read or write) so I agree with this sentiment.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages()
  2023-04-17 14:00         ` Lorenzo Stoakes
@ 2023-04-17 14:15           ` Jason Gunthorpe
  2023-04-17 15:20             ` Lorenzo Stoakes
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2023-04-17 14:15 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox,
	David Hildenbrand, Jens Axboe, Pavel Begunkov, io-uring

On Mon, Apr 17, 2023 at 03:00:16PM +0100, Lorenzo Stoakes wrote:
> On Mon, Apr 17, 2023 at 10:26:09AM -0300, Jason Gunthorpe wrote:
> > On Mon, Apr 17, 2023 at 02:19:16PM +0100, Lorenzo Stoakes wrote:
> >
> > > > I'd rather see something like FOLL_ALLOW_BROKEN_FILE_MAPPINGS than
> > > > io_uring open coding this kind of stuff.
> > > >
> > >
> > > How would the semantics of this work? What is broken? It is a little
> > > frustrating that we have FOLL_ANON but hugetlb as an outlying case, adding
> > > FOLL_ANON_OR_HUGETLB was another consideration...
> >
> > It says "historically this user has accepted file backed pages and we
> > we think there may actually be users doing that, so don't break the
> > uABI"
> 
> Having written a bunch here I suddenly realised that you probably mean for
> this flag to NOT be applied to the io_uring code and thus have it enforce
> the 'anonymous or hugetlb' check by default?

Yes

> So you mean to disallow file-backed page pinning as a whole unless this
> flag is specified? 

Yes

> For FOLL_GET I can see that access to the underlying
> data is dangerous as the memory may get reclaimed or migrated, but surely
> DMA-pinned memory (as is the case here) is safe?

No, it is all broken, read-only access is safe.

We are trying to get a point where pin access will interact properly
with the filesystem, but it isn't done yet.

> Or is this a product more so of some kernel process accessing file-backed
> pages for a file system which expects write-notify semantics and doesn't
> get them in this case, which could indeed be horribly broken.

Yes, broadly

> I am definitely in favour of cutting things down if possible, and very much
> prefer the use of uaccess if we are able to do so rather than GUP.
> 
> I do feel that GUP should be focused purely on pinning memory rather than
> manipulating it (whether read or write) so I agree with this sentiment.

Yes, someone needs to be brave enough to go and try to adjust these
old places :)

I see in the git history this was added to solve CVE-2018-1120 - eg
FUSE can hold off fault-in indefinitely. So the flag is really badly
misnamed - it is "FOLL_DONT_BLOCK_ON_USERSPACE" and anon memory is a
simple, but overly narrow, way to get that property.

If it is changed to use kthread_use_mm() it needs a VMA based check
for the same idea.

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages()
  2023-04-17 14:15           ` Jason Gunthorpe
@ 2023-04-17 15:20             ` Lorenzo Stoakes
  0 siblings, 0 replies; 16+ messages in thread
From: Lorenzo Stoakes @ 2023-04-17 15:20 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox,
	David Hildenbrand, Jens Axboe, Pavel Begunkov, io-uring

On Mon, Apr 17, 2023 at 11:15:10AM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 17, 2023 at 03:00:16PM +0100, Lorenzo Stoakes wrote:
> > On Mon, Apr 17, 2023 at 10:26:09AM -0300, Jason Gunthorpe wrote:
> > > On Mon, Apr 17, 2023 at 02:19:16PM +0100, Lorenzo Stoakes wrote:
> > >
> > > > > I'd rather see something like FOLL_ALLOW_BROKEN_FILE_MAPPINGS than
> > > > > io_uring open coding this kind of stuff.
> > > > >
> > > >
> > > > How would the semantics of this work? What is broken? It is a little
> > > > frustrating that we have FOLL_ANON but hugetlb as an outlying case, adding
> > > > FOLL_ANON_OR_HUGETLB was another consideration...
> > >
> > > It says "historically this user has accepted file backed pages and we
> > > we think there may actually be users doing that, so don't break the
> > > uABI"
> >
> > Having written a bunch here I suddenly realised that you probably mean for
> > this flag to NOT be applied to the io_uring code and thus have it enforce
> > the 'anonymous or hugetlb' check by default?
>
> Yes
>
> > So you mean to disallow file-backed page pinning as a whole unless this
> > flag is specified?
>
> Yes
>
> > For FOLL_GET I can see that access to the underlying
> > data is dangerous as the memory may get reclaimed or migrated, but surely
> > DMA-pinned memory (as is the case here) is safe?
>
> No, it is all broken, read-only access is safe.
>
> We are trying to get a point where pin access will interact properly
> with the filesystem, but it isn't done yet.
>
> > Or is this a product more so of some kernel process accessing file-backed
> > pages for a file system which expects write-notify semantics and doesn't
> > get them in this case, which could indeed be horribly broken.
>
> Yes, broadly
>
> > I am definitely in favour of cutting things down if possible, and very much
> > prefer the use of uaccess if we are able to do so rather than GUP.
> >
> > I do feel that GUP should be focused purely on pinning memory rather than
> > manipulating it (whether read or write) so I agree with this sentiment.
>
> Yes, someone needs to be brave enough to go and try to adjust these
> old places :)

Well, I liek to think of myself as stupid^W brave enough to do such things
so may try a separate patch series on that :)

>
> I see in the git history this was added to solve CVE-2018-1120 - eg
> FUSE can hold off fault-in indefinitely. So the flag is really badly
> misnamed - it is "FOLL_DONT_BLOCK_ON_USERSPACE" and anon memory is a
> simple, but overly narrow, way to get that property.
>
> If it is changed to use kthread_use_mm() it needs a VMA based check
> for the same idea.
>
> Jason

I'll try my hand at patching this also!

As for FOLL_ALLOW_BROKEN_FILE_MAPPINGS, I do really like this idea, and
think it is actually probably quite important we do it, however this feels
a bit out of scope for this patch series.

I think perhaps the way forward is, if Jens and Pavel don't have any issue
with it, we open code the check and drop FOLL_SAME_FILE for this series,
then introduce it in a separate one + replace the open coding there?

I am eager to try to keep this focused on the specific task of dropping the
vmas parameter as I think FOLL_ALLOW_BROKEN_FILE_MAPPINGS is likely to
garner some discussion which should be kept separate.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages()
  2023-04-17 13:26       ` Jason Gunthorpe
  2023-04-17 14:00         ` Lorenzo Stoakes
@ 2023-04-17 19:00         ` Lorenzo Stoakes
  2023-04-17 19:24           ` Jason Gunthorpe
  1 sibling, 1 reply; 16+ messages in thread
From: Lorenzo Stoakes @ 2023-04-17 19:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox,
	David Hildenbrand, Jens Axboe, Pavel Begunkov, io-uring

On Mon, Apr 17, 2023 at 10:26:09AM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 17, 2023 at 02:19:16PM +0100, Lorenzo Stoakes wrote:
>
> > > I'd rather see something like FOLL_ALLOW_BROKEN_FILE_MAPPINGS than
> > > io_uring open coding this kind of stuff.
> > >
> >
> > How would the semantics of this work? What is broken? It is a little
> > frustrating that we have FOLL_ANON but hugetlb as an outlying case, adding
> > FOLL_ANON_OR_HUGETLB was another consideration...
>
> It says "historically this user has accepted file backed pages and we
> we think there may actually be users doing that, so don't break the
> uABI"
>
> Without the flag GUP would refuse to return file backed pages that can
> trigger kernel crashes or data corruption.
>
> Eg we'd want most places to not specify the flag and the few that do
> to have some justification.
>
> We should consdier removing FOLL_ANON, I'm not sure it really makes
> sense these days for what proc is doing with it. All that proc stuff
> could likely be turned into a kthread_use_mm() and a simple
> copy_to/from user?
>
> I suspect that eliminates the need to check for FOLL_ANON?
>
> Jason

The proc invocations utilising FOLL_ANON are get_mm_proctitle(),
get_mm_cmdline() and environ_read() which each pass it to
access_remote_vm() and which will be being called from a process context,
i.e. with tsk->mm != NULL, but kthread_use_mm() explicitly disallows the
(slightly mind boggling) idea of swapping out an established mm.

So I don't think this route is plausible unless you were thinking of
somehow offloading to a thread?

In any case, if we institute the FOLL_ALLOW_BROKEN_FILE_MAPPINGS flag we
can just drop FOLL_ANON altogether right, as this will be implied and
hugetlb should work here too?

Separately, I find the semantics of access_remote_vm() kind of weird, and
with a possible mmap_lock-free future it does make me wonder whether
something better could be done there.

(Section where I sound like I might be going mad) Perhaps having some means
of context switching into the kernel portion of the remote process as if
were a system call or soft interrupt handler and having that actually do
the uaccess operation could be useful here?

I'm guesing nothing like that exists yet?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages()
  2023-04-17 19:00         ` Lorenzo Stoakes
@ 2023-04-17 19:24           ` Jason Gunthorpe
  2023-04-17 19:45             ` Lorenzo Stoakes
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2023-04-17 19:24 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox,
	David Hildenbrand, Jens Axboe, Pavel Begunkov, io-uring

On Mon, Apr 17, 2023 at 08:00:48PM +0100, Lorenzo Stoakes wrote:

> So I don't think this route is plausible unless you were thinking of
> somehow offloading to a thread?

ah, fair enough
 
> In any case, if we institute the FOLL_ALLOW_BROKEN_FILE_MAPPINGS flag we
> can just drop FOLL_ANON altogether right, as this will be implied and
> hugetlb should work here too?

Well.. no, as I said read-only access to the pages works fine, so GUP
should not block that. It is only write that has issues

> Separately, I find the semantics of access_remote_vm() kind of weird, and
> with a possible mmap_lock-free future it does make me wonder whether
> something better could be done there.

Yes, it is very weird, kthread_use_mm is much nicer.
 
> (Section where I sound like I might be going mad) Perhaps having some means
> of context switching into the kernel portion of the remote process as if
> were a system call or soft interrupt handler and having that actually do
> the uaccess operation could be useful here?

This is the kthread_use_mm() approach, that is basically what it
does. You are suggesting to extend it to kthreads that already have a
process attached...

access_remote_vm is basically copy_to/from_user built using kmap and
GUP.

even a simple step of localizing FOLL_ANON to __access_remote_vm,
since it must have the VMA nyhow, would be an improvement.

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages()
  2023-04-17 19:24           ` Jason Gunthorpe
@ 2023-04-17 19:45             ` Lorenzo Stoakes
  0 siblings, 0 replies; 16+ messages in thread
From: Lorenzo Stoakes @ 2023-04-17 19:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox,
	David Hildenbrand, Jens Axboe, Pavel Begunkov, io-uring

On Mon, Apr 17, 2023 at 04:24:04PM -0300, Jason Gunthorpe wrote:
> On Mon, Apr 17, 2023 at 08:00:48PM +0100, Lorenzo Stoakes wrote:
>
> > So I don't think this route is plausible unless you were thinking of
> > somehow offloading to a thread?
>
> ah, fair enough
>
> > In any case, if we institute the FOLL_ALLOW_BROKEN_FILE_MAPPINGS flag we
> > can just drop FOLL_ANON altogether right, as this will be implied and
> > hugetlb should work here too?
>
> Well.. no, as I said read-only access to the pages works fine, so GUP
> should not block that. It is only write that has issues
>
> > Separately, I find the semantics of access_remote_vm() kind of weird, and
> > with a possible mmap_lock-free future it does make me wonder whether
> > something better could be done there.
>
> Yes, it is very weird, kthread_use_mm is much nicer.
>
> > (Section where I sound like I might be going mad) Perhaps having some means
> > of context switching into the kernel portion of the remote process as if
> > were a system call or soft interrupt handler and having that actually do
> > the uaccess operation could be useful here?
>
> This is the kthread_use_mm() approach, that is basically what it
> does. You are suggesting to extend it to kthreads that already have a
> process attached...

Yeah, I wonder how plausible this is as we could in theory simply eliminate
these remote cases altogether which could be relatively efficient if we
could find a way to batch up operations.

>
> access_remote_vm is basically copy_to/from_user built using kmap and
> GUP.
>
> even a simple step of localizing FOLL_ANON to __access_remote_vm,
> since it must have the VMA nyhow, would be an improvement.

This is used from places where this flag might not be set though,
e.g. acess_process_vm() and ptrace.

However, access_remote_vm() is only used by the proc stuff, so I will spin
up a patch to move this function and treat it as a helper which sets
FOLL_ANON.

>
> Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages()
  2023-04-17 12:56   ` Jason Gunthorpe
  2023-04-17 13:19     ` Lorenzo Stoakes
@ 2023-04-18 16:25     ` Pavel Begunkov
  2023-04-18 16:35       ` Pavel Begunkov
  2023-04-18 16:36       ` Jason Gunthorpe
  1 sibling, 2 replies; 16+ messages in thread
From: Pavel Begunkov @ 2023-04-18 16:25 UTC (permalink / raw)
  To: Jason Gunthorpe, Lorenzo Stoakes
  Cc: linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox,
	David Hildenbrand, Jens Axboe, io-uring

On 4/17/23 13:56, Jason Gunthorpe wrote:
> On Sat, Apr 15, 2023 at 12:27:45AM +0100, Lorenzo Stoakes wrote:
>> Commit edd478269640 ("io_uring/rsrc: disallow multi-source reg buffers")
>> prevents io_pin_pages() from pinning pages spanning multiple VMAs with
>> permitted characteristics (anon/huge), requiring that all VMAs share the
>> same vm_file.
> 
> That commmit doesn't really explain why io_uring is doing such a weird
> thing.
> 
> What exactly is the problem with mixing struct pages from different
> files and why of all the GUP users does only io_uring need to care
> about this?

Simply because it doesn't seem sane to mix and register buffers of
different "nature" as one. It's not a huge deal for currently allowed
types, e.g. mixing normal and huge anon pages, but it's rather a matter
of time before it gets extended, and then I'll certainly become a
problem. We've been asked just recently to allow registering bufs
provided mapped by some specific driver, or there might be DMA mapped
memory in the future.

Rejecting based on vmas might be too conservative, I agree and am all
for if someone can help to make it right.


> If there is no justification then lets revert that commit instead.
> 
>>   		/* don't support file backed memory */
>> -		for (i = 0; i < nr_pages; i++) {
>> -			if (vmas[i]->vm_file != file) {
>> -				ret = -EINVAL;
>> -				break;
>> -			}
>> -			if (!file)
>> -				continue;
>> -			if (!vma_is_shmem(vmas[i]) && !is_file_hugepages(file)) {
>> -				ret = -EOPNOTSUPP;
>> -				break;
>> -			}
>> -		}
>> +		file = vma->vm_file;
>> +		if (file && !vma_is_shmem(vma) && !is_file_hugepages(file))
>> +			ret = -EOPNOTSUPP;
>> +
> 
> Also, why is it doing this?

There were problems with filesystem mappings, I believe.
Jens may remember what it was.


> All GUP users don't work entirely right for any fops implementation
> that assumes write protect is unconditionally possible. eg most
> filesystems.
> 
> We've been ignoring blocking it because it is an ABI break and it does
> sort of work in some cases.
> 
> I'd rather see something like FOLL_ALLOW_BROKEN_FILE_MAPPINGS than
> io_uring open coding this kind of stuff.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages()
  2023-04-18 16:25     ` Pavel Begunkov
@ 2023-04-18 16:35       ` Pavel Begunkov
  2023-04-18 16:36       ` Jason Gunthorpe
  1 sibling, 0 replies; 16+ messages in thread
From: Pavel Begunkov @ 2023-04-18 16:35 UTC (permalink / raw)
  To: Jason Gunthorpe, Lorenzo Stoakes
  Cc: linux-mm, linux-kernel, Andrew Morton, Matthew Wilcox,
	David Hildenbrand, Jens Axboe, io-uring

On 4/18/23 17:25, Pavel Begunkov wrote:
> On 4/17/23 13:56, Jason Gunthorpe wrote:
>> On Sat, Apr 15, 2023 at 12:27:45AM +0100, Lorenzo Stoakes wrote:
>>> Commit edd478269640 ("io_uring/rsrc: disallow multi-source reg buffers")
>>> prevents io_pin_pages() from pinning pages spanning multiple VMAs with
>>> permitted characteristics (anon/huge), requiring that all VMAs share the
>>> same vm_file.
>>
>> That commmit doesn't really explain why io_uring is doing such a weird
>> thing.
>>
>> What exactly is the problem with mixing struct pages from different
>> files and why of all the GUP users does only io_uring need to care
>> about this?
> 
> Simply because it doesn't seem sane to mix and register buffers of
> different "nature" as one. It's not a huge deal for currently allowed
> types, e.g. mixing normal and huge anon pages, but it's rather a matter
> of time before it gets extended, and then I'll certainly become a
> problem. We've been asked just recently to allow registering bufs
> provided mapped by some specific driver, or there might be DMA mapped
> memory in the future.
> 
> Rejecting based on vmas might be too conservative, I agree and am all
> for if someone can help to make it right.

For some reason I thought it was rejecting if involves more than
one different vma. ->vm_file checks still sound fair to me, but in
any case, open to changing it.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages()
  2023-04-18 16:25     ` Pavel Begunkov
  2023-04-18 16:35       ` Pavel Begunkov
@ 2023-04-18 16:36       ` Jason Gunthorpe
  2023-04-18 17:25         ` Pavel Begunkov
  1 sibling, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2023-04-18 16:36 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Lorenzo Stoakes, linux-mm, linux-kernel, Andrew Morton,
	Matthew Wilcox, David Hildenbrand, Jens Axboe, io-uring

On Tue, Apr 18, 2023 at 05:25:08PM +0100, Pavel Begunkov wrote:
> On 4/17/23 13:56, Jason Gunthorpe wrote:
> > On Sat, Apr 15, 2023 at 12:27:45AM +0100, Lorenzo Stoakes wrote:
> > > Commit edd478269640 ("io_uring/rsrc: disallow multi-source reg buffers")
> > > prevents io_pin_pages() from pinning pages spanning multiple VMAs with
> > > permitted characteristics (anon/huge), requiring that all VMAs share the
> > > same vm_file.
> > 
> > That commmit doesn't really explain why io_uring is doing such a weird
> > thing.
> > 
> > What exactly is the problem with mixing struct pages from different
> > files and why of all the GUP users does only io_uring need to care
> > about this?
> 
> Simply because it doesn't seem sane to mix and register buffers of
> different "nature" as one. 

That is not a good reason. Once things are converted to struct pages
they don't need to care about their "nature"

> problem. We've been asked just recently to allow registering bufs
> provided mapped by some specific driver, or there might be DMA mapped
> memory in the future.

We already have GUP flags to deal with it, eg FOLL_PCI_P2PDMA

> Rejecting based on vmas might be too conservative, I agree and am all
> for if someone can help to make it right.

It is GUP's problem to deal with this, not the callers.

GUP is defined to return a list of normal CPU DRAM in struct page
format. The caller doesn't care where or what this memory is, it is
all interchangable - by API contract of GUP itself.

If you use FOLL_PCI_P2PDMA then the definition expands to allow struct
pages that are MMIO.

In future, if someone invents new memory or new struct pages with
special needs it is their job to ensure it is blocked from GUP - for
*everyone*. eg how the PCI_P2PDMA was blocked from normal GUP.

io_uring is not special, there are many users of GUP, they all need to
work consistently.

> > Also, why is it doing this?
> 
> There were problems with filesystem mappings, I believe.
> Jens may remember what it was.

Yes, I know about this, but as above, io_uring is not special, if we
want to block this GUP blocks it to protect all users, not io_uring
just protects itself..

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages()
  2023-04-18 16:36       ` Jason Gunthorpe
@ 2023-04-18 17:25         ` Pavel Begunkov
  2023-04-18 18:19           ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2023-04-18 17:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Lorenzo Stoakes, linux-mm, linux-kernel, Andrew Morton,
	Matthew Wilcox, David Hildenbrand, Jens Axboe, io-uring

On 4/18/23 17:36, Jason Gunthorpe wrote:
> On Tue, Apr 18, 2023 at 05:25:08PM +0100, Pavel Begunkov wrote:
>> On 4/17/23 13:56, Jason Gunthorpe wrote:
>>> On Sat, Apr 15, 2023 at 12:27:45AM +0100, Lorenzo Stoakes wrote:
>>>> Commit edd478269640 ("io_uring/rsrc: disallow multi-source reg buffers")
>>>> prevents io_pin_pages() from pinning pages spanning multiple VMAs with
>>>> permitted characteristics (anon/huge), requiring that all VMAs share the
>>>> same vm_file.
>>>
>>> That commmit doesn't really explain why io_uring is doing such a weird
>>> thing.
>>>
>>> What exactly is the problem with mixing struct pages from different
>>> files and why of all the GUP users does only io_uring need to care
>>> about this?
>>
>> Simply because it doesn't seem sane to mix and register buffers of
>> different "nature" as one.
> 
> That is not a good reason. Once things are converted to struct pages
> they don't need to care about their "nature"

Arguing purely about uapi, I do think it is. Even though it can be
passed down and a page is a page, Frankenstein's Monster mixing anon
pages, pages for io_uring queues, device shared memory, and what not
else doesn't seem right for uapi. I see keeping buffers as a single
entity in opposite to a set of random pages beneficial for the future.

And again, as for how it's internally done, I don't have any preference
whatsoever.

>> problem. We've been asked just recently to allow registering bufs
>> provided mapped by some specific driver, or there might be DMA mapped
>> memory in the future.
> 
> We already have GUP flags to deal with it, eg FOLL_PCI_P2PDMA
> 
>> Rejecting based on vmas might be too conservative, I agree and am all
>> for if someone can help to make it right.
> 
> It is GUP's problem to deal with this, not the callers.

Ok, that's even better for io_uring if the same can be achieved
just by passing flags.


> GUP is defined to return a list of normal CPU DRAM in struct page
> format. The caller doesn't care where or what this memory is, it is
> all interchangable - by API contract of GUP itself.
> 
> If you use FOLL_PCI_P2PDMA then the definition expands to allow struct
> pages that are MMIO.
> 
> In future, if someone invents new memory or new struct pages with
> special needs it is their job to ensure it is blocked from GUP - for
> *everyone*. eg how the PCI_P2PDMA was blocked from normal GUP.
> 
> io_uring is not special, there are many users of GUP, they all need to
> work consistently.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages()
  2023-04-18 17:25         ` Pavel Begunkov
@ 2023-04-18 18:19           ` Jason Gunthorpe
  0 siblings, 0 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2023-04-18 18:19 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Lorenzo Stoakes, linux-mm, linux-kernel, Andrew Morton,
	Matthew Wilcox, David Hildenbrand, Jens Axboe, io-uring

On Tue, Apr 18, 2023 at 06:25:24PM +0100, Pavel Begunkov wrote:
> On 4/18/23 17:36, Jason Gunthorpe wrote:
> > On Tue, Apr 18, 2023 at 05:25:08PM +0100, Pavel Begunkov wrote:
> > > On 4/17/23 13:56, Jason Gunthorpe wrote:
> > > > On Sat, Apr 15, 2023 at 12:27:45AM +0100, Lorenzo Stoakes wrote:
> > > > > Commit edd478269640 ("io_uring/rsrc: disallow multi-source reg buffers")
> > > > > prevents io_pin_pages() from pinning pages spanning multiple VMAs with
> > > > > permitted characteristics (anon/huge), requiring that all VMAs share the
> > > > > same vm_file.
> > > > 
> > > > That commmit doesn't really explain why io_uring is doing such a weird
> > > > thing.
> > > > 
> > > > What exactly is the problem with mixing struct pages from different
> > > > files and why of all the GUP users does only io_uring need to care
> > > > about this?
> > > 
> > > Simply because it doesn't seem sane to mix and register buffers of
> > > different "nature" as one.
> > 
> > That is not a good reason. Once things are converted to struct pages
> > they don't need to care about their "nature"
> 
> Arguing purely about uapi, I do think it is. Even though it can be
> passed down and a page is a page, Frankenstein's Monster mixing anon
> pages, pages for io_uring queues, device shared memory, and what not
> else doesn't seem right for uapi. I see keeping buffers as a single
> entity in opposite to a set of random pages beneficial for the future.

Again, it is not up to io_uring to make this choice. We have GUP as
part of our uAPI all over the place, GUP decides how it works, not
random different ideas all over the place.

We don't have these kinds of restrictions for O_DIRECT, for instance.

There should be consistency in the uAPI across everything.

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-04-18 18:19 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <[email protected]>
2023-04-14 23:27 ` [PATCH 5/7] io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages() Lorenzo Stoakes
2023-04-17 12:56   ` Jason Gunthorpe
2023-04-17 13:19     ` Lorenzo Stoakes
2023-04-17 13:26       ` Jason Gunthorpe
2023-04-17 14:00         ` Lorenzo Stoakes
2023-04-17 14:15           ` Jason Gunthorpe
2023-04-17 15:20             ` Lorenzo Stoakes
2023-04-17 19:00         ` Lorenzo Stoakes
2023-04-17 19:24           ` Jason Gunthorpe
2023-04-17 19:45             ` Lorenzo Stoakes
2023-04-18 16:25     ` Pavel Begunkov
2023-04-18 16:35       ` Pavel Begunkov
2023-04-18 16:36       ` Jason Gunthorpe
2023-04-18 17:25         ` Pavel Begunkov
2023-04-18 18:19           ` Jason Gunthorpe
2023-04-14 23:27 ` [PATCH 6/7] mm/gup: remove vmas parameter from pin_user_pages() Lorenzo Stoakes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox