* [PATCH v1 0/3] io_uring mm related abuses
@ 2025-06-24 10:35 Pavel Begunkov
2025-06-24 10:35 ` [PATCH v1 1/3] io_uring/rsrc: fix folio unpinning Pavel Begunkov
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Pavel Begunkov @ 2025-06-24 10:35 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, David Hildenbrand
Patch 1 uses unpin_user_folio instead of the page variant.
Patches 2-3 make sure io_uring doesn't make any assumptions
about user pointer alignments.
Pavel Begunkov (3):
io_uring/rsrc: fix folio unpinning
io_uring/rsrc: don't rely on user vaddr alignment
io_uring: don't assume uaddr alignment in io_vec_fill_bvec
io_uring/rsrc.c | 28 +++++++++++++++++++++-------
io_uring/rsrc.h | 1 +
2 files changed, 22 insertions(+), 7 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v1 1/3] io_uring/rsrc: fix folio unpinning
2025-06-24 10:35 [PATCH v1 0/3] io_uring mm related abuses Pavel Begunkov
@ 2025-06-24 10:35 ` Pavel Begunkov
2025-06-24 11:57 ` David Hildenbrand
2025-06-24 10:35 ` [PATCH v1 2/3] io_uring/rsrc: don't rely on user vaddr alignment Pavel Begunkov
` (2 subsequent siblings)
3 siblings, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2025-06-24 10:35 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, David Hildenbrand
[ 108.070381][ T14] kernel BUG at mm/gup.c:71!
[ 108.070502][ T14] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[ 108.123672][ T14] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20250221-8.fc42 02/21/2025
[ 108.127458][ T14] Workqueue: iou_exit io_ring_exit_work
[ 108.174205][ T14] Call trace:
[ 108.175649][ T14] sanity_check_pinned_pages+0x7cc/0x7d0 (P)
[ 108.178138][ T14] unpin_user_page+0x80/0x10c
[ 108.180189][ T14] io_release_ubuf+0x84/0xf8
[ 108.182196][ T14] io_free_rsrc_node+0x250/0x57c
[ 108.184345][ T14] io_rsrc_data_free+0x148/0x298
[ 108.186493][ T14] io_sqe_buffers_unregister+0x84/0xa0
[ 108.188991][ T14] io_ring_ctx_free+0x48/0x480
[ 108.191057][ T14] io_ring_exit_work+0x764/0x7d8
[ 108.193207][ T14] process_one_work+0x7e8/0x155c
[ 108.195431][ T14] worker_thread+0x958/0xed8
[ 108.197561][ T14] kthread+0x5fc/0x75c
[ 108.199362][ T14] ret_from_fork+0x10/0x20
We can pin a tail page of a folio, but then io_uring will try to unpin
the the head page of the folio. While it should be fine in terms of
keeping the page actually alive, but mm folks say it's wrong and
triggers a debug warning. Use unpin_user_folio() instead of
unpin_user_page*.
Cc: stable@vger.kernel.org
Reported-by: David Hildenbrand <david@redhat.com>
Fixes: a8edbb424b139 ("io_uring/rsrc: enable multi-hugepage buffer coalescing")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/rsrc.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index c592ceace97d..e83a294c718b 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -112,8 +112,11 @@ static void io_release_ubuf(void *priv)
struct io_mapped_ubuf *imu = priv;
unsigned int i;
- for (i = 0; i < imu->nr_bvecs; i++)
- unpin_user_page(imu->bvec[i].bv_page);
+ for (i = 0; i < imu->nr_bvecs; i++) {
+ struct folio *folio = page_folio(imu->bvec[i].bv_page);
+
+ unpin_user_folio(folio, 1);
+ }
}
static struct io_mapped_ubuf *io_alloc_imu(struct io_ring_ctx *ctx,
@@ -810,7 +813,8 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx,
imu->nr_bvecs = nr_pages;
ret = io_buffer_account_pin(ctx, pages, nr_pages, imu, last_hpage);
if (ret) {
- unpin_user_pages(pages, nr_pages);
+ for (i = 0; i < nr_pages; i++)
+ unpin_user_folio(page_folio(pages[i]), 1);
goto done;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v1 2/3] io_uring/rsrc: don't rely on user vaddr alignment
2025-06-24 10:35 [PATCH v1 0/3] io_uring mm related abuses Pavel Begunkov
2025-06-24 10:35 ` [PATCH v1 1/3] io_uring/rsrc: fix folio unpinning Pavel Begunkov
@ 2025-06-24 10:35 ` Pavel Begunkov
2025-06-24 11:53 ` David Hildenbrand
2025-06-24 12:42 ` David Hildenbrand
2025-06-24 10:35 ` [PATCH v1 3/3] io_uring: don't assume uaddr alignment in io_vec_fill_bvec Pavel Begunkov
2025-06-24 10:38 ` [PATCH v1 0/3] io_uring mm related abuses Pavel Begunkov
3 siblings, 2 replies; 14+ messages in thread
From: Pavel Begunkov @ 2025-06-24 10:35 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, David Hildenbrand
There is no guaranteed alignment for user pointers, however the
calculation of an offset of the first page into a folio after
coalescing uses some weird bit mask logic, get rid of it.
Cc: stable@vger.kernel.org
Reported-by: David Hildenbrand <david@redhat.com>
Fixes: a8edbb424b139 ("io_uring/rsrc: enable multi-hugepage buffer coalescing")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/rsrc.c | 8 +++++++-
io_uring/rsrc.h | 1 +
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index e83a294c718b..5132f8df600f 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -734,6 +734,8 @@ bool io_check_coalesce_buffer(struct page **page_array, int nr_pages,
data->nr_pages_mid = folio_nr_pages(folio);
data->folio_shift = folio_shift(folio);
+ data->first_page_offset = page_array[0] - compound_head(page_array[0]);
+ data->first_page_offset <<= PAGE_SHIFT;
/*
* Check if pages are contiguous inside a folio, and all folios have
@@ -830,7 +832,11 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx,
if (coalesced)
imu->folio_shift = data.folio_shift;
refcount_set(&imu->refs, 1);
- off = (unsigned long) iov->iov_base & ((1UL << imu->folio_shift) - 1);
+
+ off = (unsigned long)iov->iov_base & ~PAGE_MASK;
+ if (coalesced)
+ off += data.first_page_offset;
+
node->buf = imu;
ret = 0;
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index 0d2138f16322..d823554a8817 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -49,6 +49,7 @@ struct io_imu_folio_data {
unsigned int nr_pages_mid;
unsigned int folio_shift;
unsigned int nr_folios;
+ unsigned long first_page_offset;
};
bool io_rsrc_cache_init(struct io_ring_ctx *ctx);
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v1 3/3] io_uring: don't assume uaddr alignment in io_vec_fill_bvec
2025-06-24 10:35 [PATCH v1 0/3] io_uring mm related abuses Pavel Begunkov
2025-06-24 10:35 ` [PATCH v1 1/3] io_uring/rsrc: fix folio unpinning Pavel Begunkov
2025-06-24 10:35 ` [PATCH v1 2/3] io_uring/rsrc: don't rely on user vaddr alignment Pavel Begunkov
@ 2025-06-24 10:35 ` Pavel Begunkov
2025-06-24 10:38 ` [PATCH v1 0/3] io_uring mm related abuses Pavel Begunkov
3 siblings, 0 replies; 14+ messages in thread
From: Pavel Begunkov @ 2025-06-24 10:35 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, David Hildenbrand
There is no guaranteed alignment for user pointers. Don't use mask
trickery and adjust the offset by bv_offset.
Cc: stable@vger.kernel.org
Reported-by: David Hildenbrand <david@redhat.com>
Fixes: 9ef4cbbcb4ac3 ("io_uring: add infra for importing vectored reg buffers")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/rsrc.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 5132f8df600f..4c972cac6cdf 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1337,7 +1337,6 @@ static int io_vec_fill_bvec(int ddir, struct iov_iter *iter,
{
unsigned long folio_size = 1 << imu->folio_shift;
unsigned long folio_mask = folio_size - 1;
- u64 folio_addr = imu->ubuf & ~folio_mask;
struct bio_vec *res_bvec = vec->bvec;
size_t total_len = 0;
unsigned bvec_idx = 0;
@@ -1359,8 +1358,13 @@ static int io_vec_fill_bvec(int ddir, struct iov_iter *iter,
if (unlikely(check_add_overflow(total_len, iov_len, &total_len)))
return -EOVERFLOW;
- /* by using folio address it also accounts for bvec offset */
- offset = buf_addr - folio_addr;
+ offset = buf_addr - imu->ubuf;
+ /*
+ * Only first bvec can have non-0 bv_offset, account it here
+ * and work with full folios below.
+ */
+ offset += imu->bvec[0].bv_offset;
+
src_bvec = imu->bvec + (offset >> imu->folio_shift);
offset &= folio_mask;
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v1 0/3] io_uring mm related abuses
2025-06-24 10:35 [PATCH v1 0/3] io_uring mm related abuses Pavel Begunkov
` (2 preceding siblings ...)
2025-06-24 10:35 ` [PATCH v1 3/3] io_uring: don't assume uaddr alignment in io_vec_fill_bvec Pavel Begunkov
@ 2025-06-24 10:38 ` Pavel Begunkov
3 siblings, 0 replies; 14+ messages in thread
From: Pavel Begunkov @ 2025-06-24 10:38 UTC (permalink / raw)
To: io-uring; +Cc: David Hildenbrand
On 6/24/25 11:35, Pavel Begunkov wrote:
> Patch 1 uses unpin_user_folio instead of the page variant.
> Patches 2-3 make sure io_uring doesn't make any assumptions
> about user pointer alignments.
I didn't reproduce the alignment issue, sth wrong with my
KASAN'ed machine, need to fix that, and it did't trigger
without it.
>
> Pavel Begunkov (3):
> io_uring/rsrc: fix folio unpinning
> io_uring/rsrc: don't rely on user vaddr alignment
> io_uring: don't assume uaddr alignment in io_vec_fill_bvec
>
> io_uring/rsrc.c | 28 +++++++++++++++++++++-------
> io_uring/rsrc.h | 1 +
> 2 files changed, 22 insertions(+), 7 deletions(-)
>
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 2/3] io_uring/rsrc: don't rely on user vaddr alignment
2025-06-24 10:35 ` [PATCH v1 2/3] io_uring/rsrc: don't rely on user vaddr alignment Pavel Begunkov
@ 2025-06-24 11:53 ` David Hildenbrand
2025-06-24 12:20 ` Pavel Begunkov
2025-06-24 12:42 ` David Hildenbrand
1 sibling, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2025-06-24 11:53 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 24.06.25 12:35, Pavel Begunkov wrote:
> There is no guaranteed alignment for user pointers, however the
> calculation of an offset of the first page into a folio after
> coalescing uses some weird bit mask logic, get rid of it.
>
> Cc: stable@vger.kernel.org
> Reported-by: David Hildenbrand <david@redhat.com>
> Fixes: a8edbb424b139 ("io_uring/rsrc: enable multi-hugepage buffer coalescing")
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
> io_uring/rsrc.c | 8 +++++++-
> io_uring/rsrc.h | 1 +
> 2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> index e83a294c718b..5132f8df600f 100644
> --- a/io_uring/rsrc.c
> +++ b/io_uring/rsrc.c
> @@ -734,6 +734,8 @@ bool io_check_coalesce_buffer(struct page **page_array, int nr_pages,
>
> data->nr_pages_mid = folio_nr_pages(folio);
> data->folio_shift = folio_shift(folio);
> + data->first_page_offset = page_array[0] - compound_head(page_array[0]);
> + data->first_page_offset <<= PAGE_SHIFT;
Would that also cover when we have something like
nr_pages = 4
pages[0] = folio_page(folio, 1);
pages[1] = folio_page(folio, 2);
pages[2] = folio_page(folio2, 1);
pages[3] = folio_page(folio2, 2);
Note that we can create all kinds of crazy partially-mapped THP layouts
using VMAs.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/3] io_uring/rsrc: fix folio unpinning
2025-06-24 10:35 ` [PATCH v1 1/3] io_uring/rsrc: fix folio unpinning Pavel Begunkov
@ 2025-06-24 11:57 ` David Hildenbrand
2025-06-24 12:08 ` Pavel Begunkov
0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2025-06-24 11:57 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 24.06.25 12:35, Pavel Begunkov wrote:
> [ 108.070381][ T14] kernel BUG at mm/gup.c:71!
> [ 108.070502][ T14] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
> [ 108.123672][ T14] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20250221-8.fc42 02/21/2025
> [ 108.127458][ T14] Workqueue: iou_exit io_ring_exit_work
> [ 108.174205][ T14] Call trace:
> [ 108.175649][ T14] sanity_check_pinned_pages+0x7cc/0x7d0 (P)
> [ 108.178138][ T14] unpin_user_page+0x80/0x10c
> [ 108.180189][ T14] io_release_ubuf+0x84/0xf8
> [ 108.182196][ T14] io_free_rsrc_node+0x250/0x57c
> [ 108.184345][ T14] io_rsrc_data_free+0x148/0x298
> [ 108.186493][ T14] io_sqe_buffers_unregister+0x84/0xa0
> [ 108.188991][ T14] io_ring_ctx_free+0x48/0x480
> [ 108.191057][ T14] io_ring_exit_work+0x764/0x7d8
> [ 108.193207][ T14] process_one_work+0x7e8/0x155c
> [ 108.195431][ T14] worker_thread+0x958/0xed8
> [ 108.197561][ T14] kthread+0x5fc/0x75c
> [ 108.199362][ T14] ret_from_fork+0x10/0x20
>
> We can pin a tail page of a folio, but then io_uring will try to unpin
> the the head page of the folio. While it should be fine in terms of
> keeping the page actually alive, but mm folks say it's wrong and
> triggers a debug warning. Use unpin_user_folio() instead of
> unpin_user_page*.
Right, unpin_user_pages() expects that you unpin the exact pages you pinned,
not some other pages of the same folio.
>
> Cc: stable@vger.kernel.org
> Reported-by: David Hildenbrand <david@redhat.com>
Probably should be:
Debugged-by: David Hildenbrand <david@redhat.com>
Reported-by: syzbot+1d335893772467199ab6@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/683f1551.050a0220.55ceb.0017.GAE@google.com
> Fixes: a8edbb424b139 ("io_uring/rsrc: enable multi-hugepage buffer coalescing")
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/3] io_uring/rsrc: fix folio unpinning
2025-06-24 11:57 ` David Hildenbrand
@ 2025-06-24 12:08 ` Pavel Begunkov
0 siblings, 0 replies; 14+ messages in thread
From: Pavel Begunkov @ 2025-06-24 12:08 UTC (permalink / raw)
To: David Hildenbrand, io-uring
On 6/24/25 12:57, David Hildenbrand wrote:
> On 24.06.25 12:35, Pavel Begunkov wrote:
>> [ 108.070381][ T14] kernel BUG at mm/gup.c:71!
>> [ 108.070502][ T14] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>> [ 108.123672][ T14] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20250221-8.fc42 02/21/2025
>> [ 108.127458][ T14] Workqueue: iou_exit io_ring_exit_work
>> [ 108.174205][ T14] Call trace:
>> [ 108.175649][ T14] sanity_check_pinned_pages+0x7cc/0x7d0 (P)
>> [ 108.178138][ T14] unpin_user_page+0x80/0x10c
>> [ 108.180189][ T14] io_release_ubuf+0x84/0xf8
>> [ 108.182196][ T14] io_free_rsrc_node+0x250/0x57c
>> [ 108.184345][ T14] io_rsrc_data_free+0x148/0x298
>> [ 108.186493][ T14] io_sqe_buffers_unregister+0x84/0xa0
>> [ 108.188991][ T14] io_ring_ctx_free+0x48/0x480
>> [ 108.191057][ T14] io_ring_exit_work+0x764/0x7d8
>> [ 108.193207][ T14] process_one_work+0x7e8/0x155c
>> [ 108.195431][ T14] worker_thread+0x958/0xed8
>> [ 108.197561][ T14] kthread+0x5fc/0x75c
>> [ 108.199362][ T14] ret_from_fork+0x10/0x20
>>
>> We can pin a tail page of a folio, but then io_uring will try to unpin
>> the the head page of the folio. While it should be fine in terms of
>> keeping the page actually alive, but mm folks say it's wrong and
>> triggers a debug warning. Use unpin_user_folio() instead of
>> unpin_user_page*.
>
> Right, unpin_user_pages() expects that you unpin the exact pages you pinned,
> not some other pages of the same folio.
>
>>
>> Cc: stable@vger.kernel.org
>> Reported-by: David Hildenbrand <david@redhat.com>
>
> Probably should be:
>
> Debugged-by: David Hildenbrand <david@redhat.com>
> Reported-by: syzbot+1d335893772467199ab6@syzkaller.appspotmail.com
> Closes: https://lkml.kernel.org/r/683f1551.050a0220.55ceb.0017.GAE@google.com
Sure, we can do that
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 2/3] io_uring/rsrc: don't rely on user vaddr alignment
2025-06-24 11:53 ` David Hildenbrand
@ 2025-06-24 12:20 ` Pavel Begunkov
2025-06-24 12:26 ` David Hildenbrand
2025-06-24 12:30 ` Pavel Begunkov
0 siblings, 2 replies; 14+ messages in thread
From: Pavel Begunkov @ 2025-06-24 12:20 UTC (permalink / raw)
To: David Hildenbrand, io-uring
On 6/24/25 12:53, David Hildenbrand wrote:
> On 24.06.25 12:35, Pavel Begunkov wrote:
>> There is no guaranteed alignment for user pointers, however the
>> calculation of an offset of the first page into a folio after
>> coalescing uses some weird bit mask logic, get rid of it.
>>
>> Cc: stable@vger.kernel.org
>> Reported-by: David Hildenbrand <david@redhat.com>
>> Fixes: a8edbb424b139 ("io_uring/rsrc: enable multi-hugepage buffer coalescing")
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>> io_uring/rsrc.c | 8 +++++++-
>> io_uring/rsrc.h | 1 +
>> 2 files changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
>> index e83a294c718b..5132f8df600f 100644
>> --- a/io_uring/rsrc.c
>> +++ b/io_uring/rsrc.c
>> @@ -734,6 +734,8 @@ bool io_check_coalesce_buffer(struct page **page_array, int nr_pages,
>> data->nr_pages_mid = folio_nr_pages(folio);
>> data->folio_shift = folio_shift(folio);
>> + data->first_page_offset = page_array[0] - compound_head(page_array[0]);
>> + data->first_page_offset <<= PAGE_SHIFT;
>
> Would that also cover when we have something like
>
> nr_pages = 4
> pages[0] = folio_page(folio, 1);
> pages[1] = folio_page(folio, 2);
> pages[2] = folio_page(folio2, 1);
> pages[3] = folio_page(folio2, 2);
>
> Note that we can create all kinds of crazy partially-mapped THP layouts using VMAs.
It'll see that pages[2] is not the first page of folio2
and return that it can't be coalesced
if (/* ... */ || folio_page_idx(folio, page_array[i]) != 0)
return false;
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 2/3] io_uring/rsrc: don't rely on user vaddr alignment
2025-06-24 12:20 ` Pavel Begunkov
@ 2025-06-24 12:26 ` David Hildenbrand
2025-06-24 12:37 ` Pavel Begunkov
2025-06-24 12:30 ` Pavel Begunkov
1 sibling, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2025-06-24 12:26 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 24.06.25 14:20, Pavel Begunkov wrote:
> On 6/24/25 12:53, David Hildenbrand wrote:
>> On 24.06.25 12:35, Pavel Begunkov wrote:
>>> There is no guaranteed alignment for user pointers, however the
>>> calculation of an offset of the first page into a folio after
>>> coalescing uses some weird bit mask logic, get rid of it.
>>>
>>> Cc: stable@vger.kernel.org
>>> Reported-by: David Hildenbrand <david@redhat.com>
>>> Fixes: a8edbb424b139 ("io_uring/rsrc: enable multi-hugepage buffer coalescing")
>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>> ---
>>> io_uring/rsrc.c | 8 +++++++-
>>> io_uring/rsrc.h | 1 +
>>> 2 files changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
>>> index e83a294c718b..5132f8df600f 100644
>>> --- a/io_uring/rsrc.c
>>> +++ b/io_uring/rsrc.c
>>> @@ -734,6 +734,8 @@ bool io_check_coalesce_buffer(struct page **page_array, int nr_pages,
>>> data->nr_pages_mid = folio_nr_pages(folio);
>>> data->folio_shift = folio_shift(folio);
>>> + data->first_page_offset = page_array[0] - compound_head(page_array[0]);
>>> + data->first_page_offset <<= PAGE_SHIFT;
>>
>> Would that also cover when we have something like
>>
>> nr_pages = 4
>> pages[0] = folio_page(folio, 1);
>> pages[1] = folio_page(folio, 2);
>> pages[2] = folio_page(folio2, 1);
>> pages[3] = folio_page(folio2, 2);
>>
>> Note that we can create all kinds of crazy partially-mapped THP layouts using VMAs.
>
> It'll see that pages[2] is not the first page of folio2
> and return that it can't be coalesced
>
> if (/* ... */ || folio_page_idx(folio, page_array[i]) != 0)
> return false;
Ah okay, that makes sense.
It might be clearer at some point to coalesce folio ranges (e.g.,
folio,idx,len) instead, representing them in a different temporary
structure.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 2/3] io_uring/rsrc: don't rely on user vaddr alignment
2025-06-24 12:20 ` Pavel Begunkov
2025-06-24 12:26 ` David Hildenbrand
@ 2025-06-24 12:30 ` Pavel Begunkov
1 sibling, 0 replies; 14+ messages in thread
From: Pavel Begunkov @ 2025-06-24 12:30 UTC (permalink / raw)
To: David Hildenbrand, io-uring
On 6/24/25 13:20, Pavel Begunkov wrote:
> On 6/24/25 12:53, David Hildenbrand wrote:
>> On 24.06.25 12:35, Pavel Begunkov wrote:
>>> There is no guaranteed alignment for user pointers, however the
>>> calculation of an offset of the first page into a folio after
>>> coalescing uses some weird bit mask logic, get rid of it.
>>>
>>> Cc: stable@vger.kernel.org
>>> Reported-by: David Hildenbrand <david@redhat.com>
>>> Fixes: a8edbb424b139 ("io_uring/rsrc: enable multi-hugepage buffer coalescing")
>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>> ---
>>> io_uring/rsrc.c | 8 +++++++-
>>> io_uring/rsrc.h | 1 +
>>> 2 files changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
>>> index e83a294c718b..5132f8df600f 100644
>>> --- a/io_uring/rsrc.c
>>> +++ b/io_uring/rsrc.c
>>> @@ -734,6 +734,8 @@ bool io_check_coalesce_buffer(struct page **page_array, int nr_pages,
>>> data->nr_pages_mid = folio_nr_pages(folio);
>>> data->folio_shift = folio_shift(folio);
>>> + data->first_page_offset = page_array[0] - compound_head(page_array[0]);
>>> + data->first_page_offset <<= PAGE_SHIFT;
>>
>> Would that also cover when we have something like
>>
>> nr_pages = 4
>> pages[0] = folio_page(folio, 1);
>> pages[1] = folio_page(folio, 2);
>> pages[2] = folio_page(folio2, 1);
>> pages[3] = folio_page(folio2, 2);
>>
>> Note that we can create all kinds of crazy partially-mapped THP layouts using VMAs.
>
> It'll see that pages[2] is not the first page of folio2
> and return that it can't be coalesced
>
> if (/* ... */ || folio_page_idx(folio, page_array[i]) != 0)
> return false;
To elaborate, we're only coalescing if for all but the first resulting
bvec segment starts from the beginning of its folio, and all but the
last bvec segment ends at the right border of the folio. IOW, all
middle bvecs should fully cover their folios, and the first and the
last bvecs should align by the right and left borders of their folios
correspondingly.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 2/3] io_uring/rsrc: don't rely on user vaddr alignment
2025-06-24 12:26 ` David Hildenbrand
@ 2025-06-24 12:37 ` Pavel Begunkov
0 siblings, 0 replies; 14+ messages in thread
From: Pavel Begunkov @ 2025-06-24 12:37 UTC (permalink / raw)
To: David Hildenbrand, io-uring
On 6/24/25 13:26, David Hildenbrand wrote:
> On 24.06.25 14:20, Pavel Begunkov wrote:
>> On 6/24/25 12:53, David Hildenbrand wrote:
>>> On 24.06.25 12:35, Pavel Begunkov wrote:
>>>> There is no guaranteed alignment for user pointers, however the
>>>> calculation of an offset of the first page into a folio after
>>>> coalescing uses some weird bit mask logic, get rid of it.
>>>>
>>>> Cc: stable@vger.kernel.org
>>>> Reported-by: David Hildenbrand <david@redhat.com>
>>>> Fixes: a8edbb424b139 ("io_uring/rsrc: enable multi-hugepage buffer coalescing")
>>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>>> ---
>>>> io_uring/rsrc.c | 8 +++++++-
>>>> io_uring/rsrc.h | 1 +
>>>> 2 files changed, 8 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
>>>> index e83a294c718b..5132f8df600f 100644
>>>> --- a/io_uring/rsrc.c
>>>> +++ b/io_uring/rsrc.c
>>>> @@ -734,6 +734,8 @@ bool io_check_coalesce_buffer(struct page **page_array, int nr_pages,
>>>> data->nr_pages_mid = folio_nr_pages(folio);
>>>> data->folio_shift = folio_shift(folio);
>>>> + data->first_page_offset = page_array[0] - compound_head(page_array[0]);
>>>> + data->first_page_offset <<= PAGE_SHIFT;
>>>
>>> Would that also cover when we have something like
>>>
>>> nr_pages = 4
>>> pages[0] = folio_page(folio, 1);
>>> pages[1] = folio_page(folio, 2);
>>> pages[2] = folio_page(folio2, 1);
>>> pages[3] = folio_page(folio2, 2);
>>>
>>> Note that we can create all kinds of crazy partially-mapped THP layouts using VMAs.
>>
>> It'll see that pages[2] is not the first page of folio2
>> and return that it can't be coalesced
>>
>> if (/* ... */ || folio_page_idx(folio, page_array[i]) != 0)
>> return false;
>
> Ah okay, that makes sense.
>
> It might be clearer at some point to coalesce folio ranges (e.g., folio,idx,len) instead, representing them in a different temporary structure.
It could, but it needs a fast way to apply an arbitrary offset
and iov_iter_advance() is not exactly fast, especially so for a
long bvec array. That's why it keeps all segments of the same
length (apart from first/last).
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 2/3] io_uring/rsrc: don't rely on user vaddr alignment
2025-06-24 10:35 ` [PATCH v1 2/3] io_uring/rsrc: don't rely on user vaddr alignment Pavel Begunkov
2025-06-24 11:53 ` David Hildenbrand
@ 2025-06-24 12:42 ` David Hildenbrand
2025-06-24 12:54 ` Pavel Begunkov
1 sibling, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2025-06-24 12:42 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 24.06.25 12:35, Pavel Begunkov wrote:
> There is no guaranteed alignment for user pointers, however the
> calculation of an offset of the first page into a folio after
> coalescing uses some weird bit mask logic, get rid of it.
>
> Cc: stable@vger.kernel.org
> Reported-by: David Hildenbrand <david@redhat.com>
> Fixes: a8edbb424b139 ("io_uring/rsrc: enable multi-hugepage buffer coalescing")
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
> io_uring/rsrc.c | 8 +++++++-
> io_uring/rsrc.h | 1 +
> 2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> index e83a294c718b..5132f8df600f 100644
> --- a/io_uring/rsrc.c
> +++ b/io_uring/rsrc.c
> @@ -734,6 +734,8 @@ bool io_check_coalesce_buffer(struct page **page_array, int nr_pages,
>
> data->nr_pages_mid = folio_nr_pages(folio);
> data->folio_shift = folio_shift(folio);
> + data->first_page_offset = page_array[0] - compound_head(page_array[0]);
Note: pointer arithmetic on "struct page" does not work reliably for
very large folios (eg., 1 GiB hugetlb) in all configs
(!CONFIG_SPARSEMEM_VMEMMAP)
I assume this can be
data->first_page_offset = folio_page_idx(folio, page_array[0]);
> + data->first_page_offset <<= PAGE_SHIFT;
>
> /*
> * Check if pages are contiguous inside a folio, and all folios have
> @@ -830,7 +832,11 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx,
> if (coalesced)
> imu->folio_shift = data.folio_shift;
> refcount_set(&imu->refs, 1);
> - off = (unsigned long) iov->iov_base & ((1UL << imu->folio_shift) - 1);
> +
> + off = (unsigned long)iov->iov_base & ~PAGE_MASK;
> + if (coalesced)
> + off += data.first_page_offset;
> +
> node->buf = imu;
> ret = 0;
>
> diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
> index 0d2138f16322..d823554a8817 100644
> --- a/io_uring/rsrc.h
> +++ b/io_uring/rsrc.h
> @@ -49,6 +49,7 @@ struct io_imu_folio_data {
> unsigned int nr_pages_mid;
> unsigned int folio_shift;
> unsigned int nr_folios;
> + unsigned long first_page_offset;
Heh, is it actually "first_folio_offset" ?
Alternatively, call it "first_folio_page_idx" or sth like that and leave
the shift to the user.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 2/3] io_uring/rsrc: don't rely on user vaddr alignment
2025-06-24 12:42 ` David Hildenbrand
@ 2025-06-24 12:54 ` Pavel Begunkov
0 siblings, 0 replies; 14+ messages in thread
From: Pavel Begunkov @ 2025-06-24 12:54 UTC (permalink / raw)
To: David Hildenbrand, io-uring
On 6/24/25 13:42, David Hildenbrand wrote:
> On 24.06.25 12:35, Pavel Begunkov wrote:
>> There is no guaranteed alignment for user pointers, however the
>> calculation of an offset of the first page into a folio after
>> coalescing uses some weird bit mask logic, get rid of it.
>>
>> Cc: stable@vger.kernel.org
>> Reported-by: David Hildenbrand <david@redhat.com>
>> Fixes: a8edbb424b139 ("io_uring/rsrc: enable multi-hugepage buffer coalescing")
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>> io_uring/rsrc.c | 8 +++++++-
>> io_uring/rsrc.h | 1 +
>> 2 files changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
>> index e83a294c718b..5132f8df600f 100644
>> --- a/io_uring/rsrc.c
>> +++ b/io_uring/rsrc.c
>> @@ -734,6 +734,8 @@ bool io_check_coalesce_buffer(struct page **page_array, int nr_pages,
>> data->nr_pages_mid = folio_nr_pages(folio);
>> data->folio_shift = folio_shift(folio);
>> + data->first_page_offset = page_array[0] - compound_head(page_array[0]);
>
> Note: pointer arithmetic on "struct page" does not work reliably for very large folios (eg., 1 GiB hugetlb) in all configs (!CONFIG_SPARSEMEM_VMEMMAP)
>
> I assume this can be
>
> data->first_page_offset = folio_page_idx(folio, page_array[0]);
Yep, already changed it in v2
...>> diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
>> index 0d2138f16322..d823554a8817 100644
>> --- a/io_uring/rsrc.h
>> +++ b/io_uring/rsrc.h
>> @@ -49,6 +49,7 @@ struct io_imu_folio_data {
>> unsigned int nr_pages_mid;
>> unsigned int folio_shift;
>> unsigned int nr_folios;
>> + unsigned long first_page_offset;
>
> Heh, is it actually "first_folio_offset" ?
It's an offset into the folio, or the offset of the page
in the folio, and first_folio_offset can also be interpreted
a wrong way. The usual naming problem
> Alternatively, call it "first_folio_page_idx" or sth like that and leave the shift to the user.
Let's do that
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-06-24 12:53 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-24 10:35 [PATCH v1 0/3] io_uring mm related abuses Pavel Begunkov
2025-06-24 10:35 ` [PATCH v1 1/3] io_uring/rsrc: fix folio unpinning Pavel Begunkov
2025-06-24 11:57 ` David Hildenbrand
2025-06-24 12:08 ` Pavel Begunkov
2025-06-24 10:35 ` [PATCH v1 2/3] io_uring/rsrc: don't rely on user vaddr alignment Pavel Begunkov
2025-06-24 11:53 ` David Hildenbrand
2025-06-24 12:20 ` Pavel Begunkov
2025-06-24 12:26 ` David Hildenbrand
2025-06-24 12:37 ` Pavel Begunkov
2025-06-24 12:30 ` Pavel Begunkov
2025-06-24 12:42 ` David Hildenbrand
2025-06-24 12:54 ` Pavel Begunkov
2025-06-24 10:35 ` [PATCH v1 3/3] io_uring: don't assume uaddr alignment in io_vec_fill_bvec Pavel Begunkov
2025-06-24 10:38 ` [PATCH v1 0/3] io_uring mm related abuses Pavel Begunkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox