public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Jens Axboe <axboe@kernel.dk>, Yuhao Jiang <danisjiang@gmail.com>
Cc: io-uring@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass by removing cross-buffer accounting
Date: Sat, 24 Jan 2026 11:04:31 +0000	[thread overview]
Message-ID: <9317bad6-aa89-4e93-b7d2-9e28f5d17cc8@gmail.com> (raw)
In-Reply-To: <eea0d7c3-9aed-4c1f-8146-23b82e611899@kernel.dk>

On 1/23/26 16:52, Jens Axboe wrote:
> On 1/23/26 8:04 AM, Jens Axboe wrote:
>> On 1/23/26 7:50 AM, Jens Axboe wrote:
>>> On 1/23/26 7:26 AM, Pavel Begunkov wrote:
>>>> On 1/22/26 21:51, Pavel Begunkov wrote:
>>>> ...
>>>>>>>> I already briefly touched on that earlier, for sure not going to be of
>>>>>>>> any practical concern.
>>>>>>>
>>>>>>> Modest 16 GB can give 1M entries. Assuming 50ns-100ns per entry for the
>>>>>>> xarray business, that's 50-100ms. It's all serialised, so multiply by
>>>>>>> the number of CPUs/threads, e.g. 10-100, that's 0.5-10s. Account sky
>>>>>>> high spinlock contention, and it jumps again, and there can be more
>>>>>>> memory / CPUs / numa nodes. Not saying that it's worse than the
>>>>>>> current O(n^2), I have a test program that borderline hangs the
>>>>>>> system.
...
>> Should've tried 32x32 as well, that ends up going deep into "this sucks"
>> territory:
>>
>> git
>>
>> good luck

FWIW, current scales perfectly with CPUs, so just 1 thread
should be enough for testing.

>> git + user_struct
>>
>> axboe@r7625 ~> time ./ppage 32 32
>> register 32 GB, num threads 32
>>
>> ________________________________________________________
>> Executed in   16.34 secs    fish           external

That's as precise to the calculations above as it could be, it
was 100x16GB but that should only be differ by the factor of ~1.5.
Without anchoring to this particular number, the problem is that
the wall clock runtime for the accounting will linearly depend on
the number of threads, so this 16 sec is what seemed concerning.

>>     usr time    0.54 secs  497.00 micros    0.54 secs
>>     sys time  451.94 secs   55.00 micros  451.94 secs
> 
...
> and the crazier cases:

I don't think it's even crazy, thinking of databases with lots
of caches where it wants to read to / write from. 100GB+
shouldn't be surprising.

> axboe@r7625 ~> time ./ppage 32 32
> register 32 GB, num threads 32
> 
> ________________________________________________________
> Executed in    2.81 secs    fish           external
>     usr time    0.71 secs  497.00 micros    0.71 secs
>     sys time   19.57 secs  183.00 micros   19.57 secs
> 
> which isn't insane. Obviously also needs conditional rescheduling in the
> page loops, as those can take a loooong time for large amounts of
> memory.

2.8 sec sounds like a lot as well, makes me wonder which part of
that is mm, but it mm should scale fine-ish. Surely there will be
contention on page refcounts but at least the table walk is
lockless in the best case scenario and otherwise seems to be read
protected by an rw lock.

-- 
Pavel Begunkov


  reply	other threads:[~2026-01-24 11:04 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-19  7:10 [PATCH v2] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass by removing cross-buffer accounting Yuhao Jiang
2026-01-19 17:03 ` Jens Axboe
2026-01-19 23:34   ` Yuhao Jiang
2026-01-19 23:40     ` Jens Axboe
2026-01-20  7:05       ` Yuhao Jiang
2026-01-20 12:04         ` Jens Axboe
2026-01-20 12:05         ` Pavel Begunkov
2026-01-20 17:03           ` Jens Axboe
2026-01-20 21:45             ` Pavel Begunkov
2026-01-21 14:58               ` Jens Axboe
2026-01-22 11:43                 ` Pavel Begunkov
2026-01-22 17:47                   ` Jens Axboe
2026-01-22 21:51                     ` Pavel Begunkov
2026-01-23 14:26                       ` Pavel Begunkov
2026-01-23 14:50                         ` Jens Axboe
2026-01-23 15:04                           ` Jens Axboe
2026-01-23 16:52                             ` Jens Axboe
2026-01-24 11:04                               ` Pavel Begunkov [this message]
2026-01-24 15:14                                 ` Jens Axboe
2026-01-24 15:55                                   ` Jens Axboe
2026-01-24 16:30                                     ` Pavel Begunkov
2026-01-24 18:44                                     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9317bad6-aa89-4e93-b7d2-9e28f5d17cc8@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=danisjiang@gmail.com \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox