Re: [PATCH v2] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass by removing cross-buffer accounting

public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed

From: Pavel Begunkov <asml.silence@gmail.com>
To: Jens Axboe <axboe@kernel.dk>, Yuhao Jiang <danisjiang@gmail.com>
Cc: io-uring@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass by removing cross-buffer accounting
Date: Thu, 22 Jan 2026 11:43:28 +0000	[thread overview]
Message-ID: <d2fc2ff2-98d9-49f8-af95-968100174d55@gmail.com> (raw)
In-Reply-To: <2fcf583a-f521-4e8d-9a89-0985681ca85b@kernel.dk>

On 1/21/26 14:58, Jens Axboe wrote:
> On 1/20/26 2:45 PM, Pavel Begunkov wrote:
>> On 1/20/26 17:03, Jens Axboe wrote:
>>> On 1/20/26 5:05 AM, Pavel Begunkov wrote:
>>>> On 1/20/26 07:05, Yuhao Jiang wrote:
>> ...
>>>>>
>>>>> I've been implementing the xarray-based ref tracking approach for v3.
>>>>> While working on it, I discovered an issue with buffer cloning.
>>>>>
>>>>> If ctx1 has two buffers sharing a huge page, ctx1->hpage_acct[page] = 2.
>>>>> Clone to ctx2, now both have a refcount of 2. On cleanup both hit zero
>>>>> and unaccount, so we double-unaccount and user->locked_vm goes negative.
>>>>>
>>>>> The per-context xarray can't coordinate across clones - each context
>>>>> tracks its own refcount independently. I think we either need a global
>>>>> xarray (shared across all contexts), or just go back to v2. What do
>>>>> you think?
>>>>
>>>> The Jens' diff is functionally equivalent to your v1 and has
>>>> exactly same problems. Global tracking won't work well.
>>>
>>> Why not? My thinking was that we just use xa_lock() for this, with
>>> a global xarray. It's not like register+unregister is a high frequency
>>> thing. And if they are, then we've got much bigger problems than the
>>> single lock as the runtime complexity isn't ideal.
>>
>> 1. There could be quite a lot of entries even for a single ring
>> with realistic amount of memory. If lots of threads start up
>> at the same time taking it in a loop, it might become a chocking
>> point for large systems. Should be even more spectacular for
>> some numa setups.
> 
> I already briefly touched on that earlier, for sure not going to be of
> any practical concern.

Modest 16 GB can give 1M entries. Assuming 50ns-100ns per entry for the
xarray business, that's 50-100ms. It's all serialised, so multiply by
the number of CPUs/threads, e.g. 10-100, that's 0.5-10s. Account sky
high spinlock contention, and it jumps again, and there can be more
memory / CPUs / numa nodes. Not saying that it's worse than the
current O(n^2), I have a test program that borderline hangs the
system.

Look, I don't care what it'd be, whether it stutters or blows up the
kernel, I only took a quick look since you pinged me and was asking
"why not". If you don't want to consider my reasoning, as the
maintainer you can merge whatever you like, and it'll be easier for
me as I won't be wasting more time.

-- 
Pavel Begunkov

next prev parent reply	other threads:[~2026-01-22 11:43 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-19  7:10 [PATCH v2] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass by removing cross-buffer accounting Yuhao Jiang
2026-01-19 17:03 ` Jens Axboe
2026-01-19 23:34   ` Yuhao Jiang
2026-01-19 23:40     ` Jens Axboe
2026-01-20  7:05       ` Yuhao Jiang
2026-01-20 12:04         ` Jens Axboe
2026-01-20 12:05         ` Pavel Begunkov
2026-01-20 17:03           ` Jens Axboe
2026-01-20 21:45             ` Pavel Begunkov
2026-01-21 14:58               ` Jens Axboe
2026-01-22 11:43                 ` Pavel Begunkov [this message]
2026-01-22 17:47                   ` Jens Axboe
2026-01-22 21:51                     ` Pavel Begunkov
2026-01-23 14:26                       ` Pavel Begunkov
2026-01-23 14:50                         ` Jens Axboe
2026-01-23 15:04                           ` Jens Axboe
2026-01-23 16:52                             ` Jens Axboe
2026-01-24 11:04                               ` Pavel Begunkov
2026-01-24 15:14                                 ` Jens Axboe
2026-01-24 15:55                                   ` Jens Axboe
2026-01-24 16:30                                     ` Pavel Begunkov
2026-01-24 18:44                                     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d2fc2ff2-98d9-49f8-af95-968100174d55@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=danisjiang@gmail.com \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox