Re: [PATCH] io_uring: make overflowing cqe subject to OOM

public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@kernel.dk>
To: Alexandre Negrel <alexandre@negrel.dev>
Cc: io-uring@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] io_uring: make overflowing cqe subject to OOM
Date: Tue, 30 Dec 2025 09:01:46 -0700	[thread overview]
Message-ID: <d2e534de-f478-4dc6-8f17-a080275c2c5f@kernel.dk> (raw)
In-Reply-To: <5YHjvAsQKKhRWwp95PB0tGlW7nmplpjVW0b5mruoUD73qmg89ntObcPe63oCPf1mhBUh-Y3ARNMcPueF2dUttoWCyWv_KiG3VMIbguuOJHY=@negrel.dev>

On 12/30/25 7:50 AM, Alexandre Negrel wrote:
>> I'm assuming the issue here is that memcg will look at __GFP_HIGH
>> somehow and allow it to proceed?
> 
> Exactly, the allocation succeed even though it exceed cgroup limits.
> After digging through try_charge_memcg(), it seems that OOM killer
> isn't involved unless __GFP_DIRECT_RECLAIM bit is set (see
> gfpflags_allow_blocking).
> 
> https://github.com/torvalds/linux/blob/8640b74557fc8b4c300030f6ccb8cd078f665ec8/mm/memcontrol.c#L2329
> https://github.com/torvalds/linux/blob/8640b74557fc8b4c300030f6ccb8cd078f665ec8/include/linux/gfp.h#L38
> 
>> In any case, then below should then do the same. Can you test?
> 
> I tried it and it seems to fix the issue but in a different way.
> try_charge_memcg now returns -ENOMEM and the allocation failed. The
> completion queue entry is "dropped on the floor" in
> io_cqring_add_overflow.
>
> So I see 3 options here:
> * use GFP_NOWAIT if dropping CQE is ok

We're utterly out of memory at that point, so something has to give. We
can't invent memory out of thin air. Hence dropping the event, and
logging it as such, is imho the way to go. Same thing would've happened
with GFP_ATOMIC, just a bit earlier in the process.

It's worth noting that this is extreme circumstances - the kernel is
completely out of memory, and this will cause various spurious failures
to complete syscalls or other events. Additionally, this is the non
DEFER_TASKRUN case, which is what people should be using anyway.

> * allocate using GFP_KERNEL_ACCOUNT without holding the lock then adding
>   overflowing entries while holding the completion_lock (iterating twice over
>   compl_reqs)

Only viable way to do that would be to allocate it upfront, which is a
huge waste of time for the normal case where the CQ ring isn't
overflowing. We should not optimize for the slow/broken case, where
userspace overflows the ring.

> * charge memory after releasing the lock. I don't know if this is possible but
>   doing kfree(kmalloc(1, GFP_KERNEL_ACCOUNT)) after releasing the lock does the
>   job (even though it's dirty).

And that's definitely a no-go as well.

-- 
Jens Axboe

next prev parent reply	other threads:[~2025-12-30 16:01 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-29 20:19 [PATCH] io_uring: make overflowing cqe subject to OOM Alexandre Negrel
2025-12-30  0:23 ` Jens Axboe
2025-12-30 14:50   ` Alexandre Negrel
2025-12-30 16:01     ` Jens Axboe [this message]
2025-12-30  7:15 ` [syzbot ci] " syzbot ci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d2e534de-f478-4dc6-8f17-a080275c2c5f@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=alexandre@negrel.dev \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox