[PATCH] mm: io_uring: allow oom-killer from io_uring

public inbox for [email protected]
 help / color / mirror / Atom feed

* [PATCH] mm: io_uring: allow oom-killer from io_uring_setup
@ 2022-01-25  5:17 Shakeel Butt
  2022-01-25 18:35 ` David Rientjes
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Shakeel Butt @ 2022-01-25  5:17 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov
  Cc: Andrew Morton, linux-mm, linux-kernel, io-uring, Shakeel Butt

On an overcommitted system which is running multiple workloads of
varying priorities, it is preferred to trigger an oom-killer to kill a
low priority workload than to let the high priority workload receiving
ENOMEMs. On our memory overcommitted systems, we are seeing a lot of
ENOMEMs instead of oom-kills because io_uring_setup callchain is using
__GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and
allow the oom-killer to kill a lower priority job.

Signed-off-by: Shakeel Butt <[email protected]>
---
 fs/io_uring.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index e54c4127422e..d9eeb202363c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8928,10 +8928,9 @@ static void io_mem_free(void *ptr)
 
 static void *io_mem_alloc(size_t size)
 {
-	gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP |
-				__GFP_NORETRY | __GFP_ACCOUNT;
+	gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP;
 
-	return (void *) __get_free_pages(gfp_flags, get_order(size));
+	return (void *) __get_free_pages(gfp, get_order(size));
 }
 
 static unsigned long rings_size(unsigned sq_entries, unsigned cq_entries,
-- 
2.35.0.rc0.227.g00780c9af4-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: io_uring: allow oom-killer from io_uring_setup
  2022-01-25  5:17 [PATCH] mm: io_uring: allow oom-killer from io_uring_setup Shakeel Butt
@ 2022-01-25 18:35 ` David Rientjes
  2022-01-25 22:57   ` Shakeel Butt
  2022-02-05  6:32 ` Shakeel Butt
  2022-02-07 15:44 ` Jens Axboe
  2 siblings, 1 reply; 6+ messages in thread
From: David Rientjes @ 2022-01-25 18:35 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Jens Axboe, Pavel Begunkov, Andrew Morton, linux-mm, linux-kernel,
	io-uring

On Mon, 24 Jan 2022, Shakeel Butt wrote:

> On an overcommitted system which is running multiple workloads of
> varying priorities, it is preferred to trigger an oom-killer to kill a
> low priority workload than to let the high priority workload receiving
> ENOMEMs. On our memory overcommitted systems, we are seeing a lot of
> ENOMEMs instead of oom-kills because io_uring_setup callchain is using
> __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and
> allow the oom-killer to kill a lower priority job.
> 

What is the size of the allocations that io_mem_alloc() is doing?

If get_order(size) > PAGE_ALLOC_COSTLY_ORDER, then this will fail even 
without the __GFP_NORETRY.  To make the guarantee that workloads are not 
receiving ENOMEM, it seems like we'd need to guarantee that allocations 
going through io_mem_alloc() are sufficiently small.

(And if we're really serious about it, then even something like a 
BUILD_BUG_ON().)

> Signed-off-by: Shakeel Butt <[email protected]>
> ---
>  fs/io_uring.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index e54c4127422e..d9eeb202363c 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -8928,10 +8928,9 @@ static void io_mem_free(void *ptr)
>  
>  static void *io_mem_alloc(size_t size)
>  {
> -	gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP |
> -				__GFP_NORETRY | __GFP_ACCOUNT;
> +	gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP;
>  
> -	return (void *) __get_free_pages(gfp_flags, get_order(size));
> +	return (void *) __get_free_pages(gfp, get_order(size));
>  }
>  
>  static unsigned long rings_size(unsigned sq_entries, unsigned cq_entries,
> -- 
> 2.35.0.rc0.227.g00780c9af4-goog
> 
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: io_uring: allow oom-killer from io_uring_setup
  2022-01-25 18:35 ` David Rientjes
@ 2022-01-25 22:57   ` Shakeel Butt
  2022-01-26  1:42     ` David Rientjes
  0 siblings, 1 reply; 6+ messages in thread
From: Shakeel Butt @ 2022-01-25 22:57 UTC (permalink / raw)
  To: David Rientjes
  Cc: Jens Axboe, Pavel Begunkov, Andrew Morton, Linux MM, LKML,
	io-uring

On Tue, Jan 25, 2022 at 10:35 AM David Rientjes <[email protected]> wrote:
>
> On Mon, 24 Jan 2022, Shakeel Butt wrote:
>
> > On an overcommitted system which is running multiple workloads of
> > varying priorities, it is preferred to trigger an oom-killer to kill a
> > low priority workload than to let the high priority workload receiving
> > ENOMEMs. On our memory overcommitted systems, we are seeing a lot of
> > ENOMEMs instead of oom-kills because io_uring_setup callchain is using
> > __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and
> > allow the oom-killer to kill a lower priority job.
> >
>
> What is the size of the allocations that io_mem_alloc() is doing?
>
> If get_order(size) > PAGE_ALLOC_COSTLY_ORDER, then this will fail even
> without the __GFP_NORETRY.  To make the guarantee that workloads are not
> receiving ENOMEM, it seems like we'd need to guarantee that allocations
> going through io_mem_alloc() are sufficiently small.
>
> (And if we're really serious about it, then even something like a
> BUILD_BUG_ON().)
>

The test case provided to me for which the user was seeing ENOMEMs was
io_uring_setup() with 64 entries (nothing else).

If I understand rings_size() calculations correctly then the 0 order
allocation was requested in io_mem_alloc().

For order > PAGE_ALLOC_COSTLY_ORDER, maybe we can use
__GFP_RETRY_MAYFAIL. It will at least do more aggressive reclaim
though I think that is a separate discussion. For this issue, we are
seeing ENOMEMs even for order 0 allocations.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: io_uring: allow oom-killer from io_uring_setup
  2022-01-25 22:57   ` Shakeel Butt
@ 2022-01-26  1:42     ` David Rientjes
  0 siblings, 0 replies; 6+ messages in thread
From: David Rientjes @ 2022-01-26  1:42 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Jens Axboe, Pavel Begunkov, Andrew Morton, Linux MM, LKML,
	io-uring

On Tue, 25 Jan 2022, Shakeel Butt wrote:

> > > On an overcommitted system which is running multiple workloads of
> > > varying priorities, it is preferred to trigger an oom-killer to kill a
> > > low priority workload than to let the high priority workload receiving
> > > ENOMEMs. On our memory overcommitted systems, we are seeing a lot of
> > > ENOMEMs instead of oom-kills because io_uring_setup callchain is using
> > > __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and
> > > allow the oom-killer to kill a lower priority job.
> > >
> >
> > What is the size of the allocations that io_mem_alloc() is doing?
> >
> > If get_order(size) > PAGE_ALLOC_COSTLY_ORDER, then this will fail even
> > without the __GFP_NORETRY.  To make the guarantee that workloads are not
> > receiving ENOMEM, it seems like we'd need to guarantee that allocations
> > going through io_mem_alloc() are sufficiently small.
> >
> > (And if we're really serious about it, then even something like a
> > BUILD_BUG_ON().)
> >
> 
> The test case provided to me for which the user was seeing ENOMEMs was
> io_uring_setup() with 64 entries (nothing else).
> 
> If I understand rings_size() calculations correctly then the 0 order
> allocation was requested in io_mem_alloc().
> 
> For order > PAGE_ALLOC_COSTLY_ORDER, maybe we can use
> __GFP_RETRY_MAYFAIL. It will at least do more aggressive reclaim
> though I think that is a separate discussion. For this issue, we are
> seeing ENOMEMs even for order 0 allocations.
> 

Ah, gotcha, thanks for the background.  IIUC, io_uring_setup() can be done 
with anything with CAP_SYS_NICE so my only concern would be whether this 
could be used maliciously on a system not using memcg, but in that case we 
can already fork many small processes that consume all memory and oom kill 
everything else on the system already.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: io_uring: allow oom-killer from io_uring_setup
  2022-01-25  5:17 [PATCH] mm: io_uring: allow oom-killer from io_uring_setup Shakeel Butt
  2022-01-25 18:35 ` David Rientjes
@ 2022-02-05  6:32 ` Shakeel Butt
  2022-02-07 15:44 ` Jens Axboe
  2 siblings, 0 replies; 6+ messages in thread
From: Shakeel Butt @ 2022-02-05  6:32 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov; +Cc: Andrew Morton, Linux MM, LKML, io-uring

On Mon, Jan 24, 2022 at 9:17 PM Shakeel Butt <[email protected]> wrote:
>
> On an overcommitted system which is running multiple workloads of
> varying priorities, it is preferred to trigger an oom-killer to kill a
> low priority workload than to let the high priority workload receiving
> ENOMEMs. On our memory overcommitted systems, we are seeing a lot of
> ENOMEMs instead of oom-kills because io_uring_setup callchain is using
> __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and
> allow the oom-killer to kill a lower priority job.
>
> Signed-off-by: Shakeel Butt <[email protected]>

Jens, any comments or concerns on this patch?

> ---
>  fs/io_uring.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index e54c4127422e..d9eeb202363c 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -8928,10 +8928,9 @@ static void io_mem_free(void *ptr)
>
>  static void *io_mem_alloc(size_t size)
>  {
> -       gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP |
> -                               __GFP_NORETRY | __GFP_ACCOUNT;
> +       gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP;
>
> -       return (void *) __get_free_pages(gfp_flags, get_order(size));
> +       return (void *) __get_free_pages(gfp, get_order(size));
>  }
>
>  static unsigned long rings_size(unsigned sq_entries, unsigned cq_entries,
> --
> 2.35.0.rc0.227.g00780c9af4-goog
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: io_uring: allow oom-killer from io_uring_setup
  2022-01-25  5:17 [PATCH] mm: io_uring: allow oom-killer from io_uring_setup Shakeel Butt
  2022-01-25 18:35 ` David Rientjes
  2022-02-05  6:32 ` Shakeel Butt
@ 2022-02-07 15:44 ` Jens Axboe
  2 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2022-02-07 15:44 UTC (permalink / raw)
  To: Pavel Begunkov, Shakeel Butt
  Cc: linux-kernel, linux-mm, io-uring, Andrew Morton

On Mon, 24 Jan 2022 21:17:36 -0800, Shakeel Butt wrote:
> On an overcommitted system which is running multiple workloads of
> varying priorities, it is preferred to trigger an oom-killer to kill a
> low priority workload than to let the high priority workload receiving
> ENOMEMs. On our memory overcommitted systems, we are seeing a lot of
> ENOMEMs instead of oom-kills because io_uring_setup callchain is using
> __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and
> allow the oom-killer to kill a lower priority job.
> 
> [...]

Applied, thanks!

[1/1] mm: io_uring: allow oom-killer from io_uring_setup
      commit: 0a3f1e0beacf6cc8ae5f846b0641c1df476e83d6

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-02-07 15:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-25  5:17 [PATCH] mm: io_uring: allow oom-killer from io_uring_setup Shakeel Butt
2022-01-25 18:35 ` David Rientjes
2022-01-25 22:57   ` Shakeel Butt
2022-01-26  1:42     ` David Rientjes
2022-02-05  6:32 ` Shakeel Butt
2022-02-07 15:44 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox