public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v2 0/1] Add a sysctl to disable io_uring system-wide
@ 2023-06-29 13:27 Matteo Rizzo
  2023-06-29 13:27 ` [PATCH v2 1/1] Add a new " Matteo Rizzo
  0 siblings, 1 reply; 8+ messages in thread
From: Matteo Rizzo @ 2023-06-29 13:27 UTC (permalink / raw)
  To: linux-doc, linux-kernel, io-uring
  Cc: matteorizzo, jordyzomer, evn, poprdi, corbet, axboe, asml.silence,
	akpm, keescook, rostedt, dave.hansen, ribalda, chenhuacai, steve,
	gpiccoli, ldufour, bhe, oleksandr

Over the last few years we've seen many critical vulnerabilities in
io_uring[1] which could be exploited by an unprivileged process to gain
control over the kernel. This patch introduces a new sysctl which disables
the creation of new io_uring instances system-wide.

The goal of this patch is to give distros, system admins, and cloud
providers a way to reduce the risk of privilege escalation through io_uring
where disabling it with seccomp or at compile time is not practical. For
example a distro or cloud provider might want to disable io_uring by
default and have users enable it again if they need to run a program that
requires it. The new sysctl is designed to let a user with root on the
machine enable and disable io_uring systemwide at runtime without requiring
a kernel recompilation or a reboot.

[1] Link: https://goo.gle/limit-iouring

---
v2:
	* Documentation style fixes
	* Add a third level that only disables io_uring for unprivileged
	  processes


Matteo Rizzo (1):
  Add a new sysctl to disable io_uring system-wide

 Documentation/admin-guide/sysctl/kernel.rst | 19 +++++++++++++
 io_uring/io_uring.c                         | 30 +++++++++++++++++++++
 2 files changed, 49 insertions(+)

-- 
2.41.0.162.gfafddb0af9-goog


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/1] Add a new sysctl to disable io_uring system-wide
  2023-06-29 13:27 [PATCH v2 0/1] Add a sysctl to disable io_uring system-wide Matteo Rizzo
@ 2023-06-29 13:27 ` Matteo Rizzo
  2023-06-29 15:15   ` Bart Van Assche
                     ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Matteo Rizzo @ 2023-06-29 13:27 UTC (permalink / raw)
  To: linux-doc, linux-kernel, io-uring
  Cc: matteorizzo, jordyzomer, evn, poprdi, corbet, axboe, asml.silence,
	akpm, keescook, rostedt, dave.hansen, ribalda, chenhuacai, steve,
	gpiccoli, ldufour, bhe, oleksandr

Introduce a new sysctl (io_uring_disabled) which can be either 0, 1,
or 2. When 0 (the default), all processes are allowed to create io_uring
instances, which is the current behavior. When 1, all calls to
io_uring_setup fail with -EPERM unless the calling process has
CAP_SYS_ADMIN. When 2, calls to io_uring_setup fail with -EPERM
regardless of privilege.

Signed-off-by: Matteo Rizzo <[email protected]>
---
 Documentation/admin-guide/sysctl/kernel.rst | 19 +++++++++++++
 io_uring/io_uring.c                         | 30 +++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 3800fab1619b..ee65f7aeb0cf 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -450,6 +450,25 @@ this allows system administrators to override the
 ``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.
 
 
+io_uring_disabled
+=================
+
+Prevents all processes from creating new io_uring instances. Enabling this
+shrinks the kernel's attack surface.
+
+= ==================================================================
+0 All processes can create io_uring instances as normal. This is the
+  default setting.
+1 io_uring creation is disabled for unprivileged processes.
+  io_uring_setup fails with -EPERM unless the calling process is
+  privileged (CAP_SYS_ADMIN). Existing io_uring instances can
+  still be used.
+2 io_uring creation is disabled for all processes. io_uring_setup
+  always fails with -EPERM. Existing io_uring instances can still be
+  used.
+= ==================================================================
+
+
 kexec_load_disabled
 ===================
 
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 1b53a2ab0a27..2343ae518546 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -153,6 +153,22 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx);
 
 struct kmem_cache *req_cachep;
 
+static int __read_mostly sysctl_io_uring_disabled;
+#ifdef CONFIG_SYSCTL
+static struct ctl_table kernel_io_uring_disabled_table[] = {
+	{
+		.procname	= "io_uring_disabled",
+		.data		= &sysctl_io_uring_disabled,
+		.maxlen		= sizeof(sysctl_io_uring_disabled),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_TWO,
+	},
+	{},
+};
+#endif
+
 struct sock *io_uring_get_socket(struct file *file)
 {
 #if defined(CONFIG_UNIX)
@@ -4000,9 +4016,18 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
 	return io_uring_create(entries, &p, params);
 }
 
+static inline bool io_uring_allowed(void)
+{
+	return sysctl_io_uring_disabled == 0 ||
+		(sysctl_io_uring_disabled == 1 && capable(CAP_SYS_ADMIN));
+}
+
 SYSCALL_DEFINE2(io_uring_setup, u32, entries,
 		struct io_uring_params __user *, params)
 {
+	if (!io_uring_allowed())
+		return -EPERM;
+
 	return io_uring_setup(entries, params);
 }
 
@@ -4577,6 +4602,11 @@ static int __init io_uring_init(void)
 
 	req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC |
 				SLAB_ACCOUNT | SLAB_TYPESAFE_BY_RCU);
+
+#ifdef CONFIG_SYSCTL
+	register_sysctl_init("kernel", kernel_io_uring_disabled_table);
+#endif
+
 	return 0;
 };
 __initcall(io_uring_init);
-- 
2.41.0.162.gfafddb0af9-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] Add a new sysctl to disable io_uring system-wide
  2023-06-29 13:27 ` [PATCH v2 1/1] Add a new " Matteo Rizzo
@ 2023-06-29 15:15   ` Bart Van Assche
  2023-06-29 15:28     ` Matteo Rizzo
  2023-06-29 16:17   ` Jeff Moyer
  2023-06-29 18:36   ` Gabriel Krisman Bertazi
  2 siblings, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2023-06-29 15:15 UTC (permalink / raw)
  To: Matteo Rizzo, linux-doc, linux-kernel, io-uring
  Cc: jordyzomer, evn, poprdi, corbet, axboe, asml.silence, akpm,
	keescook, rostedt, dave.hansen, ribalda, chenhuacai, steve,
	gpiccoli, ldufour, bhe, oleksandr

On 6/29/23 06:27, Matteo Rizzo wrote:
> +static int __read_mostly sysctl_io_uring_disabled;

Shouldn't this be a static key instead of an int in order to minimize the
performance impact on the io_uring_setup() system call? See also
Documentation/staging/static-keys.rst.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] Add a new sysctl to disable io_uring system-wide
  2023-06-29 15:15   ` Bart Van Assche
@ 2023-06-29 15:28     ` Matteo Rizzo
  2023-06-29 17:37       ` Bart Van Assche
  0 siblings, 1 reply; 8+ messages in thread
From: Matteo Rizzo @ 2023-06-29 15:28 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-doc, linux-kernel, io-uring, jordyzomer, evn, poprdi,
	corbet, axboe, asml.silence, akpm, keescook, rostedt, dave.hansen,
	ribalda, chenhuacai, steve, gpiccoli, ldufour, bhe, oleksandr

On Thu, 29 Jun 2023 at 17:16, Bart Van Assche <[email protected]> wrote:
>
> On 6/29/23 06:27, Matteo Rizzo wrote:
> > +static int __read_mostly sysctl_io_uring_disabled;
>
> Shouldn't this be a static key instead of an int in order to minimize the
> performance impact on the io_uring_setup() system call? See also
> Documentation/staging/static-keys.rst.
>
> Thanks,
>
> Bart.

Is io_uring_setup in any hot path? io_uring_create is marked as __cold.

--
Matteo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] Add a new sysctl to disable io_uring system-wide
  2023-06-29 13:27 ` [PATCH v2 1/1] Add a new " Matteo Rizzo
  2023-06-29 15:15   ` Bart Van Assche
@ 2023-06-29 16:17   ` Jeff Moyer
  2023-06-29 18:36   ` Gabriel Krisman Bertazi
  2 siblings, 0 replies; 8+ messages in thread
From: Jeff Moyer @ 2023-06-29 16:17 UTC (permalink / raw)
  To: Matteo Rizzo
  Cc: linux-doc, linux-kernel, io-uring, jordyzomer, evn, poprdi,
	corbet, axboe, asml.silence, akpm, keescook, rostedt, dave.hansen,
	ribalda, chenhuacai, steve, gpiccoli, ldufour, bhe, oleksandr

Matteo Rizzo <[email protected]> writes:

> Introduce a new sysctl (io_uring_disabled) which can be either 0, 1,
> or 2. When 0 (the default), all processes are allowed to create io_uring
> instances, which is the current behavior. When 1, all calls to
> io_uring_setup fail with -EPERM unless the calling process has
> CAP_SYS_ADMIN. When 2, calls to io_uring_setup fail with -EPERM
> regardless of privilege.
>
> Signed-off-by: Matteo Rizzo <[email protected]>

This looks good to me.  You may also consider updating the
io_uring_setup(2) man page (part of liburing) to reflect this new
meaning for -EPERM.

Reviewed-by: Jeff Moyer <[email protected]>

> ---
>  Documentation/admin-guide/sysctl/kernel.rst | 19 +++++++++++++
>  io_uring/io_uring.c                         | 30 +++++++++++++++++++++
>  2 files changed, 49 insertions(+)
>
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 3800fab1619b..ee65f7aeb0cf 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -450,6 +450,25 @@ this allows system administrators to override the
>  ``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.
>  
>  
> +io_uring_disabled
> +=================
> +
> +Prevents all processes from creating new io_uring instances. Enabling this
> +shrinks the kernel's attack surface.
> +
> += ==================================================================
> +0 All processes can create io_uring instances as normal. This is the
> +  default setting.
> +1 io_uring creation is disabled for unprivileged processes.
> +  io_uring_setup fails with -EPERM unless the calling process is
> +  privileged (CAP_SYS_ADMIN). Existing io_uring instances can
> +  still be used.
> +2 io_uring creation is disabled for all processes. io_uring_setup
> +  always fails with -EPERM. Existing io_uring instances can still be
> +  used.
> += ==================================================================
> +
> +
>  kexec_load_disabled
>  ===================
>  
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 1b53a2ab0a27..2343ae518546 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -153,6 +153,22 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx);
>  
>  struct kmem_cache *req_cachep;
>  
> +static int __read_mostly sysctl_io_uring_disabled;
> +#ifdef CONFIG_SYSCTL
> +static struct ctl_table kernel_io_uring_disabled_table[] = {
> +	{
> +		.procname	= "io_uring_disabled",
> +		.data		= &sysctl_io_uring_disabled,
> +		.maxlen		= sizeof(sysctl_io_uring_disabled),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= SYSCTL_ZERO,
> +		.extra2		= SYSCTL_TWO,
> +	},
> +	{},
> +};
> +#endif
> +
>  struct sock *io_uring_get_socket(struct file *file)
>  {
>  #if defined(CONFIG_UNIX)
> @@ -4000,9 +4016,18 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
>  	return io_uring_create(entries, &p, params);
>  }
>  
> +static inline bool io_uring_allowed(void)
> +{
> +	return sysctl_io_uring_disabled == 0 ||
> +		(sysctl_io_uring_disabled == 1 && capable(CAP_SYS_ADMIN));
> +}
> +
>  SYSCALL_DEFINE2(io_uring_setup, u32, entries,
>  		struct io_uring_params __user *, params)
>  {
> +	if (!io_uring_allowed())
> +		return -EPERM;
> +
>  	return io_uring_setup(entries, params);
>  }
>  
> @@ -4577,6 +4602,11 @@ static int __init io_uring_init(void)
>  
>  	req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC |
>  				SLAB_ACCOUNT | SLAB_TYPESAFE_BY_RCU);
> +
> +#ifdef CONFIG_SYSCTL
> +	register_sysctl_init("kernel", kernel_io_uring_disabled_table);
> +#endif
> +
>  	return 0;
>  };
>  __initcall(io_uring_init);


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] Add a new sysctl to disable io_uring system-wide
  2023-06-29 15:28     ` Matteo Rizzo
@ 2023-06-29 17:37       ` Bart Van Assche
  0 siblings, 0 replies; 8+ messages in thread
From: Bart Van Assche @ 2023-06-29 17:37 UTC (permalink / raw)
  To: Matteo Rizzo
  Cc: linux-doc, linux-kernel, io-uring, jordyzomer, evn, poprdi,
	corbet, axboe, asml.silence, akpm, keescook, rostedt, dave.hansen,
	ribalda, chenhuacai, steve, gpiccoli, ldufour, bhe, oleksandr

On 6/29/23 08:28, Matteo Rizzo wrote:
> On Thu, 29 Jun 2023 at 17:16, Bart Van Assche <[email protected]> wrote:
>>
>> On 6/29/23 06:27, Matteo Rizzo wrote:
>>> +static int __read_mostly sysctl_io_uring_disabled;
>>
>> Shouldn't this be a static key instead of an int in order to minimize the
>> performance impact on the io_uring_setup() system call? See also
>> Documentation/staging/static-keys.rst.
>>
> Is io_uring_setup in any hot path? io_uring_create is marked as __cold.

I confused io_uring_setup() with io_uring_enter() so please ignore my comment.

Bart.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] Add a new sysctl to disable io_uring system-wide
  2023-06-29 13:27 ` [PATCH v2 1/1] Add a new " Matteo Rizzo
  2023-06-29 15:15   ` Bart Van Assche
  2023-06-29 16:17   ` Jeff Moyer
@ 2023-06-29 18:36   ` Gabriel Krisman Bertazi
  2023-06-30 15:04     ` Matteo Rizzo
  2 siblings, 1 reply; 8+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-06-29 18:36 UTC (permalink / raw)
  To: Matteo Rizzo
  Cc: linux-doc, linux-kernel, io-uring, jordyzomer, evn, poprdi,
	corbet, axboe, asml.silence, akpm, keescook, rostedt, dave.hansen,
	ribalda, chenhuacai, steve, gpiccoli, ldufour, bhe, oleksandr

Matteo Rizzo <[email protected]> writes:

> Introduce a new sysctl (io_uring_disabled) which can be either 0, 1,
> or 2. When 0 (the default), all processes are allowed to create io_uring
> instances, which is the current behavior. When 1, all calls to
> io_uring_setup fail with -EPERM unless the calling process has
> CAP_SYS_ADMIN. When 2, calls to io_uring_setup fail with -EPERM
> regardless of privilege.
>
> Signed-off-by: Matteo Rizzo <[email protected]>
> ---

Thanks for adding the extra level for root-only rings.

The patch looks good to me.

Reviewed-by: Gabriel Krisman Bertazi <[email protected]>

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] Add a new sysctl to disable io_uring system-wide
  2023-06-29 18:36   ` Gabriel Krisman Bertazi
@ 2023-06-30 15:04     ` Matteo Rizzo
  0 siblings, 0 replies; 8+ messages in thread
From: Matteo Rizzo @ 2023-06-30 15:04 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: linux-doc, linux-kernel, io-uring, jordyzomer, evn, poprdi,
	corbet, axboe, asml.silence, akpm, keescook, rostedt, dave.hansen,
	ribalda, chenhuacai, steve, gpiccoli, ldufour, bhe, oleksandr,
	Bart Van Assche, jmoyer, Jann Horn

On Thu, 29 Jun 2023 at 20:36, Gabriel Krisman Bertazi <[email protected]> wrote:
>
> Thanks for adding the extra level for root-only rings.
>
> The patch looks good to me.
>
> Reviewed-by: Gabriel Krisman Bertazi <[email protected]>

Thanks everyone for the reviews! Unfortunately I forgot the subsystem name
in the commit message. Jann also pointed out to me internally that the
check in io_uring_allowed could race with another process that is trying to
change the sysctl. I will send a v3 that fixes both issues.

Thanks,
--
Matteo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-06-30 15:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-29 13:27 [PATCH v2 0/1] Add a sysctl to disable io_uring system-wide Matteo Rizzo
2023-06-29 13:27 ` [PATCH v2 1/1] Add a new " Matteo Rizzo
2023-06-29 15:15   ` Bart Van Assche
2023-06-29 15:28     ` Matteo Rizzo
2023-06-29 17:37       ` Bart Van Assche
2023-06-29 16:17   ` Jeff Moyer
2023-06-29 18:36   ` Gabriel Krisman Bertazi
2023-06-30 15:04     ` Matteo Rizzo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox