public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH 0/1] Add a sysctl to disable io_uring system-wide
@ 2023-06-27 12:00 Matteo Rizzo
  2023-06-27 12:00 ` [PATCH 1/1] Add a new " Matteo Rizzo
  0 siblings, 1 reply; 11+ messages in thread
From: Matteo Rizzo @ 2023-06-27 12:00 UTC (permalink / raw)
  To: linux-doc, linux-kernel, io-uring
  Cc: matteorizzo, jordyzomer, evn, poprdi, corbet, axboe,
	asml.silence, akpm, keescook, rostedt, dave.hansen, ribalda,
	chenhuacai, steve, gpiccoli, ldufour

Over the last few years we've seen many critical vulnerabilities in
io_uring (https://goo.gle/limit-iouring) which could be exploited by
an unprivileged process. There is currently no way to disable io_uring
system-wide except by compiling it out of the kernel entirely. The only
way to prevent a process from accessing io_uring is to use a seccomp
filter, but seccomp cannot be applied system-wide. This patch introduces a
new sysctl which disables the creation of new io_uring instances
system-wide. This gives system admins a way to reduce the kernel's attack
surface on systems where io_uring is not used.


Matteo Rizzo (1):
  Add a new sysctl to disable io_uring system-wide

 Documentation/admin-guide/sysctl/kernel.rst | 14 ++++++++++++
 io_uring/io_uring.c                         | 24 +++++++++++++++++++++
 2 files changed, 38 insertions(+)

-- 
2.41.0.162.gfafddb0af9-goog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/1] Add a new sysctl to disable io_uring system-wide
  2023-06-27 12:00 [PATCH 0/1] Add a sysctl to disable io_uring system-wide Matteo Rizzo
@ 2023-06-27 12:00 ` Matteo Rizzo
  2023-06-27 16:23   ` Randy Dunlap
                     ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Matteo Rizzo @ 2023-06-27 12:00 UTC (permalink / raw)
  To: linux-doc, linux-kernel, io-uring
  Cc: matteorizzo, jordyzomer, evn, poprdi, corbet, axboe,
	asml.silence, akpm, keescook, rostedt, dave.hansen, ribalda,
	chenhuacai, steve, gpiccoli, ldufour

Introduce a new sysctl (io_uring_disabled) which can be either 0 or 1.
When 0 (the default), all processes are allowed to create io_uring
instances, which is the current behavior. When 1, all calls to
io_uring_setup fail with -EPERM.

Signed-off-by: Matteo Rizzo <[email protected]>
---
 Documentation/admin-guide/sysctl/kernel.rst | 14 ++++++++++++
 io_uring/io_uring.c                         | 24 +++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index d85d90f5d000..3c53a238332a 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -450,6 +450,20 @@ this allows system administrators to override the
 ``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.
 
 
+io_uring_disabled
+=========================
+
+Prevents all processes from creating new io_uring instances. Enabling this
+shrinks the kernel's attack surface.
+
+= =============================================================
+0 All processes can create io_uring instances as normal. This is the default
+  setting.
+1 io_uring is disabled. io_uring_setup always fails with -EPERM. Existing
+  io_uring instances can still be used.
+= =============================================================
+
+
 kexec_load_disabled
 ===================
 
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 1b53a2ab0a27..0496ae7017f7 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -153,6 +153,22 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx);
 
 struct kmem_cache *req_cachep;
 
+static int __read_mostly sysctl_io_uring_disabled;
+#ifdef CONFIG_SYSCTL
+static struct ctl_table kernel_io_uring_disabled_table[] = {
+	{
+		.procname	= "io_uring_disabled",
+		.data		= &sysctl_io_uring_disabled,
+		.maxlen		= sizeof(sysctl_io_uring_disabled),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
+	{},
+};
+#endif
+
 struct sock *io_uring_get_socket(struct file *file)
 {
 #if defined(CONFIG_UNIX)
@@ -4003,6 +4019,9 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
 SYSCALL_DEFINE2(io_uring_setup, u32, entries,
 		struct io_uring_params __user *, params)
 {
+	if (sysctl_io_uring_disabled)
+		return -EPERM;
+
 	return io_uring_setup(entries, params);
 }
 
@@ -4577,6 +4596,11 @@ static int __init io_uring_init(void)
 
 	req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC |
 				SLAB_ACCOUNT | SLAB_TYPESAFE_BY_RCU);
+
+#ifdef CONFIG_SYSCTL
+	register_sysctl_init("kernel", kernel_io_uring_disabled_table);
+#endif
+
 	return 0;
 };
 __initcall(io_uring_init);
-- 
2.41.0.162.gfafddb0af9-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide
  2023-06-27 12:00 ` [PATCH 1/1] Add a new " Matteo Rizzo
@ 2023-06-27 16:23   ` Randy Dunlap
  2023-06-27 17:10   ` Bart Van Assche
  2023-06-28 13:50   ` Gabriel Krisman Bertazi
  2 siblings, 0 replies; 11+ messages in thread
From: Randy Dunlap @ 2023-06-27 16:23 UTC (permalink / raw)
  To: Matteo Rizzo, linux-doc, linux-kernel, io-uring
  Cc: jordyzomer, evn, poprdi, corbet, axboe, asml.silence, akpm,
	keescook, rostedt, dave.hansen, ribalda, chenhuacai, steve,
	gpiccoli, ldufour

Hi--

On 6/27/23 05:00, Matteo Rizzo wrote:
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index d85d90f5d000..3c53a238332a 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -450,6 +450,20 @@ this allows system administrators to override the
>  ``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.
>  
>  
> +io_uring_disabled
> +=========================
> +
> +Prevents all processes from creating new io_uring instances. Enabling this
> +shrinks the kernel's attack surface.
> +
> += =============================================================
> +0 All processes can create io_uring instances as normal. This is the default
> +  setting.
> +1 io_uring is disabled. io_uring_setup always fails with -EPERM. Existing
> +  io_uring instances can still be used.
> += =============================================================

These table lines should be extended at least as far as the text that they
enclose. I.e., the top and bottom lines should be like:

> += ==========================================================================

thanks.
-- 
~Randy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide
  2023-06-27 12:00 ` [PATCH 1/1] Add a new " Matteo Rizzo
  2023-06-27 16:23   ` Randy Dunlap
@ 2023-06-27 17:10   ` Bart Van Assche
  2023-06-27 18:15     ` Matteo Rizzo
  2023-06-28 13:50   ` Gabriel Krisman Bertazi
  2 siblings, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2023-06-27 17:10 UTC (permalink / raw)
  To: Matteo Rizzo, linux-doc, linux-kernel, io-uring
  Cc: jordyzomer, evn, poprdi, corbet, axboe, asml.silence, akpm,
	keescook, rostedt, dave.hansen, ribalda, chenhuacai, steve,
	gpiccoli, ldufour

On 6/27/23 05:00, Matteo Rizzo wrote:
> +Prevents all processes from creating new io_uring instances. Enabling this
> +shrinks the kernel's attack surface.
> +
> += =============================================================
> +0 All processes can create io_uring instances as normal. This is the default
> +  setting.
> +1 io_uring is disabled. io_uring_setup always fails with -EPERM. Existing
> +  io_uring instances can still be used.
> += =============================================================

I'm using fio + io_uring all the time on Android devices. I think we need a
better solution than disabling io_uring system-wide, e.g. a mechanism based
on SELinux that disables io_uring for apps and that keeps io_uring enabled
for processes started via 'adb root && adb shell ...'

Bart.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide
  2023-06-27 17:10   ` Bart Van Assche
@ 2023-06-27 18:15     ` Matteo Rizzo
  2023-06-28 11:36       ` Ricardo Ribalda
  0 siblings, 1 reply; 11+ messages in thread
From: Matteo Rizzo @ 2023-06-27 18:15 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-doc, linux-kernel, io-uring, jordyzomer, evn, poprdi,
	corbet, axboe, asml.silence, akpm, keescook, rostedt,
	dave.hansen, ribalda, chenhuacai, steve, gpiccoli, ldufour

On Tue, 27 Jun 2023 at 19:10, Bart Van Assche <[email protected]> wrote:
> I'm using fio + io_uring all the time on Android devices. I think we need a
> better solution than disabling io_uring system-wide, e.g. a mechanism based
> on SELinux that disables io_uring for apps and that keeps io_uring enabled
> for processes started via 'adb root && adb shell ...'

Android already uses seccomp to prevent untrusted applications from using
io_uring. This patch is aimed at server/desktop environments where there is
no easy way to set a system-wide seccomp policy and right now the only way
to disable io_uring system-wide is to compile it out of the kernel entirely
(not really feasible for e.g. a general-purpose distro).

I thought about adding a capability check that lets privileged processes
bypass this sysctl, but it wasn't clear to me which capability I should use.
For userfaultfd the kernel uses CAP_SYS_PTRACE, but I wasn't sure that's
the best choice here since io_uring has nothing to do with ptrace.
If anyone has any suggestions please let me know. A LSM hook also sounds
like an option but it would be more complicated to implement and use.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide
  2023-06-27 18:15     ` Matteo Rizzo
@ 2023-06-28 11:36       ` Ricardo Ribalda
  2023-06-28 15:12         ` Matteo Rizzo
  0 siblings, 1 reply; 11+ messages in thread
From: Ricardo Ribalda @ 2023-06-28 11:36 UTC (permalink / raw)
  To: Matteo Rizzo
  Cc: Bart Van Assche, linux-doc, linux-kernel, io-uring, jordyzomer,
	evn, poprdi, corbet, axboe, asml.silence, akpm, keescook,
	rostedt, dave.hansen, chenhuacai, steve, gpiccoli, ldufour

Hi Matteo

On Tue, 27 Jun 2023 at 20:15, Matteo Rizzo <[email protected]> wrote:
>
> On Tue, 27 Jun 2023 at 19:10, Bart Van Assche <[email protected]> wrote:
> > I'm using fio + io_uring all the time on Android devices. I think we need a
> > better solution than disabling io_uring system-wide, e.g. a mechanism based
> > on SELinux that disables io_uring for apps and that keeps io_uring enabled
> > for processes started via 'adb root && adb shell ...'
>
> Android already uses seccomp to prevent untrusted applications from using
> io_uring. This patch is aimed at server/desktop environments where there is
> no easy way to set a system-wide seccomp policy and right now the only way
> to disable io_uring system-wide is to compile it out of the kernel entirely
> (not really feasible for e.g. a general-purpose distro).
>
> I thought about adding a capability check that lets privileged processes
> bypass this sysctl, but it wasn't clear to me which capability I should use.
> For userfaultfd the kernel uses CAP_SYS_PTRACE, but I wasn't sure that's
> the best choice here since io_uring has nothing to do with ptrace.
> If anyone has any suggestions please let me know. A LSM hook also sounds
> like an option but it would be more complicated to implement and use.

Have you considered that the new sysctl is "sticky like kexec_load_disabled.
When the user disables it there is no way to turn it back on until the
system is rebooted.

Best regards!

-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide
  2023-06-27 12:00 ` [PATCH 1/1] Add a new " Matteo Rizzo
  2023-06-27 16:23   ` Randy Dunlap
  2023-06-27 17:10   ` Bart Van Assche
@ 2023-06-28 13:50   ` Gabriel Krisman Bertazi
  2023-06-28 15:59     ` Jeff Moyer
  2 siblings, 1 reply; 11+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-06-28 13:50 UTC (permalink / raw)
  To: Matteo Rizzo
  Cc: linux-doc, linux-kernel, io-uring, jordyzomer, evn, poprdi,
	corbet, axboe, asml.silence, akpm, keescook, rostedt,
	dave.hansen, ribalda, chenhuacai, steve, gpiccoli, ldufour

Matteo Rizzo <[email protected]> writes:

> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index d85d90f5d000..3c53a238332a 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -450,6 +450,20 @@ this allows system administrators to override the
>  ``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.
>  
>  
> +io_uring_disabled
> +=========================
> +
> +Prevents all processes from creating new io_uring instances. Enabling this
> +shrinks the kernel's attack surface.
> +
> += =============================================================
> +0 All processes can create io_uring instances as normal. This is the default
> +  setting.
> +1 io_uring is disabled. io_uring_setup always fails with -EPERM. Existing
> +  io_uring instances can still be used.
> += =============================================================

I had an internal request for something like this recently.  If we go
this route, we could use a intermediary option that limits io_uring
to root processes only.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide
  2023-06-28 11:36       ` Ricardo Ribalda
@ 2023-06-28 15:12         ` Matteo Rizzo
  2023-06-28 15:59           ` Jeff Moyer
  2023-06-28 15:59           ` Ricardo Ribalda
  0 siblings, 2 replies; 11+ messages in thread
From: Matteo Rizzo @ 2023-06-28 15:12 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: Bart Van Assche, linux-doc, linux-kernel, io-uring, jordyzomer,
	evn, poprdi, corbet, axboe, asml.silence, akpm, keescook,
	rostedt, dave.hansen, chenhuacai, steve, gpiccoli, ldufour

On Wed, 28 Jun 2023 at 13:44, Ricardo Ribalda <[email protected]> wrote:
>
> Have you considered that the new sysctl is "sticky like kexec_load_disabled.
> When the user disables it there is no way to turn it back on until the
> system is rebooted.

Are you suggesting making this sysctl sticky? Are there any examples of how to
implement a sticky sysctl that can take more than 2 values in case we want to
add an intermediate level that still allows privileged processes to use
io_uring? Also, what would be the use case? Preventing privileged processes
from re-enabling io_uring?

Thanks!
--
Matteo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide
  2023-06-28 13:50   ` Gabriel Krisman Bertazi
@ 2023-06-28 15:59     ` Jeff Moyer
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff Moyer @ 2023-06-28 15:59 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Matteo Rizzo, linux-doc, linux-kernel, io-uring, jordyzomer, evn,
	poprdi, corbet, axboe, asml.silence, akpm, keescook, rostedt,
	dave.hansen, ribalda, chenhuacai, steve, gpiccoli, ldufour

Gabriel Krisman Bertazi <[email protected]> writes:

> Matteo Rizzo <[email protected]> writes:
>
>> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
>> index d85d90f5d000..3c53a238332a 100644
>> --- a/Documentation/admin-guide/sysctl/kernel.rst
>> +++ b/Documentation/admin-guide/sysctl/kernel.rst
>> @@ -450,6 +450,20 @@ this allows system administrators to override the
>>  ``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.
>>  
>>  
>> +io_uring_disabled
>> +=========================
>> +
>> +Prevents all processes from creating new io_uring instances. Enabling this
>> +shrinks the kernel's attack surface.
>> +
>> += =============================================================
>> +0 All processes can create io_uring instances as normal. This is the default
>> +  setting.
>> +1 io_uring is disabled. io_uring_setup always fails with -EPERM. Existing
>> +  io_uring instances can still be used.
>> += =============================================================
>
> I had an internal request for something like this recently.  If we go
> this route, we could use a intermediary option that limits io_uring
> to root processes only.

This is all regrettable, but this option makes the most sense to me.
Testing for CAP_SYS_ADMIN or CAP_SYS_RAW_IO would work for that third
option, I think.

-Jeff


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide
  2023-06-28 15:12         ` Matteo Rizzo
@ 2023-06-28 15:59           ` Jeff Moyer
  2023-06-28 15:59           ` Ricardo Ribalda
  1 sibling, 0 replies; 11+ messages in thread
From: Jeff Moyer @ 2023-06-28 15:59 UTC (permalink / raw)
  To: Matteo Rizzo
  Cc: Ricardo Ribalda, Bart Van Assche, linux-doc, linux-kernel,
	io-uring, jordyzomer, evn, poprdi, corbet, axboe, asml.silence,
	akpm, keescook, rostedt, dave.hansen, chenhuacai, steve,
	gpiccoli, ldufour

Matteo Rizzo <[email protected]> writes:

> On Wed, 28 Jun 2023 at 13:44, Ricardo Ribalda <[email protected]> wrote:
>>
>> Have you considered that the new sysctl is "sticky like kexec_load_disabled.
>> When the user disables it there is no way to turn it back on until the
>> system is rebooted.
>
> Are you suggesting making this sysctl sticky? Are there any examples of how to
> implement a sticky sysctl that can take more than 2 values in case we want to
> add an intermediate level that still allows privileged processes to use
> io_uring? Also, what would be the use case? Preventing privileged processes
> from re-enabling io_uring?

See unprivileged_bpf_disabled for an example.  I can't speak to the use
case for a sticky value.

-Jeff


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] Add a new sysctl to disable io_uring system-wide
  2023-06-28 15:12         ` Matteo Rizzo
  2023-06-28 15:59           ` Jeff Moyer
@ 2023-06-28 15:59           ` Ricardo Ribalda
  1 sibling, 0 replies; 11+ messages in thread
From: Ricardo Ribalda @ 2023-06-28 15:59 UTC (permalink / raw)
  To: Matteo Rizzo
  Cc: Bart Van Assche, linux-doc, linux-kernel, io-uring, jordyzomer,
	evn, poprdi, corbet, axboe, asml.silence, akpm, keescook,
	rostedt, dave.hansen, chenhuacai, steve, gpiccoli, ldufour

HI Matteo

On Wed, 28 Jun 2023 at 17:12, Matteo Rizzo <[email protected]> wrote:
>
> On Wed, 28 Jun 2023 at 13:44, Ricardo Ribalda <[email protected]> wrote:
> >
> > Have you considered that the new sysctl is "sticky like kexec_load_disabled.
> > When the user disables it there is no way to turn it back on until the
> > system is rebooted.
>
> Are you suggesting making this sysctl sticky? Are there any examples of how to
> implement a sticky sysctl that can take more than 2 values in case we want to
> add an intermediate level that still allows privileged processes to use
> io_uring? Also, what would be the use case? Preventing privileged processes
> from re-enabling io_uring?

Yes, if this sysctl is accepted, I think it would make sense to make it sticky.

For more than one value take a look to  kexec_load_limit_reboot and
kexec_load_limit_panic

Thanks!

>
> Thanks!
> --
> Matteo



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-06-28 16:00 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-27 12:00 [PATCH 0/1] Add a sysctl to disable io_uring system-wide Matteo Rizzo
2023-06-27 12:00 ` [PATCH 1/1] Add a new " Matteo Rizzo
2023-06-27 16:23   ` Randy Dunlap
2023-06-27 17:10   ` Bart Van Assche
2023-06-27 18:15     ` Matteo Rizzo
2023-06-28 11:36       ` Ricardo Ribalda
2023-06-28 15:12         ` Matteo Rizzo
2023-06-28 15:59           ` Jeff Moyer
2023-06-28 15:59           ` Ricardo Ribalda
2023-06-28 13:50   ` Gabriel Krisman Bertazi
2023-06-28 15:59     ` Jeff Moyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox