* [PATCH v2 0/1] Add a sysctl to disable io_uring system-wide @ 2023-06-29 13:27 Matteo Rizzo 2023-06-29 13:27 ` [PATCH v2 1/1] Add a new " Matteo Rizzo 0 siblings, 1 reply; 8+ messages in thread From: Matteo Rizzo @ 2023-06-29 13:27 UTC (permalink / raw) To: linux-doc, linux-kernel, io-uring Cc: matteorizzo, jordyzomer, evn, poprdi, corbet, axboe, asml.silence, akpm, keescook, rostedt, dave.hansen, ribalda, chenhuacai, steve, gpiccoli, ldufour, bhe, oleksandr Over the last few years we've seen many critical vulnerabilities in io_uring[1] which could be exploited by an unprivileged process to gain control over the kernel. This patch introduces a new sysctl which disables the creation of new io_uring instances system-wide. The goal of this patch is to give distros, system admins, and cloud providers a way to reduce the risk of privilege escalation through io_uring where disabling it with seccomp or at compile time is not practical. For example a distro or cloud provider might want to disable io_uring by default and have users enable it again if they need to run a program that requires it. The new sysctl is designed to let a user with root on the machine enable and disable io_uring systemwide at runtime without requiring a kernel recompilation or a reboot. [1] Link: https://goo.gle/limit-iouring --- v2: * Documentation style fixes * Add a third level that only disables io_uring for unprivileged processes Matteo Rizzo (1): Add a new sysctl to disable io_uring system-wide Documentation/admin-guide/sysctl/kernel.rst | 19 +++++++++++++ io_uring/io_uring.c | 30 +++++++++++++++++++++ 2 files changed, 49 insertions(+) -- 2.41.0.162.gfafddb0af9-goog ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 1/1] Add a new sysctl to disable io_uring system-wide 2023-06-29 13:27 [PATCH v2 0/1] Add a sysctl to disable io_uring system-wide Matteo Rizzo @ 2023-06-29 13:27 ` Matteo Rizzo 2023-06-29 15:15 ` Bart Van Assche ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Matteo Rizzo @ 2023-06-29 13:27 UTC (permalink / raw) To: linux-doc, linux-kernel, io-uring Cc: matteorizzo, jordyzomer, evn, poprdi, corbet, axboe, asml.silence, akpm, keescook, rostedt, dave.hansen, ribalda, chenhuacai, steve, gpiccoli, ldufour, bhe, oleksandr Introduce a new sysctl (io_uring_disabled) which can be either 0, 1, or 2. When 0 (the default), all processes are allowed to create io_uring instances, which is the current behavior. When 1, all calls to io_uring_setup fail with -EPERM unless the calling process has CAP_SYS_ADMIN. When 2, calls to io_uring_setup fail with -EPERM regardless of privilege. Signed-off-by: Matteo Rizzo <[email protected]> --- Documentation/admin-guide/sysctl/kernel.rst | 19 +++++++++++++ io_uring/io_uring.c | 30 +++++++++++++++++++++ 2 files changed, 49 insertions(+) diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 3800fab1619b..ee65f7aeb0cf 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -450,6 +450,25 @@ this allows system administrators to override the ``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded. +io_uring_disabled +================= + +Prevents all processes from creating new io_uring instances. Enabling this +shrinks the kernel's attack surface. + += ================================================================== +0 All processes can create io_uring instances as normal. This is the + default setting. +1 io_uring creation is disabled for unprivileged processes. + io_uring_setup fails with -EPERM unless the calling process is + privileged (CAP_SYS_ADMIN). Existing io_uring instances can + still be used. +2 io_uring creation is disabled for all processes. io_uring_setup + always fails with -EPERM. Existing io_uring instances can still be + used. += ================================================================== + + kexec_load_disabled =================== diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 1b53a2ab0a27..2343ae518546 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -153,6 +153,22 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx); struct kmem_cache *req_cachep; +static int __read_mostly sysctl_io_uring_disabled; +#ifdef CONFIG_SYSCTL +static struct ctl_table kernel_io_uring_disabled_table[] = { + { + .procname = "io_uring_disabled", + .data = &sysctl_io_uring_disabled, + .maxlen = sizeof(sysctl_io_uring_disabled), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_TWO, + }, + {}, +}; +#endif + struct sock *io_uring_get_socket(struct file *file) { #if defined(CONFIG_UNIX) @@ -4000,9 +4016,18 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params) return io_uring_create(entries, &p, params); } +static inline bool io_uring_allowed(void) +{ + return sysctl_io_uring_disabled == 0 || + (sysctl_io_uring_disabled == 1 && capable(CAP_SYS_ADMIN)); +} + SYSCALL_DEFINE2(io_uring_setup, u32, entries, struct io_uring_params __user *, params) { + if (!io_uring_allowed()) + return -EPERM; + return io_uring_setup(entries, params); } @@ -4577,6 +4602,11 @@ static int __init io_uring_init(void) req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT | SLAB_TYPESAFE_BY_RCU); + +#ifdef CONFIG_SYSCTL + register_sysctl_init("kernel", kernel_io_uring_disabled_table); +#endif + return 0; }; __initcall(io_uring_init); -- 2.41.0.162.gfafddb0af9-goog ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/1] Add a new sysctl to disable io_uring system-wide 2023-06-29 13:27 ` [PATCH v2 1/1] Add a new " Matteo Rizzo @ 2023-06-29 15:15 ` Bart Van Assche 2023-06-29 15:28 ` Matteo Rizzo 2023-06-29 16:17 ` Jeff Moyer 2023-06-29 18:36 ` Gabriel Krisman Bertazi 2 siblings, 1 reply; 8+ messages in thread From: Bart Van Assche @ 2023-06-29 15:15 UTC (permalink / raw) To: Matteo Rizzo, linux-doc, linux-kernel, io-uring Cc: jordyzomer, evn, poprdi, corbet, axboe, asml.silence, akpm, keescook, rostedt, dave.hansen, ribalda, chenhuacai, steve, gpiccoli, ldufour, bhe, oleksandr On 6/29/23 06:27, Matteo Rizzo wrote: > +static int __read_mostly sysctl_io_uring_disabled; Shouldn't this be a static key instead of an int in order to minimize the performance impact on the io_uring_setup() system call? See also Documentation/staging/static-keys.rst. Thanks, Bart. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/1] Add a new sysctl to disable io_uring system-wide 2023-06-29 15:15 ` Bart Van Assche @ 2023-06-29 15:28 ` Matteo Rizzo 2023-06-29 17:37 ` Bart Van Assche 0 siblings, 1 reply; 8+ messages in thread From: Matteo Rizzo @ 2023-06-29 15:28 UTC (permalink / raw) To: Bart Van Assche Cc: linux-doc, linux-kernel, io-uring, jordyzomer, evn, poprdi, corbet, axboe, asml.silence, akpm, keescook, rostedt, dave.hansen, ribalda, chenhuacai, steve, gpiccoli, ldufour, bhe, oleksandr On Thu, 29 Jun 2023 at 17:16, Bart Van Assche <[email protected]> wrote: > > On 6/29/23 06:27, Matteo Rizzo wrote: > > +static int __read_mostly sysctl_io_uring_disabled; > > Shouldn't this be a static key instead of an int in order to minimize the > performance impact on the io_uring_setup() system call? See also > Documentation/staging/static-keys.rst. > > Thanks, > > Bart. Is io_uring_setup in any hot path? io_uring_create is marked as __cold. -- Matteo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/1] Add a new sysctl to disable io_uring system-wide 2023-06-29 15:28 ` Matteo Rizzo @ 2023-06-29 17:37 ` Bart Van Assche 0 siblings, 0 replies; 8+ messages in thread From: Bart Van Assche @ 2023-06-29 17:37 UTC (permalink / raw) To: Matteo Rizzo Cc: linux-doc, linux-kernel, io-uring, jordyzomer, evn, poprdi, corbet, axboe, asml.silence, akpm, keescook, rostedt, dave.hansen, ribalda, chenhuacai, steve, gpiccoli, ldufour, bhe, oleksandr On 6/29/23 08:28, Matteo Rizzo wrote: > On Thu, 29 Jun 2023 at 17:16, Bart Van Assche <[email protected]> wrote: >> >> On 6/29/23 06:27, Matteo Rizzo wrote: >>> +static int __read_mostly sysctl_io_uring_disabled; >> >> Shouldn't this be a static key instead of an int in order to minimize the >> performance impact on the io_uring_setup() system call? See also >> Documentation/staging/static-keys.rst. >> > Is io_uring_setup in any hot path? io_uring_create is marked as __cold. I confused io_uring_setup() with io_uring_enter() so please ignore my comment. Bart. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/1] Add a new sysctl to disable io_uring system-wide 2023-06-29 13:27 ` [PATCH v2 1/1] Add a new " Matteo Rizzo 2023-06-29 15:15 ` Bart Van Assche @ 2023-06-29 16:17 ` Jeff Moyer 2023-06-29 18:36 ` Gabriel Krisman Bertazi 2 siblings, 0 replies; 8+ messages in thread From: Jeff Moyer @ 2023-06-29 16:17 UTC (permalink / raw) To: Matteo Rizzo Cc: linux-doc, linux-kernel, io-uring, jordyzomer, evn, poprdi, corbet, axboe, asml.silence, akpm, keescook, rostedt, dave.hansen, ribalda, chenhuacai, steve, gpiccoli, ldufour, bhe, oleksandr Matteo Rizzo <[email protected]> writes: > Introduce a new sysctl (io_uring_disabled) which can be either 0, 1, > or 2. When 0 (the default), all processes are allowed to create io_uring > instances, which is the current behavior. When 1, all calls to > io_uring_setup fail with -EPERM unless the calling process has > CAP_SYS_ADMIN. When 2, calls to io_uring_setup fail with -EPERM > regardless of privilege. > > Signed-off-by: Matteo Rizzo <[email protected]> This looks good to me. You may also consider updating the io_uring_setup(2) man page (part of liburing) to reflect this new meaning for -EPERM. Reviewed-by: Jeff Moyer <[email protected]> > --- > Documentation/admin-guide/sysctl/kernel.rst | 19 +++++++++++++ > io_uring/io_uring.c | 30 +++++++++++++++++++++ > 2 files changed, 49 insertions(+) > > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst > index 3800fab1619b..ee65f7aeb0cf 100644 > --- a/Documentation/admin-guide/sysctl/kernel.rst > +++ b/Documentation/admin-guide/sysctl/kernel.rst > @@ -450,6 +450,25 @@ this allows system administrators to override the > ``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded. > > > +io_uring_disabled > +================= > + > +Prevents all processes from creating new io_uring instances. Enabling this > +shrinks the kernel's attack surface. > + > += ================================================================== > +0 All processes can create io_uring instances as normal. This is the > + default setting. > +1 io_uring creation is disabled for unprivileged processes. > + io_uring_setup fails with -EPERM unless the calling process is > + privileged (CAP_SYS_ADMIN). Existing io_uring instances can > + still be used. > +2 io_uring creation is disabled for all processes. io_uring_setup > + always fails with -EPERM. Existing io_uring instances can still be > + used. > += ================================================================== > + > + > kexec_load_disabled > =================== > > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > index 1b53a2ab0a27..2343ae518546 100644 > --- a/io_uring/io_uring.c > +++ b/io_uring/io_uring.c > @@ -153,6 +153,22 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx); > > struct kmem_cache *req_cachep; > > +static int __read_mostly sysctl_io_uring_disabled; > +#ifdef CONFIG_SYSCTL > +static struct ctl_table kernel_io_uring_disabled_table[] = { > + { > + .procname = "io_uring_disabled", > + .data = &sysctl_io_uring_disabled, > + .maxlen = sizeof(sysctl_io_uring_disabled), > + .mode = 0644, > + .proc_handler = proc_dointvec_minmax, > + .extra1 = SYSCTL_ZERO, > + .extra2 = SYSCTL_TWO, > + }, > + {}, > +}; > +#endif > + > struct sock *io_uring_get_socket(struct file *file) > { > #if defined(CONFIG_UNIX) > @@ -4000,9 +4016,18 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params) > return io_uring_create(entries, &p, params); > } > > +static inline bool io_uring_allowed(void) > +{ > + return sysctl_io_uring_disabled == 0 || > + (sysctl_io_uring_disabled == 1 && capable(CAP_SYS_ADMIN)); > +} > + > SYSCALL_DEFINE2(io_uring_setup, u32, entries, > struct io_uring_params __user *, params) > { > + if (!io_uring_allowed()) > + return -EPERM; > + > return io_uring_setup(entries, params); > } > > @@ -4577,6 +4602,11 @@ static int __init io_uring_init(void) > > req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC | > SLAB_ACCOUNT | SLAB_TYPESAFE_BY_RCU); > + > +#ifdef CONFIG_SYSCTL > + register_sysctl_init("kernel", kernel_io_uring_disabled_table); > +#endif > + > return 0; > }; > __initcall(io_uring_init); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/1] Add a new sysctl to disable io_uring system-wide 2023-06-29 13:27 ` [PATCH v2 1/1] Add a new " Matteo Rizzo 2023-06-29 15:15 ` Bart Van Assche 2023-06-29 16:17 ` Jeff Moyer @ 2023-06-29 18:36 ` Gabriel Krisman Bertazi 2023-06-30 15:04 ` Matteo Rizzo 2 siblings, 1 reply; 8+ messages in thread From: Gabriel Krisman Bertazi @ 2023-06-29 18:36 UTC (permalink / raw) To: Matteo Rizzo Cc: linux-doc, linux-kernel, io-uring, jordyzomer, evn, poprdi, corbet, axboe, asml.silence, akpm, keescook, rostedt, dave.hansen, ribalda, chenhuacai, steve, gpiccoli, ldufour, bhe, oleksandr Matteo Rizzo <[email protected]> writes: > Introduce a new sysctl (io_uring_disabled) which can be either 0, 1, > or 2. When 0 (the default), all processes are allowed to create io_uring > instances, which is the current behavior. When 1, all calls to > io_uring_setup fail with -EPERM unless the calling process has > CAP_SYS_ADMIN. When 2, calls to io_uring_setup fail with -EPERM > regardless of privilege. > > Signed-off-by: Matteo Rizzo <[email protected]> > --- Thanks for adding the extra level for root-only rings. The patch looks good to me. Reviewed-by: Gabriel Krisman Bertazi <[email protected]> -- Gabriel Krisman Bertazi ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/1] Add a new sysctl to disable io_uring system-wide 2023-06-29 18:36 ` Gabriel Krisman Bertazi @ 2023-06-30 15:04 ` Matteo Rizzo 0 siblings, 0 replies; 8+ messages in thread From: Matteo Rizzo @ 2023-06-30 15:04 UTC (permalink / raw) To: Gabriel Krisman Bertazi Cc: linux-doc, linux-kernel, io-uring, jordyzomer, evn, poprdi, corbet, axboe, asml.silence, akpm, keescook, rostedt, dave.hansen, ribalda, chenhuacai, steve, gpiccoli, ldufour, bhe, oleksandr, Bart Van Assche, jmoyer, Jann Horn On Thu, 29 Jun 2023 at 20:36, Gabriel Krisman Bertazi <[email protected]> wrote: > > Thanks for adding the extra level for root-only rings. > > The patch looks good to me. > > Reviewed-by: Gabriel Krisman Bertazi <[email protected]> Thanks everyone for the reviews! Unfortunately I forgot the subsystem name in the commit message. Jann also pointed out to me internally that the check in io_uring_allowed could race with another process that is trying to change the sysctl. I will send a v3 that fixes both issues. Thanks, -- Matteo ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-06-30 15:04 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-06-29 13:27 [PATCH v2 0/1] Add a sysctl to disable io_uring system-wide Matteo Rizzo 2023-06-29 13:27 ` [PATCH v2 1/1] Add a new " Matteo Rizzo 2023-06-29 15:15 ` Bart Van Assche 2023-06-29 15:28 ` Matteo Rizzo 2023-06-29 17:37 ` Bart Van Assche 2023-06-29 16:17 ` Jeff Moyer 2023-06-29 18:36 ` Gabriel Krisman Bertazi 2023-06-30 15:04 ` Matteo Rizzo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox