* [PATCH v4 1/8] audit: prepare audit_context for use in calling contexts beyond syscalls
2021-09-15 16:49 [PATCH v4 0/8] Add LSM access controls and auditing to io_uring Paul Moore
@ 2021-09-15 16:49 ` Paul Moore
2021-09-15 16:49 ` [PATCH v4 2/8] audit,io_uring,io-wq: add some basic audit support to io_uring Paul Moore
` (7 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Paul Moore @ 2021-09-15 16:49 UTC (permalink / raw)
To: linux-security-module, selinux, linux-audit, io-uring,
linux-fsdevel, Jens Axboe, Pavel Begunkov,
Kumar Kartikeya Dwivedi
This patch cleans up some of our audit_context handling by
abstracting out the reset and return code fixup handling to dedicated
functions. Not only does this help make things easier to read and
inspect, it allows for easier reuse by future patches. We also
convert the simple audit_context->in_syscall flag into an enum which
can be used to by future patches to indicate a calling context other
than the syscall context.
Thanks to Richard Guy Briggs for review and feedback.
Acked-by: Richard Guy Briggs <[email protected]>
Signed-off-by: Paul Moore <[email protected]>
---
v4:
- fix some spelling errors in the comments
v3:
- removed work-in-progress warning from the description
v2:
- no change
v1:
- initial draft
---
kernel/audit.h | 5 +
kernel/auditsc.c | 256 ++++++++++++++++++++++++++++++++++--------------------
2 files changed, 167 insertions(+), 94 deletions(-)
diff --git a/kernel/audit.h b/kernel/audit.h
index d6a2c899a8db..13abc48de0bd 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -100,7 +100,10 @@ struct audit_proctitle {
/* The per-task audit context. */
struct audit_context {
int dummy; /* must be the first element */
- int in_syscall; /* 1 if task is in a syscall */
+ enum {
+ AUDIT_CTX_UNUSED, /* audit_context is currently unused */
+ AUDIT_CTX_SYSCALL, /* in use by syscall */
+ } context;
enum audit_state state, current_state;
unsigned int serial; /* serial number for record */
int major; /* syscall number */
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 8dd73a64f921..f3d309b05c2d 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -915,10 +915,80 @@ static inline void audit_free_aux(struct audit_context *context)
context->aux = aux->next;
kfree(aux);
}
+ context->aux = NULL;
while ((aux = context->aux_pids)) {
context->aux_pids = aux->next;
kfree(aux);
}
+ context->aux_pids = NULL;
+}
+
+/**
+ * audit_reset_context - reset a audit_context structure
+ * @ctx: the audit_context to reset
+ *
+ * All fields in the audit_context will be reset to an initial state, all
+ * references held by fields will be dropped, and private memory will be
+ * released. When this function returns the audit_context will be suitable
+ * for reuse, so long as the passed context is not NULL or a dummy context.
+ */
+static void audit_reset_context(struct audit_context *ctx)
+{
+ if (!ctx)
+ return;
+
+ /* if ctx is non-null, reset the "ctx->state" regardless */
+ ctx->context = AUDIT_CTX_UNUSED;
+ if (ctx->dummy)
+ return;
+
+ /*
+ * NOTE: It shouldn't matter in what order we release the fields, so
+ * release them in the order in which they appear in the struct;
+ * this gives us some hope of quickly making sure we are
+ * resetting the audit_context properly.
+ *
+ * Other things worth mentioning:
+ * - we don't reset "dummy"
+ * - we don't reset "state", we do reset "current_state"
+ * - we preserve "filterkey" if "state" is AUDIT_STATE_RECORD
+ * - much of this is likely overkill, but play it safe for now
+ * - we really need to work on improving the audit_context struct
+ */
+
+ ctx->current_state = ctx->state;
+ ctx->serial = 0;
+ ctx->major = 0;
+ ctx->ctime = (struct timespec64){ .tv_sec = 0, .tv_nsec = 0 };
+ memset(ctx->argv, 0, sizeof(ctx->argv));
+ ctx->return_code = 0;
+ ctx->prio = (ctx->state == AUDIT_STATE_RECORD ? ~0ULL : 0);
+ ctx->return_valid = AUDITSC_INVALID;
+ audit_free_names(ctx);
+ if (ctx->state != AUDIT_STATE_RECORD) {
+ kfree(ctx->filterkey);
+ ctx->filterkey = NULL;
+ }
+ audit_free_aux(ctx);
+ kfree(ctx->sockaddr);
+ ctx->sockaddr = NULL;
+ ctx->sockaddr_len = 0;
+ ctx->pid = ctx->ppid = 0;
+ ctx->uid = ctx->euid = ctx->suid = ctx->fsuid = KUIDT_INIT(0);
+ ctx->gid = ctx->egid = ctx->sgid = ctx->fsgid = KGIDT_INIT(0);
+ ctx->personality = 0;
+ ctx->arch = 0;
+ ctx->target_pid = 0;
+ ctx->target_auid = ctx->target_uid = KUIDT_INIT(0);
+ ctx->target_sessionid = 0;
+ ctx->target_sid = 0;
+ ctx->target_comm[0] = '\0';
+ unroll_tree_refs(ctx, NULL, 0);
+ WARN_ON(!list_empty(&ctx->killed_trees));
+ ctx->type = 0;
+ audit_free_module(ctx);
+ ctx->fds[0] = -1;
+ audit_proctitle_free(ctx);
}
static inline struct audit_context *audit_alloc_context(enum audit_state state)
@@ -928,6 +998,7 @@ static inline struct audit_context *audit_alloc_context(enum audit_state state)
context = kzalloc(sizeof(*context), GFP_KERNEL);
if (!context)
return NULL;
+ context->context = AUDIT_CTX_UNUSED;
context->state = state;
context->prio = state == AUDIT_STATE_RECORD ? ~0ULL : 0;
INIT_LIST_HEAD(&context->killed_trees);
@@ -953,7 +1024,7 @@ int audit_alloc(struct task_struct *tsk)
char *key = NULL;
if (likely(!audit_ever_enabled))
- return 0; /* Return if not auditing. */
+ return 0;
state = audit_filter_task(tsk, &key);
if (state == AUDIT_STATE_DISABLED) {
@@ -975,14 +1046,10 @@ int audit_alloc(struct task_struct *tsk)
static inline void audit_free_context(struct audit_context *context)
{
- audit_free_module(context);
- audit_free_names(context);
- unroll_tree_refs(context, NULL, 0);
+ /* resetting is extra work, but it is likely just noise */
+ audit_reset_context(context);
free_tree_refs(context);
- audit_free_aux(context);
kfree(context->filterkey);
- kfree(context->sockaddr);
- audit_proctitle_free(context);
kfree(context);
}
@@ -1489,29 +1556,35 @@ static void audit_log_exit(void)
context->personality = current->personality;
- ab = audit_log_start(context, GFP_KERNEL, AUDIT_SYSCALL);
- if (!ab)
- return; /* audit_panic has been called */
- audit_log_format(ab, "arch=%x syscall=%d",
- context->arch, context->major);
- if (context->personality != PER_LINUX)
- audit_log_format(ab, " per=%lx", context->personality);
- if (context->return_valid != AUDITSC_INVALID)
- audit_log_format(ab, " success=%s exit=%ld",
- (context->return_valid==AUDITSC_SUCCESS)?"yes":"no",
- context->return_code);
-
- audit_log_format(ab,
- " a0=%lx a1=%lx a2=%lx a3=%lx items=%d",
- context->argv[0],
- context->argv[1],
- context->argv[2],
- context->argv[3],
- context->name_count);
-
- audit_log_task_info(ab);
- audit_log_key(ab, context->filterkey);
- audit_log_end(ab);
+ switch (context->context) {
+ case AUDIT_CTX_SYSCALL:
+ ab = audit_log_start(context, GFP_KERNEL, AUDIT_SYSCALL);
+ if (!ab)
+ return;
+ audit_log_format(ab, "arch=%x syscall=%d",
+ context->arch, context->major);
+ if (context->personality != PER_LINUX)
+ audit_log_format(ab, " per=%lx", context->personality);
+ if (context->return_valid != AUDITSC_INVALID)
+ audit_log_format(ab, " success=%s exit=%ld",
+ (context->return_valid == AUDITSC_SUCCESS ?
+ "yes" : "no"),
+ context->return_code);
+ audit_log_format(ab,
+ " a0=%lx a1=%lx a2=%lx a3=%lx items=%d",
+ context->argv[0],
+ context->argv[1],
+ context->argv[2],
+ context->argv[3],
+ context->name_count);
+ audit_log_task_info(ab);
+ audit_log_key(ab, context->filterkey);
+ audit_log_end(ab);
+ break;
+ default:
+ BUG();
+ break;
+ }
for (aux = context->aux; aux; aux = aux->next) {
@@ -1602,14 +1675,15 @@ static void audit_log_exit(void)
audit_log_name(context, n, NULL, i++, &call_panic);
}
- audit_log_proctitle();
+ if (context->context == AUDIT_CTX_SYSCALL)
+ audit_log_proctitle();
/* Send end of event record to help user space know we are finished */
ab = audit_log_start(context, GFP_KERNEL, AUDIT_EOE);
if (ab)
audit_log_end(ab);
if (call_panic)
- audit_panic("error converting sid to string");
+ audit_panic("error in audit_log_exit()");
}
/**
@@ -1625,6 +1699,7 @@ void __audit_free(struct task_struct *tsk)
if (!context)
return;
+ /* this may generate CONFIG_CHANGE records */
if (!list_empty(&context->killed_trees))
audit_kill_trees(context);
@@ -1633,7 +1708,8 @@ void __audit_free(struct task_struct *tsk)
* random task_struct that doesn't doesn't have any meaningful data we
* need to log via audit_log_exit().
*/
- if (tsk == current && !context->dummy && context->in_syscall) {
+ if (tsk == current && !context->dummy &&
+ context->context == AUDIT_CTX_SYSCALL) {
context->return_valid = AUDITSC_INVALID;
context->return_code = 0;
@@ -1647,6 +1723,34 @@ void __audit_free(struct task_struct *tsk)
audit_free_context(context);
}
+/**
+ * audit_return_fixup - fixup the return codes in the audit_context
+ * @ctx: the audit_context
+ * @success: true/false value to indicate if the operation succeeded or not
+ * @code: operation return code
+ *
+ * We need to fixup the return code in the audit logs if the actual return
+ * codes are later going to be fixed by the arch specific signal handlers.
+ */
+static void audit_return_fixup(struct audit_context *ctx,
+ int success, long code)
+{
+ /*
+ * This is actually a test for:
+ * (rc == ERESTARTSYS ) || (rc == ERESTARTNOINTR) ||
+ * (rc == ERESTARTNOHAND) || (rc == ERESTART_RESTARTBLOCK)
+ *
+ * but is faster than a bunch of ||
+ */
+ if (unlikely(code <= -ERESTARTSYS) &&
+ (code >= -ERESTART_RESTARTBLOCK) &&
+ (code != -ENOIOCTLCMD))
+ ctx->return_code = -EINTR;
+ else
+ ctx->return_code = code;
+ ctx->return_valid = (success ? AUDITSC_SUCCESS : AUDITSC_FAILURE);
+}
+
/**
* __audit_syscall_entry - fill in an audit record at syscall entry
* @major: major syscall type (function)
@@ -1672,7 +1776,12 @@ void __audit_syscall_entry(int major, unsigned long a1, unsigned long a2,
if (!audit_enabled || !context)
return;
- BUG_ON(context->in_syscall || context->name_count);
+ WARN_ON(context->context != AUDIT_CTX_UNUSED);
+ WARN_ON(context->name_count);
+ if (context->context != AUDIT_CTX_UNUSED || context->name_count) {
+ audit_panic("unrecoverable error in audit_syscall_entry()");
+ return;
+ }
state = context->state;
if (state == AUDIT_STATE_DISABLED)
@@ -1691,10 +1800,8 @@ void __audit_syscall_entry(int major, unsigned long a1, unsigned long a2,
context->argv[1] = a2;
context->argv[2] = a3;
context->argv[3] = a4;
- context->serial = 0;
- context->in_syscall = 1;
+ context->context = AUDIT_CTX_SYSCALL;
context->current_state = state;
- context->ppid = 0;
ktime_get_coarse_real_ts64(&context->ctime);
}
@@ -1711,63 +1818,27 @@ void __audit_syscall_entry(int major, unsigned long a1, unsigned long a2,
*/
void __audit_syscall_exit(int success, long return_code)
{
- struct audit_context *context;
+ struct audit_context *context = audit_context();
- context = audit_context();
- if (!context)
- return;
+ if (!context || context->dummy ||
+ context->context != AUDIT_CTX_SYSCALL)
+ goto out;
+ /* this may generate CONFIG_CHANGE records */
if (!list_empty(&context->killed_trees))
audit_kill_trees(context);
- if (!context->dummy && context->in_syscall) {
- if (success)
- context->return_valid = AUDITSC_SUCCESS;
- else
- context->return_valid = AUDITSC_FAILURE;
-
- /*
- * we need to fix up the return code in the audit logs if the
- * actual return codes are later going to be fixed up by the
- * arch specific signal handlers
- *
- * This is actually a test for:
- * (rc == ERESTARTSYS ) || (rc == ERESTARTNOINTR) ||
- * (rc == ERESTARTNOHAND) || (rc == ERESTART_RESTARTBLOCK)
- *
- * but is faster than a bunch of ||
- */
- if (unlikely(return_code <= -ERESTARTSYS) &&
- (return_code >= -ERESTART_RESTARTBLOCK) &&
- (return_code != -ENOIOCTLCMD))
- context->return_code = -EINTR;
- else
- context->return_code = return_code;
-
- audit_filter_syscall(current, context);
- audit_filter_inodes(current, context);
- if (context->current_state == AUDIT_STATE_RECORD)
- audit_log_exit();
- }
+ /* run through both filters to ensure we set the filterkey properly */
+ audit_filter_syscall(current, context);
+ audit_filter_inodes(current, context);
+ if (context->current_state < AUDIT_STATE_RECORD)
+ goto out;
- context->in_syscall = 0;
- context->prio = context->state == AUDIT_STATE_RECORD ? ~0ULL : 0;
+ audit_return_fixup(context, success, return_code);
+ audit_log_exit();
- audit_free_module(context);
- audit_free_names(context);
- unroll_tree_refs(context, NULL, 0);
- audit_free_aux(context);
- context->aux = NULL;
- context->aux_pids = NULL;
- context->target_pid = 0;
- context->target_sid = 0;
- context->sockaddr_len = 0;
- context->type = 0;
- context->fds[0] = -1;
- if (context->state != AUDIT_STATE_RECORD) {
- kfree(context->filterkey);
- context->filterkey = NULL;
- }
+out:
+ audit_reset_context(context);
}
static inline void handle_one(const struct inode *inode)
@@ -1919,7 +1990,7 @@ void __audit_getname(struct filename *name)
struct audit_context *context = audit_context();
struct audit_names *n;
- if (!context->in_syscall)
+ if (context->context == AUDIT_CTX_UNUSED)
return;
n = audit_alloc_name(context, AUDIT_TYPE_UNKNOWN);
@@ -1991,7 +2062,7 @@ void __audit_inode(struct filename *name, const struct dentry *dentry,
struct list_head *list = &audit_filter_list[AUDIT_FILTER_FS];
int i;
- if (!context->in_syscall)
+ if (context->context == AUDIT_CTX_UNUSED)
return;
rcu_read_lock();
@@ -2109,7 +2180,7 @@ void __audit_inode_child(struct inode *parent,
struct list_head *list = &audit_filter_list[AUDIT_FILTER_FS];
int i;
- if (!context->in_syscall)
+ if (context->context == AUDIT_CTX_UNUSED)
return;
rcu_read_lock();
@@ -2208,7 +2279,7 @@ EXPORT_SYMBOL_GPL(__audit_inode_child);
int auditsc_get_stamp(struct audit_context *ctx,
struct timespec64 *t, unsigned int *serial)
{
- if (!ctx->in_syscall)
+ if (ctx->context == AUDIT_CTX_UNUSED)
return 0;
if (!ctx->serial)
ctx->serial = audit_serial();
@@ -2706,8 +2777,7 @@ void audit_seccomp_actions_logged(const char *names, const char *old_names,
struct list_head *audit_killed_trees(void)
{
struct audit_context *ctx = audit_context();
-
- if (likely(!ctx || !ctx->in_syscall))
+ if (likely(!ctx || ctx->context == AUDIT_CTX_UNUSED))
return NULL;
return &ctx->killed_trees;
}
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 2/8] audit,io_uring,io-wq: add some basic audit support to io_uring
2021-09-15 16:49 [PATCH v4 0/8] Add LSM access controls and auditing to io_uring Paul Moore
2021-09-15 16:49 ` [PATCH v4 1/8] audit: prepare audit_context for use in calling contexts beyond syscalls Paul Moore
@ 2021-09-15 16:49 ` Paul Moore
2021-09-16 13:33 ` Richard Guy Briggs
2021-09-15 16:49 ` [PATCH v4 3/8] audit: add filtering for io_uring records Paul Moore
` (6 subsequent siblings)
8 siblings, 1 reply; 15+ messages in thread
From: Paul Moore @ 2021-09-15 16:49 UTC (permalink / raw)
To: linux-security-module, selinux, linux-audit, io-uring,
linux-fsdevel, Jens Axboe, Pavel Begunkov,
Kumar Kartikeya Dwivedi
This patch adds basic auditing to io_uring operations, regardless of
their context. This is accomplished by allocating audit_context
structures for the io-wq worker and io_uring SQPOLL kernel threads
as well as explicitly auditing the io_uring operations in
io_issue_sqe(). Individual io_uring operations can bypass auditing
through the "audit_skip" field in the struct io_op_def definition for
the operation; although great care must be taken so that security
relevant io_uring operations do not bypass auditing; please contact
the audit mailing list (see the MAINTAINERS file) with any questions.
The io_uring operations are audited using a new AUDIT_URINGOP record,
an example is shown below:
type=UNKNOWN[1336] msg=audit(1630523381.288:260):
uring_op=19 success=yes exit=0 items=0 ppid=853 pid=1204
uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0
subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
key=(null)
AUID="root" UID="root" GID="root" EUID="root" SUID="root"
FSUID="root" EGID="root" SGID="root" FSGID="root"
Thanks to Richard Guy Briggs for review and feedback.
Signed-off-by: Paul Moore <[email protected]>
---
v4:
- removed some work-in-progress comments
- removed the auid logging in audit_log_uring()
v3:
- removed work-in-progress warning from the description
v2:
- added dummy funcs for audit_uring_{entry,exit}()
- replaced opcode checks in io_issue_sqe() with audit_skip checks
- moved fastpath checks into audit_uring_{entry,exit}()
- audit_log_uring() uses GFP_ATOMIC
- don't record the arch in __audit_uring_entry()
v1:
- initial draft
---
fs/io-wq.c | 4 +
fs/io_uring.c | 55 +++++++++++++--
include/linux/audit.h | 26 +++++++
include/uapi/linux/audit.h | 1
kernel/audit.h | 2 +
kernel/auditsc.c | 166 ++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 248 insertions(+), 6 deletions(-)
diff --git a/fs/io-wq.c b/fs/io-wq.c
index 6c55362c1f99..dac5c5961c9d 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -14,6 +14,7 @@
#include <linux/rculist_nulls.h>
#include <linux/cpu.h>
#include <linux/tracehook.h>
+#include <linux/audit.h>
#include "io-wq.h"
@@ -562,6 +563,8 @@ static int io_wqe_worker(void *data)
snprintf(buf, sizeof(buf), "iou-wrk-%d", wq->task->pid);
set_task_comm(current, buf);
+ audit_alloc_kernel(current);
+
while (!test_bit(IO_WQ_BIT_EXIT, &wq->state)) {
long ret;
@@ -601,6 +604,7 @@ static int io_wqe_worker(void *data)
io_worker_handle_work(worker);
}
+ audit_free(current);
io_worker_exit(worker);
return 0;
}
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 16fb7436043c..388754b24785 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -79,6 +79,7 @@
#include <linux/pagemap.h>
#include <linux/io_uring.h>
#include <linux/tracehook.h>
+#include <linux/audit.h>
#define CREATE_TRACE_POINTS
#include <trace/events/io_uring.h>
@@ -917,6 +918,8 @@ struct io_op_def {
unsigned needs_async_setup : 1;
/* should block plug */
unsigned plug : 1;
+ /* skip auditing */
+ unsigned audit_skip : 1;
/* size of async data needed, if any */
unsigned short async_size;
};
@@ -930,6 +933,7 @@ static const struct io_op_def io_op_defs[] = {
.buffer_select = 1,
.needs_async_setup = 1,
.plug = 1,
+ .audit_skip = 1,
.async_size = sizeof(struct io_async_rw),
},
[IORING_OP_WRITEV] = {
@@ -939,16 +943,19 @@ static const struct io_op_def io_op_defs[] = {
.pollout = 1,
.needs_async_setup = 1,
.plug = 1,
+ .audit_skip = 1,
.async_size = sizeof(struct io_async_rw),
},
[IORING_OP_FSYNC] = {
.needs_file = 1,
+ .audit_skip = 1,
},
[IORING_OP_READ_FIXED] = {
.needs_file = 1,
.unbound_nonreg_file = 1,
.pollin = 1,
.plug = 1,
+ .audit_skip = 1,
.async_size = sizeof(struct io_async_rw),
},
[IORING_OP_WRITE_FIXED] = {
@@ -957,15 +964,20 @@ static const struct io_op_def io_op_defs[] = {
.unbound_nonreg_file = 1,
.pollout = 1,
.plug = 1,
+ .audit_skip = 1,
.async_size = sizeof(struct io_async_rw),
},
[IORING_OP_POLL_ADD] = {
.needs_file = 1,
.unbound_nonreg_file = 1,
+ .audit_skip = 1,
+ },
+ [IORING_OP_POLL_REMOVE] = {
+ .audit_skip = 1,
},
- [IORING_OP_POLL_REMOVE] = {},
[IORING_OP_SYNC_FILE_RANGE] = {
.needs_file = 1,
+ .audit_skip = 1,
},
[IORING_OP_SENDMSG] = {
.needs_file = 1,
@@ -983,18 +995,23 @@ static const struct io_op_def io_op_defs[] = {
.async_size = sizeof(struct io_async_msghdr),
},
[IORING_OP_TIMEOUT] = {
+ .audit_skip = 1,
.async_size = sizeof(struct io_timeout_data),
},
[IORING_OP_TIMEOUT_REMOVE] = {
/* used by timeout updates' prep() */
+ .audit_skip = 1,
},
[IORING_OP_ACCEPT] = {
.needs_file = 1,
.unbound_nonreg_file = 1,
.pollin = 1,
},
- [IORING_OP_ASYNC_CANCEL] = {},
+ [IORING_OP_ASYNC_CANCEL] = {
+ .audit_skip = 1,
+ },
[IORING_OP_LINK_TIMEOUT] = {
+ .audit_skip = 1,
.async_size = sizeof(struct io_timeout_data),
},
[IORING_OP_CONNECT] = {
@@ -1009,14 +1026,19 @@ static const struct io_op_def io_op_defs[] = {
},
[IORING_OP_OPENAT] = {},
[IORING_OP_CLOSE] = {},
- [IORING_OP_FILES_UPDATE] = {},
- [IORING_OP_STATX] = {},
+ [IORING_OP_FILES_UPDATE] = {
+ .audit_skip = 1,
+ },
+ [IORING_OP_STATX] = {
+ .audit_skip = 1,
+ },
[IORING_OP_READ] = {
.needs_file = 1,
.unbound_nonreg_file = 1,
.pollin = 1,
.buffer_select = 1,
.plug = 1,
+ .audit_skip = 1,
.async_size = sizeof(struct io_async_rw),
},
[IORING_OP_WRITE] = {
@@ -1025,39 +1047,50 @@ static const struct io_op_def io_op_defs[] = {
.unbound_nonreg_file = 1,
.pollout = 1,
.plug = 1,
+ .audit_skip = 1,
.async_size = sizeof(struct io_async_rw),
},
[IORING_OP_FADVISE] = {
.needs_file = 1,
+ .audit_skip = 1,
},
[IORING_OP_MADVISE] = {},
[IORING_OP_SEND] = {
.needs_file = 1,
.unbound_nonreg_file = 1,
.pollout = 1,
+ .audit_skip = 1,
},
[IORING_OP_RECV] = {
.needs_file = 1,
.unbound_nonreg_file = 1,
.pollin = 1,
.buffer_select = 1,
+ .audit_skip = 1,
},
[IORING_OP_OPENAT2] = {
},
[IORING_OP_EPOLL_CTL] = {
.unbound_nonreg_file = 1,
+ .audit_skip = 1,
},
[IORING_OP_SPLICE] = {
.needs_file = 1,
.hash_reg_file = 1,
.unbound_nonreg_file = 1,
+ .audit_skip = 1,
+ },
+ [IORING_OP_PROVIDE_BUFFERS] = {
+ .audit_skip = 1,
+ },
+ [IORING_OP_REMOVE_BUFFERS] = {
+ .audit_skip = 1,
},
- [IORING_OP_PROVIDE_BUFFERS] = {},
- [IORING_OP_REMOVE_BUFFERS] = {},
[IORING_OP_TEE] = {
.needs_file = 1,
.hash_reg_file = 1,
.unbound_nonreg_file = 1,
+ .audit_skip = 1,
},
[IORING_OP_SHUTDOWN] = {
.needs_file = 1,
@@ -6591,6 +6624,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
if ((req->flags & REQ_F_CREDS) && req->creds != current_cred())
creds = override_creds(req->creds);
+ if (!io_op_defs[req->opcode].audit_skip)
+ audit_uring_entry(req->opcode);
+
switch (req->opcode) {
case IORING_OP_NOP:
ret = io_nop(req, issue_flags);
@@ -6706,6 +6742,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
break;
}
+ if (!io_op_defs[req->opcode].audit_skip)
+ audit_uring_exit(!ret, ret);
+
if (creds)
revert_creds(creds);
if (ret)
@@ -7360,6 +7399,8 @@ static int io_sq_thread(void *data)
set_cpus_allowed_ptr(current, cpu_online_mask);
current->flags |= PF_NO_SETAFFINITY;
+ audit_alloc_kernel(current);
+
mutex_lock(&sqd->lock);
while (1) {
bool cap_entries, sqt_spin = false;
@@ -7425,6 +7466,8 @@ static int io_sq_thread(void *data)
io_run_task_work();
mutex_unlock(&sqd->lock);
+ audit_free(current);
+
complete(&sqd->exited);
do_exit(0);
}
diff --git a/include/linux/audit.h b/include/linux/audit.h
index 82b7c1116a85..d656a06dd909 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -286,7 +286,10 @@ static inline int audit_signal_info(int sig, struct task_struct *t)
/* These are defined in auditsc.c */
/* Public API */
extern int audit_alloc(struct task_struct *task);
+extern int audit_alloc_kernel(struct task_struct *task);
extern void __audit_free(struct task_struct *task);
+extern void __audit_uring_entry(u8 op);
+extern void __audit_uring_exit(int success, long code);
extern void __audit_syscall_entry(int major, unsigned long a0, unsigned long a1,
unsigned long a2, unsigned long a3);
extern void __audit_syscall_exit(int ret_success, long ret_value);
@@ -323,6 +326,21 @@ static inline void audit_free(struct task_struct *task)
if (unlikely(task->audit_context))
__audit_free(task);
}
+static inline void audit_uring_entry(u8 op)
+{
+ /*
+ * We intentionally check audit_context() before audit_enabled as most
+ * Linux systems (as of ~2021) rely on systemd which forces audit to
+ * be enabled regardless of the user's audit configuration.
+ */
+ if (unlikely(audit_context() && audit_enabled))
+ __audit_uring_entry(op);
+}
+static inline void audit_uring_exit(int success, long code)
+{
+ if (unlikely(!audit_dummy_context()))
+ __audit_uring_exit(success, code);
+}
static inline void audit_syscall_entry(int major, unsigned long a0,
unsigned long a1, unsigned long a2,
unsigned long a3)
@@ -554,8 +572,16 @@ static inline int audit_alloc(struct task_struct *task)
{
return 0;
}
+static inline int audit_alloc_kernel(struct task_struct *task)
+{
+ return 0;
+}
static inline void audit_free(struct task_struct *task)
{ }
+static inline void audit_uring_entry(u8 op)
+{ }
+static inline void audit_uring_exit(int success, long code)
+{ }
static inline void audit_syscall_entry(int major, unsigned long a0,
unsigned long a1, unsigned long a2,
unsigned long a3)
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index daa481729e9b..a1997697c8b1 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -118,6 +118,7 @@
#define AUDIT_TIME_ADJNTPVAL 1333 /* NTP value adjustment */
#define AUDIT_BPF 1334 /* BPF subsystem */
#define AUDIT_EVENT_LISTENER 1335 /* Task joined multicast read socket */
+#define AUDIT_URINGOP 1336 /* io_uring operation */
#define AUDIT_AVC 1400 /* SE Linux avc denial or grant */
#define AUDIT_SELINUX_ERR 1401 /* Internal SE Linux Errors */
diff --git a/kernel/audit.h b/kernel/audit.h
index 13abc48de0bd..d1161e3b83e2 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -103,10 +103,12 @@ struct audit_context {
enum {
AUDIT_CTX_UNUSED, /* audit_context is currently unused */
AUDIT_CTX_SYSCALL, /* in use by syscall */
+ AUDIT_CTX_URING, /* in use by io_uring */
} context;
enum audit_state state, current_state;
unsigned int serial; /* serial number for record */
int major; /* syscall number */
+ int uring_op; /* uring operation */
struct timespec64 ctime; /* time of syscall entry */
unsigned long argv[4]; /* syscall arguments */
long return_code;/* syscall return code */
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index f3d309b05c2d..6dda448fb826 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -959,6 +959,7 @@ static void audit_reset_context(struct audit_context *ctx)
ctx->current_state = ctx->state;
ctx->serial = 0;
ctx->major = 0;
+ ctx->uring_op = 0;
ctx->ctime = (struct timespec64){ .tv_sec = 0, .tv_nsec = 0 };
memset(ctx->argv, 0, sizeof(ctx->argv));
ctx->return_code = 0;
@@ -1044,6 +1045,31 @@ int audit_alloc(struct task_struct *tsk)
return 0;
}
+/**
+ * audit_alloc_kernel - allocate an audit_context for a kernel task
+ * @tsk: the kernel task
+ *
+ * Similar to the audit_alloc() function, but intended for kernel private
+ * threads. Returns zero on success, negative values on failure.
+ */
+int audit_alloc_kernel(struct task_struct *tsk)
+{
+ /*
+ * At the moment we are just going to call into audit_alloc() to
+ * simplify the code, but there two things to keep in mind with this
+ * approach:
+ *
+ * 1. Filtering internal kernel tasks is a bit laughable in almost all
+ * cases, but there is at least one case where there is a benefit:
+ * the '-a task,never' case allows the admin to effectively disable
+ * task auditing at runtime.
+ *
+ * 2. The {set,clear}_task_syscall_work() ops likely have zero effect
+ * on these internal kernel tasks, but they probably don't hurt either.
+ */
+ return audit_alloc(tsk);
+}
+
static inline void audit_free_context(struct audit_context *context)
{
/* resetting is extra work, but it is likely just noise */
@@ -1546,6 +1572,44 @@ static void audit_log_proctitle(void)
audit_log_end(ab);
}
+/**
+ * audit_log_uring - generate a AUDIT_URINGOP record
+ * @ctx: the audit context
+ */
+static void audit_log_uring(struct audit_context *ctx)
+{
+ struct audit_buffer *ab;
+ const struct cred *cred;
+
+ ab = audit_log_start(ctx, GFP_ATOMIC, AUDIT_URINGOP);
+ if (!ab)
+ return;
+ cred = current_cred();
+ audit_log_format(ab, "uring_op=%d", ctx->uring_op);
+ if (ctx->return_valid != AUDITSC_INVALID)
+ audit_log_format(ab, " success=%s exit=%ld",
+ (ctx->return_valid == AUDITSC_SUCCESS ?
+ "yes" : "no"),
+ ctx->return_code);
+ audit_log_format(ab,
+ " items=%d"
+ " ppid=%d pid=%d uid=%u gid=%u euid=%u suid=%u"
+ " fsuid=%u egid=%u sgid=%u fsgid=%u",
+ ctx->name_count,
+ task_ppid_nr(current), task_tgid_nr(current),
+ from_kuid(&init_user_ns, cred->uid),
+ from_kgid(&init_user_ns, cred->gid),
+ from_kuid(&init_user_ns, cred->euid),
+ from_kuid(&init_user_ns, cred->suid),
+ from_kuid(&init_user_ns, cred->fsuid),
+ from_kgid(&init_user_ns, cred->egid),
+ from_kgid(&init_user_ns, cred->sgid),
+ from_kgid(&init_user_ns, cred->fsgid));
+ audit_log_task_context(ab);
+ audit_log_key(ab, ctx->filterkey);
+ audit_log_end(ab);
+}
+
static void audit_log_exit(void)
{
int i, call_panic = 0;
@@ -1581,6 +1645,9 @@ static void audit_log_exit(void)
audit_log_key(ab, context->filterkey);
audit_log_end(ab);
break;
+ case AUDIT_CTX_URING:
+ audit_log_uring(context);
+ break;
default:
BUG();
break;
@@ -1751,6 +1818,105 @@ static void audit_return_fixup(struct audit_context *ctx,
ctx->return_valid = (success ? AUDITSC_SUCCESS : AUDITSC_FAILURE);
}
+/**
+ * __audit_uring_entry - prepare the kernel task's audit context for io_uring
+ * @op: the io_uring opcode
+ *
+ * This is similar to audit_syscall_entry() but is intended for use by io_uring
+ * operations. This function should only ever be called from
+ * audit_uring_entry() as we rely on the audit context checking present in that
+ * function.
+ */
+void __audit_uring_entry(u8 op)
+{
+ struct audit_context *ctx = audit_context();
+
+ if (ctx->state == AUDIT_STATE_DISABLED)
+ return;
+
+ /*
+ * NOTE: It's possible that we can be called from the process' context
+ * before it returns to userspace, and before audit_syscall_exit()
+ * is called. In this case there is not much to do, just record
+ * the io_uring details and return.
+ */
+ ctx->uring_op = op;
+ if (ctx->context == AUDIT_CTX_SYSCALL)
+ return;
+
+ ctx->dummy = !audit_n_rules;
+ if (!ctx->dummy && ctx->state == AUDIT_STATE_BUILD)
+ ctx->prio = 0;
+
+ ctx->context = AUDIT_CTX_URING;
+ ctx->current_state = ctx->state;
+ ktime_get_coarse_real_ts64(&ctx->ctime);
+}
+
+/**
+ * __audit_uring_exit - wrap up the kernel task's audit context after io_uring
+ * @success: true/false value to indicate if the operation succeeded or not
+ * @code: operation return code
+ *
+ * This is similar to audit_syscall_exit() but is intended for use by io_uring
+ * operations. This function should only ever be called from
+ * audit_uring_exit() as we rely on the audit context checking present in that
+ * function.
+ */
+void __audit_uring_exit(int success, long code)
+{
+ struct audit_context *ctx = audit_context();
+
+ /*
+ * TODO: At some point we will likely want to filter on io_uring ops
+ * and other things similar to what we do for syscalls, but that
+ * is something for another day; just record what we can here.
+ */
+
+ if (ctx->context == AUDIT_CTX_SYSCALL) {
+ /*
+ * NOTE: See the note in __audit_uring_entry() about the case
+ * where we may be called from process context before we
+ * return to userspace via audit_syscall_exit(). In this
+ * case we simply emit a URINGOP record and bail, the
+ * normal syscall exit handling will take care of
+ * everything else.
+ * It is also worth mentioning that when we are called,
+ * the current process creds may differ from the creds
+ * used during the normal syscall processing; keep that
+ * in mind if/when we move the record generation code.
+ */
+
+ /*
+ * We need to filter on the syscall info here to decide if we
+ * should emit a URINGOP record. I know it seems odd but this
+ * solves the problem where users have a filter to block *all*
+ * syscall records in the "exit" filter; we want to preserve
+ * the behavior here.
+ */
+ audit_filter_syscall(current, ctx);
+ audit_filter_inodes(current, ctx);
+ if (ctx->current_state != AUDIT_STATE_RECORD)
+ return;
+
+ audit_log_uring(ctx);
+ return;
+ }
+
+ /* this may generate CONFIG_CHANGE records */
+ if (!list_empty(&ctx->killed_trees))
+ audit_kill_trees(ctx);
+
+ audit_filter_inodes(current, ctx);
+ if (ctx->current_state != AUDIT_STATE_RECORD)
+ goto out;
+ audit_return_fixup(ctx, success, code);
+ audit_log_exit();
+
+out:
+ audit_reset_context(ctx);
+}
+
/**
* __audit_syscall_entry - fill in an audit record at syscall entry
* @major: major syscall type (function)
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/8] audit,io_uring,io-wq: add some basic audit support to io_uring
2021-09-15 16:49 ` [PATCH v4 2/8] audit,io_uring,io-wq: add some basic audit support to io_uring Paul Moore
@ 2021-09-16 13:33 ` Richard Guy Briggs
2021-09-16 14:02 ` Paul Moore
0 siblings, 1 reply; 15+ messages in thread
From: Richard Guy Briggs @ 2021-09-16 13:33 UTC (permalink / raw)
To: Paul Moore
Cc: linux-security-module, selinux, linux-audit, io-uring,
linux-fsdevel, Jens Axboe, Pavel Begunkov,
Kumar Kartikeya Dwivedi
On 2021-09-15 12:49, Paul Moore wrote:
> This patch adds basic auditing to io_uring operations, regardless of
> their context. This is accomplished by allocating audit_context
> structures for the io-wq worker and io_uring SQPOLL kernel threads
> as well as explicitly auditing the io_uring operations in
> io_issue_sqe(). Individual io_uring operations can bypass auditing
> through the "audit_skip" field in the struct io_op_def definition for
> the operation; although great care must be taken so that security
> relevant io_uring operations do not bypass auditing; please contact
> the audit mailing list (see the MAINTAINERS file) with any questions.
>
> The io_uring operations are audited using a new AUDIT_URINGOP record,
> an example is shown below:
>
> type=UNKNOWN[1336] msg=audit(1630523381.288:260):
> uring_op=19 success=yes exit=0 items=0 ppid=853 pid=1204
> uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0
> subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> key=(null)
> AUID="root" UID="root" GID="root" EUID="root" SUID="root"
> FSUID="root" EGID="root" SGID="root" FSGID="root"
>
> Thanks to Richard Guy Briggs for review and feedback.
I share Steve's concerns about the missing auid and ses. The userspace
log interpreter conjured up AUID="root" from the absent auid=.
Some of the creds are here including ppid, pid, a herd of *id and subj.
*Something* initiated this action and then delegated it to iouring to
carry out. That should be in there somewhere. You had a concern about
shared queues and mis-attribution. All of these creds including auid
and ses should be kept together to get this right.
> Signed-off-by: Paul Moore <[email protected]>
>
> ---
> v4:
> - removed some work-in-progress comments
> - removed the auid logging in audit_log_uring()
> v3:
> - removed work-in-progress warning from the description
> v2:
> - added dummy funcs for audit_uring_{entry,exit}()
> - replaced opcode checks in io_issue_sqe() with audit_skip checks
> - moved fastpath checks into audit_uring_{entry,exit}()
> - audit_log_uring() uses GFP_ATOMIC
> - don't record the arch in __audit_uring_entry()
> v1:
> - initial draft
> ---
> fs/io-wq.c | 4 +
> fs/io_uring.c | 55 +++++++++++++--
> include/linux/audit.h | 26 +++++++
> include/uapi/linux/audit.h | 1
> kernel/audit.h | 2 +
> kernel/auditsc.c | 166 ++++++++++++++++++++++++++++++++++++++++++++
> 6 files changed, 248 insertions(+), 6 deletions(-)
>
> diff --git a/fs/io-wq.c b/fs/io-wq.c
> index 6c55362c1f99..dac5c5961c9d 100644
> --- a/fs/io-wq.c
> +++ b/fs/io-wq.c
> @@ -14,6 +14,7 @@
> #include <linux/rculist_nulls.h>
> #include <linux/cpu.h>
> #include <linux/tracehook.h>
> +#include <linux/audit.h>
>
> #include "io-wq.h"
>
> @@ -562,6 +563,8 @@ static int io_wqe_worker(void *data)
> snprintf(buf, sizeof(buf), "iou-wrk-%d", wq->task->pid);
> set_task_comm(current, buf);
>
> + audit_alloc_kernel(current);
> +
> while (!test_bit(IO_WQ_BIT_EXIT, &wq->state)) {
> long ret;
>
> @@ -601,6 +604,7 @@ static int io_wqe_worker(void *data)
> io_worker_handle_work(worker);
> }
>
> + audit_free(current);
> io_worker_exit(worker);
> return 0;
> }
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 16fb7436043c..388754b24785 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -79,6 +79,7 @@
> #include <linux/pagemap.h>
> #include <linux/io_uring.h>
> #include <linux/tracehook.h>
> +#include <linux/audit.h>
>
> #define CREATE_TRACE_POINTS
> #include <trace/events/io_uring.h>
> @@ -917,6 +918,8 @@ struct io_op_def {
> unsigned needs_async_setup : 1;
> /* should block plug */
> unsigned plug : 1;
> + /* skip auditing */
> + unsigned audit_skip : 1;
> /* size of async data needed, if any */
> unsigned short async_size;
> };
> @@ -930,6 +933,7 @@ static const struct io_op_def io_op_defs[] = {
> .buffer_select = 1,
> .needs_async_setup = 1,
> .plug = 1,
> + .audit_skip = 1,
> .async_size = sizeof(struct io_async_rw),
> },
> [IORING_OP_WRITEV] = {
> @@ -939,16 +943,19 @@ static const struct io_op_def io_op_defs[] = {
> .pollout = 1,
> .needs_async_setup = 1,
> .plug = 1,
> + .audit_skip = 1,
> .async_size = sizeof(struct io_async_rw),
> },
> [IORING_OP_FSYNC] = {
> .needs_file = 1,
> + .audit_skip = 1,
> },
> [IORING_OP_READ_FIXED] = {
> .needs_file = 1,
> .unbound_nonreg_file = 1,
> .pollin = 1,
> .plug = 1,
> + .audit_skip = 1,
> .async_size = sizeof(struct io_async_rw),
> },
> [IORING_OP_WRITE_FIXED] = {
> @@ -957,15 +964,20 @@ static const struct io_op_def io_op_defs[] = {
> .unbound_nonreg_file = 1,
> .pollout = 1,
> .plug = 1,
> + .audit_skip = 1,
> .async_size = sizeof(struct io_async_rw),
> },
> [IORING_OP_POLL_ADD] = {
> .needs_file = 1,
> .unbound_nonreg_file = 1,
> + .audit_skip = 1,
> + },
> + [IORING_OP_POLL_REMOVE] = {
> + .audit_skip = 1,
> },
> - [IORING_OP_POLL_REMOVE] = {},
> [IORING_OP_SYNC_FILE_RANGE] = {
> .needs_file = 1,
> + .audit_skip = 1,
> },
> [IORING_OP_SENDMSG] = {
> .needs_file = 1,
> @@ -983,18 +995,23 @@ static const struct io_op_def io_op_defs[] = {
> .async_size = sizeof(struct io_async_msghdr),
> },
> [IORING_OP_TIMEOUT] = {
> + .audit_skip = 1,
> .async_size = sizeof(struct io_timeout_data),
> },
> [IORING_OP_TIMEOUT_REMOVE] = {
> /* used by timeout updates' prep() */
> + .audit_skip = 1,
> },
> [IORING_OP_ACCEPT] = {
> .needs_file = 1,
> .unbound_nonreg_file = 1,
> .pollin = 1,
> },
> - [IORING_OP_ASYNC_CANCEL] = {},
> + [IORING_OP_ASYNC_CANCEL] = {
> + .audit_skip = 1,
> + },
> [IORING_OP_LINK_TIMEOUT] = {
> + .audit_skip = 1,
> .async_size = sizeof(struct io_timeout_data),
> },
> [IORING_OP_CONNECT] = {
> @@ -1009,14 +1026,19 @@ static const struct io_op_def io_op_defs[] = {
> },
> [IORING_OP_OPENAT] = {},
> [IORING_OP_CLOSE] = {},
> - [IORING_OP_FILES_UPDATE] = {},
> - [IORING_OP_STATX] = {},
> + [IORING_OP_FILES_UPDATE] = {
> + .audit_skip = 1,
> + },
> + [IORING_OP_STATX] = {
> + .audit_skip = 1,
> + },
> [IORING_OP_READ] = {
> .needs_file = 1,
> .unbound_nonreg_file = 1,
> .pollin = 1,
> .buffer_select = 1,
> .plug = 1,
> + .audit_skip = 1,
> .async_size = sizeof(struct io_async_rw),
> },
> [IORING_OP_WRITE] = {
> @@ -1025,39 +1047,50 @@ static const struct io_op_def io_op_defs[] = {
> .unbound_nonreg_file = 1,
> .pollout = 1,
> .plug = 1,
> + .audit_skip = 1,
> .async_size = sizeof(struct io_async_rw),
> },
> [IORING_OP_FADVISE] = {
> .needs_file = 1,
> + .audit_skip = 1,
> },
> [IORING_OP_MADVISE] = {},
> [IORING_OP_SEND] = {
> .needs_file = 1,
> .unbound_nonreg_file = 1,
> .pollout = 1,
> + .audit_skip = 1,
> },
> [IORING_OP_RECV] = {
> .needs_file = 1,
> .unbound_nonreg_file = 1,
> .pollin = 1,
> .buffer_select = 1,
> + .audit_skip = 1,
> },
> [IORING_OP_OPENAT2] = {
> },
> [IORING_OP_EPOLL_CTL] = {
> .unbound_nonreg_file = 1,
> + .audit_skip = 1,
> },
> [IORING_OP_SPLICE] = {
> .needs_file = 1,
> .hash_reg_file = 1,
> .unbound_nonreg_file = 1,
> + .audit_skip = 1,
> + },
> + [IORING_OP_PROVIDE_BUFFERS] = {
> + .audit_skip = 1,
> + },
> + [IORING_OP_REMOVE_BUFFERS] = {
> + .audit_skip = 1,
> },
> - [IORING_OP_PROVIDE_BUFFERS] = {},
> - [IORING_OP_REMOVE_BUFFERS] = {},
> [IORING_OP_TEE] = {
> .needs_file = 1,
> .hash_reg_file = 1,
> .unbound_nonreg_file = 1,
> + .audit_skip = 1,
> },
> [IORING_OP_SHUTDOWN] = {
> .needs_file = 1,
> @@ -6591,6 +6624,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
> if ((req->flags & REQ_F_CREDS) && req->creds != current_cred())
> creds = override_creds(req->creds);
>
> + if (!io_op_defs[req->opcode].audit_skip)
> + audit_uring_entry(req->opcode);
> +
> switch (req->opcode) {
> case IORING_OP_NOP:
> ret = io_nop(req, issue_flags);
> @@ -6706,6 +6742,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
> break;
> }
>
> + if (!io_op_defs[req->opcode].audit_skip)
> + audit_uring_exit(!ret, ret);
> +
> if (creds)
> revert_creds(creds);
> if (ret)
> @@ -7360,6 +7399,8 @@ static int io_sq_thread(void *data)
> set_cpus_allowed_ptr(current, cpu_online_mask);
> current->flags |= PF_NO_SETAFFINITY;
>
> + audit_alloc_kernel(current);
> +
> mutex_lock(&sqd->lock);
> while (1) {
> bool cap_entries, sqt_spin = false;
> @@ -7425,6 +7466,8 @@ static int io_sq_thread(void *data)
> io_run_task_work();
> mutex_unlock(&sqd->lock);
>
> + audit_free(current);
> +
> complete(&sqd->exited);
> do_exit(0);
> }
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 82b7c1116a85..d656a06dd909 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -286,7 +286,10 @@ static inline int audit_signal_info(int sig, struct task_struct *t)
> /* These are defined in auditsc.c */
> /* Public API */
> extern int audit_alloc(struct task_struct *task);
> +extern int audit_alloc_kernel(struct task_struct *task);
> extern void __audit_free(struct task_struct *task);
> +extern void __audit_uring_entry(u8 op);
> +extern void __audit_uring_exit(int success, long code);
> extern void __audit_syscall_entry(int major, unsigned long a0, unsigned long a1,
> unsigned long a2, unsigned long a3);
> extern void __audit_syscall_exit(int ret_success, long ret_value);
> @@ -323,6 +326,21 @@ static inline void audit_free(struct task_struct *task)
> if (unlikely(task->audit_context))
> __audit_free(task);
> }
> +static inline void audit_uring_entry(u8 op)
> +{
> + /*
> + * We intentionally check audit_context() before audit_enabled as most
> + * Linux systems (as of ~2021) rely on systemd which forces audit to
> + * be enabled regardless of the user's audit configuration.
> + */
> + if (unlikely(audit_context() && audit_enabled))
> + __audit_uring_entry(op);
> +}
> +static inline void audit_uring_exit(int success, long code)
> +{
> + if (unlikely(!audit_dummy_context()))
> + __audit_uring_exit(success, code);
> +}
> static inline void audit_syscall_entry(int major, unsigned long a0,
> unsigned long a1, unsigned long a2,
> unsigned long a3)
> @@ -554,8 +572,16 @@ static inline int audit_alloc(struct task_struct *task)
> {
> return 0;
> }
> +static inline int audit_alloc_kernel(struct task_struct *task)
> +{
> + return 0;
> +}
> static inline void audit_free(struct task_struct *task)
> { }
> +static inline void audit_uring_entry(u8 op)
> +{ }
> +static inline void audit_uring_exit(int success, long code)
> +{ }
> static inline void audit_syscall_entry(int major, unsigned long a0,
> unsigned long a1, unsigned long a2,
> unsigned long a3)
> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index daa481729e9b..a1997697c8b1 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -118,6 +118,7 @@
> #define AUDIT_TIME_ADJNTPVAL 1333 /* NTP value adjustment */
> #define AUDIT_BPF 1334 /* BPF subsystem */
> #define AUDIT_EVENT_LISTENER 1335 /* Task joined multicast read socket */
> +#define AUDIT_URINGOP 1336 /* io_uring operation */
>
> #define AUDIT_AVC 1400 /* SE Linux avc denial or grant */
> #define AUDIT_SELINUX_ERR 1401 /* Internal SE Linux Errors */
> diff --git a/kernel/audit.h b/kernel/audit.h
> index 13abc48de0bd..d1161e3b83e2 100644
> --- a/kernel/audit.h
> +++ b/kernel/audit.h
> @@ -103,10 +103,12 @@ struct audit_context {
> enum {
> AUDIT_CTX_UNUSED, /* audit_context is currently unused */
> AUDIT_CTX_SYSCALL, /* in use by syscall */
> + AUDIT_CTX_URING, /* in use by io_uring */
> } context;
> enum audit_state state, current_state;
> unsigned int serial; /* serial number for record */
> int major; /* syscall number */
> + int uring_op; /* uring operation */
> struct timespec64 ctime; /* time of syscall entry */
> unsigned long argv[4]; /* syscall arguments */
> long return_code;/* syscall return code */
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index f3d309b05c2d..6dda448fb826 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -959,6 +959,7 @@ static void audit_reset_context(struct audit_context *ctx)
> ctx->current_state = ctx->state;
> ctx->serial = 0;
> ctx->major = 0;
> + ctx->uring_op = 0;
> ctx->ctime = (struct timespec64){ .tv_sec = 0, .tv_nsec = 0 };
> memset(ctx->argv, 0, sizeof(ctx->argv));
> ctx->return_code = 0;
> @@ -1044,6 +1045,31 @@ int audit_alloc(struct task_struct *tsk)
> return 0;
> }
>
> +/**
> + * audit_alloc_kernel - allocate an audit_context for a kernel task
> + * @tsk: the kernel task
> + *
> + * Similar to the audit_alloc() function, but intended for kernel private
> + * threads. Returns zero on success, negative values on failure.
> + */
> +int audit_alloc_kernel(struct task_struct *tsk)
> +{
> + /*
> + * At the moment we are just going to call into audit_alloc() to
> + * simplify the code, but there two things to keep in mind with this
> + * approach:
> + *
> + * 1. Filtering internal kernel tasks is a bit laughable in almost all
> + * cases, but there is at least one case where there is a benefit:
> + * the '-a task,never' case allows the admin to effectively disable
> + * task auditing at runtime.
> + *
> + * 2. The {set,clear}_task_syscall_work() ops likely have zero effect
> + * on these internal kernel tasks, but they probably don't hurt either.
> + */
> + return audit_alloc(tsk);
> +}
> +
> static inline void audit_free_context(struct audit_context *context)
> {
> /* resetting is extra work, but it is likely just noise */
> @@ -1546,6 +1572,44 @@ static void audit_log_proctitle(void)
> audit_log_end(ab);
> }
>
> +/**
> + * audit_log_uring - generate a AUDIT_URINGOP record
> + * @ctx: the audit context
> + */
> +static void audit_log_uring(struct audit_context *ctx)
> +{
> + struct audit_buffer *ab;
> + const struct cred *cred;
> +
> + ab = audit_log_start(ctx, GFP_ATOMIC, AUDIT_URINGOP);
> + if (!ab)
> + return;
> + cred = current_cred();
> + audit_log_format(ab, "uring_op=%d", ctx->uring_op);
> + if (ctx->return_valid != AUDITSC_INVALID)
> + audit_log_format(ab, " success=%s exit=%ld",
> + (ctx->return_valid == AUDITSC_SUCCESS ?
> + "yes" : "no"),
> + ctx->return_code);
> + audit_log_format(ab,
> + " items=%d"
> + " ppid=%d pid=%d uid=%u gid=%u euid=%u suid=%u"
> + " fsuid=%u egid=%u sgid=%u fsgid=%u",
> + ctx->name_count,
> + task_ppid_nr(current), task_tgid_nr(current),
> + from_kuid(&init_user_ns, cred->uid),
> + from_kgid(&init_user_ns, cred->gid),
> + from_kuid(&init_user_ns, cred->euid),
> + from_kuid(&init_user_ns, cred->suid),
> + from_kuid(&init_user_ns, cred->fsuid),
> + from_kgid(&init_user_ns, cred->egid),
> + from_kgid(&init_user_ns, cred->sgid),
> + from_kgid(&init_user_ns, cred->fsgid));
> + audit_log_task_context(ab);
> + audit_log_key(ab, ctx->filterkey);
> + audit_log_end(ab);
> +}
> +
> static void audit_log_exit(void)
> {
> int i, call_panic = 0;
> @@ -1581,6 +1645,9 @@ static void audit_log_exit(void)
> audit_log_key(ab, context->filterkey);
> audit_log_end(ab);
> break;
> + case AUDIT_CTX_URING:
> + audit_log_uring(context);
> + break;
> default:
> BUG();
> break;
> @@ -1751,6 +1818,105 @@ static void audit_return_fixup(struct audit_context *ctx,
> ctx->return_valid = (success ? AUDITSC_SUCCESS : AUDITSC_FAILURE);
> }
>
> +/**
> + * __audit_uring_entry - prepare the kernel task's audit context for io_uring
> + * @op: the io_uring opcode
> + *
> + * This is similar to audit_syscall_entry() but is intended for use by io_uring
> + * operations. This function should only ever be called from
> + * audit_uring_entry() as we rely on the audit context checking present in that
> + * function.
> + */
> +void __audit_uring_entry(u8 op)
> +{
> + struct audit_context *ctx = audit_context();
> +
> + if (ctx->state == AUDIT_STATE_DISABLED)
> + return;
> +
> + /*
> + * NOTE: It's possible that we can be called from the process' context
> + * before it returns to userspace, and before audit_syscall_exit()
> + * is called. In this case there is not much to do, just record
> + * the io_uring details and return.
> + */
> + ctx->uring_op = op;
> + if (ctx->context == AUDIT_CTX_SYSCALL)
> + return;
> +
> + ctx->dummy = !audit_n_rules;
> + if (!ctx->dummy && ctx->state == AUDIT_STATE_BUILD)
> + ctx->prio = 0;
> +
> + ctx->context = AUDIT_CTX_URING;
> + ctx->current_state = ctx->state;
> + ktime_get_coarse_real_ts64(&ctx->ctime);
> +}
> +
> +/**
> + * __audit_uring_exit - wrap up the kernel task's audit context after io_uring
> + * @success: true/false value to indicate if the operation succeeded or not
> + * @code: operation return code
> + *
> + * This is similar to audit_syscall_exit() but is intended for use by io_uring
> + * operations. This function should only ever be called from
> + * audit_uring_exit() as we rely on the audit context checking present in that
> + * function.
> + */
> +void __audit_uring_exit(int success, long code)
> +{
> + struct audit_context *ctx = audit_context();
> +
> + /*
> + * TODO: At some point we will likely want to filter on io_uring ops
> + * and other things similar to what we do for syscalls, but that
> + * is something for another day; just record what we can here.
> + */
> +
> + if (ctx->context == AUDIT_CTX_SYSCALL) {
> + /*
> + * NOTE: See the note in __audit_uring_entry() about the case
> + * where we may be called from process context before we
> + * return to userspace via audit_syscall_exit(). In this
> + * case we simply emit a URINGOP record and bail, the
> + * normal syscall exit handling will take care of
> + * everything else.
> + * It is also worth mentioning that when we are called,
> + * the current process creds may differ from the creds
> + * used during the normal syscall processing; keep that
> + * in mind if/when we move the record generation code.
> + */
> +
> + /*
> + * We need to filter on the syscall info here to decide if we
> + * should emit a URINGOP record. I know it seems odd but this
> + * solves the problem where users have a filter to block *all*
> + * syscall records in the "exit" filter; we want to preserve
> + * the behavior here.
> + */
> + audit_filter_syscall(current, ctx);
> + audit_filter_inodes(current, ctx);
> + if (ctx->current_state != AUDIT_STATE_RECORD)
> + return;
> +
> + audit_log_uring(ctx);
> + return;
> + }
> +
> + /* this may generate CONFIG_CHANGE records */
> + if (!list_empty(&ctx->killed_trees))
> + audit_kill_trees(ctx);
> +
> + audit_filter_inodes(current, ctx);
> + if (ctx->current_state != AUDIT_STATE_RECORD)
> + goto out;
> + audit_return_fixup(ctx, success, code);
> + audit_log_exit();
> +
> +out:
> + audit_reset_context(ctx);
> +}
> +
> /**
> * __audit_syscall_entry - fill in an audit record at syscall entry
> * @major: major syscall type (function)
>
- RGB
--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/8] audit,io_uring,io-wq: add some basic audit support to io_uring
2021-09-16 13:33 ` Richard Guy Briggs
@ 2021-09-16 14:02 ` Paul Moore
2021-09-16 14:19 ` Richard Guy Briggs
0 siblings, 1 reply; 15+ messages in thread
From: Paul Moore @ 2021-09-16 14:02 UTC (permalink / raw)
To: Richard Guy Briggs
Cc: linux-security-module, selinux, linux-audit, io-uring,
linux-fsdevel, Jens Axboe, Pavel Begunkov,
Kumar Kartikeya Dwivedi
On Thu, Sep 16, 2021 at 9:33 AM Richard Guy Briggs <[email protected]> wrote:
> On 2021-09-15 12:49, Paul Moore wrote:
> > This patch adds basic auditing to io_uring operations, regardless of
> > their context. This is accomplished by allocating audit_context
> > structures for the io-wq worker and io_uring SQPOLL kernel threads
> > as well as explicitly auditing the io_uring operations in
> > io_issue_sqe(). Individual io_uring operations can bypass auditing
> > through the "audit_skip" field in the struct io_op_def definition for
> > the operation; although great care must be taken so that security
> > relevant io_uring operations do not bypass auditing; please contact
> > the audit mailing list (see the MAINTAINERS file) with any questions.
> >
> > The io_uring operations are audited using a new AUDIT_URINGOP record,
> > an example is shown below:
> >
> > type=UNKNOWN[1336] msg=audit(1630523381.288:260):
> > uring_op=19 success=yes exit=0 items=0 ppid=853 pid=1204
> > uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0
> > subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> > key=(null)
> > AUID="root" UID="root" GID="root" EUID="root" SUID="root"
> > FSUID="root" EGID="root" SGID="root" FSGID="root"
> >
> > Thanks to Richard Guy Briggs for review and feedback.
>
> I share Steve's concerns about the missing auid and ses. The userspace
> log interpreter conjured up AUID="root" from the absent auid=.
>
> Some of the creds are here including ppid, pid, a herd of *id and subj.
> *Something* initiated this action and then delegated it to iouring to
> carry out. That should be in there somewhere. You had a concern about
> shared queues and mis-attribution. All of these creds including auid
> and ses should be kept together to get this right.
Look, there are a lot of things about io_uring that frustrate me from
a security perspective - this is one of them - but I've run out of
ways to say it's not possible to reliably capture the audit ID or
session ID here. With io_uring it is possible to submit an io_uring
operation, and capture the results, by simply reading and writing to a
mmap'd buffer. Yes, it would be nice to have that information, but I
don't believe there is a practical way to capture it. If you have any
suggestions on how to do so, please share, but please make it
concrete; hand wavy solutions aren't useful at this stage.
As for the userspace mysteriously creating an AUID out of thin air,
that was my mistake: I simply removed the "auid=" field from the
example and didn't remove the additional fields, e.g. AUID, that
auditd appends to the end of the record. I've updated the commit
description with a freshly generated record and removed the auditd
bonus bits as those probably shouldn't be shown in an example of a
kernel generated audit record. I'm not going to repost the patchset
just for this small edit to the description, but I have force-pushed
the update to the selinux/working-io_uring branch.
--
paul moore
www.paul-moore.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/8] audit,io_uring,io-wq: add some basic audit support to io_uring
2021-09-16 14:02 ` Paul Moore
@ 2021-09-16 14:19 ` Richard Guy Briggs
2021-09-16 14:47 ` Paul Moore
0 siblings, 1 reply; 15+ messages in thread
From: Richard Guy Briggs @ 2021-09-16 14:19 UTC (permalink / raw)
To: Paul Moore
Cc: linux-security-module, selinux, linux-audit, io-uring,
linux-fsdevel, Jens Axboe, Pavel Begunkov,
Kumar Kartikeya Dwivedi
On 2021-09-16 10:02, Paul Moore wrote:
> On Thu, Sep 16, 2021 at 9:33 AM Richard Guy Briggs <[email protected]> wrote:
> > On 2021-09-15 12:49, Paul Moore wrote:
> > > This patch adds basic auditing to io_uring operations, regardless of
> > > their context. This is accomplished by allocating audit_context
> > > structures for the io-wq worker and io_uring SQPOLL kernel threads
> > > as well as explicitly auditing the io_uring operations in
> > > io_issue_sqe(). Individual io_uring operations can bypass auditing
> > > through the "audit_skip" field in the struct io_op_def definition for
> > > the operation; although great care must be taken so that security
> > > relevant io_uring operations do not bypass auditing; please contact
> > > the audit mailing list (see the MAINTAINERS file) with any questions.
> > >
> > > The io_uring operations are audited using a new AUDIT_URINGOP record,
> > > an example is shown below:
> > >
> > > type=UNKNOWN[1336] msg=audit(1630523381.288:260):
> > > uring_op=19 success=yes exit=0 items=0 ppid=853 pid=1204
> > > uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0
> > > subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> > > key=(null)
> > > AUID="root" UID="root" GID="root" EUID="root" SUID="root"
> > > FSUID="root" EGID="root" SGID="root" FSGID="root"
> > >
> > > Thanks to Richard Guy Briggs for review and feedback.
> >
> > I share Steve's concerns about the missing auid and ses. The userspace
> > log interpreter conjured up AUID="root" from the absent auid=.
> >
> > Some of the creds are here including ppid, pid, a herd of *id and subj.
> > *Something* initiated this action and then delegated it to iouring to
> > carry out. That should be in there somewhere. You had a concern about
> > shared queues and mis-attribution. All of these creds including auid
> > and ses should be kept together to get this right.
>
> Look, there are a lot of things about io_uring that frustrate me from
> a security perspective - this is one of them - but I've run out of
> ways to say it's not possible to reliably capture the audit ID or
> session ID here. With io_uring it is possible to submit an io_uring
> operation, and capture the results, by simply reading and writing to a
> mmap'd buffer. Yes, it would be nice to have that information, but I
> don't believe there is a practical way to capture it. If you have any
> suggestions on how to do so, please share, but please make it
> concrete; hand wavy solutions aren't useful at this stage.
I was hoping to give a more concrete solution but have other
distractions at the moment. My concern is adding it later once the
message format is committed. We have too many field orderings already.
Recognizing this adds useless characters to the record type at this
time, I'm even thinking auid=? ses=? until a solution can be found.
So you are sure the rest of the creds are correct?
> As for the userspace mysteriously creating an AUID out of thin air,
> that was my mistake: I simply removed the "auid=" field from the
> example and didn't remove the additional fields, e.g. AUID, that
> auditd appends to the end of the record. I've updated the commit
> description with a freshly generated record and removed the auditd
> bonus bits as those probably shouldn't be shown in an example of a
> kernel generated audit record. I'm not going to repost the patchset
> just for this small edit to the description, but I have force-pushed
> the update to the selinux/working-io_uring branch.
Understood, no problem here.
> paul moore
- RGB
--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/8] audit,io_uring,io-wq: add some basic audit support to io_uring
2021-09-16 14:19 ` Richard Guy Briggs
@ 2021-09-16 14:47 ` Paul Moore
0 siblings, 0 replies; 15+ messages in thread
From: Paul Moore @ 2021-09-16 14:47 UTC (permalink / raw)
To: Richard Guy Briggs
Cc: linux-security-module, selinux, linux-audit, io-uring,
linux-fsdevel, Jens Axboe, Pavel Begunkov,
Kumar Kartikeya Dwivedi
On Thu, Sep 16, 2021 at 10:19 AM Richard Guy Briggs <[email protected]> wrote:
> On 2021-09-16 10:02, Paul Moore wrote:
> > On Thu, Sep 16, 2021 at 9:33 AM Richard Guy Briggs <[email protected]> wrote:
> > > On 2021-09-15 12:49, Paul Moore wrote:
> > > > This patch adds basic auditing to io_uring operations, regardless of
> > > > their context. This is accomplished by allocating audit_context
> > > > structures for the io-wq worker and io_uring SQPOLL kernel threads
> > > > as well as explicitly auditing the io_uring operations in
> > > > io_issue_sqe(). Individual io_uring operations can bypass auditing
> > > > through the "audit_skip" field in the struct io_op_def definition for
> > > > the operation; although great care must be taken so that security
> > > > relevant io_uring operations do not bypass auditing; please contact
> > > > the audit mailing list (see the MAINTAINERS file) with any questions.
> > > >
> > > > The io_uring operations are audited using a new AUDIT_URINGOP record,
> > > > an example is shown below:
> > > >
> > > > type=UNKNOWN[1336] msg=audit(1630523381.288:260):
> > > > uring_op=19 success=yes exit=0 items=0 ppid=853 pid=1204
> > > > uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0
> > > > subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> > > > key=(null)
> > > > AUID="root" UID="root" GID="root" EUID="root" SUID="root"
> > > > FSUID="root" EGID="root" SGID="root" FSGID="root"
> > > >
> > > > Thanks to Richard Guy Briggs for review and feedback.
> > >
> > > I share Steve's concerns about the missing auid and ses. The userspace
> > > log interpreter conjured up AUID="root" from the absent auid=.
> > >
> > > Some of the creds are here including ppid, pid, a herd of *id and subj.
> > > *Something* initiated this action and then delegated it to iouring to
> > > carry out. That should be in there somewhere. You had a concern about
> > > shared queues and mis-attribution. All of these creds including auid
> > > and ses should be kept together to get this right.
> >
> > Look, there are a lot of things about io_uring that frustrate me from
> > a security perspective - this is one of them - but I've run out of
> > ways to say it's not possible to reliably capture the audit ID or
> > session ID here. With io_uring it is possible to submit an io_uring
> > operation, and capture the results, by simply reading and writing to a
> > mmap'd buffer. Yes, it would be nice to have that information, but I
> > don't believe there is a practical way to capture it. If you have any
> > suggestions on how to do so, please share, but please make it
> > concrete; hand wavy solutions aren't useful at this stage.
>
> I was hoping to give a more concrete solution but have other
> distractions at the moment. My concern is adding it later once the
> message format is committed. We have too many field orderings already.
> Recognizing this adds useless characters to the record type at this
> time, I'm even thinking auid=? ses=? until a solution can be found.
You know my feelings on audit record field orderings, so let's not
follow that distraction right now.
Regarding proactively inserting a placeholder for auid= and ses=, I'm
reasonably convinced it is not practical, and likely not possible, to
capture that information for the audit record. As a result, I see no
reason to waste the space in the record. However, if you (or anyone
else for that matter) can show that we can reliably capture that
information then I'm in complete agreement that it should be added.
What I'm not going to do is hold this patchset any longer on a vague
feeling that it should be possible. On the plus side I'm only merging
this into selinux/next, not selinux/stable-5.15, so worst case you
have a couple more weeks before things are "set". I realize we all
have competing priorities, but as the saying goes, "it's time to put
up or shut up" ;)
> So you are sure the rest of the creds are correct?
Yes, the credentials that are logged in audit_log_uring() are all
taken from the currently executing task via a call to current_cred().
Regardless of the io_uring calling context (synchronous, io-wq,
sqpoll) the current_cred() call should always return the correct
credentials; look at how io_uring manages credentials in
io_issue_sqe(). While the audit log treats the audit ID just as if it
were another credential on the system, it is definitely not like a
normal credential and requires special handling; this is why there was
the mixup where I mistakenly left it in the audit record, and one of
the reasons why we will not always have a valid audit ID for io_uring
operations.
--
paul moore
www.paul-moore.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v4 3/8] audit: add filtering for io_uring records
2021-09-15 16:49 [PATCH v4 0/8] Add LSM access controls and auditing to io_uring Paul Moore
2021-09-15 16:49 ` [PATCH v4 1/8] audit: prepare audit_context for use in calling contexts beyond syscalls Paul Moore
2021-09-15 16:49 ` [PATCH v4 2/8] audit,io_uring,io-wq: add some basic audit support to io_uring Paul Moore
@ 2021-09-15 16:49 ` Paul Moore
2021-09-15 21:48 ` Richard Guy Briggs
2021-09-15 16:49 ` [PATCH v4 4/8] fs: add anon_inode_getfile_secure() similar to anon_inode_getfd_secure() Paul Moore
` (5 subsequent siblings)
8 siblings, 1 reply; 15+ messages in thread
From: Paul Moore @ 2021-09-15 16:49 UTC (permalink / raw)
To: linux-security-module, selinux, linux-audit, io-uring,
linux-fsdevel, Jens Axboe, Pavel Begunkov,
Kumar Kartikeya Dwivedi
This patch adds basic audit io_uring filtering, using as much of the
existing audit filtering infrastructure as possible. In order to do
this we reuse the audit filter rule's syscall mask for the io_uring
operation and we create a new filter for io_uring operations as
AUDIT_FILTER_URING_EXIT/audit_filter_list[7].
Thanks to Richard Guy Briggs for his review, feedback, and work on
the corresponding audit userspace changes.
Signed-off-by: Paul Moore <[email protected]>
---
v4:
- no change
v3:
- removed work-in-progress warning from the description
v2:
- incorporate feedback from Richard
v1:
- initial draft
---
include/uapi/linux/audit.h | 3 +-
kernel/audit_tree.c | 3 +-
kernel/audit_watch.c | 3 +-
kernel/auditfilter.c | 15 +++++++++--
kernel/auditsc.c | 61 ++++++++++++++++++++++++++++++++++----------
5 files changed, 65 insertions(+), 20 deletions(-)
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index a1997697c8b1..ecf1edd2affa 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -167,8 +167,9 @@
#define AUDIT_FILTER_EXCLUDE 0x05 /* Apply rule before record creation */
#define AUDIT_FILTER_TYPE AUDIT_FILTER_EXCLUDE /* obsolete misleading naming */
#define AUDIT_FILTER_FS 0x06 /* Apply rule at __audit_inode_child */
+#define AUDIT_FILTER_URING_EXIT 0x07 /* Apply rule at io_uring op exit */
-#define AUDIT_NR_FILTERS 7
+#define AUDIT_NR_FILTERS 8
#define AUDIT_FILTER_PREPEND 0x10 /* Prepend to front of list */
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 2cd7b5694422..338c53a961c5 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -726,7 +726,8 @@ int audit_make_tree(struct audit_krule *rule, char *pathname, u32 op)
{
if (pathname[0] != '/' ||
- rule->listnr != AUDIT_FILTER_EXIT ||
+ (rule->listnr != AUDIT_FILTER_EXIT &&
+ rule->listnr != AUDIT_FILTER_URING_EXIT) ||
op != Audit_equal ||
rule->inode_f || rule->watch || rule->tree)
return -EINVAL;
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index 2acf7ca49154..698b62b4a2ec 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -183,7 +183,8 @@ int audit_to_watch(struct audit_krule *krule, char *path, int len, u32 op)
return -EOPNOTSUPP;
if (path[0] != '/' || path[len-1] == '/' ||
- krule->listnr != AUDIT_FILTER_EXIT ||
+ (krule->listnr != AUDIT_FILTER_EXIT &&
+ krule->listnr != AUDIT_FILTER_URING_EXIT) ||
op != Audit_equal ||
krule->inode_f || krule->watch || krule->tree)
return -EINVAL;
diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
index db2c6b59dfc3..d75acb014ccd 100644
--- a/kernel/auditfilter.c
+++ b/kernel/auditfilter.c
@@ -44,7 +44,8 @@ struct list_head audit_filter_list[AUDIT_NR_FILTERS] = {
LIST_HEAD_INIT(audit_filter_list[4]),
LIST_HEAD_INIT(audit_filter_list[5]),
LIST_HEAD_INIT(audit_filter_list[6]),
-#if AUDIT_NR_FILTERS != 7
+ LIST_HEAD_INIT(audit_filter_list[7]),
+#if AUDIT_NR_FILTERS != 8
#error Fix audit_filter_list initialiser
#endif
};
@@ -56,6 +57,7 @@ static struct list_head audit_rules_list[AUDIT_NR_FILTERS] = {
LIST_HEAD_INIT(audit_rules_list[4]),
LIST_HEAD_INIT(audit_rules_list[5]),
LIST_HEAD_INIT(audit_rules_list[6]),
+ LIST_HEAD_INIT(audit_rules_list[7]),
};
DEFINE_MUTEX(audit_filter_mutex);
@@ -151,7 +153,8 @@ char *audit_unpack_string(void **bufp, size_t *remain, size_t len)
static inline int audit_to_inode(struct audit_krule *krule,
struct audit_field *f)
{
- if (krule->listnr != AUDIT_FILTER_EXIT ||
+ if ((krule->listnr != AUDIT_FILTER_EXIT &&
+ krule->listnr != AUDIT_FILTER_URING_EXIT) ||
krule->inode_f || krule->watch || krule->tree ||
(f->op != Audit_equal && f->op != Audit_not_equal))
return -EINVAL;
@@ -248,6 +251,7 @@ static inline struct audit_entry *audit_to_entry_common(struct audit_rule_data *
pr_err("AUDIT_FILTER_ENTRY is deprecated\n");
goto exit_err;
case AUDIT_FILTER_EXIT:
+ case AUDIT_FILTER_URING_EXIT:
case AUDIT_FILTER_TASK:
#endif
case AUDIT_FILTER_USER:
@@ -332,6 +336,10 @@ static int audit_field_valid(struct audit_entry *entry, struct audit_field *f)
if (entry->rule.listnr != AUDIT_FILTER_FS)
return -EINVAL;
break;
+ case AUDIT_PERM:
+ if (entry->rule.listnr == AUDIT_FILTER_URING_EXIT)
+ return -EINVAL;
+ break;
}
switch (entry->rule.listnr) {
@@ -980,7 +988,8 @@ static inline int audit_add_rule(struct audit_entry *entry)
}
entry->rule.prio = ~0ULL;
- if (entry->rule.listnr == AUDIT_FILTER_EXIT) {
+ if (entry->rule.listnr == AUDIT_FILTER_EXIT ||
+ entry->rule.listnr == AUDIT_FILTER_URING_EXIT) {
if (entry->rule.flags & AUDIT_FILTER_PREPEND)
entry->rule.prio = ++prio_high;
else
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 6dda448fb826..7c66a9fea5e6 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -805,6 +805,35 @@ static int audit_in_mask(const struct audit_krule *rule, unsigned long val)
return rule->mask[word] & bit;
}
+/**
+ * audit_filter_uring - apply filters to an io_uring operation
+ * @tsk: associated task
+ * @ctx: audit context
+ */
+static void audit_filter_uring(struct task_struct *tsk,
+ struct audit_context *ctx)
+{
+ struct audit_entry *e;
+ enum audit_state state;
+
+ if (auditd_test_task(tsk))
+ return;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_URING_EXIT],
+ list) {
+ if (audit_in_mask(&e->rule, ctx->uring_op) &&
+ audit_filter_rules(tsk, &e->rule, ctx, NULL, &state,
+ false)) {
+ rcu_read_unlock();
+ ctx->current_state = state;
+ return;
+ }
+ }
+ rcu_read_unlock();
+ return;
+}
+
/* At syscall exit time, this filter is called if the audit_state is
* not low enough that auditing cannot take place, but is also not
* high enough that we already know we have to write an audit record
@@ -1757,7 +1786,7 @@ static void audit_log_exit(void)
* __audit_free - free a per-task audit context
* @tsk: task whose audit context block to free
*
- * Called from copy_process and do_exit
+ * Called from copy_process, do_exit, and the io_uring code
*/
void __audit_free(struct task_struct *tsk)
{
@@ -1775,15 +1804,21 @@ void __audit_free(struct task_struct *tsk)
* random task_struct that doesn't doesn't have any meaningful data we
* need to log via audit_log_exit().
*/
- if (tsk == current && !context->dummy &&
- context->context == AUDIT_CTX_SYSCALL) {
+ if (tsk == current && !context->dummy) {
context->return_valid = AUDITSC_INVALID;
context->return_code = 0;
-
- audit_filter_syscall(tsk, context);
- audit_filter_inodes(tsk, context);
- if (context->current_state == AUDIT_STATE_RECORD)
- audit_log_exit();
+ if (context->context == AUDIT_CTX_SYSCALL) {
+ audit_filter_syscall(tsk, context);
+ audit_filter_inodes(tsk, context);
+ if (context->current_state == AUDIT_STATE_RECORD)
+ audit_log_exit();
+ } else if (context->context == AUDIT_CTX_URING) {
+ /* TODO: verify this case is real and valid */
+ audit_filter_uring(tsk, context);
+ audit_filter_inodes(tsk, context);
+ if (context->current_state == AUDIT_STATE_RECORD)
+ audit_log_uring(context);
+ }
}
audit_set_context(tsk, NULL);
@@ -1867,12 +1902,6 @@ void __audit_uring_exit(int success, long code)
{
struct audit_context *ctx = audit_context();
- /*
- * TODO: At some point we will likely want to filter on io_uring ops
- * and other things similar to what we do for syscalls, but that
- * is something for another day; just record what we can here.
- */
-
if (ctx->context == AUDIT_CTX_SYSCALL) {
/*
* NOTE: See the note in __audit_uring_entry() about the case
@@ -1895,6 +1924,8 @@ void __audit_uring_exit(int success, long code)
* the behavior here.
*/
audit_filter_syscall(current, ctx);
+ if (ctx->current_state != AUDIT_STATE_RECORD)
+ audit_filter_uring(current, ctx);
audit_filter_inodes(current, ctx);
if (ctx->current_state != AUDIT_STATE_RECORD)
return;
@@ -1907,6 +1938,8 @@ void __audit_uring_exit(int success, long code)
if (!list_empty(&ctx->killed_trees))
audit_kill_trees(ctx);
+ /* run through both filters to ensure we set the filterkey properly */
+ audit_filter_uring(current, ctx);
audit_filter_inodes(current, ctx);
if (ctx->current_state != AUDIT_STATE_RECORD)
goto out;
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v4 3/8] audit: add filtering for io_uring records
2021-09-15 16:49 ` [PATCH v4 3/8] audit: add filtering for io_uring records Paul Moore
@ 2021-09-15 21:48 ` Richard Guy Briggs
0 siblings, 0 replies; 15+ messages in thread
From: Richard Guy Briggs @ 2021-09-15 21:48 UTC (permalink / raw)
To: Paul Moore
Cc: linux-security-module, selinux, linux-audit, io-uring,
linux-fsdevel, Jens Axboe, Pavel Begunkov,
Kumar Kartikeya Dwivedi
On 2021-09-15 12:49, Paul Moore wrote:
> This patch adds basic audit io_uring filtering, using as much of the
> existing audit filtering infrastructure as possible. In order to do
> this we reuse the audit filter rule's syscall mask for the io_uring
> operation and we create a new filter for io_uring operations as
> AUDIT_FILTER_URING_EXIT/audit_filter_list[7].
>
> Thanks to Richard Guy Briggs for his review, feedback, and work on
> the corresponding audit userspace changes.
>
> Signed-off-by: Paul Moore <[email protected]>
Looks good.
Acked-by: Richard Guy Briggs <[email protected]>
> ---
> v4:
> - no change
> v3:
> - removed work-in-progress warning from the description
> v2:
> - incorporate feedback from Richard
> v1:
> - initial draft
> ---
> include/uapi/linux/audit.h | 3 +-
> kernel/audit_tree.c | 3 +-
> kernel/audit_watch.c | 3 +-
> kernel/auditfilter.c | 15 +++++++++--
> kernel/auditsc.c | 61 ++++++++++++++++++++++++++++++++++----------
> 5 files changed, 65 insertions(+), 20 deletions(-)
>
> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index a1997697c8b1..ecf1edd2affa 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -167,8 +167,9 @@
> #define AUDIT_FILTER_EXCLUDE 0x05 /* Apply rule before record creation */
> #define AUDIT_FILTER_TYPE AUDIT_FILTER_EXCLUDE /* obsolete misleading naming */
> #define AUDIT_FILTER_FS 0x06 /* Apply rule at __audit_inode_child */
> +#define AUDIT_FILTER_URING_EXIT 0x07 /* Apply rule at io_uring op exit */
>
> -#define AUDIT_NR_FILTERS 7
> +#define AUDIT_NR_FILTERS 8
>
> #define AUDIT_FILTER_PREPEND 0x10 /* Prepend to front of list */
>
> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> index 2cd7b5694422..338c53a961c5 100644
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -726,7 +726,8 @@ int audit_make_tree(struct audit_krule *rule, char *pathname, u32 op)
> {
>
> if (pathname[0] != '/' ||
> - rule->listnr != AUDIT_FILTER_EXIT ||
> + (rule->listnr != AUDIT_FILTER_EXIT &&
> + rule->listnr != AUDIT_FILTER_URING_EXIT) ||
> op != Audit_equal ||
> rule->inode_f || rule->watch || rule->tree)
> return -EINVAL;
> diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
> index 2acf7ca49154..698b62b4a2ec 100644
> --- a/kernel/audit_watch.c
> +++ b/kernel/audit_watch.c
> @@ -183,7 +183,8 @@ int audit_to_watch(struct audit_krule *krule, char *path, int len, u32 op)
> return -EOPNOTSUPP;
>
> if (path[0] != '/' || path[len-1] == '/' ||
> - krule->listnr != AUDIT_FILTER_EXIT ||
> + (krule->listnr != AUDIT_FILTER_EXIT &&
> + krule->listnr != AUDIT_FILTER_URING_EXIT) ||
> op != Audit_equal ||
> krule->inode_f || krule->watch || krule->tree)
> return -EINVAL;
> diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> index db2c6b59dfc3..d75acb014ccd 100644
> --- a/kernel/auditfilter.c
> +++ b/kernel/auditfilter.c
> @@ -44,7 +44,8 @@ struct list_head audit_filter_list[AUDIT_NR_FILTERS] = {
> LIST_HEAD_INIT(audit_filter_list[4]),
> LIST_HEAD_INIT(audit_filter_list[5]),
> LIST_HEAD_INIT(audit_filter_list[6]),
> -#if AUDIT_NR_FILTERS != 7
> + LIST_HEAD_INIT(audit_filter_list[7]),
> +#if AUDIT_NR_FILTERS != 8
> #error Fix audit_filter_list initialiser
> #endif
> };
> @@ -56,6 +57,7 @@ static struct list_head audit_rules_list[AUDIT_NR_FILTERS] = {
> LIST_HEAD_INIT(audit_rules_list[4]),
> LIST_HEAD_INIT(audit_rules_list[5]),
> LIST_HEAD_INIT(audit_rules_list[6]),
> + LIST_HEAD_INIT(audit_rules_list[7]),
> };
>
> DEFINE_MUTEX(audit_filter_mutex);
> @@ -151,7 +153,8 @@ char *audit_unpack_string(void **bufp, size_t *remain, size_t len)
> static inline int audit_to_inode(struct audit_krule *krule,
> struct audit_field *f)
> {
> - if (krule->listnr != AUDIT_FILTER_EXIT ||
> + if ((krule->listnr != AUDIT_FILTER_EXIT &&
> + krule->listnr != AUDIT_FILTER_URING_EXIT) ||
> krule->inode_f || krule->watch || krule->tree ||
> (f->op != Audit_equal && f->op != Audit_not_equal))
> return -EINVAL;
> @@ -248,6 +251,7 @@ static inline struct audit_entry *audit_to_entry_common(struct audit_rule_data *
> pr_err("AUDIT_FILTER_ENTRY is deprecated\n");
> goto exit_err;
> case AUDIT_FILTER_EXIT:
> + case AUDIT_FILTER_URING_EXIT:
> case AUDIT_FILTER_TASK:
> #endif
> case AUDIT_FILTER_USER:
> @@ -332,6 +336,10 @@ static int audit_field_valid(struct audit_entry *entry, struct audit_field *f)
> if (entry->rule.listnr != AUDIT_FILTER_FS)
> return -EINVAL;
> break;
> + case AUDIT_PERM:
> + if (entry->rule.listnr == AUDIT_FILTER_URING_EXIT)
> + return -EINVAL;
> + break;
> }
>
> switch (entry->rule.listnr) {
> @@ -980,7 +988,8 @@ static inline int audit_add_rule(struct audit_entry *entry)
> }
>
> entry->rule.prio = ~0ULL;
> - if (entry->rule.listnr == AUDIT_FILTER_EXIT) {
> + if (entry->rule.listnr == AUDIT_FILTER_EXIT ||
> + entry->rule.listnr == AUDIT_FILTER_URING_EXIT) {
> if (entry->rule.flags & AUDIT_FILTER_PREPEND)
> entry->rule.prio = ++prio_high;
> else
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 6dda448fb826..7c66a9fea5e6 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -805,6 +805,35 @@ static int audit_in_mask(const struct audit_krule *rule, unsigned long val)
> return rule->mask[word] & bit;
> }
>
> +/**
> + * audit_filter_uring - apply filters to an io_uring operation
> + * @tsk: associated task
> + * @ctx: audit context
> + */
> +static void audit_filter_uring(struct task_struct *tsk,
> + struct audit_context *ctx)
> +{
> + struct audit_entry *e;
> + enum audit_state state;
> +
> + if (auditd_test_task(tsk))
> + return;
> +
> + rcu_read_lock();
> + list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_URING_EXIT],
> + list) {
> + if (audit_in_mask(&e->rule, ctx->uring_op) &&
> + audit_filter_rules(tsk, &e->rule, ctx, NULL, &state,
> + false)) {
> + rcu_read_unlock();
> + ctx->current_state = state;
> + return;
> + }
> + }
> + rcu_read_unlock();
> + return;
> +}
> +
> /* At syscall exit time, this filter is called if the audit_state is
> * not low enough that auditing cannot take place, but is also not
> * high enough that we already know we have to write an audit record
> @@ -1757,7 +1786,7 @@ static void audit_log_exit(void)
> * __audit_free - free a per-task audit context
> * @tsk: task whose audit context block to free
> *
> - * Called from copy_process and do_exit
> + * Called from copy_process, do_exit, and the io_uring code
> */
> void __audit_free(struct task_struct *tsk)
> {
> @@ -1775,15 +1804,21 @@ void __audit_free(struct task_struct *tsk)
> * random task_struct that doesn't doesn't have any meaningful data we
> * need to log via audit_log_exit().
> */
> - if (tsk == current && !context->dummy &&
> - context->context == AUDIT_CTX_SYSCALL) {
> + if (tsk == current && !context->dummy) {
> context->return_valid = AUDITSC_INVALID;
> context->return_code = 0;
> -
> - audit_filter_syscall(tsk, context);
> - audit_filter_inodes(tsk, context);
> - if (context->current_state == AUDIT_STATE_RECORD)
> - audit_log_exit();
> + if (context->context == AUDIT_CTX_SYSCALL) {
> + audit_filter_syscall(tsk, context);
> + audit_filter_inodes(tsk, context);
> + if (context->current_state == AUDIT_STATE_RECORD)
> + audit_log_exit();
> + } else if (context->context == AUDIT_CTX_URING) {
> + /* TODO: verify this case is real and valid */
> + audit_filter_uring(tsk, context);
> + audit_filter_inodes(tsk, context);
> + if (context->current_state == AUDIT_STATE_RECORD)
> + audit_log_uring(context);
> + }
> }
>
> audit_set_context(tsk, NULL);
> @@ -1867,12 +1902,6 @@ void __audit_uring_exit(int success, long code)
> {
> struct audit_context *ctx = audit_context();
>
> - /*
> - * TODO: At some point we will likely want to filter on io_uring ops
> - * and other things similar to what we do for syscalls, but that
> - * is something for another day; just record what we can here.
> - */
> -
> if (ctx->context == AUDIT_CTX_SYSCALL) {
> /*
> * NOTE: See the note in __audit_uring_entry() about the case
> @@ -1895,6 +1924,8 @@ void __audit_uring_exit(int success, long code)
> * the behavior here.
> */
> audit_filter_syscall(current, ctx);
> + if (ctx->current_state != AUDIT_STATE_RECORD)
> + audit_filter_uring(current, ctx);
> audit_filter_inodes(current, ctx);
> if (ctx->current_state != AUDIT_STATE_RECORD)
> return;
> @@ -1907,6 +1938,8 @@ void __audit_uring_exit(int success, long code)
> if (!list_empty(&ctx->killed_trees))
> audit_kill_trees(ctx);
>
> + /* run through both filters to ensure we set the filterkey properly */
> + audit_filter_uring(current, ctx);
> audit_filter_inodes(current, ctx);
> if (ctx->current_state != AUDIT_STATE_RECORD)
> goto out;
>
- RGB
--
Richard Guy Briggs <[email protected]>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v4 4/8] fs: add anon_inode_getfile_secure() similar to anon_inode_getfd_secure()
2021-09-15 16:49 [PATCH v4 0/8] Add LSM access controls and auditing to io_uring Paul Moore
` (2 preceding siblings ...)
2021-09-15 16:49 ` [PATCH v4 3/8] audit: add filtering for io_uring records Paul Moore
@ 2021-09-15 16:49 ` Paul Moore
2021-09-15 16:49 ` [PATCH v4 5/8] io_uring: convert io_uring to the secure anon inode interface Paul Moore
` (4 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Paul Moore @ 2021-09-15 16:49 UTC (permalink / raw)
To: linux-security-module, selinux, linux-audit, io-uring,
linux-fsdevel, Jens Axboe, Pavel Begunkov,
Kumar Kartikeya Dwivedi
Extending the secure anonymous inode support to other subsystems
requires that we have a secure anon_inode_getfile() variant in
addition to the existing secure anon_inode_getfd() variant.
Thankfully we can reuse the existing __anon_inode_getfile() function
and just wrap it with the proper arguments.
Acked-by: Mickaël Salaün <[email protected]>
Signed-off-by: Paul Moore <[email protected]>
---
v4:
- no change
v3:
- no change
v2:
- no change
v1:
- initial draft
---
fs/anon_inodes.c | 29 +++++++++++++++++++++++++++++
include/linux/anon_inodes.h | 4 ++++
2 files changed, 33 insertions(+)
diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index a280156138ed..e0c3e33c4177 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -148,6 +148,35 @@ struct file *anon_inode_getfile(const char *name,
}
EXPORT_SYMBOL_GPL(anon_inode_getfile);
+/**
+ * anon_inode_getfile_secure - Like anon_inode_getfile(), but creates a new
+ * !S_PRIVATE anon inode rather than reuse the
+ * singleton anon inode and calls the
+ * inode_init_security_anon() LSM hook. This
+ * allows for both the inode to have its own
+ * security context and for the LSM to enforce
+ * policy on the inode's creation.
+ *
+ * @name: [in] name of the "class" of the new file
+ * @fops: [in] file operations for the new file
+ * @priv: [in] private data for the new file (will be file's private_data)
+ * @flags: [in] flags
+ * @context_inode:
+ * [in] the logical relationship with the new inode (optional)
+ *
+ * The LSM may use @context_inode in inode_init_security_anon(), but a
+ * reference to it is not held. Returns the newly created file* or an error
+ * pointer. See the anon_inode_getfile() documentation for more information.
+ */
+struct file *anon_inode_getfile_secure(const char *name,
+ const struct file_operations *fops,
+ void *priv, int flags,
+ const struct inode *context_inode)
+{
+ return __anon_inode_getfile(name, fops, priv, flags,
+ context_inode, true);
+}
+
static int __anon_inode_getfd(const char *name,
const struct file_operations *fops,
void *priv, int flags,
diff --git a/include/linux/anon_inodes.h b/include/linux/anon_inodes.h
index 71881a2b6f78..5deaddbd7927 100644
--- a/include/linux/anon_inodes.h
+++ b/include/linux/anon_inodes.h
@@ -15,6 +15,10 @@ struct inode;
struct file *anon_inode_getfile(const char *name,
const struct file_operations *fops,
void *priv, int flags);
+struct file *anon_inode_getfile_secure(const char *name,
+ const struct file_operations *fops,
+ void *priv, int flags,
+ const struct inode *context_inode);
int anon_inode_getfd(const char *name, const struct file_operations *fops,
void *priv, int flags);
int anon_inode_getfd_secure(const char *name,
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 5/8] io_uring: convert io_uring to the secure anon inode interface
2021-09-15 16:49 [PATCH v4 0/8] Add LSM access controls and auditing to io_uring Paul Moore
` (3 preceding siblings ...)
2021-09-15 16:49 ` [PATCH v4 4/8] fs: add anon_inode_getfile_secure() similar to anon_inode_getfd_secure() Paul Moore
@ 2021-09-15 16:49 ` Paul Moore
2021-09-15 16:49 ` [PATCH v4 6/8] lsm,io_uring: add LSM hooks to io_uring Paul Moore
` (3 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Paul Moore @ 2021-09-15 16:49 UTC (permalink / raw)
To: linux-security-module, selinux, linux-audit, io-uring,
linux-fsdevel, Jens Axboe, Pavel Begunkov,
Kumar Kartikeya Dwivedi
Converting io_uring's anonymous inode to the secure anon inode API
enables LSMs to enforce policy on the io_uring anonymous inodes if
they chose to do so. This is an important first step towards
providing the necessary mechanisms so that LSMs can apply security
policy to io_uring operations.
Signed-off-by: Paul Moore <[email protected]>
---
v4:
- no change
v3:
- no change
v2:
- no change
v1:
- initial draft
---
fs/io_uring.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 388754b24785..56cc9aba0d01 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -10155,8 +10155,8 @@ static struct file *io_uring_get_file(struct io_ring_ctx *ctx)
return ERR_PTR(ret);
#endif
- file = anon_inode_getfile("[io_uring]", &io_uring_fops, ctx,
- O_RDWR | O_CLOEXEC);
+ file = anon_inode_getfile_secure("[io_uring]", &io_uring_fops, ctx,
+ O_RDWR | O_CLOEXEC, NULL);
#if defined(CONFIG_UNIX)
if (IS_ERR(file)) {
sock_release(ctx->ring_sock);
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 6/8] lsm,io_uring: add LSM hooks to io_uring
2021-09-15 16:49 [PATCH v4 0/8] Add LSM access controls and auditing to io_uring Paul Moore
` (4 preceding siblings ...)
2021-09-15 16:49 ` [PATCH v4 5/8] io_uring: convert io_uring to the secure anon inode interface Paul Moore
@ 2021-09-15 16:49 ` Paul Moore
2021-09-15 16:50 ` [PATCH v4 7/8] selinux: add support for the io_uring access controls Paul Moore
` (2 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Paul Moore @ 2021-09-15 16:49 UTC (permalink / raw)
To: linux-security-module, selinux, linux-audit, io-uring,
linux-fsdevel, Jens Axboe, Pavel Begunkov,
Kumar Kartikeya Dwivedi
A full expalantion of io_uring is beyond the scope of this commit
description, but in summary it is an asynchronous I/O mechanism
which allows for I/O requests and the resulting data to be queued
in memory mapped "rings" which are shared between the kernel and
userspace. Optionally, io_uring offers the ability for applications
to spawn kernel threads to dequeue I/O requests from the ring and
submit the requests in the kernel, helping to minimize the syscall
overhead. Rings are accessed in userspace by memory mapping a file
descriptor provided by the io_uring_setup(2), and can be shared
between applications as one might do with any open file descriptor.
Finally, process credentials can be registered with a given ring
and any process with access to that ring can submit I/O requests
using any of the registered credentials.
While the io_uring functionality is widely recognized as offering a
vastly improved, and high performing asynchronous I/O mechanism, its
ability to allow processes to submit I/O requests with credentials
other than its own presents a challenge to LSMs. When a process
creates a new io_uring ring the ring's credentials are inhertied
from the calling process; if this ring is shared with another
process operating with different credentials there is the potential
to bypass the LSMs security policy. Similarly, registering
credentials with a given ring allows any process with access to that
ring to submit I/O requests with those credentials.
In an effort to allow LSMs to apply security policy to io_uring I/O
operations, this patch adds two new LSM hooks. These hooks, in
conjunction with the LSM anonymous inode support previously
submitted, allow an LSM to apply access control policy to the
sharing of io_uring rings as well as any io_uring credential changes
requested by a process.
The new LSM hooks are described below:
* int security_uring_override_creds(cred)
Controls if the current task, executing an io_uring operation,
is allowed to override it's credentials with @cred. In cases
where the current task is a user application, the current
credentials will be those of the user application. In cases
where the current task is a kernel thread servicing io_uring
requests the current credentials will be those of the io_uring
ring (inherited from the process that created the ring).
* int security_uring_sqpoll(void)
Controls if the current task is allowed to create an io_uring
polling thread (IORING_SETUP_SQPOLL). Without a SQPOLL thread
in the kernel processes must submit I/O requests via
io_uring_enter(2) which allows us to compare any requested
credential changes against the application making the request.
With a SQPOLL thread, we can no longer compare requested
credential changes against the application making the request,
the comparison is made against the ring's credentials.
Signed-off-by: Paul Moore <[email protected]>
---
v4:
- no change
v3:
- removed work-in-progress warning from the description
v2:
- no change
v1:
- initial draft
---
fs/io_uring.c | 10 ++++++++++
include/linux/lsm_hook_defs.h | 5 +++++
include/linux/lsm_hooks.h | 13 +++++++++++++
include/linux/security.h | 16 ++++++++++++++++
security/security.c | 12 ++++++++++++
5 files changed, 56 insertions(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 56cc9aba0d01..f89d00af3a67 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -80,6 +80,7 @@
#include <linux/io_uring.h>
#include <linux/tracehook.h>
#include <linux/audit.h>
+#include <linux/security.h>
#define CREATE_TRACE_POINTS
#include <trace/events/io_uring.h>
@@ -7070,6 +7071,11 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
if (!req->creds)
return -EINVAL;
get_cred(req->creds);
+ ret = security_uring_override_creds(req->creds);
+ if (ret) {
+ put_cred(req->creds);
+ return ret;
+ }
req->flags |= REQ_F_CREDS;
}
state = &ctx->submit_state;
@@ -8566,6 +8572,10 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
struct io_sq_data *sqd;
bool attached;
+ ret = security_uring_sqpoll();
+ if (ret)
+ return ret;
+
sqd = io_get_sq_data(p, &attached);
if (IS_ERR(sqd)) {
ret = PTR_ERR(sqd);
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 2adeea44c0d5..b3c525353769 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -402,3 +402,8 @@ LSM_HOOK(void, LSM_RET_VOID, perf_event_free, struct perf_event *event)
LSM_HOOK(int, 0, perf_event_read, struct perf_event *event)
LSM_HOOK(int, 0, perf_event_write, struct perf_event *event)
#endif /* CONFIG_PERF_EVENTS */
+
+#ifdef CONFIG_IO_URING
+LSM_HOOK(int, 0, uring_override_creds, const struct cred *new)
+LSM_HOOK(int, 0, uring_sqpoll, void)
+#endif /* CONFIG_IO_URING */
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 5c4c5c0602cb..0eb0ae95c4c4 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1557,6 +1557,19 @@
* Read perf_event security info if allowed.
* @perf_event_write:
* Write perf_event security info if allowed.
+ *
+ * Security hooks for io_uring
+ *
+ * @uring_override_creds:
+ * Check if the current task, executing an io_uring operation, is allowed
+ * to override it's credentials with @new.
+ *
+ * @new: the new creds to use
+ *
+ * @uring_sqpoll:
+ * Check whether the current task is allowed to spawn a io_uring polling
+ * thread (IORING_SETUP_SQPOLL).
+ *
*/
union security_list_options {
#define LSM_HOOK(RET, DEFAULT, NAME, ...) RET (*NAME)(__VA_ARGS__);
diff --git a/include/linux/security.h b/include/linux/security.h
index 5b7288521300..7979b9629a42 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -2038,4 +2038,20 @@ static inline int security_perf_event_write(struct perf_event *event)
#endif /* CONFIG_SECURITY */
#endif /* CONFIG_PERF_EVENTS */
+#ifdef CONFIG_IO_URING
+#ifdef CONFIG_SECURITY
+extern int security_uring_override_creds(const struct cred *new);
+extern int security_uring_sqpoll(void);
+#else
+static inline int security_uring_override_creds(const struct cred *new)
+{
+ return 0;
+}
+static inline int security_uring_sqpoll(void)
+{
+ return 0;
+}
+#endif /* CONFIG_SECURITY */
+#endif /* CONFIG_IO_URING */
+
#endif /* ! __LINUX_SECURITY_H */
diff --git a/security/security.c b/security/security.c
index 9ffa9e9c5c55..c49a2c0cc1c1 100644
--- a/security/security.c
+++ b/security/security.c
@@ -2625,3 +2625,15 @@ int security_perf_event_write(struct perf_event *event)
return call_int_hook(perf_event_write, 0, event);
}
#endif /* CONFIG_PERF_EVENTS */
+
+#ifdef CONFIG_IO_URING
+int security_uring_override_creds(const struct cred *new)
+{
+ return call_int_hook(uring_override_creds, 0, new);
+}
+
+int security_uring_sqpoll(void)
+{
+ return call_int_hook(uring_sqpoll, 0);
+}
+#endif /* CONFIG_IO_URING */
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 7/8] selinux: add support for the io_uring access controls
2021-09-15 16:49 [PATCH v4 0/8] Add LSM access controls and auditing to io_uring Paul Moore
` (5 preceding siblings ...)
2021-09-15 16:49 ` [PATCH v4 6/8] lsm,io_uring: add LSM hooks to io_uring Paul Moore
@ 2021-09-15 16:50 ` Paul Moore
2021-09-15 16:50 ` [PATCH v4 8/8] Smack: Brutalist io_uring support Paul Moore
2021-09-20 2:44 ` [PATCH v4 0/8] Add LSM access controls and auditing to io_uring Paul Moore
8 siblings, 0 replies; 15+ messages in thread
From: Paul Moore @ 2021-09-15 16:50 UTC (permalink / raw)
To: linux-security-module, selinux, linux-audit, io-uring,
linux-fsdevel, Jens Axboe, Pavel Begunkov,
Kumar Kartikeya Dwivedi
This patch implements two new io_uring access controls, specifically
support for controlling the io_uring "personalities" and
IORING_SETUP_SQPOLL. Controlling the sharing of io_urings themselves
is handled via the normal file/inode labeling and sharing mechanisms.
The io_uring { override_creds } permission restricts which domains
the subject domain can use to override it's own credentials.
Granting a domain the io_uring { override_creds } permission allows
it to impersonate another domain in io_uring operations.
The io_uring { sqpoll } permission restricts which domains can create
asynchronous io_uring polling threads. This is important from a
security perspective as operations queued by this asynchronous thread
inherit the credentials of the thread creator by default; if an
io_uring is shared across process/domain boundaries this could result
in one domain impersonating another. Controlling the creation of
sqpoll threads, and the sharing of io_urings across processes, allow
policy authors to restrict the ability of one domain to impersonate
another via io_uring.
As a quick summary, this patch adds a new object class with two
permissions:
io_uring { override_creds sqpoll }
These permissions can be seen in the two simple policy statements
below:
allow domA_t domB_t : io_uring { override_creds };
allow domA_t self : io_uring { sqpoll };
Signed-off-by: Paul Moore <[email protected]>
---
v4:
- no change
v3:
- removed work-in-progress warning from the description
v2:
- made the selinux_uring_* funcs static
- removed the debugging code
v1:
- initial draft
---
security/selinux/hooks.c | 34 ++++++++++++++++++++++++++++++++++
security/selinux/include/classmap.h | 2 ++
2 files changed, 36 insertions(+)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 6517f221d52c..012e8504ed9e 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -7111,6 +7111,35 @@ static int selinux_perf_event_write(struct perf_event *event)
}
#endif
+#ifdef CONFIG_IO_URING
+/**
+ * selinux_uring_override_creds - check the requested cred override
+ * @new: the target creds
+ *
+ * Check to see if the current task is allowed to override it's credentials
+ * to service an io_uring operation.
+ */
+static int selinux_uring_override_creds(const struct cred *new)
+{
+ return avc_has_perm(&selinux_state, current_sid(), cred_sid(new),
+ SECCLASS_IO_URING, IO_URING__OVERRIDE_CREDS, NULL);
+}
+
+/**
+ * selinux_uring_sqpoll - check if a io_uring polling thread can be created
+ *
+ * Check to see if the current task is allowed to create a new io_uring
+ * kernel polling thread.
+ */
+static int selinux_uring_sqpoll(void)
+{
+ int sid = current_sid();
+
+ return avc_has_perm(&selinux_state, sid, sid,
+ SECCLASS_IO_URING, IO_URING__SQPOLL, NULL);
+}
+#endif /* CONFIG_IO_URING */
+
/*
* IMPORTANT NOTE: When adding new hooks, please be careful to keep this order:
* 1. any hooks that don't belong to (2.) or (3.) below,
@@ -7349,6 +7378,11 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
LSM_HOOK_INIT(perf_event_write, selinux_perf_event_write),
#endif
+#ifdef CONFIG_IO_URING
+ LSM_HOOK_INIT(uring_override_creds, selinux_uring_override_creds),
+ LSM_HOOK_INIT(uring_sqpoll, selinux_uring_sqpoll),
+#endif
+
LSM_HOOK_INIT(locked_down, selinux_lockdown),
/*
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index 084757ff4390..698ccfdaf82d 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -254,6 +254,8 @@ struct security_class_mapping secclass_map[] = {
{ "integrity", "confidentiality", NULL } },
{ "anon_inode",
{ COMMON_FILE_PERMS, NULL } },
+ { "io_uring",
+ { "override_creds", "sqpoll", NULL } },
{ NULL }
};
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 8/8] Smack: Brutalist io_uring support
2021-09-15 16:49 [PATCH v4 0/8] Add LSM access controls and auditing to io_uring Paul Moore
` (6 preceding siblings ...)
2021-09-15 16:50 ` [PATCH v4 7/8] selinux: add support for the io_uring access controls Paul Moore
@ 2021-09-15 16:50 ` Paul Moore
2021-09-20 2:44 ` [PATCH v4 0/8] Add LSM access controls and auditing to io_uring Paul Moore
8 siblings, 0 replies; 15+ messages in thread
From: Paul Moore @ 2021-09-15 16:50 UTC (permalink / raw)
To: linux-security-module, selinux, linux-audit, io-uring,
linux-fsdevel, Jens Axboe, Pavel Begunkov,
Kumar Kartikeya Dwivedi
From: Casey Schaufler <[email protected]>
Add Smack privilege checks for io_uring. Use CAP_MAC_OVERRIDE
for the override_creds case and CAP_MAC_ADMIN for creating a
polling thread. These choices are based on conjecture regarding
the intent of the surrounding code.
Signed-off-by: Casey Schaufler <[email protected]>
[PM: make the smack_uring_* funcs static, remove debug code]
Signed-off-by: Paul Moore <[email protected]>
---
v4:
- updated subject line
v3:
- removed debug code
v2:
- made the smack_uring_* funcs static
v1:
- initial draft
---
security/smack/smack_lsm.c | 46 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index cacbe7518519..f90ab1efeb6d 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -4691,6 +4691,48 @@ static int smack_dentry_create_files_as(struct dentry *dentry, int mode,
return 0;
}
+#ifdef CONFIG_IO_URING
+/**
+ * smack_uring_override_creds - Is io_uring cred override allowed?
+ * @new: the target creds
+ *
+ * Check to see if the current task is allowed to override it's credentials
+ * to service an io_uring operation.
+ */
+static int smack_uring_override_creds(const struct cred *new)
+{
+ struct task_smack *tsp = smack_cred(current_cred());
+ struct task_smack *nsp = smack_cred(new);
+
+ /*
+ * Allow the degenerate case where the new Smack value is
+ * the same as the current Smack value.
+ */
+ if (tsp->smk_task == nsp->smk_task)
+ return 0;
+
+ if (smack_privileged_cred(CAP_MAC_OVERRIDE, current_cred()))
+ return 0;
+
+ return -EPERM;
+}
+
+/**
+ * smack_uring_sqpoll - check if a io_uring polling thread can be created
+ *
+ * Check to see if the current task is allowed to create a new io_uring
+ * kernel polling thread.
+ */
+static int smack_uring_sqpoll(void)
+{
+ if (smack_privileged_cred(CAP_MAC_ADMIN, current_cred()))
+ return 0;
+
+ return -EPERM;
+}
+
+#endif /* CONFIG_IO_URING */
+
struct lsm_blob_sizes smack_blob_sizes __lsm_ro_after_init = {
.lbs_cred = sizeof(struct task_smack),
.lbs_file = sizeof(struct smack_known *),
@@ -4843,6 +4885,10 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = {
LSM_HOOK_INIT(inode_copy_up, smack_inode_copy_up),
LSM_HOOK_INIT(inode_copy_up_xattr, smack_inode_copy_up_xattr),
LSM_HOOK_INIT(dentry_create_files_as, smack_dentry_create_files_as),
+#ifdef CONFIG_IO_URING
+ LSM_HOOK_INIT(uring_override_creds, smack_uring_override_creds),
+ LSM_HOOK_INIT(uring_sqpoll, smack_uring_sqpoll),
+#endif
};
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v4 0/8] Add LSM access controls and auditing to io_uring
2021-09-15 16:49 [PATCH v4 0/8] Add LSM access controls and auditing to io_uring Paul Moore
` (7 preceding siblings ...)
2021-09-15 16:50 ` [PATCH v4 8/8] Smack: Brutalist io_uring support Paul Moore
@ 2021-09-20 2:44 ` Paul Moore
8 siblings, 0 replies; 15+ messages in thread
From: Paul Moore @ 2021-09-20 2:44 UTC (permalink / raw)
To: linux-security-module, selinux, linux-audit, io-uring,
linux-fsdevel, Jens Axboe, Pavel Begunkov,
Kumar Kartikeya Dwivedi
On Wed, Sep 15, 2021 at 12:49 PM Paul Moore <[email protected]> wrote:
>
> A quick update to the v3 patchset with a small change to the audit
> record format (remove the audit login ID on io_uring records) and
> a subject line fix on the Smack patch. I also caught a few minor
> things in the code comments and fixed those up. All told, nothing
> significant but I really dislike merging patches that haven't hit
> the list so here ya go ...
>
> As a reminder, I'm planning to merge these in the selinux/next tree
> later this week and it would be *really* nice to get some ACKs from
> the io_uring folks; this patchset is implementing the ideas we all
> agreed to back in the v1 patchset so there shouldn't be anything
> surprising in here.
>
> For reference the v3 patchset can be found here:
> https://lore.kernel.org/linux-security-module/163159032713.470089.11728103630366176255.stgit@olly/T/#t
>
> Those who would prefer to fetch these patches directly from git can
> do so using the tree/branch below:
> git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git
> (checkout branch "working-io_uring")
>
> ---
>
> Casey Schaufler (1):
> Smack: Brutalist io_uring support
>
> Paul Moore (7):
> audit: prepare audit_context for use in calling contexts beyond syscalls
> audit,io_uring,io-wq: add some basic audit support to io_uring
> audit: add filtering for io_uring records
> fs: add anon_inode_getfile_secure() similar to anon_inode_getfd_secure()
> io_uring: convert io_uring to the secure anon inode interface
> lsm,io_uring: add LSM hooks to io_uring
> selinux: add support for the io_uring access controls
>
>
> fs/anon_inodes.c | 29 ++
> fs/io-wq.c | 4 +
> fs/io_uring.c | 69 +++-
> include/linux/anon_inodes.h | 4 +
> include/linux/audit.h | 26 ++
> include/linux/lsm_hook_defs.h | 5 +
> include/linux/lsm_hooks.h | 13 +
> include/linux/security.h | 16 +
> include/uapi/linux/audit.h | 4 +-
> kernel/audit.h | 7 +-
> kernel/audit_tree.c | 3 +-
> kernel/audit_watch.c | 3 +-
> kernel/auditfilter.c | 15 +-
> kernel/auditsc.c | 469 ++++++++++++++++++++++------
> security/security.c | 12 +
> security/selinux/hooks.c | 34 ++
> security/selinux/include/classmap.h | 2 +
> security/smack/smack_lsm.c | 46 +++
> 18 files changed, 646 insertions(+), 115 deletions(-)
With no serious objections or outstanding comments, I just merged
these patches into selinux/next. If anyone has any follow-on patches
please base them against selinux/next, thanks.
--
paul moore
www.paul-moore.com
^ permalink raw reply [flat|nested] 15+ messages in thread