public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH] exec: Make sure task->comm is always NUL-terminated
@ 2024-11-30  4:49 Kees Cook
  2024-11-30  7:15 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Kees Cook @ 2024-11-30  4:49 UTC (permalink / raw)
  To: Eric Biederman
  Cc: Kees Cook, Linus Torvalds, Alexander Viro, Christian Brauner,
	Jan Kara, linux-mm, linux-fsdevel, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, Jens Axboe,
	Pavel Begunkov, Andrew Morton, Chen Yu, Shuah Khan,
	Mickaël Salaün, linux-kernel, io-uring, linux-hardening

Using strscpy() meant that the final character in task->comm may be
non-NUL for a moment before the "string too long" truncation happens.

Instead of adding a new use of the ambiguous strncpy(), we'd want to
use memtostr_pad() which enforces being able to check at compile time
that sizes are sensible, but this requires being able to see string
buffer lengths. Instead of trying to inline __set_task_comm() (which
needs to call trace and perf functions), just open-code it. But to
make sure we're always safe, add compile-time checking like we already
do for get_task_comm().

Suggested-by: Linus Torvalds <[email protected]>
Suggested-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
---
Cc: Eric Biederman <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: [email protected]
Cc: [email protected]

Here's what I'd prefer to use to clean up set_task_comm(). I merged
Linus and Eric's suggestions and open-coded memtostr_pad().
---
 fs/exec.c             | 12 ++++++------
 include/linux/sched.h |  9 ++++-----
 io_uring/io-wq.c      |  2 +-
 io_uring/sqpoll.c     |  2 +-
 kernel/kthread.c      |  3 ++-
 5 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index e0435b31a811..5f16500ac325 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1200,16 +1200,16 @@ char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk)
 EXPORT_SYMBOL_GPL(__get_task_comm);
 
 /*
- * These functions flushes out all traces of the currently running executable
- * so that a new one can be started
+ * This is unlocked -- the string will always be NUL-terminated, but
+ * may show overlapping contents if racing concurrent reads.
  */
-
 void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec)
 {
-	task_lock(tsk);
+	size_t len = min(strlen(buf), sizeof(tsk->comm) - 1);
+
 	trace_task_rename(tsk, buf);
-	strscpy_pad(tsk->comm, buf, sizeof(tsk->comm));
-	task_unlock(tsk);
+	memcpy(tsk->comm, buf, len);
+	memset(&tsk->comm[len], 0, sizeof(tsk->comm) - len);
 	perf_event_comm(tsk, exec);
 }
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e6ee4258169a..ac9f429ddc17 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1932,11 +1932,10 @@ static inline void kick_process(struct task_struct *tsk) { }
 #endif
 
 extern void __set_task_comm(struct task_struct *tsk, const char *from, bool exec);
-
-static inline void set_task_comm(struct task_struct *tsk, const char *from)
-{
-	__set_task_comm(tsk, from, false);
-}
+#define set_task_comm(tsk, from) ({			\
+	BUILD_BUG_ON(sizeof(from) != TASK_COMM_LEN);	\
+	__set_task_comm(tsk, from, false);		\
+})
 
 extern char *__get_task_comm(char *to, size_t len, struct task_struct *tsk);
 #define get_task_comm(buf, tsk) ({			\
diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index a38f36b68060..5d0928f37471 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -634,7 +634,7 @@ static int io_wq_worker(void *data)
 	struct io_wq_acct *acct = io_wq_get_acct(worker);
 	struct io_wq *wq = worker->wq;
 	bool exit_mask = false, last_timeout = false;
-	char buf[TASK_COMM_LEN];
+	char buf[TASK_COMM_LEN] = {};
 
 	set_mask_bits(&worker->flags, 0,
 		      BIT(IO_WORKER_F_UP) | BIT(IO_WORKER_F_RUNNING));
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index a26593979887..90011f06c7fb 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -271,7 +271,7 @@ static int io_sq_thread(void *data)
 	struct io_ring_ctx *ctx;
 	struct rusage start;
 	unsigned long timeout = 0;
-	char buf[TASK_COMM_LEN];
+	char buf[TASK_COMM_LEN] = {};
 	DEFINE_WAIT(wait);
 
 	/* offload context creation failed, just exit */
diff --git a/kernel/kthread.c b/kernel/kthread.c
index db4ceb0f503c..162d55811744 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -736,10 +736,11 @@ EXPORT_SYMBOL(kthread_stop_put);
 
 int kthreadd(void *unused)
 {
+	static const char comm[TASK_COMM_LEN] = "kthreadd";
 	struct task_struct *tsk = current;
 
 	/* Setup a clean context for our children to inherit. */
-	set_task_comm(tsk, "kthreadd");
+	set_task_comm(tsk, comm);
 	ignore_signals(tsk);
 	set_cpus_allowed_ptr(tsk, housekeeping_cpumask(HK_TYPE_KTHREAD));
 	set_mems_allowed(node_states[N_MEMORY]);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] exec: Make sure task->comm is always NUL-terminated
  2024-11-30  4:49 [PATCH] exec: Make sure task->comm is always NUL-terminated Kees Cook
@ 2024-11-30  7:15 ` Linus Torvalds
  2024-11-30 21:05   ` Kees Cook
  2024-12-01 20:23   ` Linus Torvalds
  2024-11-30 21:40 ` David Laight
  2024-12-01 21:49 ` Jens Axboe
  2 siblings, 2 replies; 7+ messages in thread
From: Linus Torvalds @ 2024-11-30  7:15 UTC (permalink / raw)
  To: Kees Cook
  Cc: Eric Biederman, Alexander Viro, Christian Brauner, Jan Kara,
	linux-mm, linux-fsdevel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Jens Axboe, Pavel Begunkov,
	Andrew Morton, Chen Yu, Shuah Khan, Mickaël Salaün,
	linux-kernel, io-uring, linux-hardening

[-- Attachment #1: Type: text/plain, Size: 1706 bytes --]

Edited down to just the end result:

On Fri, 29 Nov 2024 at 20:49, Kees Cook <[email protected]> wrote:
>
>  void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec)
>  {
>         size_t len = min(strlen(buf), sizeof(tsk->comm) - 1);
>
>         trace_task_rename(tsk, buf);
>         memcpy(tsk->comm, buf, len);
>         memset(&tsk->comm[len], 0, sizeof(tsk->comm) - len);
>         perf_event_comm(tsk, exec);
>  }

I actually don't think that's super-safe either. Yeah, it works in
practice, and the last byte is certainly always going to be 0, but it
might not be reliably padded.

Why? It walks over the source twice. First at strlen() time, then at
memcpy. So if the source isn't stable, the end result might have odd
results with NUL characters in the middle.

And strscpy() really was *supposed* to be safe even in this case, and
I thought it was until I looked closer.

But I think strscpy() can be saved.

Something (UNTESTED!) like the attached I think does the right thing.
I added a couple of "READ_ONCE()" things to make it really super-clear
that strscpy() reads the source exactly once, and to not allow any
compiler re-materialization of the reads (although I think that when I
asked people, it turns out neither gcc nor clang rematerialize memory
accesses, so that READ_ONCE is likely more a documentation ad
theoretical thing than a real thing).

And yes, we could make the word-at-a-time case also know about masking
the last word, but it's kind of annoying and depends on byte ordering.

Hmm? I don't think your version is wrong, but I also think we'd be
better off making our 'strscpy()' infrastructure explicitly safe wrt
unstable source strings.

          Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1168 bytes --]

 lib/string.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/string.c b/lib/string.c
index 76327b51e36f..a2a678e45389 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -137,7 +137,7 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count)
 	if (IS_ENABLED(CONFIG_KMSAN))
 		max = 0;
 
-	while (max >= sizeof(unsigned long)) {
+	while (max > sizeof(unsigned long)) {
 		unsigned long c, data;
 
 		c = read_word_at_a_time(src+res);
@@ -153,10 +153,10 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count)
 		max -= sizeof(unsigned long);
 	}
 
-	while (count) {
+	while (count > 0) {
 		char c;
 
-		c = src[res];
+		c = READ_ONCE(src[res]);
 		dest[res] = c;
 		if (!c)
 			return res;
@@ -164,11 +164,11 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count)
 		count--;
 	}
 
-	/* Hit buffer length without finding a NUL; force NUL-termination. */
-	if (res)
-		dest[res-1] = '\0';
+	/* Final byte - force NUL termination */
+	dest[res] = 0;
 
-	return -E2BIG;
+	/* Return -E2BIG if the source continued.. */
+	return READ_ONCE(src[res]) ? -E2BIG : res;
 }
 EXPORT_SYMBOL(sized_strscpy);
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] exec: Make sure task->comm is always NUL-terminated
  2024-11-30  7:15 ` Linus Torvalds
@ 2024-11-30 21:05   ` Kees Cook
  2024-11-30 21:33     ` Linus Torvalds
  2024-12-01 20:23   ` Linus Torvalds
  1 sibling, 1 reply; 7+ messages in thread
From: Kees Cook @ 2024-11-30 21:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Biederman, Alexander Viro, Christian Brauner, Jan Kara,
	linux-mm, linux-fsdevel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Jens Axboe, Pavel Begunkov,
	Andrew Morton, Chen Yu, Shuah Khan, Mickaël Salaün,
	linux-kernel, io-uring, linux-hardening

On Fri, Nov 29, 2024 at 11:15:44PM -0800, Linus Torvalds wrote:
> Edited down to just the end result:
> 
> On Fri, 29 Nov 2024 at 20:49, Kees Cook <[email protected]> wrote:
> >
> >  void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec)
> >  {
> >         size_t len = min(strlen(buf), sizeof(tsk->comm) - 1);
> >
> >         trace_task_rename(tsk, buf);
> >         memcpy(tsk->comm, buf, len);
> >         memset(&tsk->comm[len], 0, sizeof(tsk->comm) - len);
> >         perf_event_comm(tsk, exec);
> >  }
> 
> I actually don't think that's super-safe either. Yeah, it works in
> practice, and the last byte is certainly always going to be 0, but it
> might not be reliably padded.

Right, my concern over comm is strictly about unterminated reads (i.e.
exposing memory contents stored after "comm" in the task_struct). I've not
been worried about "uninitialized content" exposure because the starting
contents have always been wiped and will (now) always end with a NUL,
so the worst exposure is seeing prior or racing bytes of whatever is
being written into comm concurrently.

> Why? It walks over the source twice. First at strlen() time, then at
> memcpy. So if the source isn't stable, the end result might have odd
> results with NUL characters in the middle.

Yeah, this just means it has greater potential to be garbled.

> And strscpy() really was *supposed* to be safe even in this case, and
> I thought it was until I looked closer.
> 
> But I think strscpy() can be saved.

Yeah, fixing the final NUL byte write is needed.

> Something (UNTESTED!) like the attached I think does the right thing.
> I added a couple of "READ_ONCE()" things to make it really super-clear
> that strscpy() reads the source exactly once, and to not allow any
> compiler re-materialization of the reads (although I think that when I
> asked people, it turns out neither gcc nor clang rematerialize memory
> accesses, so that READ_ONCE is likely more a documentation ad
> theoretical thing than a real thing).

This is fine, but it doesn't solve either an unstable source nor
concurrent writers to dest. If source changes out from under strscpy,
we can still copy a "torn" write. If destination changes out from under
strscpy, we just get a potentially interleaved output (but with the
NUL-write change, we never have a dest that _lacks_ a NUL terminator).

So yeah, let's change the loop as you have it. I'm fine with the
READ_ONCE() additions, but I'm not clear on what benefit it has.

> Hmm? I don't think your version is wrong, but I also think we'd be
> better off making our 'strscpy()' infrastructure explicitly safe wrt
> unstable source strings.

Agreed. I'll get this tested against our string handling selftests...

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] exec: Make sure task->comm is always NUL-terminated
  2024-11-30 21:05   ` Kees Cook
@ 2024-11-30 21:33     ` Linus Torvalds
  0 siblings, 0 replies; 7+ messages in thread
From: Linus Torvalds @ 2024-11-30 21:33 UTC (permalink / raw)
  To: Kees Cook
  Cc: Eric Biederman, Alexander Viro, Christian Brauner, Jan Kara,
	linux-mm, linux-fsdevel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Jens Axboe, Pavel Begunkov,
	Andrew Morton, Chen Yu, Shuah Khan, Mickaël Salaün,
	linux-kernel, io-uring, linux-hardening

On Sat, 30 Nov 2024 at 13:05, Kees Cook <[email protected]> wrote:
>
> Yeah, this just means it has greater potential to be garbled.

Garbled is fine. Id' just rather it be "consistently padded".

> This is fine, but it doesn't solve either an unstable source nor
> concurrent writers to dest.

Yeah, I guess concurrent writers will also cause possibly inconsistent padding.

Maybe we just don't care. As long as it's NUL-terminated, it's a
string. If somebody is messing with the kernel, they get to the
garbled string parts.

           Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] exec: Make sure task->comm is always NUL-terminated
  2024-11-30  4:49 [PATCH] exec: Make sure task->comm is always NUL-terminated Kees Cook
  2024-11-30  7:15 ` Linus Torvalds
@ 2024-11-30 21:40 ` David Laight
  2024-12-01 21:49 ` Jens Axboe
  2 siblings, 0 replies; 7+ messages in thread
From: David Laight @ 2024-11-30 21:40 UTC (permalink / raw)
  To: 'Kees Cook', Eric Biederman
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	[email protected], [email protected], Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Jens Axboe, Pavel Begunkov, Andrew Morton, Chen Yu, Shuah Khan,
	Mickaël Salaün, [email protected],
	[email protected], [email protected]

From: Kees Cook
> Sent: 30 November 2024 04:49
>
> Instead of adding a new use of the ambiguous strncpy(), we'd want to
> use memtostr_pad() which enforces being able to check at compile time
> that sizes are sensible, but this requires being able to see string
> buffer lengths. Instead of trying to inline __set_task_comm() (which
> needs to call trace and perf functions), just open-code it. But to
> make sure we're always safe, add compile-time checking like we already
> do for get_task_comm().
...
> Here's what I'd prefer to use to clean up set_task_comm(). I merged
> Linus and Eric's suggestions and open-coded memtostr_pad().
> ---
>  fs/exec.c             | 12 ++++++------
>  include/linux/sched.h |  9 ++++-----
>  io_uring/io-wq.c      |  2 +-
>  io_uring/sqpoll.c     |  2 +-
>  kernel/kthread.c      |  3 ++-
>  5 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index e0435b31a811..5f16500ac325 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1200,16 +1200,16 @@ char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk)
>  EXPORT_SYMBOL_GPL(__get_task_comm);
> 
>  /*
> - * These functions flushes out all traces of the currently running executable
> - * so that a new one can be started
> + * This is unlocked -- the string will always be NUL-terminated, but
> + * may show overlapping contents if racing concurrent reads.
>   */
> -
>  void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec)
>  {
> -	task_lock(tsk);
> +	size_t len = min(strlen(buf), sizeof(tsk->comm) - 1);
> +
>  	trace_task_rename(tsk, buf);
> -	strscpy_pad(tsk->comm, buf, sizeof(tsk->comm));
> -	task_unlock(tsk);
> +	memcpy(tsk->comm, buf, len);
> +	memset(&tsk->comm[len], 0, sizeof(tsk->comm) - len);
>  	perf_event_comm(tsk, exec);

Why not do strscpy_pad() into a local char[16] and then do a 16 byte
memcpy() into the target buffer?

Then non-constant input data will always give a valid '\0' terminated string
regardless of how strscpy_pad() is implemented.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] exec: Make sure task->comm is always NUL-terminated
  2024-11-30  7:15 ` Linus Torvalds
  2024-11-30 21:05   ` Kees Cook
@ 2024-12-01 20:23   ` Linus Torvalds
  1 sibling, 0 replies; 7+ messages in thread
From: Linus Torvalds @ 2024-12-01 20:23 UTC (permalink / raw)
  To: Kees Cook
  Cc: Eric Biederman, Alexander Viro, Christian Brauner, Jan Kara,
	linux-mm, linux-fsdevel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Jens Axboe, Pavel Begunkov,
	Andrew Morton, Chen Yu, Shuah Khan, Mickaël Salaün,
	linux-kernel, io-uring, linux-hardening

On Fri, 29 Nov 2024 at 23:15, Linus Torvalds
<[email protected]> wrote:
>
> And yes, we could make the word-at-a-time case also know about masking
> the last word, but it's kind of annoying and depends on byte ordering.

Actually, it turned out to be really trivial to do. It does depend on
byte order, but not in a very complex way.

Also, doing the memory accesses with READ_ONCE() might be good for
clarity, but it makes gcc have conniptions and makes the code
generation noticeably worse.

I'm not sure why, but gcc stops doing address generation in the memory
instruction for volatile accesses. I've seen that before, but
completely forgot about how odd the code generation becomes.

This actually generates quite good code - apart from the later
'memset()' by strscpy_pad().  Kind of sad, since the word-at-a-time
code by 'strscpy()' actually handles comm[] really well (the buffer is
a nice multiple of the word length), and extending it to padding would
be trivial.

The whole sized_strscpy_pad() macro is in fact all kinds of stupid. It does

        __wrote = sized_strscpy(__dst, __src, __count);
        if (__wrote >= 0 && __wrote < __count)

and that '__wrote' name is actively misleading, and the "__wrote <
__count" test is pointless.

The underlying sized_strscpy() function doesn't return how many
characters it wrote, it returns the length of the resulting string (or
error if it truncated it), so the return value is *always* smaller
than __count.

That's the whole point of the function, after all.

Oh well. I'll just commit my strscpy() improvement as a fix.

And I'll think about how to do the "pad" version better too. Just because.

                Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] exec: Make sure task->comm is always NUL-terminated
  2024-11-30  4:49 [PATCH] exec: Make sure task->comm is always NUL-terminated Kees Cook
  2024-11-30  7:15 ` Linus Torvalds
  2024-11-30 21:40 ` David Laight
@ 2024-12-01 21:49 ` Jens Axboe
  2 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2024-12-01 21:49 UTC (permalink / raw)
  To: Kees Cook, Eric Biederman
  Cc: Linus Torvalds, Alexander Viro, Christian Brauner, Jan Kara,
	linux-mm, linux-fsdevel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Pavel Begunkov, Andrew Morton,
	Chen Yu, Shuah Khan, Mickaël Salaün, linux-kernel,
	io-uring, linux-hardening

On 11/29/24 9:49 PM, Kees Cook wrote:
> Using strscpy() meant that the final character in task->comm may be
> non-NUL for a moment before the "string too long" truncation happens.
> 
> Instead of adding a new use of the ambiguous strncpy(), we'd want to
> use memtostr_pad() which enforces being able to check at compile time
> that sizes are sensible, but this requires being able to see string
> buffer lengths. Instead of trying to inline __set_task_comm() (which
> needs to call trace and perf functions), just open-code it. But to
> make sure we're always safe, add compile-time checking like we already
> do for get_task_comm().

In terms of the io_uring changes, both of those looks fine to me. Feel
free to bundle it with something else. If you're still changing things,
then I do prefer = { }; rather than no space...

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-12-01 21:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-30  4:49 [PATCH] exec: Make sure task->comm is always NUL-terminated Kees Cook
2024-11-30  7:15 ` Linus Torvalds
2024-11-30 21:05   ` Kees Cook
2024-11-30 21:33     ` Linus Torvalds
2024-12-01 20:23   ` Linus Torvalds
2024-11-30 21:40 ` David Laight
2024-12-01 21:49 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox