public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
@ 2021-03-25 16:43 Jens Axboe
  2021-03-25 16:43 ` [PATCH 1/2] kernel: don't include PF_IO_WORKERs as part of same_thread_group() Jens Axboe
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Jens Axboe @ 2021-03-25 16:43 UTC (permalink / raw)
  To: io-uring; +Cc: torvalds, ebiederm, linux-kernel, oleg, metze

Hi,

Stefan reports that attaching to a task with io_uring will leave gdb
very confused and just repeatedly attempting to attach to the IO threads,
even though it receives an -EPERM every time. This patchset proposes to
skip PF_IO_WORKER threads as same_thread_group(), except for accounting
purposes which we still desire.

We also skip listing the IO threads in /proc/<pid>/task/ so that gdb
doesn't think it should stop and attach to them. This makes us consistent
with earlier kernels, where these async threads were not related to the
ring owning task, and hence gdb (and others) ignored them anyway.

Seems to me that this is the right approach, but open to comments on if
others agree with this. Oleg, I did see your messages as well on SIGSTOP,
and as was discussed with Eric as well, this is something we most
certainly can revisit. I do think that the visibility of these threads
is a separate issue. Even with SIGSTOP implemented (which I did try as
well), we're never going to allow ptrace attach and hence gdb would still
be broken. Hence I'd rather treat them as separate issues to attack.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/2] kernel: don't include PF_IO_WORKERs as part of same_thread_group()
  2021-03-25 16:43 [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/ Jens Axboe
@ 2021-03-25 16:43 ` Jens Axboe
  2021-03-25 16:43 ` [PATCH 2/2] proc: don't show PF_IO_WORKER threads as threads in /proc/<pid>/task/ Jens Axboe
  2021-03-25 19:33 ` [PATCH 0/2] Don't show PF_IO_WORKER " Eric W. Biederman
  2 siblings, 0 replies; 30+ messages in thread
From: Jens Axboe @ 2021-03-25 16:43 UTC (permalink / raw)
  To: io-uring; +Cc: torvalds, ebiederm, linux-kernel, oleg, metze, Jens Axboe

Don't pretend that the IO threads are in the same thread group, the only
case where that seems to be desired is for accounting purposes. Add
a special accounting function for that and make the scheduler side use it.

For signals and ptrace, we don't allow them to be treated as threads
anyway.

Reported-by: Stefan Metzmacher <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
---
 include/linux/sched/signal.h | 9 ++++++++-
 kernel/sched/cputime.c       | 2 +-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3f6a0fcaa10c..4f621e386abf 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -668,11 +668,18 @@ static inline bool thread_group_leader(struct task_struct *p)
 }
 
 static inline
-bool same_thread_group(struct task_struct *p1, struct task_struct *p2)
+bool same_thread_group_account(struct task_struct *p1, struct task_struct *p2)
 {
 	return p1->signal == p2->signal;
 }
 
+static inline
+bool same_thread_group(struct task_struct *p1, struct task_struct *p2)
+{
+	return same_thread_group_account(p1, p2) &&
+			!((p1->flags | p2->flags) & PF_IO_WORKER);
+}
+
 static inline struct task_struct *next_thread(const struct task_struct *p)
 {
 	return list_entry_rcu(p->thread_group.next,
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5f611658eeab..625110cacc2a 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -307,7 +307,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 	 * those pending times and rely only on values updated on tick or
 	 * other scheduler action.
 	 */
-	if (same_thread_group(current, tsk))
+	if (same_thread_group_account(current, tsk))
 		(void) task_sched_runtime(current);
 
 	rcu_read_lock();
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/2] proc: don't show PF_IO_WORKER threads as threads in /proc/<pid>/task/
  2021-03-25 16:43 [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/ Jens Axboe
  2021-03-25 16:43 ` [PATCH 1/2] kernel: don't include PF_IO_WORKERs as part of same_thread_group() Jens Axboe
@ 2021-03-25 16:43 ` Jens Axboe
  2021-03-25 19:33 ` [PATCH 0/2] Don't show PF_IO_WORKER " Eric W. Biederman
  2 siblings, 0 replies; 30+ messages in thread
From: Jens Axboe @ 2021-03-25 16:43 UTC (permalink / raw)
  To: io-uring; +Cc: torvalds, ebiederm, linux-kernel, oleg, metze, Jens Axboe

We don't allow SIGSTOP and ptrace attach to these threads, and that
confuses applications like gdb that assume they can attach to any thread
listed in /proc/<pid>/task/. gdb then enters an infinite loop of retrying
attach, even though it fails with the same error (-EPERM) every time.

Skip over PF_IO_WORKER threads in the proc task setup. We can't just
terminate the when we find a PF_IO_WORKER thread, as there's no real
ordering here. It's perfectly feasible to have the first thread be an
IO worker, and then a real thread after that. Hence just implement the
skip.

Reported-by: Stefan Metzmacher <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
---
 fs/proc/base.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 3851bfcdba56..abff2fe10bfa 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3723,7 +3723,7 @@ static struct task_struct *first_tid(struct pid *pid, int tid, loff_t f_pos,
 	 */
 	pos = task = task->group_leader;
 	do {
-		if (!nr--)
+		if (same_thread_group(task, pos) && !nr--)
 			goto found;
 	} while_each_thread(task, pos);
 fail:
@@ -3744,16 +3744,22 @@ static struct task_struct *first_tid(struct pid *pid, int tid, loff_t f_pos,
  */
 static struct task_struct *next_tid(struct task_struct *start)
 {
-	struct task_struct *pos = NULL;
+	struct task_struct *tmp, *pos = NULL;
+
 	rcu_read_lock();
-	if (pid_alive(start)) {
-		pos = next_thread(start);
-		if (thread_group_leader(pos))
-			pos = NULL;
-		else
-			get_task_struct(pos);
+	if (!pid_alive(start))
+		goto no_thread;
+	list_for_each_entry_rcu(tmp, &start->thread_group, thread_group) {
+		if (!thread_group_leader(tmp) && same_thread_group(start, tmp)) {
+			get_task_struct(tmp);
+			pos = tmp;
+			break;
+		}
 	}
+no_thread:
 	rcu_read_unlock();
+	if (!pos)
+		return NULL;
 	put_task_struct(start);
 	return pos;
 }
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 16:43 [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/ Jens Axboe
  2021-03-25 16:43 ` [PATCH 1/2] kernel: don't include PF_IO_WORKERs as part of same_thread_group() Jens Axboe
  2021-03-25 16:43 ` [PATCH 2/2] proc: don't show PF_IO_WORKER threads as threads in /proc/<pid>/task/ Jens Axboe
@ 2021-03-25 19:33 ` Eric W. Biederman
  2021-03-25 19:38   ` Linus Torvalds
  2021-03-25 19:40   ` Jens Axboe
  2 siblings, 2 replies; 30+ messages in thread
From: Eric W. Biederman @ 2021-03-25 19:33 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, torvalds, linux-kernel, oleg, metze

Jens Axboe <[email protected]> writes:

> Hi,
>
> Stefan reports that attaching to a task with io_uring will leave gdb
> very confused and just repeatedly attempting to attach to the IO threads,
> even though it receives an -EPERM every time. This patchset proposes to
> skip PF_IO_WORKER threads as same_thread_group(), except for accounting
> purposes which we still desire.
>
> We also skip listing the IO threads in /proc/<pid>/task/ so that gdb
> doesn't think it should stop and attach to them. This makes us consistent
> with earlier kernels, where these async threads were not related to the
> ring owning task, and hence gdb (and others) ignored them anyway.
>
> Seems to me that this is the right approach, but open to comments on if
> others agree with this. Oleg, I did see your messages as well on SIGSTOP,
> and as was discussed with Eric as well, this is something we most
> certainly can revisit. I do think that the visibility of these threads
> is a separate issue. Even with SIGSTOP implemented (which I did try as
> well), we're never going to allow ptrace attach and hence gdb would still
> be broken. Hence I'd rather treat them as separate issues to attack.

A quick skim shows that these threads are not showing up anywhere in
proc which appears to be a problem, as it hides them from top.

Sysadmins need the ability to dig into a system and find out where all
their cpu usage or io's have gone when there is a problem.  I general I
think this argues that these threads should show up as threads of the
process so I am not even certain this is the right fix to deal with gdb.

Eric

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 19:33 ` [PATCH 0/2] Don't show PF_IO_WORKER " Eric W. Biederman
@ 2021-03-25 19:38   ` Linus Torvalds
  2021-03-25 19:40     ` Jens Axboe
  2021-03-25 19:42     ` Linus Torvalds
  2021-03-25 19:40   ` Jens Axboe
  1 sibling, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2021-03-25 19:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jens Axboe, io-uring, Linux Kernel Mailing List, Oleg Nesterov,
	Stefan Metzmacher

On Thu, Mar 25, 2021 at 12:34 PM Eric W. Biederman
<[email protected]> wrote:
>
> A quick skim shows that these threads are not showing up anywhere in
> proc which appears to be a problem, as it hides them from top.
>
> Sysadmins need the ability to dig into a system and find out where all
> their cpu usage or io's have gone when there is a problem.  I general I
> think this argues that these threads should show up as threads of the
> process so I am not even certain this is the right fix to deal with gdb.

Yeah, I do think that hiding them is the wrong model, because it also
hides them from "ps" etc, which is very wrong.

I don't know what the gdb logic is, but maybe there's some other
option that makes gdb not react to them?

          Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 19:33 ` [PATCH 0/2] Don't show PF_IO_WORKER " Eric W. Biederman
  2021-03-25 19:38   ` Linus Torvalds
@ 2021-03-25 19:40   ` Jens Axboe
  2021-03-25 20:32     ` Oleg Nesterov
  1 sibling, 1 reply; 30+ messages in thread
From: Jens Axboe @ 2021-03-25 19:40 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: io-uring, torvalds, linux-kernel, oleg, metze

On 3/25/21 1:33 PM, Eric W. Biederman wrote:
> Jens Axboe <[email protected]> writes:
> 
>> Hi,
>>
>> Stefan reports that attaching to a task with io_uring will leave gdb
>> very confused and just repeatedly attempting to attach to the IO threads,
>> even though it receives an -EPERM every time. This patchset proposes to
>> skip PF_IO_WORKER threads as same_thread_group(), except for accounting
>> purposes which we still desire.
>>
>> We also skip listing the IO threads in /proc/<pid>/task/ so that gdb
>> doesn't think it should stop and attach to them. This makes us consistent
>> with earlier kernels, where these async threads were not related to the
>> ring owning task, and hence gdb (and others) ignored them anyway.
>>
>> Seems to me that this is the right approach, but open to comments on if
>> others agree with this. Oleg, I did see your messages as well on SIGSTOP,
>> and as was discussed with Eric as well, this is something we most
>> certainly can revisit. I do think that the visibility of these threads
>> is a separate issue. Even with SIGSTOP implemented (which I did try as
>> well), we're never going to allow ptrace attach and hence gdb would still
>> be broken. Hence I'd rather treat them as separate issues to attack.
> 
> A quick skim shows that these threads are not showing up anywhere in
> proc which appears to be a problem, as it hides them from top.
> 
> Sysadmins need the ability to dig into a system and find out where all
> their cpu usage or io's have gone when there is a problem.  I general I
> think this argues that these threads should show up as threads of the
> process so I am not even certain this is the right fix to deal with gdb.

That's a good point, overall hiding was not really what I desired, just
getting them out of gdb's hands. And arguably it _is_ a gdb bug, but...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 19:38   ` Linus Torvalds
@ 2021-03-25 19:40     ` Jens Axboe
  2021-03-25 19:42     ` Linus Torvalds
  1 sibling, 0 replies; 30+ messages in thread
From: Jens Axboe @ 2021-03-25 19:40 UTC (permalink / raw)
  To: Linus Torvalds, Eric W. Biederman
  Cc: io-uring, Linux Kernel Mailing List, Oleg Nesterov,
	Stefan Metzmacher

On 3/25/21 1:38 PM, Linus Torvalds wrote:
> On Thu, Mar 25, 2021 at 12:34 PM Eric W. Biederman
> <[email protected]> wrote:
>>
>> A quick skim shows that these threads are not showing up anywhere in
>> proc which appears to be a problem, as it hides them from top.
>>
>> Sysadmins need the ability to dig into a system and find out where all
>> their cpu usage or io's have gone when there is a problem.  I general I
>> think this argues that these threads should show up as threads of the
>> process so I am not even certain this is the right fix to deal with gdb.
> 
> Yeah, I do think that hiding them is the wrong model, because it also
> hides them from "ps" etc, which is very wrong.

Totally agree.

> I don't know what the gdb logic is, but maybe there's some other
> option that makes gdb not react to them?

Guess it's time to dig out the gdb source... I'll take a look.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 19:38   ` Linus Torvalds
  2021-03-25 19:40     ` Jens Axboe
@ 2021-03-25 19:42     ` Linus Torvalds
  2021-03-25 19:46       ` Jens Axboe
  2021-03-25 20:12       ` Linus Torvalds
  1 sibling, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2021-03-25 19:42 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jens Axboe, io-uring, Linux Kernel Mailing List, Oleg Nesterov,
	Stefan Metzmacher

On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
<[email protected]> wrote:
>
> I don't know what the gdb logic is, but maybe there's some other
> option that makes gdb not react to them?

.. maybe we could have a different name for them under the task/
subdirectory, for example (not  just the pid)? Although that probably
messes up 'ps' too..

          Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 19:42     ` Linus Torvalds
@ 2021-03-25 19:46       ` Jens Axboe
  2021-03-25 20:21         ` Eric W. Biederman
  2021-03-25 20:12       ` Linus Torvalds
  1 sibling, 1 reply; 30+ messages in thread
From: Jens Axboe @ 2021-03-25 19:46 UTC (permalink / raw)
  To: Linus Torvalds, Eric W. Biederman
  Cc: io-uring, Linux Kernel Mailing List, Oleg Nesterov,
	Stefan Metzmacher

On 3/25/21 1:42 PM, Linus Torvalds wrote:
> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
> <[email protected]> wrote:
>>
>> I don't know what the gdb logic is, but maybe there's some other
>> option that makes gdb not react to them?
> 
> .. maybe we could have a different name for them under the task/
> subdirectory, for example (not  just the pid)? Although that probably
> messes up 'ps' too..

Heh, I can try, but my guess is that it would mess up _something_, if
not ps/top.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 19:42     ` Linus Torvalds
  2021-03-25 19:46       ` Jens Axboe
@ 2021-03-25 20:12       ` Linus Torvalds
  2021-03-25 20:40         ` Jens Axboe
                           ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Linus Torvalds @ 2021-03-25 20:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jens Axboe, io-uring, Linux Kernel Mailing List, Oleg Nesterov,
	Stefan Metzmacher

On Thu, Mar 25, 2021 at 12:42 PM Linus Torvalds
<[email protected]> wrote:
>
> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > I don't know what the gdb logic is, but maybe there's some other
> > option that makes gdb not react to them?
>
> .. maybe we could have a different name for them under the task/
> subdirectory, for example (not  just the pid)? Although that probably
> messes up 'ps' too..

Actually, maybe the right model is to simply make all the io threads
take signals, and get rid of all the special cases.

Sure, the signals will never be delivered to user space, but if we

 - just made the thread loop do "get_signal()" when there are pending signals

 - allowed ptrace_attach on them

they'd look pretty much like regular threads that just never do the
user-space part of signal handling.

The whole "signals are very special for IO threads" thing has caused
so many problems, that maybe the solution is simply to _not_ make them
special?

           Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 19:46       ` Jens Axboe
@ 2021-03-25 20:21         ` Eric W. Biederman
  2021-03-25 20:40           ` Oleg Nesterov
  2021-03-25 20:42           ` Jens Axboe
  0 siblings, 2 replies; 30+ messages in thread
From: Eric W. Biederman @ 2021-03-25 20:21 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, io-uring, Linux Kernel Mailing List,
	Oleg Nesterov, Stefan Metzmacher

Jens Axboe <[email protected]> writes:

> On 3/25/21 1:42 PM, Linus Torvalds wrote:
>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>> <[email protected]> wrote:
>>>
>>> I don't know what the gdb logic is, but maybe there's some other
>>> option that makes gdb not react to them?
>> 
>> .. maybe we could have a different name for them under the task/
>> subdirectory, for example (not  just the pid)? Although that probably
>> messes up 'ps' too..
>
> Heh, I can try, but my guess is that it would mess up _something_, if
> not ps/top.

Hmm.

So looking quickly the flip side of the coin is gdb (and other
debuggers) needs a way to know these threads are special, so it can know
not to attach.

I suspect getting -EPERM (or possibly a different error code) when
attempting attach is the right was to know that a thread is not
available to be debugged.

Eric


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 19:40   ` Jens Axboe
@ 2021-03-25 20:32     ` Oleg Nesterov
  0 siblings, 0 replies; 30+ messages in thread
From: Oleg Nesterov @ 2021-03-25 20:32 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Eric W. Biederman, io-uring, torvalds, linux-kernel, metze

I didn't even try to read this series yet, will try tomorrow.

But sorry, I can't resist...

On 03/25, Jens Axboe wrote:
>
> On 3/25/21 1:33 PM, Eric W. Biederman wrote:
> > Jens Axboe <[email protected]> writes:
> >
> >> Hi,
> >>
> >> Stefan reports that attaching to a task with io_uring will leave gdb
> >> very confused and just repeatedly attempting to attach to the IO threads,
> >> even though it receives an -EPERM every time.

Heh. As expected :/

> And arguably it _is_ a gdb bug, but...

Why do you think so?

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 20:12       ` Linus Torvalds
@ 2021-03-25 20:40         ` Jens Axboe
  2021-03-25 21:44           ` Jens Axboe
  2021-03-25 20:43         ` Eric W. Biederman
  2021-03-25 20:44         ` Oleg Nesterov
  2 siblings, 1 reply; 30+ messages in thread
From: Jens Axboe @ 2021-03-25 20:40 UTC (permalink / raw)
  To: Linus Torvalds, Eric W. Biederman
  Cc: io-uring, Linux Kernel Mailing List, Oleg Nesterov,
	Stefan Metzmacher

On 3/25/21 2:12 PM, Linus Torvalds wrote:
> On Thu, Mar 25, 2021 at 12:42 PM Linus Torvalds
> <[email protected]> wrote:
>>
>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>> <[email protected]> wrote:
>>>
>>> I don't know what the gdb logic is, but maybe there's some other
>>> option that makes gdb not react to them?
>>
>> .. maybe we could have a different name for them under the task/
>> subdirectory, for example (not  just the pid)? Although that probably
>> messes up 'ps' too..
> 
> Actually, maybe the right model is to simply make all the io threads
> take signals, and get rid of all the special cases.
> 
> Sure, the signals will never be delivered to user space, but if we
> 
>  - just made the thread loop do "get_signal()" when there are pending signals
> 
>  - allowed ptrace_attach on them
> 
> they'd look pretty much like regular threads that just never do the
> user-space part of signal handling.
> 
> The whole "signals are very special for IO threads" thing has caused
> so many problems, that maybe the solution is simply to _not_ make them
> special?

Just to wrap up the previous one, yes it broke all sorts of things to
make the 'tid' directory different. They just end up being hidden anyway
through that, for both ps and top.

Yes, I do think that maybe it's better to just embrace maybe just
embrace the signals, and have everything just work by default. It's
better than continually trying to make the threads special. I'll see
if there are some demons lurking down that path.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 20:21         ` Eric W. Biederman
@ 2021-03-25 20:40           ` Oleg Nesterov
  2021-03-25 20:43             ` Jens Axboe
  2021-03-25 20:48             ` Eric W. Biederman
  2021-03-25 20:42           ` Jens Axboe
  1 sibling, 2 replies; 30+ messages in thread
From: Oleg Nesterov @ 2021-03-25 20:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jens Axboe, Linus Torvalds, io-uring, Linux Kernel Mailing List,
	Stefan Metzmacher

On 03/25, Eric W. Biederman wrote:
>
> So looking quickly the flip side of the coin is gdb (and other
> debuggers) needs a way to know these threads are special, so it can know
> not to attach.

may be,

> I suspect getting -EPERM (or possibly a different error code) when
> attempting attach is the right was to know that a thread is not
> available to be debugged.

may be.

But I don't think we can blame gdb. The kernel changed the rules, and this
broke gdb. IOW, I don't agree this is gdb bug.

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 20:21         ` Eric W. Biederman
  2021-03-25 20:40           ` Oleg Nesterov
@ 2021-03-25 20:42           ` Jens Axboe
  1 sibling, 0 replies; 30+ messages in thread
From: Jens Axboe @ 2021-03-25 20:42 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, io-uring, Linux Kernel Mailing List,
	Oleg Nesterov, Stefan Metzmacher

On 3/25/21 2:21 PM, Eric W. Biederman wrote:
> Jens Axboe <[email protected]> writes:
> 
>> On 3/25/21 1:42 PM, Linus Torvalds wrote:
>>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>>> <[email protected]> wrote:
>>>>
>>>> I don't know what the gdb logic is, but maybe there's some other
>>>> option that makes gdb not react to them?
>>>
>>> .. maybe we could have a different name for them under the task/
>>> subdirectory, for example (not  just the pid)? Although that probably
>>> messes up 'ps' too..
>>
>> Heh, I can try, but my guess is that it would mess up _something_, if
>> not ps/top.
> 
> Hmm.
> 
> So looking quickly the flip side of the coin is gdb (and other
> debuggers) needs a way to know these threads are special, so it can know
> not to attach.
> 
> I suspect getting -EPERM (or possibly a different error code) when
> attempting attach is the right was to know that a thread is not
> available to be debugged.

But that's what's being returned right now, and gdb seemingly doesn't
really handle that. And even if it was just a gdb issue that could be
fixed it (it definitely is), I'd still greatly prefer not having to do
that. It just takes too long for packages to get updated in distros, and
it'd be years until it got fixed widely. Secondly, I'm even more worried
about cases that we haven't seen yet. I doubt that gdb is the only thing
that'd fall over, not expecting threads in there that it cannot attach
to.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 20:40           ` Oleg Nesterov
@ 2021-03-25 20:43             ` Jens Axboe
  2021-03-25 20:48             ` Eric W. Biederman
  1 sibling, 0 replies; 30+ messages in thread
From: Jens Axboe @ 2021-03-25 20:43 UTC (permalink / raw)
  To: Oleg Nesterov, Eric W. Biederman
  Cc: Linus Torvalds, io-uring, Linux Kernel Mailing List,
	Stefan Metzmacher

On 3/25/21 2:40 PM, Oleg Nesterov wrote:
> On 03/25, Eric W. Biederman wrote:
>>
>> So looking quickly the flip side of the coin is gdb (and other
>> debuggers) needs a way to know these threads are special, so it can know
>> not to attach.
> 
> may be,
> 
>> I suspect getting -EPERM (or possibly a different error code) when
>> attempting attach is the right was to know that a thread is not
>> available to be debugged.
> 
> may be.
> 
> But I don't think we can blame gdb. The kernel changed the rules, and this
> broke gdb. IOW, I don't agree this is gdb bug.

Right, that's what I was getting at too - and it's likely not just gdb.
We have to ensure that we don't break this use case, which seems to
imply that we:

1) Just make it work, or
2) Make them hidden in such a way that gdb doesn't see them, but
   regular tooling does

#2 seems fraught with peril, and maybe not even possible.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 20:12       ` Linus Torvalds
  2021-03-25 20:40         ` Jens Axboe
@ 2021-03-25 20:43         ` Eric W. Biederman
  2021-03-25 21:50           ` Jens Axboe
  2021-03-25 20:44         ` Oleg Nesterov
  2 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2021-03-25 20:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jens Axboe, io-uring, Linux Kernel Mailing List, Oleg Nesterov,
	Stefan Metzmacher

Linus Torvalds <[email protected]> writes:

> On Thu, Mar 25, 2021 at 12:42 PM Linus Torvalds
> <[email protected]> wrote:
>>
>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>> <[email protected]> wrote:
>> >
>> > I don't know what the gdb logic is, but maybe there's some other
>> > option that makes gdb not react to them?
>>
>> .. maybe we could have a different name for them under the task/
>> subdirectory, for example (not  just the pid)? Although that probably
>> messes up 'ps' too..
>
> Actually, maybe the right model is to simply make all the io threads
> take signals, and get rid of all the special cases.
>
> Sure, the signals will never be delivered to user space, but if we
>
>  - just made the thread loop do "get_signal()" when there are pending signals
>
>  - allowed ptrace_attach on them
>
> they'd look pretty much like regular threads that just never do the
> user-space part of signal handling.
>
> The whole "signals are very special for IO threads" thing has caused
> so many problems, that maybe the solution is simply to _not_ make them
> special?

The special case in check_kill_permission is certainly unnecessary.
Having the signal blocked is enough to prevent signal_pending() from
being true. 


The most straight forward thing I can see is to allow ptrace_attach and
to modify ptrace_check_attach to always return -ESRCH for io workers
unless ignore_state is set causing none of the other ptrace operations
to work.

That is what a long running in-kernel thread would do today so
user-space aka gdb may actually cope with it.


We might be able to support if io workers start supporting SIGSTOP but I
am not at all certain.

Eric


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 20:12       ` Linus Torvalds
  2021-03-25 20:40         ` Jens Axboe
  2021-03-25 20:43         ` Eric W. Biederman
@ 2021-03-25 20:44         ` Oleg Nesterov
  2021-03-25 20:55           ` Eric W. Biederman
  2 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2021-03-25 20:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric W. Biederman, Jens Axboe, io-uring,
	Linux Kernel Mailing List, Stefan Metzmacher

On 03/25, Linus Torvalds wrote:
>
> The whole "signals are very special for IO threads" thing has caused
> so many problems, that maybe the solution is simply to _not_ make them
> special?

Or may be IO threads should not abuse CLONE_THREAD?

Why does create_io_thread() abuse CLONE_THREAD ?

One reason (I think) is that this implies SIGKILL when the process exits/execs,
anything else?

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 20:40           ` Oleg Nesterov
  2021-03-25 20:43             ` Jens Axboe
@ 2021-03-25 20:48             ` Eric W. Biederman
  1 sibling, 0 replies; 30+ messages in thread
From: Eric W. Biederman @ 2021-03-25 20:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jens Axboe, Linus Torvalds, io-uring, Linux Kernel Mailing List,
	Stefan Metzmacher

Oleg Nesterov <[email protected]> writes:

> On 03/25, Eric W. Biederman wrote:
>>
>> So looking quickly the flip side of the coin is gdb (and other
>> debuggers) needs a way to know these threads are special, so it can know
>> not to attach.
>
> may be,
>
>> I suspect getting -EPERM (or possibly a different error code) when
>> attempting attach is the right was to know that a thread is not
>> available to be debugged.
>
> may be.
>
> But I don't think we can blame gdb. The kernel changed the rules, and this
> broke gdb. IOW, I don't agree this is gdb bug.

My point would be it is not strictly a regression either.  It is gdb not
handling new functionality.

If we can be backwards compatible and make ptrace_attach work that is
preferable.  If we can't saying the handful of ptrace using applications
need an upgrade to support processes that use io_uring may be
acceptable.

I don't see any easy to implement path that is guaranteed to work.

Eric




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 20:44         ` Oleg Nesterov
@ 2021-03-25 20:55           ` Eric W. Biederman
  2021-03-25 21:20             ` Stefan Metzmacher
  0 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2021-03-25 20:55 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linus Torvalds, Jens Axboe, io-uring, Linux Kernel Mailing List,
	Stefan Metzmacher

Oleg Nesterov <[email protected]> writes:

> On 03/25, Linus Torvalds wrote:
>>
>> The whole "signals are very special for IO threads" thing has caused
>> so many problems, that maybe the solution is simply to _not_ make them
>> special?
>
> Or may be IO threads should not abuse CLONE_THREAD?
>
> Why does create_io_thread() abuse CLONE_THREAD ?
>
> One reason (I think) is that this implies SIGKILL when the process exits/execs,
> anything else?

A lot.

The io workers perform work on behave of the ordinary userspace threads.
Some of that work is opening files.  For things like rlimits to work
properly you need to share the signal_struct.  But odds are if you find
anything in signal_struct (not counting signals) there will be an
io_uring code path that can exercise it as io_uring can traverse the
filesystem, open files and read/write files.  So io_uring can exercise
all of proc.

Using create_io_thread with CLONE_THREAD is the least problematic way
(including all of the signal and ptrace problems we are looking at right
now) to implement the io worker threads.

They _really_ are threads of the process that just never execute any
code in userspace.

Eric


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 20:55           ` Eric W. Biederman
@ 2021-03-25 21:20             ` Stefan Metzmacher
  2021-03-25 21:48               ` Stefan Metzmacher
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Metzmacher @ 2021-03-25 21:20 UTC (permalink / raw)
  To: Eric W. Biederman, Oleg Nesterov
  Cc: Linus Torvalds, Jens Axboe, io-uring, Linux Kernel Mailing List


Am 25.03.21 um 21:55 schrieb Eric W. Biederman:
> Oleg Nesterov <[email protected]> writes:
> 
>> On 03/25, Linus Torvalds wrote:
>>>
>>> The whole "signals are very special for IO threads" thing has caused
>>> so many problems, that maybe the solution is simply to _not_ make them
>>> special?
>>
>> Or may be IO threads should not abuse CLONE_THREAD?
>>
>> Why does create_io_thread() abuse CLONE_THREAD ?
>>
>> One reason (I think) is that this implies SIGKILL when the process exits/execs,
>> anything else?
> 
> A lot.
> 
> The io workers perform work on behave of the ordinary userspace threads.
> Some of that work is opening files.  For things like rlimits to work
> properly you need to share the signal_struct.  But odds are if you find
> anything in signal_struct (not counting signals) there will be an
> io_uring code path that can exercise it as io_uring can traverse the
> filesystem, open files and read/write files.  So io_uring can exercise
> all of proc.
> 
> Using create_io_thread with CLONE_THREAD is the least problematic way
> (including all of the signal and ptrace problems we are looking at right
> now) to implement the io worker threads.
> 
> They _really_ are threads of the process that just never execute any
> code in userspace.

So they should look like a userspace thread sitting in something like
epoll_pwait() with all signals blocked, which will never return to userspace again?

I think that would be useful, but I also think that userspace should see:
- /proc/$tidofiothread/cmdline as empty (in order to let ps and top use [iou-wrk-$tidofuserspacethread])
- /proc/$tidofiothread/exe as symlink to that not exists
- all of /proc/$tidofiothread/ shows root.root as owner and group
  and things which still allow write access to /proc/$tidofiothread/comm similar things
  with rw permissions should still disallow modifications:

For the other kernel threads e.g. "[cryptd]" I see the following:

LANG=C ls -l /proc/653 | grep rw
ls: cannot read symbolic link '/proc/653/exe': No such file or directory
-rw-r--r--  1 root root 0 Mar 25 22:09 autogroup
-rw-r--r--  1 root root 0 Mar 25 22:09 comm
-rw-r--r--  1 root root 0 Mar 25 22:09 coredump_filter
lrwxrwxrwx  1 root root 0 Mar 25 22:09 cwd -> /
lrwxrwxrwx  1 root root 0 Mar 25 22:09 exe
-rw-r--r--  1 root root 0 Mar 25 22:09 gid_map
-rw-r--r--  1 root root 0 Mar 25 22:09 loginuid
-rw-------  1 root root 0 Mar 25 22:09 mem
-rw-r--r--  1 root root 0 Mar 25 22:09 oom_adj
-rw-r--r--  1 root root 0 Mar 25 22:09 oom_score_adj
-rw-r--r--  1 root root 0 Mar 25 22:09 projid_map
lrwxrwxrwx  1 root root 0 Mar 25 22:09 root -> /
-rw-r--r--  1 root root 0 Mar 25 22:09 sched
-rw-r--r--  1 root root 0 Mar 25 22:09 setgroups
-rw-r--r--  1 root root 0 Mar 25 22:09 timens_offsets
-rw-rw-rw-  1 root root 0 Mar 25 22:09 timerslack_ns
-rw-r--r--  1 root root 0 Mar 25 22:09 uid_map

And this:

LANG=C echo "bla" > /proc/653/comm
-bash: echo: write error: Invalid argument

LANG=C echo "bla" > /proc/653/gid_map
-bash: echo: write error: Operation not permitted

Can't we do the same for iothreads regarding /proc?
Just make things read only there and empty "cmdline"/"exe"?

Maybe I'm too naive, but that what I'd assume as a userspace developer/admin.

Does at least parts of it make any sense?

metze

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 20:40         ` Jens Axboe
@ 2021-03-25 21:44           ` Jens Axboe
  2021-03-25 21:57             ` Stefan Metzmacher
  2021-03-25 22:37             ` Linus Torvalds
  0 siblings, 2 replies; 30+ messages in thread
From: Jens Axboe @ 2021-03-25 21:44 UTC (permalink / raw)
  To: Linus Torvalds, Eric W. Biederman
  Cc: io-uring, Linux Kernel Mailing List, Oleg Nesterov,
	Stefan Metzmacher

On 3/25/21 2:40 PM, Jens Axboe wrote:
> On 3/25/21 2:12 PM, Linus Torvalds wrote:
>> On Thu, Mar 25, 2021 at 12:42 PM Linus Torvalds
>> <[email protected]> wrote:
>>>
>>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>>> <[email protected]> wrote:
>>>>
>>>> I don't know what the gdb logic is, but maybe there's some other
>>>> option that makes gdb not react to them?
>>>
>>> .. maybe we could have a different name for them under the task/
>>> subdirectory, for example (not  just the pid)? Although that probably
>>> messes up 'ps' too..
>>
>> Actually, maybe the right model is to simply make all the io threads
>> take signals, and get rid of all the special cases.
>>
>> Sure, the signals will never be delivered to user space, but if we
>>
>>  - just made the thread loop do "get_signal()" when there are pending signals
>>
>>  - allowed ptrace_attach on them
>>
>> they'd look pretty much like regular threads that just never do the
>> user-space part of signal handling.
>>
>> The whole "signals are very special for IO threads" thing has caused
>> so many problems, that maybe the solution is simply to _not_ make them
>> special?
> 
> Just to wrap up the previous one, yes it broke all sorts of things to
> make the 'tid' directory different. They just end up being hidden anyway
> through that, for both ps and top.
> 
> Yes, I do think that maybe it's better to just embrace maybe just
> embrace the signals, and have everything just work by default. It's
> better than continually trying to make the threads special. I'll see
> if there are some demons lurking down that path.

In the spirit of "let's just try it", I ran with the below patch. With
that, I can gdb attach just fine to a test case that creates an io_uring
and a regular thread with pthread_create(). The regular thread uses
the ring, so you end up with two iou-mgr threads. Attach:

[root@archlinux ~]# gdb -p 360
[snip gdb noise]
Attaching to process 360
[New LWP 361]
[New LWP 362]
[New LWP 363]

warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386

warning: Architecture rejected target-supplied description
Error while reading shared library symbols for /usr/lib/libpthread.so.0:
Cannot find user-level thread for LWP 363: generic error
0x00007f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 () from /usr/lib/libc.so.6
(gdb) info threads
  Id   Target Id             Frame 
* 1    LWP 360 "io_uring"    0x00007f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 ()
   from /usr/lib/libc.so.6
  2    LWP 361 "iou-mgr-360" 0x0000000000000000 in ?? ()
  3    LWP 362 "io_uring"    0x00007f7aa52a0a9d in syscall () from /usr/lib/libc.so.6
  4    LWP 363 "iou-mgr-362" 0x0000000000000000 in ?? ()
(gdb) thread 2
[Switching to thread 2 (LWP 361)]
#0  0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0
(gdb) cont
Continuing.
^C
Thread 1 "io_uring" received signal SIGINT, Interrupt.
[Switching to LWP 360]
0x00007f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 () from /usr/lib/libc.so.6
(gdb) q
A debugging session is active.

	Inferior 1 [process 360] will be detached.

Quit anyway? (y or n) y
Detaching from program: /root/git/fio/t/io_uring, process 360
[Inferior 1 (process 360) detached]

The iou-mgr-x threads are stopped just fine, gdb obviously can't get any
real info out of them. But it works... Regular test cases work fine too,
just a sanity check. Didn't expect them not to.

Only thing that I dislike a bit, but I guess that's just a Linuxism, is
that if can now kill an io_uring owning task by sending a signal to one
of its IO thread workers.


diff --git a/fs/io-wq.c b/fs/io-wq.c
index b7c1fa932cb3..2dbdc552f3ba 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -505,8 +505,14 @@ static int io_wqe_worker(void *data)
 		ret = schedule_timeout(WORKER_IDLE_TIMEOUT);
 		if (try_to_freeze() || ret)
 			continue;
-		if (fatal_signal_pending(current))
-			break;
+		if (signal_pending(current)) {
+			struct ksignal ksig;
+
+			if (fatal_signal_pending(current))
+				break;
+			get_signal(&ksig);
+			continue;
+		}
 		/* timed out, exit unless we're the fixed worker */
 		if (test_bit(IO_WQ_BIT_EXIT, &wq->state) ||
 		    !(worker->flags & IO_WORKER_F_FIXED))
@@ -715,8 +721,15 @@ static int io_wq_manager(void *data)
 		io_wq_check_workers(wq);
 		schedule_timeout(HZ);
 		try_to_freeze();
-		if (fatal_signal_pending(current))
-			set_bit(IO_WQ_BIT_EXIT, &wq->state);
+		if (signal_pending(current)) {
+			struct ksignal ksig;
+
+			if (fatal_signal_pending(current))
+				set_bit(IO_WQ_BIT_EXIT, &wq->state);
+			else
+				get_signal(&ksig);
+			continue;
+		}
 	} while (!test_bit(IO_WQ_BIT_EXIT, &wq->state));
 
 	io_wq_check_workers(wq);
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 54ea561db4a5..3a9d021db328 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6765,8 +6765,14 @@ static int io_sq_thread(void *data)
 			timeout = jiffies + sqd->sq_thread_idle;
 			continue;
 		}
-		if (fatal_signal_pending(current))
-			break;
+		if (signal_pending(current)) {
+			struct ksignal ksig;
+
+			if (fatal_signal_pending(current))
+				break;
+			get_signal(&ksig);
+			continue;
+		}
 		sqt_spin = false;
 		cap_entries = !list_is_singular(&sqd->ctx_list);
 		list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
diff --git a/kernel/fork.c b/kernel/fork.c
index d3171e8e88e5..3b45d0f04044 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2436,6 +2436,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
 	if (!IS_ERR(tsk)) {
 		sigfillset(&tsk->blocked);
 		sigdelsetmask(&tsk->blocked, sigmask(SIGKILL));
+		sigdelsetmask(&tsk->blocked, sigmask(SIGSTOP));
 	}
 	return tsk;
 }
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 821cf1723814..61db50f7ca86 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -375,7 +375,7 @@ static int ptrace_attach(struct task_struct *task, long request,
 	audit_ptrace(task);
 
 	retval = -EPERM;
-	if (unlikely(task->flags & (PF_KTHREAD | PF_IO_WORKER)))
+	if (unlikely(task->flags & PF_KTHREAD))
 		goto out;
 	if (same_thread_group(task, current))
 		goto out;
diff --git a/kernel/signal.c b/kernel/signal.c
index f2a1b898da29..a5700557eb50 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -91,7 +91,7 @@ static bool sig_task_ignored(struct task_struct *t, int sig, bool force)
 		return true;
 
 	/* Only allow kernel generated signals to this kthread */
-	if (unlikely((t->flags & (PF_KTHREAD | PF_IO_WORKER)) &&
+	if (unlikely((t->flags & PF_KTHREAD) &&
 		     (handler == SIG_KTHREAD_KERNEL) && !force))
 		return true;
 
@@ -288,8 +288,7 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask)
 			JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING));
 	BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK));
 
-	if (unlikely(fatal_signal_pending(task) ||
-		     (task->flags & (PF_EXITING | PF_IO_WORKER))))
+	if (unlikely(fatal_signal_pending(task) || task->flags & PF_EXITING))
 		return false;
 
 	if (mask & JOBCTL_STOP_SIGMASK)
@@ -834,9 +833,6 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,
 
 	if (!valid_signal(sig))
 		return -EINVAL;
-	/* PF_IO_WORKER threads don't take any signals */
-	if (t->flags & PF_IO_WORKER)
-		return -ESRCH;
 
 	if (!si_fromuser(info))
 		return 0;

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 21:20             ` Stefan Metzmacher
@ 2021-03-25 21:48               ` Stefan Metzmacher
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Metzmacher @ 2021-03-25 21:48 UTC (permalink / raw)
  To: Eric W. Biederman, Oleg Nesterov
  Cc: Linus Torvalds, Jens Axboe, io-uring, Linux Kernel Mailing List


Am 25.03.21 um 22:20 schrieb Stefan Metzmacher:
> 
> Am 25.03.21 um 21:55 schrieb Eric W. Biederman:
>> Oleg Nesterov <[email protected]> writes:
>>
>>> On 03/25, Linus Torvalds wrote:
>>>>
>>>> The whole "signals are very special for IO threads" thing has caused
>>>> so many problems, that maybe the solution is simply to _not_ make them
>>>> special?
>>>
>>> Or may be IO threads should not abuse CLONE_THREAD?
>>>
>>> Why does create_io_thread() abuse CLONE_THREAD ?
>>>
>>> One reason (I think) is that this implies SIGKILL when the process exits/execs,
>>> anything else?
>>
>> A lot.
>>
>> The io workers perform work on behave of the ordinary userspace threads.
>> Some of that work is opening files.  For things like rlimits to work
>> properly you need to share the signal_struct.  But odds are if you find
>> anything in signal_struct (not counting signals) there will be an
>> io_uring code path that can exercise it as io_uring can traverse the
>> filesystem, open files and read/write files.  So io_uring can exercise
>> all of proc.
>>
>> Using create_io_thread with CLONE_THREAD is the least problematic way
>> (including all of the signal and ptrace problems we are looking at right
>> now) to implement the io worker threads.
>>
>> They _really_ are threads of the process that just never execute any
>> code in userspace.
> 
> So they should look like a userspace thread sitting in something like
> epoll_pwait() with all signals blocked, which will never return to userspace again?

Would gdb work with that?
The question is what backtrace gdb would show for that thread.

Is it possible to block SIGSTOP/SIGCONT?

I also think that all signals to an iothread should not be delivered to
other threads and it may only react on a direct SIGSTOP/SIGCONT.
I guess even SIGKILL should be ignored as the shutdown should happen
via the exit path of the iothread parent only.

> I think that would be useful, but I also think that userspace should see:
> - /proc/$tidofiothread/cmdline as empty (in order to let ps and top use [iou-wrk-$tidofuserspacethread])
> - /proc/$tidofiothread/exe as symlink to that not exists
> - all of /proc/$tidofiothread/ shows root.root as owner and group
>   and things which still allow write access to /proc/$tidofiothread/comm similar things
>   with rw permissions should still disallow modifications:
> 
> For the other kernel threads e.g. "[cryptd]" I see the following:
> 
> LANG=C ls -l /proc/653 | grep rw
> ls: cannot read symbolic link '/proc/653/exe': No such file or directory
> -rw-r--r--  1 root root 0 Mar 25 22:09 autogroup
> -rw-r--r--  1 root root 0 Mar 25 22:09 comm
> -rw-r--r--  1 root root 0 Mar 25 22:09 coredump_filter
> lrwxrwxrwx  1 root root 0 Mar 25 22:09 cwd -> /
> lrwxrwxrwx  1 root root 0 Mar 25 22:09 exe
> -rw-r--r--  1 root root 0 Mar 25 22:09 gid_map
> -rw-r--r--  1 root root 0 Mar 25 22:09 loginuid
> -rw-------  1 root root 0 Mar 25 22:09 mem
> -rw-r--r--  1 root root 0 Mar 25 22:09 oom_adj
> -rw-r--r--  1 root root 0 Mar 25 22:09 oom_score_adj
> -rw-r--r--  1 root root 0 Mar 25 22:09 projid_map
> lrwxrwxrwx  1 root root 0 Mar 25 22:09 root -> /
> -rw-r--r--  1 root root 0 Mar 25 22:09 sched
> -rw-r--r--  1 root root 0 Mar 25 22:09 setgroups
> -rw-r--r--  1 root root 0 Mar 25 22:09 timens_offsets
> -rw-rw-rw-  1 root root 0 Mar 25 22:09 timerslack_ns
> -rw-r--r--  1 root root 0 Mar 25 22:09 uid_map
> 
> And this:
> 
> LANG=C echo "bla" > /proc/653/comm
> -bash: echo: write error: Invalid argument
> 
> LANG=C echo "bla" > /proc/653/gid_map
> -bash: echo: write error: Operation not permitted
> 
> Can't we do the same for iothreads regarding /proc?
> Just make things read only there and empty "cmdline"/"exe"?
> 
> Maybe I'm too naive, but that what I'd assume as a userspace developer/admin.
> 
> Does at least parts of it make any sense?

I think the strange glibc setuid() behavior should also be tests here,
I guess we don't want that to reset the credentials of an iothread!

Another idea would be to have the iothreads as a child process with it's threads,
but again I'm only looking as an admin to what I'd except to see under /proc
via ps and top.

metze

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 20:43         ` Eric W. Biederman
@ 2021-03-25 21:50           ` Jens Axboe
  0 siblings, 0 replies; 30+ messages in thread
From: Jens Axboe @ 2021-03-25 21:50 UTC (permalink / raw)
  To: Eric W. Biederman, Linus Torvalds
  Cc: io-uring, Linux Kernel Mailing List, Oleg Nesterov,
	Stefan Metzmacher

On 3/25/21 2:43 PM, Eric W. Biederman wrote:
> Linus Torvalds <[email protected]> writes:
> 
>> On Thu, Mar 25, 2021 at 12:42 PM Linus Torvalds
>> <[email protected]> wrote:
>>>
>>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>>> <[email protected]> wrote:
>>>>
>>>> I don't know what the gdb logic is, but maybe there's some other
>>>> option that makes gdb not react to them?
>>>
>>> .. maybe we could have a different name for them under the task/
>>> subdirectory, for example (not  just the pid)? Although that probably
>>> messes up 'ps' too..
>>
>> Actually, maybe the right model is to simply make all the io threads
>> take signals, and get rid of all the special cases.
>>
>> Sure, the signals will never be delivered to user space, but if we
>>
>>  - just made the thread loop do "get_signal()" when there are pending signals
>>
>>  - allowed ptrace_attach on them
>>
>> they'd look pretty much like regular threads that just never do the
>> user-space part of signal handling.
>>
>> The whole "signals are very special for IO threads" thing has caused
>> so many problems, that maybe the solution is simply to _not_ make them
>> special?
> 
> The special case in check_kill_permission is certainly unnecessary.
> Having the signal blocked is enough to prevent signal_pending() from
> being true. 
> 
> 
> The most straight forward thing I can see is to allow ptrace_attach and
> to modify ptrace_check_attach to always return -ESRCH for io workers
> unless ignore_state is set causing none of the other ptrace operations
> to work.
> 
> That is what a long running in-kernel thread would do today so
> user-space aka gdb may actually cope with it.
> 
> 
> We might be able to support if io workers start supporting SIGSTOP but I
> am not at all certain.

See patch just send out as a POC, mostly, not fully sanitized yet. But
I did try to return -ESRCH from ptrace_check_attach() if it's an IO
thread and ignore_state isn't set:

if (!ignore_state && child->flags & PF_IO_WORKER)
	return -ESRCH;

and that causes gdb to abort at that thread. For the same test case
as in the previous email, you get:

Attaching to process 358
[New LWP 359]
[New LWP 360]
[New LWP 361]
Couldn't get CS register: No such process.
(gdb) 0x00007ffa58537125 in ?? ()

(gdb) bt
#0  0x00007ffa58537125 in ?? ()
#1  0x0000000000000000 in ?? ()
(gdb) info threads
  Id   Target Id             Frame 
* 1    LWP 358 "io_uring"    0x00007ffa58537125 in ?? ()
  2    LWP 359 "iou-mgr-358" Couldn't get registers: No such process.
(gdb) q
A debugging session is active.

	Inferior 1 [process 358] will be detached.

Quit anyway? (y or n) y
Couldn't write debug register: No such process.

where 360 here is a regular pthread created thread, and 361 is another
iou-mgr-x task. While gdb behaves better in this case, it does still
prevent you from inspecting thread 3 which would be totally valid.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 21:44           ` Jens Axboe
@ 2021-03-25 21:57             ` Stefan Metzmacher
  2021-03-26  0:11               ` Jens Axboe
  2021-03-25 22:37             ` Linus Torvalds
  1 sibling, 1 reply; 30+ messages in thread
From: Stefan Metzmacher @ 2021-03-25 21:57 UTC (permalink / raw)
  To: Jens Axboe, Linus Torvalds, Eric W. Biederman
  Cc: io-uring, Linux Kernel Mailing List, Oleg Nesterov


Am 25.03.21 um 22:44 schrieb Jens Axboe:
> On 3/25/21 2:40 PM, Jens Axboe wrote:
>> On 3/25/21 2:12 PM, Linus Torvalds wrote:
>>> On Thu, Mar 25, 2021 at 12:42 PM Linus Torvalds
>>> <[email protected]> wrote:
>>>>
>>>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>>>> <[email protected]> wrote:
>>>>>
>>>>> I don't know what the gdb logic is, but maybe there's some other
>>>>> option that makes gdb not react to them?
>>>>
>>>> .. maybe we could have a different name for them under the task/
>>>> subdirectory, for example (not  just the pid)? Although that probably
>>>> messes up 'ps' too..
>>>
>>> Actually, maybe the right model is to simply make all the io threads
>>> take signals, and get rid of all the special cases.
>>>
>>> Sure, the signals will never be delivered to user space, but if we
>>>
>>>  - just made the thread loop do "get_signal()" when there are pending signals
>>>
>>>  - allowed ptrace_attach on them
>>>
>>> they'd look pretty much like regular threads that just never do the
>>> user-space part of signal handling.
>>>
>>> The whole "signals are very special for IO threads" thing has caused
>>> so many problems, that maybe the solution is simply to _not_ make them
>>> special?
>>
>> Just to wrap up the previous one, yes it broke all sorts of things to
>> make the 'tid' directory different. They just end up being hidden anyway
>> through that, for both ps and top.
>>
>> Yes, I do think that maybe it's better to just embrace maybe just
>> embrace the signals, and have everything just work by default. It's
>> better than continually trying to make the threads special. I'll see
>> if there are some demons lurking down that path.
> 
> In the spirit of "let's just try it", I ran with the below patch. With
> that, I can gdb attach just fine to a test case that creates an io_uring
> and a regular thread with pthread_create(). The regular thread uses
> the ring, so you end up with two iou-mgr threads. Attach:
> 
> [root@archlinux ~]# gdb -p 360
> [snip gdb noise]
> Attaching to process 360
> [New LWP 361]
> [New LWP 362]
> [New LWP 363]
> 
> warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386
> 
> warning: Architecture rejected target-supplied description
> Error while reading shared library symbols for /usr/lib/libpthread.so.0:
> Cannot find user-level thread for LWP 363: generic error
> 0x00007f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 () from /usr/lib/libc.so.6
> (gdb) info threads
>   Id   Target Id             Frame 
> * 1    LWP 360 "io_uring"    0x00007f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 ()
>    from /usr/lib/libc.so.6
>   2    LWP 361 "iou-mgr-360" 0x0000000000000000 in ?? ()
>   3    LWP 362 "io_uring"    0x00007f7aa52a0a9d in syscall () from /usr/lib/libc.so.6
>   4    LWP 363 "iou-mgr-362" 0x0000000000000000 in ?? ()
> (gdb) thread 2
> [Switching to thread 2 (LWP 361)]
> #0  0x0000000000000000 in ?? ()
> (gdb) bt
> #0  0x0000000000000000 in ?? ()
> Backtrace stopped: Cannot access memory at address 0x0
> (gdb) cont
> Continuing.
> ^C
> Thread 1 "io_uring" received signal SIGINT, Interrupt.
> [Switching to LWP 360]
> 0x00007f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 () from /usr/lib/libc.so.6
> (gdb) q
> A debugging session is active.
> 
> 	Inferior 1 [process 360] will be detached.
> 
> Quit anyway? (y or n) y
> Detaching from program: /root/git/fio/t/io_uring, process 360
> [Inferior 1 (process 360) detached]
> 
> The iou-mgr-x threads are stopped just fine, gdb obviously can't get any
> real info out of them. But it works... Regular test cases work fine too,
> just a sanity check. Didn't expect them not to.

I guess that's basically what I tried to describe when I said they should
look like a userspace process that is blocked in a syscall forever.

> Only thing that I dislike a bit, but I guess that's just a Linuxism, is
> that if can now kill an io_uring owning task by sending a signal to one
> of its IO thread workers.

Can't we just only allow SIGSTOP, which will be only delivered to
the iothread itself? And also SIGKILL should not be allowed from userspace.

And /proc/$iothread/ should be read only and owned by root with
"cmdline" and "exe" being empty.

Thanks!
metze

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 21:44           ` Jens Axboe
  2021-03-25 21:57             ` Stefan Metzmacher
@ 2021-03-25 22:37             ` Linus Torvalds
  2021-03-26  0:08               ` Jens Axboe
  1 sibling, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2021-03-25 22:37 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Eric W. Biederman, io-uring, Linux Kernel Mailing List,
	Oleg Nesterov, Stefan Metzmacher

On Thu, Mar 25, 2021 at 2:44 PM Jens Axboe <[email protected]> wrote:
>
> In the spirit of "let's just try it", I ran with the below patch. With
> that, I can gdb attach just fine to a test case that creates an io_uring
> and a regular thread with pthread_create(). The regular thread uses
> the ring, so you end up with two iou-mgr threads. Attach:
>
> [root@archlinux ~]# gdb -p 360
> [snip gdb noise]
> Attaching to process 360
> [New LWP 361]
> [New LWP 362]
> [New LWP 363]
[..]

Looks fairly sane to me.

I think this ends up being the right approach - just the final part
(famous last words) of "io_uring threads act like normal threads".

Doing it for VM and FS got rid of all the special cases there, and now
doing it for signal handling gets rid of all these ptrace etc issues.

And the fact that a noticeable part of the patch was removing the
PF_IO_WORKER tests again looks like a very good sign to me.

In fact, I think you could now remove all the freezer hacks too -
because get_signal() will now do the proper try_to_freeze(), so all
those freezer things are stale as well.

Yeah, it's still going to be different in that there's no real user
space return, and so it will never look _entirely_ like a normal
thread, but on the whole I really like how this does seem to get rid
of another batch of special cases.

               Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 22:37             ` Linus Torvalds
@ 2021-03-26  0:08               ` Jens Axboe
  0 siblings, 0 replies; 30+ messages in thread
From: Jens Axboe @ 2021-03-26  0:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric W. Biederman, io-uring, Linux Kernel Mailing List,
	Oleg Nesterov, Stefan Metzmacher

On 3/25/21 4:37 PM, Linus Torvalds wrote:
> On Thu, Mar 25, 2021 at 2:44 PM Jens Axboe <[email protected]> wrote:
>>
>> In the spirit of "let's just try it", I ran with the below patch. With
>> that, I can gdb attach just fine to a test case that creates an io_uring
>> and a regular thread with pthread_create(). The regular thread uses
>> the ring, so you end up with two iou-mgr threads. Attach:
>>
>> [root@archlinux ~]# gdb -p 360
>> [snip gdb noise]
>> Attaching to process 360
>> [New LWP 361]
>> [New LWP 362]
>> [New LWP 363]
> [..]
> 
> Looks fairly sane to me.
> 
> I think this ends up being the right approach - just the final part
> (famous last words) of "io_uring threads act like normal threads".
> 
> Doing it for VM and FS got rid of all the special cases there, and now
> doing it for signal handling gets rid of all these ptrace etc issues.
> 
> And the fact that a noticeable part of the patch was removing the
> PF_IO_WORKER tests again looks like a very good sign to me.

I agree, and in fact there are more PF_IO_WORKER checks that can go
too. The patch is just the bare minimum.

> In fact, I think you could now remove all the freezer hacks too -
> because get_signal() will now do the proper try_to_freeze(), so all
> those freezer things are stale as well.

Yep

> Yeah, it's still going to be different in that there's no real user
> space return, and so it will never look _entirely_ like a normal
> thread, but on the whole I really like how this does seem to get rid
> of another batch of special cases.

That's what makes me feel better too. I think was so hung up on the
"never take signals" that it just didn't occur to me to go this
route instead.

I'll send out a clean series.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-25 21:57             ` Stefan Metzmacher
@ 2021-03-26  0:11               ` Jens Axboe
  2021-03-26 11:59                 ` Stefan Metzmacher
  0 siblings, 1 reply; 30+ messages in thread
From: Jens Axboe @ 2021-03-26  0:11 UTC (permalink / raw)
  To: Stefan Metzmacher, Linus Torvalds, Eric W. Biederman
  Cc: io-uring, Linux Kernel Mailing List, Oleg Nesterov

On 3/25/21 3:57 PM, Stefan Metzmacher wrote:
> 
> Am 25.03.21 um 22:44 schrieb Jens Axboe:
>> On 3/25/21 2:40 PM, Jens Axboe wrote:
>>> On 3/25/21 2:12 PM, Linus Torvalds wrote:
>>>> On Thu, Mar 25, 2021 at 12:42 PM Linus Torvalds
>>>> <[email protected]> wrote:
>>>>>
>>>>> On Thu, Mar 25, 2021 at 12:38 PM Linus Torvalds
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> I don't know what the gdb logic is, but maybe there's some other
>>>>>> option that makes gdb not react to them?
>>>>>
>>>>> .. maybe we could have a different name for them under the task/
>>>>> subdirectory, for example (not  just the pid)? Although that probably
>>>>> messes up 'ps' too..
>>>>
>>>> Actually, maybe the right model is to simply make all the io threads
>>>> take signals, and get rid of all the special cases.
>>>>
>>>> Sure, the signals will never be delivered to user space, but if we
>>>>
>>>>  - just made the thread loop do "get_signal()" when there are pending signals
>>>>
>>>>  - allowed ptrace_attach on them
>>>>
>>>> they'd look pretty much like regular threads that just never do the
>>>> user-space part of signal handling.
>>>>
>>>> The whole "signals are very special for IO threads" thing has caused
>>>> so many problems, that maybe the solution is simply to _not_ make them
>>>> special?
>>>
>>> Just to wrap up the previous one, yes it broke all sorts of things to
>>> make the 'tid' directory different. They just end up being hidden anyway
>>> through that, for both ps and top.
>>>
>>> Yes, I do think that maybe it's better to just embrace maybe just
>>> embrace the signals, and have everything just work by default. It's
>>> better than continually trying to make the threads special. I'll see
>>> if there are some demons lurking down that path.
>>
>> In the spirit of "let's just try it", I ran with the below patch. With
>> that, I can gdb attach just fine to a test case that creates an io_uring
>> and a regular thread with pthread_create(). The regular thread uses
>> the ring, so you end up with two iou-mgr threads. Attach:
>>
>> [root@archlinux ~]# gdb -p 360
>> [snip gdb noise]
>> Attaching to process 360
>> [New LWP 361]
>> [New LWP 362]
>> [New LWP 363]
>>
>> warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386
>>
>> warning: Architecture rejected target-supplied description
>> Error while reading shared library symbols for /usr/lib/libpthread.so.0:
>> Cannot find user-level thread for LWP 363: generic error
>> 0x00007f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 () from /usr/lib/libc.so.6
>> (gdb) info threads
>>   Id   Target Id             Frame 
>> * 1    LWP 360 "io_uring"    0x00007f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 ()
>>    from /usr/lib/libc.so.6
>>   2    LWP 361 "iou-mgr-360" 0x0000000000000000 in ?? ()
>>   3    LWP 362 "io_uring"    0x00007f7aa52a0a9d in syscall () from /usr/lib/libc.so.6
>>   4    LWP 363 "iou-mgr-362" 0x0000000000000000 in ?? ()
>> (gdb) thread 2
>> [Switching to thread 2 (LWP 361)]
>> #0  0x0000000000000000 in ?? ()
>> (gdb) bt
>> #0  0x0000000000000000 in ?? ()
>> Backtrace stopped: Cannot access memory at address 0x0
>> (gdb) cont
>> Continuing.
>> ^C
>> Thread 1 "io_uring" received signal SIGINT, Interrupt.
>> [Switching to LWP 360]
>> 0x00007f7aa526e125 in clock_nanosleep@GLIBC_2.2.5 () from /usr/lib/libc.so.6
>> (gdb) q
>> A debugging session is active.
>>
>> 	Inferior 1 [process 360] will be detached.
>>
>> Quit anyway? (y or n) y
>> Detaching from program: /root/git/fio/t/io_uring, process 360
>> [Inferior 1 (process 360) detached]
>>
>> The iou-mgr-x threads are stopped just fine, gdb obviously can't get any
>> real info out of them. But it works... Regular test cases work fine too,
>> just a sanity check. Didn't expect them not to.
> 
> I guess that's basically what I tried to describe when I said they
> should look like a userspace process that is blocked in a syscall
> forever.

Right, that's almost what they look like, in practice that is what they
look like.

>> Only thing that I dislike a bit, but I guess that's just a Linuxism, is
>> that if can now kill an io_uring owning task by sending a signal to one
>> of its IO thread workers.
> 
> Can't we just only allow SIGSTOP, which will be only delivered to
> the iothread itself? And also SIGKILL should not be allowed from userspace.

I don't think we can sanely block them, and we to cleanup and teardown
normally regardless of who gets the signal (owner or one of the
threads). So I'm not _too_ hung up on the "io thread gets signal goes to
owner" as that is what happens with normal threads too, though I would
prefer if that wasn't the case. But overall I feel better just embracing
the thread model, rather than having something that kinda sorta looks
like a thread, but differs in odd ways.

> And /proc/$iothread/ should be read only and owned by root with
> "cmdline" and "exe" being empty.

I know you brought this one up as part of your series, not sure I get
why you want it owned by root and read-only? cmdline and exe, yeah those
could be hidden, but is there really any point?

Maybe I'm missing something here, if so, do clue me in!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-26  0:11               ` Jens Axboe
@ 2021-03-26 11:59                 ` Stefan Metzmacher
  2021-04-01 14:40                   ` Stefan Metzmacher
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Metzmacher @ 2021-03-26 11:59 UTC (permalink / raw)
  To: Jens Axboe, Linus Torvalds, Eric W. Biederman
  Cc: io-uring, Linux Kernel Mailing List, Oleg Nesterov


Hi Jens,

>> And /proc/$iothread/ should be read only and owned by root with
>> "cmdline" and "exe" being empty.
> 
> I know you brought this one up as part of your series, not sure I get
> why you want it owned by root and read-only? cmdline and exe, yeah those
> could be hidden, but is there really any point?
> 
> Maybe I'm missing something here, if so, do clue me in!

I looked through /proc and I think it's mostly similar to
the unshare() case, if userspace wants to do stupid things
like changing "comm" of iothreads, it gets what was asked for.

But the "cmdline" hiding would be very useful.

While most tools use "comm", by default.

ps -eLf or 'iotop' use "cmdline".

Some processes use setproctitle to change "cmdline" in order
to identify the process better, without the 15 chars comm restriction,
that's why I very often press 'c' in 'top' to see the cmdline,
in that case it would be very helpful to see '[iou-wrk-1234]'
instead of the seeing the cmdline.

So I'd very much prefer if this could be applied:
https://lore.kernel.org/io-uring/d4487f959c778d0b1d4c5738b75bcff17d21df5b.1616197787.git.metze@samba.org/T/#u

If you want I can add a comment and a more verbose commit message...

metze

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/
  2021-03-26 11:59                 ` Stefan Metzmacher
@ 2021-04-01 14:40                   ` Stefan Metzmacher
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Metzmacher @ 2021-04-01 14:40 UTC (permalink / raw)
  To: Jens Axboe, Linus Torvalds, Eric W. Biederman
  Cc: io-uring, Linux Kernel Mailing List, Oleg Nesterov

Hi Jens,

>> I know you brought this one up as part of your series, not sure I get
>> why you want it owned by root and read-only? cmdline and exe, yeah those
>> could be hidden, but is there really any point?
>>
>> Maybe I'm missing something here, if so, do clue me in!
> 
> I looked through /proc and I think it's mostly similar to
> the unshare() case, if userspace wants to do stupid things
> like changing "comm" of iothreads, it gets what was asked for.
> 
> But the "cmdline" hiding would be very useful.
> 
> While most tools use "comm", by default.
> 
> ps -eLf or 'iotop' use "cmdline".
> 
> Some processes use setproctitle to change "cmdline" in order
> to identify the process better, without the 15 chars comm restriction,
> that's why I very often press 'c' in 'top' to see the cmdline,
> in that case it would be very helpful to see '[iou-wrk-1234]'
> instead of the seeing the cmdline.
> 
> So I'd very much prefer if this could be applied:
> https://lore.kernel.org/io-uring/d4487f959c778d0b1d4c5738b75bcff17d21df5b.1616197787.git.metze@samba.org/T/#u
> 
> If you want I can add a comment and a more verbose commit message...

I noticed that 'iotop' actually appends ' [iou-wrk-1234]' to the cmdline value,
so that leaves us with 'ps -eLf' and 'top' (with 'c').

pstree -a -t -p is also fine:
      │   └─io_uring-cp,1315 /root/kernel/linux-image-5.12.0-rc2+-dbg_5.12.0-rc2+-5_amd64.deb file
      │       ├─{iou-mgr-1315},1316
      │       ├─{iou-wrk-1315},1317
      │       ├─{iou-wrk-1315},1318
      │       ├─{iou-wrk-1315},1319
      │       ├─{iou-wrk-1315},1320


In the spirit of "avoid special PF_IO_WORKER checks" I guess it's ok
to leave of as is...

metze

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-04-01 18:19 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-03-25 16:43 [PATCH 0/2] Don't show PF_IO_WORKER in /proc/<pid>/task/ Jens Axboe
2021-03-25 16:43 ` [PATCH 1/2] kernel: don't include PF_IO_WORKERs as part of same_thread_group() Jens Axboe
2021-03-25 16:43 ` [PATCH 2/2] proc: don't show PF_IO_WORKER threads as threads in /proc/<pid>/task/ Jens Axboe
2021-03-25 19:33 ` [PATCH 0/2] Don't show PF_IO_WORKER " Eric W. Biederman
2021-03-25 19:38   ` Linus Torvalds
2021-03-25 19:40     ` Jens Axboe
2021-03-25 19:42     ` Linus Torvalds
2021-03-25 19:46       ` Jens Axboe
2021-03-25 20:21         ` Eric W. Biederman
2021-03-25 20:40           ` Oleg Nesterov
2021-03-25 20:43             ` Jens Axboe
2021-03-25 20:48             ` Eric W. Biederman
2021-03-25 20:42           ` Jens Axboe
2021-03-25 20:12       ` Linus Torvalds
2021-03-25 20:40         ` Jens Axboe
2021-03-25 21:44           ` Jens Axboe
2021-03-25 21:57             ` Stefan Metzmacher
2021-03-26  0:11               ` Jens Axboe
2021-03-26 11:59                 ` Stefan Metzmacher
2021-04-01 14:40                   ` Stefan Metzmacher
2021-03-25 22:37             ` Linus Torvalds
2021-03-26  0:08               ` Jens Axboe
2021-03-25 20:43         ` Eric W. Biederman
2021-03-25 21:50           ` Jens Axboe
2021-03-25 20:44         ` Oleg Nesterov
2021-03-25 20:55           ` Eric W. Biederman
2021-03-25 21:20             ` Stefan Metzmacher
2021-03-25 21:48               ` Stefan Metzmacher
2021-03-25 19:40   ` Jens Axboe
2021-03-25 20:32     ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox