public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v2 RESEND] io_uring/fdinfo: add timeout_list to fdinfo
       [not found] <CGME20240925085815epcas5p16fa977581284a81dae7b67da8bc96a85@epcas5p1.samsung.com>
@ 2024-09-25  8:58 ` Ruyi Zhang
  2024-09-25 11:58   ` Pavel Begunkov
  0 siblings, 1 reply; 9+ messages in thread
From: Ruyi Zhang @ 2024-09-25  8:58 UTC (permalink / raw)
  To: axboe, asml.silence; +Cc: io-uring, linux-kernel, peiwei.li, Ruyi Zhang

io_uring fdinfo contains most of the runtime information,which is
helpful for debugging io_uring applications; However, there is
currently a lack of timeout-related information, and this patch adds
timeout_list information.

--
changes since v1:
- use _irq version spin_lock.
- Fixed formatting issues and delete redundant code.
- v1 :https://lore.kernel.org/io-uring/[email protected]/
--

Signed-off-by: Ruyi Zhang <[email protected]>
---
 io_uring/fdinfo.c  | 14 ++++++++++++++
 io_uring/timeout.c | 12 ------------
 io_uring/timeout.h | 12 ++++++++++++
 3 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index d43e1b5fcb36..f524c3cd6f57 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -14,6 +14,7 @@
 #include "fdinfo.h"
 #include "cancel.h"
 #include "rsrc.h"
+#include "timeout.h"
 
 #ifdef CONFIG_PROC_FS
 static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id,
@@ -55,6 +56,7 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *file)
 	struct io_ring_ctx *ctx = file->private_data;
 	struct io_overflow_cqe *ocqe;
 	struct io_rings *r = ctx->rings;
+	struct io_timeout *timeout;
 	struct rusage sq_usage;
 	unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1;
 	unsigned int sq_head = READ_ONCE(r->sq.head);
@@ -235,5 +237,17 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *file)
 		seq_puts(m, "NAPI:\tdisabled\n");
 	}
 #endif
+
+	seq_puts(m, "TimeoutList:\n");
+	spin_lock_irq(&ctx->timeout_lock);
+	list_for_each_entry(timeout, &ctx->timeout_list, list) {
+		struct io_timeout_data *data;
+
+		data = cmd_to_io_kiocb(timeout)->async_data;
+		seq_printf(m, "  off=%u, repeats=%u, sec=%lld, nsec=%ld\n",
+			   timeout->off, timeout->repeats, data->ts.tv_sec,
+			   data->ts.tv_nsec);
+	}
+	spin_unlock_irq(&ctx->timeout_lock);
 }
 #endif
diff --git a/io_uring/timeout.c b/io_uring/timeout.c
index 9973876d91b0..4449e139e371 100644
--- a/io_uring/timeout.c
+++ b/io_uring/timeout.c
@@ -13,18 +13,6 @@
 #include "cancel.h"
 #include "timeout.h"
 
-struct io_timeout {
-	struct file			*file;
-	u32				off;
-	u32				target_seq;
-	u32				repeats;
-	struct list_head		list;
-	/* head of the link, used by linked timeouts only */
-	struct io_kiocb			*head;
-	/* for linked completions */
-	struct io_kiocb			*prev;
-};
-
 struct io_timeout_rem {
 	struct file			*file;
 	u64				addr;
diff --git a/io_uring/timeout.h b/io_uring/timeout.h
index a6939f18313e..befd489a6286 100644
--- a/io_uring/timeout.h
+++ b/io_uring/timeout.h
@@ -1,5 +1,17 @@
 // SPDX-License-Identifier: GPL-2.0
 
+struct io_timeout {
+	struct file			*file;
+	u32				off;
+	u32				target_seq;
+	u32				repeats;
+	struct list_head		list;
+	/* head of the link, used by linked timeouts only */
+	struct io_kiocb			*head;
+	/* for linked completions */
+	struct io_kiocb			*prev;
+};
+
 struct io_timeout_data {
 	struct io_kiocb			*req;
 	struct hrtimer			timer;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 RESEND] io_uring/fdinfo: add timeout_list to fdinfo
  2024-09-25  8:58 ` [PATCH v2 RESEND] io_uring/fdinfo: add timeout_list to fdinfo Ruyi Zhang
@ 2024-09-25 11:58   ` Pavel Begunkov
       [not found]     ` <CGME20241010092012epcas5p2bc333a1f880209003523e71d97ba3298@epcas5p2.samsung.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2024-09-25 11:58 UTC (permalink / raw)
  To: Ruyi Zhang, axboe; +Cc: io-uring, linux-kernel, peiwei.li

On 9/25/24 09:58, Ruyi Zhang wrote:
> io_uring fdinfo contains most of the runtime information,which is
> helpful for debugging io_uring applications; However, there is
> currently a lack of timeout-related information, and this patch adds
> timeout_list information.

Please refer to unaddressed comments from v1. We can't have irqs
disabled for that long. And it's too verbose (i.e. depends on
the number of timeouts).


> --
> changes since v1:
> - use _irq version spin_lock.
> - Fixed formatting issues and delete redundant code.
> - v1 :https://lore.kernel.org/io-uring/[email protected]/
> --
> 
> Signed-off-by: Ruyi Zhang <[email protected]>

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Re: [PATCH v2 RESEND] io_uring/fdinfo: add timeout_list to fdinfo
       [not found]     ` <CGME20241010092012epcas5p2bc333a1f880209003523e71d97ba3298@epcas5p2.samsung.com>
@ 2024-10-10  9:20       ` Ruyi Zhang
  2024-10-10 15:35         ` Pavel Begunkov
  0 siblings, 1 reply; 9+ messages in thread
From: Ruyi Zhang @ 2024-10-10  9:20 UTC (permalink / raw)
  To: asml.silence, axboe; +Cc: io-uring, linux-kernel, peiwei.li, ruyi.zhang

---
On 25 Sep 2024 12:58 Pavel Begunkov wrote
> On 9/25/24 09:58, Ruyi Zhang wrote:
>> io_uring fdinfo contains most of the runtime information,which is
>> helpful for debugging io_uring applications; However, there is
>> currently a lack of timeout-related information, and this patch adds
>> timeout_list information.

> Please refer to unaddressed comments from v1. We can't have irqs
> disabled for that long. And it's too verbose (i.e. depends on
> the number of timeouts).

Two questions:

1. I agree with you, we shouldn't walk a potentially very long list
under spinlock. but i can't find any other way to get all the timeout
information than to walk the timeout_list. Do you have any good ideas?

2. I also agree seq_printf heavier, if we use seq_put_decimal_ull and
seq_puts to concatenate strings, I haven't tested whether it's more
efficient or not, but the code is certainly not as readable as the
former. It's also possible that I don't fully understand what you mean
and want to hear your opinion.

---
Ruyi Zhang

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 RESEND] io_uring/fdinfo: add timeout_list to fdinfo
  2024-10-10  9:20       ` Ruyi Zhang
@ 2024-10-10 15:35         ` Pavel Begunkov
       [not found]           ` <CGME20241012091032epcas5p2dec0e3db5a72854f4566b251791b84ad@epcas5p2.samsung.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2024-10-10 15:35 UTC (permalink / raw)
  To: Ruyi Zhang, axboe; +Cc: io-uring, linux-kernel, peiwei.li

On 10/10/24 10:20, Ruyi Zhang wrote:
> ---
> On 25 Sep 2024 12:58 Pavel Begunkov wrote
>> On 9/25/24 09:58, Ruyi Zhang wrote:
>>> io_uring fdinfo contains most of the runtime information,which is
>>> helpful for debugging io_uring applications; However, there is
>>> currently a lack of timeout-related information, and this patch adds
>>> timeout_list information.
> 
>> Please refer to unaddressed comments from v1. We can't have irqs
>> disabled for that long. And it's too verbose (i.e. depends on
>> the number of timeouts).
> 
> Two questions:
> 
> 1. I agree with you, we shouldn't walk a potentially very long list
> under spinlock. but i can't find any other way to get all the timeout

If only it's just under the spin, but with disabled irqs...

> information than to walk the timeout_list. Do you have any good ideas?

In the long run it'd be great to replace the spinlock
with a mutex, i.e. just ->uring_lock, but that would might be
a bit involving as need to move handling to the task context.

> 2. I also agree seq_printf heavier, if we use seq_put_decimal_ull and
> seq_puts to concatenate strings, I haven't tested whether it's more
> efficient or not, but the code is certainly not as readable as the
> former. It's also possible that I don't fully understand what you mean
> and want to hear your opinion.

I don't think there is any difference, it'd be a matter of
doubling the number of in flight timeouts to achieve same
timings. Tell me, do you really have a good case where you
need that (pretty verbose)? Why not drgn / bpftrace it out
of the kernel instead?

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 RESEND] io_uring/fdinfo: add timeout_list to fdinfo
       [not found]           ` <CGME20241012091032epcas5p2dec0e3db5a72854f4566b251791b84ad@epcas5p2.samsung.com>
@ 2024-10-12  9:10             ` Ruyi Zhang
  2024-10-24 17:31               ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Ruyi Zhang @ 2024-10-12  9:10 UTC (permalink / raw)
  To: asml.silence; +Cc: axboe, io-uring, linux-kernel, peiwei.li, ruyi.zhang

---
On 2024-10-10 15:35 Pavel Begunkov wrote:
>> Two questions:
>> 
>> 1. I agree with you, we shouldn't walk a potentially very
>> long list under spinlock. but i can't find any other way
>> to get all the timeout

> If only it's just under the spin, but with disabled irqs...

>> information than to walk the timeout_list. Do you have any
>> good ideas?

> In the long run it'd be great to replace the spinlock
> with a mutex, i.e. just ->uring_lock, but that would might be
> a bit involving as need to move handling to the task context.
 
 Yes, it makes more sense to replace spin_lock, but that would
 require other related logic to be modified, and I don't think
 it's wise to do that for the sake of a piece of debugging
 information.

>> 2. I also agree seq_printf heavier, if we use
>> seq_put_decimal_ull and seq_puts to concatenate strings,
>> I haven't tested whether it's more efficient or not, but
>> the code is certainly not as readable as the former. It's
>> also possible that I don't fully understand what you mean
>> and want to hear your opinion.

> I don't think there is any difference, it'd be a matter of
> doubling the number of in flight timeouts to achieve same
> timings. Tell me, do you really have a good case where you
> need that (pretty verbose)? Why not drgn / bpftrace it out
> of the kernel instead?

 Of course, this information is available through existing tools.
 But I think that most of the io_uring metadata has been exported
 from the fdinfo file, and the purpose of adding the timeout
 information is the same as before, easier to use. This way, 
 I don't have to write additional scripts to get all kinds of data.

 And as far as I know, the io_uring_show_fdinfo function is
 only called once when the user is viewing the 
 /proc/xxx/fdinfo/x file once. I don't think we normally need to 
 look at this file as often, and only look at it when the program
 is abnormal, and the timeout_list is very long in the extreme case,
 so I think the performance impact of adding this code is limited.

---
Ruyi Zhang

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 RESEND] io_uring/fdinfo: add timeout_list to fdinfo
  2024-10-12  9:10             ` Ruyi Zhang
@ 2024-10-24 17:31               ` Jens Axboe
  2024-10-24 18:10                 ` Pavel Begunkov
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2024-10-24 17:31 UTC (permalink / raw)
  To: Ruyi Zhang; +Cc: asml.silence, io-uring, linux-kernel, peiwei.li, ruyi.zhang

On Sat, Oct 12, 2024 at 3:30?AM Ruyi Zhang <[email protected]> wrote:
>
> ---
> On 2024-10-10 15:35 Pavel Begunkov wrote:
> >> Two questions:
> >>
> >> 1. I agree with you, we shouldn't walk a potentially very
> >> long list under spinlock. but i can't find any other way
> >> to get all the timeout
>
> > If only it's just under the spin, but with disabled irqs...
>
> >> information than to walk the timeout_list. Do you have any
> >> good ideas?
>
> > In the long run it'd be great to replace the spinlock
> > with a mutex, i.e. just ->uring_lock, but that would might be
> > a bit involving as need to move handling to the task context.
>
>  Yes, it makes more sense to replace spin_lock, but that would
>  require other related logic to be modified, and I don't think
>  it's wise to do that for the sake of a piece of debugging
>  information.
>
> >> 2. I also agree seq_printf heavier, if we use
> >> seq_put_decimal_ull and seq_puts to concatenate strings,
> >> I haven't tested whether it's more efficient or not, but
> >> the code is certainly not as readable as the former. It's
> >> also possible that I don't fully understand what you mean
> >> and want to hear your opinion.
>
> > I don't think there is any difference, it'd be a matter of
> > doubling the number of in flight timeouts to achieve same
> > timings. Tell me, do you really have a good case where you
> > need that (pretty verbose)? Why not drgn / bpftrace it out
> > of the kernel instead?
>
>  Of course, this information is available through existing tools.
>  But I think that most of the io_uring metadata has been exported
>  from the fdinfo file, and the purpose of adding the timeout
>  information is the same as before, easier to use. This way,
>  I don't have to write additional scripts to get all kinds of data.
>
>  And as far as I know, the io_uring_show_fdinfo function is
>  only called once when the user is viewing the
>  /proc/xxx/fdinfo/x file once. I don't think we normally need to
>  look at this file as often, and only look at it when the program
>  is abnormal, and the timeout_list is very long in the extreme case,
>  so I think the performance impact of adding this code is limited.

I do think it's useful, sometimes the only thing you have to poke at
after-the-fact is the fdinfo information. At the same time, would it be
more useful to dump _some_ of the info, even if we can't get all of it?
Would not be too hard to just stop dumping if need_resched() is set, and
even note that - you can always retry, as this info is generally grabbed
from the console anyway, not programmatically. That avoids the worst
possible scenario, which is a malicious setup with a shit ton of pending
timers, while still allowing it to be useful for a normal setup. And
this patch could just do that, rather than attempt to re-architect how
the timers are tracked and which locking it uses.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 RESEND] io_uring/fdinfo: add timeout_list to fdinfo
  2024-10-24 17:31               ` Jens Axboe
@ 2024-10-24 18:10                 ` Pavel Begunkov
  2024-10-24 23:25                   ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2024-10-24 18:10 UTC (permalink / raw)
  To: Jens Axboe, Ruyi Zhang; +Cc: io-uring, linux-kernel, peiwei.li

On 10/24/24 18:31, Jens Axboe wrote:
> On Sat, Oct 12, 2024 at 3:30?AM Ruyi Zhang <[email protected]> wrote:
...
>>> I don't think there is any difference, it'd be a matter of
>>> doubling the number of in flight timeouts to achieve same
>>> timings. Tell me, do you really have a good case where you
>>> need that (pretty verbose)? Why not drgn / bpftrace it out
>>> of the kernel instead?
>>
>>   Of course, this information is available through existing tools.
>>   But I think that most of the io_uring metadata has been exported
>>   from the fdinfo file, and the purpose of adding the timeout
>>   information is the same as before, easier to use. This way,
>>   I don't have to write additional scripts to get all kinds of data.
>>
>>   And as far as I know, the io_uring_show_fdinfo function is
>>   only called once when the user is viewing the
>>   /proc/xxx/fdinfo/x file once. I don't think we normally need to
>>   look at this file as often, and only look at it when the program
>>   is abnormal, and the timeout_list is very long in the extreme case,
>>   so I think the performance impact of adding this code is limited.
> 
> I do think it's useful, sometimes the only thing you have to poke at
> after-the-fact is the fdinfo information. At the same time, would it be

If you have an fd to print fdinfo, you can just well run drgn
or any other debugging tool. We keep pushing more debugging code
that can be extracted with bpf and other tools, and not only
it bloats the code, but potentially cripples the entire kernel.

> more useful to dump _some_ of the info, even if we can't get all of it?
> Would not be too hard to just stop dumping if need_resched() is set, and

need_resched() takes eternity in the eyes of hard irqs, that is
surely one way to make the system unusable. Will we even get the
request for rescheduling considering that irqs are off => timers
can't run?

> even note that - you can always retry, as this info is generally grabbed
> from the console anyway, not programmatically. That avoids the worst
> possible scenario, which is a malicious setup with a shit ton of pending
> timers, while still allowing it to be useful for a normal setup. And
> this patch could just do that, rather than attempt to re-architect how
> the timers are tracked and which locking it uses.

Or it can be done with one of the existing tools that already
exist specifically for that purpose, which don't need any additional
kernel and custom handling in the kernel, and users won't need to
wait until the patch lands into your kernel and can be run right
away.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 RESEND] io_uring/fdinfo: add timeout_list to fdinfo
  2024-10-24 18:10                 ` Pavel Begunkov
@ 2024-10-24 23:25                   ` Jens Axboe
  2024-10-30  1:29                     ` Pavel Begunkov
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2024-10-24 23:25 UTC (permalink / raw)
  To: Pavel Begunkov, Ruyi Zhang; +Cc: io-uring, linux-kernel, peiwei.li

On 10/24/24 12:10 PM, Pavel Begunkov wrote:
> On 10/24/24 18:31, Jens Axboe wrote:
>> On Sat, Oct 12, 2024 at 3:30?AM Ruyi Zhang <[email protected]> wrote:
> ...
>>>> I don't think there is any difference, it'd be a matter of
>>>> doubling the number of in flight timeouts to achieve same
>>>> timings. Tell me, do you really have a good case where you
>>>> need that (pretty verbose)? Why not drgn / bpftrace it out
>>>> of the kernel instead?
>>>
>>>   Of course, this information is available through existing tools.
>>>   But I think that most of the io_uring metadata has been exported
>>>   from the fdinfo file, and the purpose of adding the timeout
>>>   information is the same as before, easier to use. This way,
>>>   I don't have to write additional scripts to get all kinds of data.
>>>
>>>   And as far as I know, the io_uring_show_fdinfo function is
>>>   only called once when the user is viewing the
>>>   /proc/xxx/fdinfo/x file once. I don't think we normally need to
>>>   look at this file as often, and only look at it when the program
>>>   is abnormal, and the timeout_list is very long in the extreme case,
>>>   so I think the performance impact of adding this code is limited.
>>
>> I do think it's useful, sometimes the only thing you have to poke at
>> after-the-fact is the fdinfo information. At the same time, would it be
> 
> If you have an fd to print fdinfo, you can just well run drgn
> or any other debugging tool. We keep pushing more debugging code
> that can be extracted with bpf and other tools, and not only
> it bloats the code, but potentially cripples the entire kernel.

While that is certainly true, it's also a much harder barrier to entry.
If you're already setup with eg drgn, then yeah fdinfo is useless as you
can grab much more info out by just using drgn.

I'm fine punting this to "needs more advanced debugging than fdinfo".
It's just important we get closure on these patches, so they don't
linger forever in no man's land.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 RESEND] io_uring/fdinfo: add timeout_list to fdinfo
  2024-10-24 23:25                   ` Jens Axboe
@ 2024-10-30  1:29                     ` Pavel Begunkov
  0 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2024-10-30  1:29 UTC (permalink / raw)
  To: Jens Axboe, Ruyi Zhang; +Cc: io-uring, linux-kernel, peiwei.li

On 10/25/24 00:25, Jens Axboe wrote:
> On 10/24/24 12:10 PM, Pavel Begunkov wrote:
>> On 10/24/24 18:31, Jens Axboe wrote:
>>> On Sat, Oct 12, 2024 at 3:30?AM Ruyi Zhang <[email protected]> wrote:
>> ...
>>>>> I don't think there is any difference, it'd be a matter of
>>>>> doubling the number of in flight timeouts to achieve same
>>>>> timings. Tell me, do you really have a good case where you
>>>>> need that (pretty verbose)? Why not drgn / bpftrace it out
>>>>> of the kernel instead?
>>>>
>>>>    Of course, this information is available through existing tools.
>>>>    But I think that most of the io_uring metadata has been exported
>>>>    from the fdinfo file, and the purpose of adding the timeout
>>>>    information is the same as before, easier to use. This way,
>>>>    I don't have to write additional scripts to get all kinds of data.
>>>>
>>>>    And as far as I know, the io_uring_show_fdinfo function is
>>>>    only called once when the user is viewing the
>>>>    /proc/xxx/fdinfo/x file once. I don't think we normally need to
>>>>    look at this file as often, and only look at it when the program
>>>>    is abnormal, and the timeout_list is very long in the extreme case,
>>>>    so I think the performance impact of adding this code is limited.
>>>
>>> I do think it's useful, sometimes the only thing you have to poke at
>>> after-the-fact is the fdinfo information. At the same time, would it be
>>
>> If you have an fd to print fdinfo, you can just well run drgn
>> or any other debugging tool. We keep pushing more debugging code
>> that can be extracted with bpf and other tools, and not only
>> it bloats the code, but potentially cripples the entire kernel.
> 
> While that is certainly true, it's also a much harder barrier to entry.
> If you're already setup with eg drgn, then yeah fdinfo is useless as you
> can grab much more info out by just using drgn.

drgn is simple, not that harder than patching fdinfo, we can add
liburing/scripts, and push it there so that don't need rewriting
it each time.

> I'm fine punting this to "needs more advanced debugging than fdinfo".
> It's just important we get closure on these patches, so they don't
> linger forever in no man's land.

The only option I see is to dump first ~5 and stop there, but
I still think the tooling option is better.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-10-30  1:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CGME20240925085815epcas5p16fa977581284a81dae7b67da8bc96a85@epcas5p1.samsung.com>
2024-09-25  8:58 ` [PATCH v2 RESEND] io_uring/fdinfo: add timeout_list to fdinfo Ruyi Zhang
2024-09-25 11:58   ` Pavel Begunkov
     [not found]     ` <CGME20241010092012epcas5p2bc333a1f880209003523e71d97ba3298@epcas5p2.samsung.com>
2024-10-10  9:20       ` Ruyi Zhang
2024-10-10 15:35         ` Pavel Begunkov
     [not found]           ` <CGME20241012091032epcas5p2dec0e3db5a72854f4566b251791b84ad@epcas5p2.samsung.com>
2024-10-12  9:10             ` Ruyi Zhang
2024-10-24 17:31               ` Jens Axboe
2024-10-24 18:10                 ` Pavel Begunkov
2024-10-24 23:25                   ` Jens Axboe
2024-10-30  1:29                     ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox