Re: [PATCH 06/13] fuse: Add an interval ring stop worker/monitor

public inbox for [email protected]
 help / color / mirror / Atom feed

* Re: [PATCH 06/13] fuse: Add an interval ring stop worker/monitor
       [not found]       ` <CAJfpeguvCNUEbcy6VQzVJeNOsnNqfDS=LyRaGvSiDTGerB+iuw@mail.gmail.com>
@ 2023-03-23 13:26         ` Ming Lei
       [not found]         ` <[email protected]>
  1 sibling, 0 replies; 4+ messages in thread
From: Ming Lei @ 2023-03-23 13:26 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, [email protected], Dharmendra Singh,
	Amir Goldstein, [email protected],
	Aleksandr Mikhalitsyn, io-uring, Jens Axboe

On Thu, Mar 23, 2023 at 01:35:24PM +0100, Miklos Szeredi wrote:
> On Thu, 23 Mar 2023 at 12:04, Bernd Schubert <[email protected]> wrote:
> >
> > Thanks for looking at these patches!
> >
> > I'm adding in Ming Lei, as I had taken several ideas from ublkm I guess
> > I also should also explain in the commit messages and code why it is
> > done that way.
> >
> > On 3/23/23 11:27, Miklos Szeredi wrote:
> > > On Tue, 21 Mar 2023 at 02:11, Bernd Schubert <[email protected]> wrote:
> > >>
> > >> This adds a delayed work queue that runs in intervals
> > >> to check and to stop the ring if needed. Fuse connection
> > >> abort now waits for this worker to complete.
> > >
> > > This seems like a hack.   Can you explain what the problem is?
> > >
> > > The first thing I notice is that you store a reference to the task
> > > that initiated the ring creation.  This already looks fishy, as the
> > > ring could well survive the task (thread) that created it, no?
> >
> > You mean the currently ongoing work, where the daemon can be restarted?
> > Daemon restart will need some work with ring communication, I will take
> > care of that once we have agreed on an approach. [Also added in Alexsandre].
> >
> > fuse_uring_stop_mon() checks if the daemon process is exiting and and
> > looks at fc->ring.daemon->flags & PF_EXITING - this is what the process
> > reference is for.
> 
> Okay, so you are saying that the lifetime of the ring is bound to the
> lifetime of the thread that created it?
> 
> Why is that?

Cc Jens and io_uring list

For ublk:

1) it is MQ device, it is natural to map queue into pthread/uring

2) io_uring context is invisible to driver, we don't know when it is destructed,
so bind io_uring context with queue/pthread, because we have to complete all
uring commands before io_uring context exits. uring cmd usage for ublk/fuse should
be special and unique, and it is like poll request, and sent to device beforehand,
and it is completed only if driver has incoming thing which needs userspace to handle,
but ublk/fuse may never have anyting which needs userpace to look.

If io_uring can provides API for registering exit callback, things could be easier
for ublk/fuse. However, we still need to know the exact io_uring context associated
with our commands, so either more io_uring implementation details exposed to driver,
or proper APIs provided.

> 
> I'ts much more common to bind a lifetime of an object to that of an
> open file.  io_uring_setup() will do that for example.
> 
> It's much easier to hook into the destruction of an open file, than
> into the destruction of a process (as you've observed). And the way
> you do it is even more confusing as the ring is destroyed not when the
> process is destroyed, but when a specific thread is destroyed, making
> this a thread specific behavior that is probably best avoided.
> 
> So the obvious solution would be to destroy the ring(s) in
> fuse_dev_release().  Why wouldn't that work?

io uring is used for submitting multiple files, so its lifetime can't be bound
to file, also io_uring is invisible to driver if here the ring(s) means io_uring.


thanks,
Ming


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 06/13] fuse: Add an interval ring stop worker/monitor
       [not found]         ` <[email protected]>
@ 2023-03-23 20:51           ` Bernd Schubert
  2023-03-27 13:22             ` Pavel Begunkov
  0 siblings, 1 reply; 4+ messages in thread
From: Bernd Schubert @ 2023-03-23 20:51 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: [email protected], Dharmendra Singh, Amir Goldstein,
	[email protected], Ming Lei, Aleksandr Mikhalitsyn,
	[email protected], Jens Axboe

On 3/23/23 14:18, Bernd Schubert wrote:
> On 3/23/23 13:35, Miklos Szeredi wrote:
>> On Thu, 23 Mar 2023 at 12:04, Bernd Schubert <[email protected]> wrote:
>>>
>>> Thanks for looking at these patches!
>>>
>>> I'm adding in Ming Lei, as I had taken several ideas from ublkm I guess
>>> I also should also explain in the commit messages and code why it is
>>> done that way.
>>>
>>> On 3/23/23 11:27, Miklos Szeredi wrote:
>>>> On Tue, 21 Mar 2023 at 02:11, Bernd Schubert <[email protected]> wrote:
>>>>>
>>>>> This adds a delayed work queue that runs in intervals
>>>>> to check and to stop the ring if needed. Fuse connection
>>>>> abort now waits for this worker to complete.
>>>>
>>>> This seems like a hack.   Can you explain what the problem is?
>>>>
>>>> The first thing I notice is that you store a reference to the task
>>>> that initiated the ring creation.  This already looks fishy, as the
>>>> ring could well survive the task (thread) that created it, no?
>>>
>>> You mean the currently ongoing work, where the daemon can be restarted?
>>> Daemon restart will need some work with ring communication, I will take
>>> care of that once we have agreed on an approach. [Also added in 
>>> Alexsandre].
>>>
>>> fuse_uring_stop_mon() checks if the daemon process is exiting and and
>>> looks at fc->ring.daemon->flags & PF_EXITING - this is what the process
>>> reference is for.
>>
>> Okay, so you are saying that the lifetime of the ring is bound to the
>> lifetime of the thread that created it?
>>
>> Why is that?
>>
>> I'ts much more common to bind a lifetime of an object to that of an
>> open file.  io_uring_setup() will do that for example.
>>
>> It's much easier to hook into the destruction of an open file, than
>> into the destruction of a process (as you've observed). And the way
>> you do it is even more confusing as the ring is destroyed not when the
>> process is destroyed, but when a specific thread is destroyed, making
>> this a thread specific behavior that is probably best avoided.
>>
>> So the obvious solution would be to destroy the ring(s) in
>> fuse_dev_release().  Why wouldn't that work?
>>
> 
> I _think_ I had tried it at the beginning and run into issues and then 
> switched the ublk approach. Going to try again now.
> 

Found the reason why I complete SQEs when the daemon stops - on daemon 
side I have

ret = io_uring_wait_cqe(&queue->ring, &cqe);

and that hangs when you stop user side with SIGTERM/SIGINT. Maybe that 
could be solved with io_uring_wait_cqe_timeout() / 
io_uring_wait_cqe_timeout(), but would that really be a good solution? 
We would now have CPU activity in intervals on the daemon side for now 
good reason - the more often the faster SIGTERM/SIGINT works.
So at best, it should be uring side that stops to wait on a receiving a 
signal.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 06/13] fuse: Add an interval ring stop worker/monitor
  2023-03-23 20:51           ` Bernd Schubert
@ 2023-03-27 13:22             ` Pavel Begunkov
  2023-03-27 14:02               ` Bernd Schubert
  0 siblings, 1 reply; 4+ messages in thread
From: Pavel Begunkov @ 2023-03-27 13:22 UTC (permalink / raw)
  To: Bernd Schubert, Miklos Szeredi
  Cc: [email protected], Dharmendra Singh, Amir Goldstein,
	[email protected], Ming Lei, Aleksandr Mikhalitsyn,
	[email protected], Jens Axboe

On 3/23/23 20:51, Bernd Schubert wrote:
> On 3/23/23 14:18, Bernd Schubert wrote:
>> On 3/23/23 13:35, Miklos Szeredi wrote:
>>> On Thu, 23 Mar 2023 at 12:04, Bernd Schubert <[email protected]> wrote:
[...]
> Found the reason why I complete SQEs when the daemon stops - on daemon
> side I have
> 
> ret = io_uring_wait_cqe(&queue->ring, &cqe);
> 
> and that hangs when you stop user side with SIGTERM/SIGINT. Maybe that
> could be solved with io_uring_wait_cqe_timeout() /
> io_uring_wait_cqe_timeout(), but would that really be a good solution?

It can be some sort of an eventfd triggered from the signal handler
and waited upon by an io_uring poll/read request. Or maybe signalfd.

> We would now have CPU activity in intervals on the daemon side for now
> good reason - the more often the faster SIGTERM/SIGINT works.
> So at best, it should be uring side that stops to wait on a receiving a
> signal.

FWIW, io_uring (i.e. kernel side) will stop waiting if there are pending
signals, and we'd need to check liburing to honour it, e.g. not to retry
waiting.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 06/13] fuse: Add an interval ring stop worker/monitor
  2023-03-27 13:22             ` Pavel Begunkov
@ 2023-03-27 14:02               ` Bernd Schubert
  0 siblings, 0 replies; 4+ messages in thread
From: Bernd Schubert @ 2023-03-27 14:02 UTC (permalink / raw)
  To: Pavel Begunkov, Miklos Szeredi
  Cc: [email protected], Dharmendra Singh, Amir Goldstein,
	[email protected], Ming Lei, Aleksandr Mikhalitsyn,
	[email protected], Jens Axboe

On 3/27/23 15:22, Pavel Begunkov wrote:
> On 3/23/23 20:51, Bernd Schubert wrote:
>> On 3/23/23 14:18, Bernd Schubert wrote:
>>> On 3/23/23 13:35, Miklos Szeredi wrote:
>>>> On Thu, 23 Mar 2023 at 12:04, Bernd Schubert <[email protected]> wrote:
> [...]
>> Found the reason why I complete SQEs when the daemon stops - on daemon
>> side I have
>>
>> ret = io_uring_wait_cqe(&queue->ring, &cqe);
>>
>> and that hangs when you stop user side with SIGTERM/SIGINT. Maybe that
>> could be solved with io_uring_wait_cqe_timeout() /
>> io_uring_wait_cqe_timeout(), but would that really be a good solution?
> 
> It can be some sort of an eventfd triggered from the signal handler
> and waited upon by an io_uring poll/read request. Or maybe signalfd.
> 
>> We would now have CPU activity in intervals on the daemon side for now
>> good reason - the more often the faster SIGTERM/SIGINT works.
>> So at best, it should be uring side that stops to wait on a receiving a
>> signal.
> 
> FWIW, io_uring (i.e. kernel side) will stop waiting if there are pending
> signals, and we'd need to check liburing to honour it, e.g. not to retry
> waiting.
> 

I'm going to check where and why it hangs, busy with something else 
today - by tomorrow I should know what happens.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-03-27 14:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <[email protected]>
     [not found] ` <[email protected]>
     [not found]   ` <CAJfpegs6z6pvepUx=3zfAYqisumri=2N-_A-nsYHQd62AQRahA@mail.gmail.com>
     [not found]     ` <[email protected]>
     [not found]       ` <CAJfpeguvCNUEbcy6VQzVJeNOsnNqfDS=LyRaGvSiDTGerB+iuw@mail.gmail.com>
2023-03-23 13:26         ` [PATCH 06/13] fuse: Add an interval ring stop worker/monitor Ming Lei
     [not found]         ` <[email protected]>
2023-03-23 20:51           ` Bernd Schubert
2023-03-27 13:22             ` Pavel Begunkov
2023-03-27 14:02               ` Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox