public inbox for [email protected]
 help / color / mirror / Atom feed
* Re: [PATCH 2/4] io_uring: io_uring: add support for async work inheriting files
       [not found] ` <[email protected]>
@ 2020-01-26 10:12   ` Andres Freund
  2020-01-26 17:10     ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Andres Freund @ 2020-01-26 10:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, io-uring, davem, netdev, jannh

Hi,

On 2019-10-25 11:30:35 -0600, Jens Axboe wrote:
> This is in preparation for adding opcodes that need to add new files
> in a process file table, system calls like open(2) or accept4(2).
> 
> If an opcode needs this, it must set IO_WQ_WORK_NEEDS_FILES in the work
> item. If work that needs to get punted to async context have this
> set, the async worker will assume the original task file table before
> executing the work.
> 
> Note that opcodes that need access to the current files of an
> application cannot be done through IORING_SETUP_SQPOLL.


Unfortunately this partially breaks sharing a uring across with forked
off processes, even though it initially appears to work:


> +static int io_uring_flush(struct file *file, void *data)
> +{
> +	struct io_ring_ctx *ctx = file->private_data;
> +
> +	io_uring_cancel_files(ctx, data);
> +	if (fatal_signal_pending(current) || (current->flags & PF_EXITING))
> +		io_wq_cancel_all(ctx->io_wq);
> +	return 0;
> +}

Once one process having the uring fd open (even if it were just a fork
never touching the uring, I believe) exits, this prevents the uring from
being usable for any async tasks. The process exiting closes the fd,
which triggers flush. io_wq_cancel_all() sets IO_WQ_BIT_CANCEL, which
never gets unset, which causes all future async sqes to be be
immediately returned as -ECANCELLED by the worker, via io_req_cancelled.

It's not clear to me why a close() should cancel the the wq (nor clear
the entire backlog, after 1d7bb1d50fb4)? Couldn't that even just be a
dup()ed fd? Or a fork that immediately exec()s?

After rudely ifdefing out the above if, and reverting 44d282796f81, my
WIP io_uring using version of postgres appears to pass its tests - which
are very sparse at this point - again with 5.5-rc7.

Greetings,

Andres Freund

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/4] io_uring: io_uring: add support for async work inheriting files
  2020-01-26 10:12   ` [PATCH 2/4] io_uring: io_uring: add support for async work inheriting files Andres Freund
@ 2020-01-26 17:10     ` Jens Axboe
  2020-01-26 17:17       ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2020-01-26 17:10 UTC (permalink / raw)
  To: Andres Freund; +Cc: linux-block, io-uring, davem, netdev, jannh

On 1/26/20 3:12 AM, Andres Freund wrote:
> Hi,
> 
> On 2019-10-25 11:30:35 -0600, Jens Axboe wrote:
>> This is in preparation for adding opcodes that need to add new files
>> in a process file table, system calls like open(2) or accept4(2).
>>
>> If an opcode needs this, it must set IO_WQ_WORK_NEEDS_FILES in the work
>> item. If work that needs to get punted to async context have this
>> set, the async worker will assume the original task file table before
>> executing the work.
>>
>> Note that opcodes that need access to the current files of an
>> application cannot be done through IORING_SETUP_SQPOLL.
> 
> 
> Unfortunately this partially breaks sharing a uring across with forked
> off processes, even though it initially appears to work:
> 
> 
>> +static int io_uring_flush(struct file *file, void *data)
>> +{
>> +	struct io_ring_ctx *ctx = file->private_data;
>> +
>> +	io_uring_cancel_files(ctx, data);
>> +	if (fatal_signal_pending(current) || (current->flags & PF_EXITING))
>> +		io_wq_cancel_all(ctx->io_wq);
>> +	return 0;
>> +}
> 
> Once one process having the uring fd open (even if it were just a fork
> never touching the uring, I believe) exits, this prevents the uring from
> being usable for any async tasks. The process exiting closes the fd,
> which triggers flush. io_wq_cancel_all() sets IO_WQ_BIT_CANCEL, which
> never gets unset, which causes all future async sqes to be be
> immediately returned as -ECANCELLED by the worker, via io_req_cancelled.
> 
> It's not clear to me why a close() should cancel the the wq (nor clear
> the entire backlog, after 1d7bb1d50fb4)? Couldn't that even just be a
> dup()ed fd? Or a fork that immediately exec()s?
> 
> After rudely ifdefing out the above if, and reverting 44d282796f81, my
> WIP io_uring using version of postgres appears to pass its tests - which
> are very sparse at this point - again with 5.5-rc7.

We need to cancel work items using the files from this process if it
exits, but I think we should be fine not canceling all work. Especially
since thet setting of IO_WQ_BIT_CANCEL is a one way street...  I'm assuming
the below works for you?

diff --git a/fs/io_uring.c b/fs/io_uring.c
index e5b502091804..e3ac2a6ff195 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5044,10 +5044,8 @@ static int io_uring_flush(struct file *file, void *data)
 	struct io_ring_ctx *ctx = file->private_data;
 
 	io_uring_cancel_files(ctx, data);
-	if (fatal_signal_pending(current) || (current->flags & PF_EXITING)) {
+	if (fatal_signal_pending(current) || (current->flags & PF_EXITING))
 		io_cqring_overflow_flush(ctx, true);
-		io_wq_cancel_all(ctx->io_wq);
-	}
 	return 0;
 }
 
-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/4] io_uring: io_uring: add support for async work inheriting files
  2020-01-26 17:10     ` Jens Axboe
@ 2020-01-26 17:17       ` Jens Axboe
  2020-01-26 20:07         ` Andres Freund
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2020-01-26 17:17 UTC (permalink / raw)
  To: Andres Freund; +Cc: linux-block, io-uring, davem, netdev, jannh

On 1/26/20 10:10 AM, Jens Axboe wrote:
> On 1/26/20 3:12 AM, Andres Freund wrote:
>> Hi,
>>
>> On 2019-10-25 11:30:35 -0600, Jens Axboe wrote:
>>> This is in preparation for adding opcodes that need to add new files
>>> in a process file table, system calls like open(2) or accept4(2).
>>>
>>> If an opcode needs this, it must set IO_WQ_WORK_NEEDS_FILES in the work
>>> item. If work that needs to get punted to async context have this
>>> set, the async worker will assume the original task file table before
>>> executing the work.
>>>
>>> Note that opcodes that need access to the current files of an
>>> application cannot be done through IORING_SETUP_SQPOLL.
>>
>>
>> Unfortunately this partially breaks sharing a uring across with forked
>> off processes, even though it initially appears to work:
>>
>>
>>> +static int io_uring_flush(struct file *file, void *data)
>>> +{
>>> +	struct io_ring_ctx *ctx = file->private_data;
>>> +
>>> +	io_uring_cancel_files(ctx, data);
>>> +	if (fatal_signal_pending(current) || (current->flags & PF_EXITING))
>>> +		io_wq_cancel_all(ctx->io_wq);
>>> +	return 0;
>>> +}
>>
>> Once one process having the uring fd open (even if it were just a fork
>> never touching the uring, I believe) exits, this prevents the uring from
>> being usable for any async tasks. The process exiting closes the fd,
>> which triggers flush. io_wq_cancel_all() sets IO_WQ_BIT_CANCEL, which
>> never gets unset, which causes all future async sqes to be be
>> immediately returned as -ECANCELLED by the worker, via io_req_cancelled.
>>
>> It's not clear to me why a close() should cancel the the wq (nor clear
>> the entire backlog, after 1d7bb1d50fb4)? Couldn't that even just be a
>> dup()ed fd? Or a fork that immediately exec()s?
>>
>> After rudely ifdefing out the above if, and reverting 44d282796f81, my
>> WIP io_uring using version of postgres appears to pass its tests - which
>> are very sparse at this point - again with 5.5-rc7.
> 
> We need to cancel work items using the files from this process if it
> exits, but I think we should be fine not canceling all work. Especially
> since thet setting of IO_WQ_BIT_CANCEL is a one way street...  I'm assuming
> the below works for you?

Could be even simpler, for shared ring setup, it also doesn't make any sense
to flush the cq ring on exit.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index e5b502091804..e54556b0fcc6 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5044,10 +5044,6 @@ static int io_uring_flush(struct file *file, void *data)
 	struct io_ring_ctx *ctx = file->private_data;
 
 	io_uring_cancel_files(ctx, data);
-	if (fatal_signal_pending(current) || (current->flags & PF_EXITING)) {
-		io_cqring_overflow_flush(ctx, true);
-		io_wq_cancel_all(ctx->io_wq);
-	}
 	return 0;
 }
 

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/4] io_uring: io_uring: add support for async work inheriting files
  2020-01-26 17:17       ` Jens Axboe
@ 2020-01-26 20:07         ` Andres Freund
  0 siblings, 0 replies; 4+ messages in thread
From: Andres Freund @ 2020-01-26 20:07 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, io-uring, davem, netdev, jannh

Hi,

On 2020-01-26 10:17:00 -0700, Jens Axboe wrote:
> On 1/26/20 10:10 AM, Jens Axboe wrote:
> >> Unfortunately this partially breaks sharing a uring across with forked
> >> off processes, even though it initially appears to work:
> >>
> >>
> >>> +static int io_uring_flush(struct file *file, void *data)
> >>> +{
> >>> +	struct io_ring_ctx *ctx = file->private_data;
> >>> +
> >>> +	io_uring_cancel_files(ctx, data);
> >>> +	if (fatal_signal_pending(current) || (current->flags & PF_EXITING))
> >>> +		io_wq_cancel_all(ctx->io_wq);
> >>> +	return 0;
> >>> +}
> >>
> >> Once one process having the uring fd open (even if it were just a fork
> >> never touching the uring, I believe) exits, this prevents the uring from
> >> being usable for any async tasks. The process exiting closes the fd,
> >> which triggers flush. io_wq_cancel_all() sets IO_WQ_BIT_CANCEL, which
> >> never gets unset, which causes all future async sqes to be be
> >> immediately returned as -ECANCELLED by the worker, via io_req_cancelled.
> >>
> >> It's not clear to me why a close() should cancel the the wq (nor clear
> >> the entire backlog, after 1d7bb1d50fb4)? Couldn't that even just be a
> >> dup()ed fd? Or a fork that immediately exec()s?
> >>
> >> After rudely ifdefing out the above if, and reverting 44d282796f81, my
> >> WIP io_uring using version of postgres appears to pass its tests - which
> >> are very sparse at this point - again with 5.5-rc7.
> > 
> > We need to cancel work items using the files from this process if it
> > exits, but I think we should be fine not canceling all work. Especially
> > since thet setting of IO_WQ_BIT_CANCEL is a one way street...  I'm assuming
> > the below works for you?
> 
> Could be even simpler, for shared ring setup, it also doesn't make any sense
> to flush the cq ring on exit.
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index e5b502091804..e54556b0fcc6 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5044,10 +5044,6 @@ static int io_uring_flush(struct file *file, void *data)
>  	struct io_ring_ctx *ctx = file->private_data;
>  
>  	io_uring_cancel_files(ctx, data);
> -	if (fatal_signal_pending(current) || (current->flags & PF_EXITING)) {
> -		io_cqring_overflow_flush(ctx, true);
> -		io_wq_cancel_all(ctx->io_wq);
> -	}
>  	return 0;
>  }

Yea, that's what I basically did locally, and it seems to work for my
uses.


It took me quite a while to understand why I wasn't seeing anything
actively causing requests inside the workqueue being cancelled, but
submissions to it continuing to succeed.  Wonder if it's a good idea to
continue enqueing new jobs if the WQ is already cancelled. E.g. during
release, afaict the sqpoll thread is still running, and could just
continue pumping work into the wq?  Perhaps the sqthread (and the enter
syscall? Not sure if possible) need to stop submitting once the ring is
being closed.


Btw, one of the reasons, besides not being familiar with the code, it
took me a while to figure out what was happening, is that the flags for
individual wqes and the whole wq are named so similarly. Generally, for
the casual reader, the non-distinctive naming of the different parts
makes it somewhat hard to follow along. Like which part is about sq
offload (sqo_mm one might think, but its also used for sync work I
believe), which about the user-facing sq, which about the workqueue,
etc.

Greetings,

Andres Freund

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-01-26 20:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <[email protected]>
     [not found] ` <[email protected]>
2020-01-26 10:12   ` [PATCH 2/4] io_uring: io_uring: add support for async work inheriting files Andres Freund
2020-01-26 17:10     ` Jens Axboe
2020-01-26 17:17       ` Jens Axboe
2020-01-26 20:07         ` Andres Freund

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox