Re: [PATCH v3 1/2] io_uring: avoid whole io_wq_work copy for requests completed inline

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Jens Axboe <[email protected]>
To: Xiaoguang Wang <[email protected]>,
	Pavel Begunkov <[email protected]>,
	[email protected]
Cc: [email protected]
Subject: Re: [PATCH v3 1/2] io_uring: avoid whole io_wq_work copy for requests completed inline
Date: Sat, 30 May 2020 07:43:52 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 5/30/20 7:36 AM, Xiaoguang Wang wrote:
> hi,
> 
>> On 28/05/2020 12:15, Xiaoguang Wang wrote:
>>> If requests can be submitted and completed inline, we don't need to
>>> initialize whole io_wq_work in io_init_req(), which is an expensive
>>> operation, add a new 'REQ_F_WORK_INITIALIZED' to control whether
>>> io_wq_work is initialized.
>>
>> It looks nicer. Especially if you'd add a helper as Jens supposed.
> Sure, I'll add a helper in V4, thanks.
> 
>>
>> The other thing, even though I hate treating a part of the fields differently
>> from others, I don't like ->creds tossing either.
>>
>> Did you consider trying using only ->work.creds without adding req->creds? like
>> in the untested incremental below. init_io_work() there is misleading, should be
>> somehow played around better.
> But if not adding a new req->creds, I think there will be some potential risks.
> In current io_uring mainline codes, look at io_kiocb's memory layout
> crash> struct -o io_kiocb
> struct io_kiocb {
>          union {
>      [0]     struct file *file;
>      [0]     struct io_rw rw;
>      [0]     struct io_poll_iocb poll;
>      [0]     struct io_accept accept;
>      [0]     struct io_sync sync;
>      [0]     struct io_cancel cancel;
>      [0]     struct io_timeout timeout;
>      [0]     struct io_connect connect;
>      [0]     struct io_sr_msg sr_msg;
>      [0]     struct io_open open;
>      [0]     struct io_close close;
>      [0]     struct io_files_update files_update;
>      [0]     struct io_fadvise fadvise;
>      [0]     struct io_madvise madvise;
>      [0]     struct io_epoll epoll;
>      [0]     struct io_splice splice;
>      [0]     struct io_provide_buf pbuf;
>          };
>     [64] struct io_async_ctx *io;
>     [72] int cflags;
>     [76] u8 opcode;
>     [78] u16 buf_index;
>     [80] struct io_ring_ctx *ctx;
>     [88] struct list_head list;
>    [104] unsigned int flags;
>    [108] refcount_t refs;
>    [112] struct task_struct *task;
>    [120] unsigned long fsize;
>    [128] u64 user_data;
>    [136] u32 result;
>    [140] u32 sequence;
>    [144] struct list_head link_list;
>    [160] struct list_head inflight_entry;
>    [176] struct percpu_ref *fixed_file_refs;
>          union {
>              struct {
>    [184]         struct callback_head task_work;
>    [200]         struct hlist_node hash_node;
>    [216]         struct async_poll *apoll;
>              };
>    [184]     struct io_wq_work work;
>          };
> }
> SIZE: 240
> 
> struct io_wq_work {
>     [0] struct io_wq_work_node list;
>     [8] void (*func)(struct io_wq_work **);
>    [16] struct files_struct *files;
>    [24] struct mm_struct *mm;
>    [32] const struct cred *creds;
>    [40] struct fs_struct *fs;
>    [48] unsigned int flags;
>    [52] pid_t task_pid;
> }
> SIZE: 56
> 
> The risk mainly comes from the union:
> union {
> 	/*
> 	 * Only commands that never go async can use the below fields,
> 	 * obviously. Right now only IORING_OP_POLL_ADD uses them, and
> 	 * async armed poll handlers for regular commands. The latter
> 	 * restore the work, if needed.
> 	 */
> 	struct {
> 		struct callback_head	task_work;
> 		struct hlist_node	hash_node;
> 		struct async_poll	*apoll;
> 	};
> 	struct io_wq_work	work;
> };
> 
> 1, apoll and creds are in same memory offset, for 'async armed poll handlers' case,
> apoll will be used, that means creds will be overwrited. In patch "io_uring: avoid
> unnecessary io_wq_work copy for fast poll feature", I use REQ_F_WORK_INITIALIZED
> to control whether to do io_wq_work restore, then your below codes will break:
> 
> static inline void io_req_work_drop_env(struct io_kiocb *req)
> {
> 	/* always init'ed, put before REQ_F_WORK_INITIALIZED check */
> 	if (req->work.creds) {
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Here req->work.creds will be invalid, or I still need to use some space
> to record original req->work.creds, and do creds restore.
> 
> 		put_cred(req->work.creds);
> 		req->work.creds = NULL;
> 	}
> 	if (!(req->flags & REQ_F_WORK_INITIALIZED))
>   		return;
> 
> 2, For IORING_OP_POLL_ADD case, current mainline codes will use task_work and hash_node,
> 32 bytes, that means io_wq_work's member list, func, files and mm would be overwrited,
> but will not touch creds, it's safe now. But if we will add some new member to
> struct {
> 	struct callback_head	task_work;
> 	struct hlist_node	hash_node;
> 	struct async_poll	*apoll;
> };
> say callback_head adds a new member, our check will still break.
> 
> 3. IMO, io_wq_work is just to describe needed running environment for reqs that will be
> punted to io-wq, for reqs submitted and completed inline should not touch this struct
> from software design view, and current io_kiocb is 240 bytes, and a new pointer will be
> 248 bytes, still 4 cache lines for cache line 64 bytes.

I'd say let's do the extra creds member for now. It's safer. We can always look into
shrinking down the size and re-using the io_wq_work part. I'd rather get this
prepped up and ready for 5.8.

Are you sending out a v4 today?

-- 
Jens Axboe

next prev parent reply	other threads:[~2020-05-30 13:43 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-28  9:15 [PATCH v3 1/2] io_uring: avoid whole io_wq_work copy for requests completed inline Xiaoguang Wang
2020-05-28  9:15 ` [PATCH v3 2/2] io_uring: avoid unnecessary io_wq_work copy for fast poll feature Xiaoguang Wang
2020-05-28 17:03 ` [PATCH v3 1/2] io_uring: avoid whole io_wq_work copy for requests completed inline Jens Axboe
2020-05-29  3:58   ` Xiaoguang Wang
2020-05-29  8:58 ` Pavel Begunkov
2020-05-29 14:27   ` Jens Axboe
2020-05-30 13:36   ` Xiaoguang Wang
2020-05-30 13:43     ` Jens Axboe [this message]
2020-05-30 19:38     ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox