public inbox for [email protected]
 help / color / mirror / Atom feed
From: Mark Harmstone <[email protected]>
To: Jens Axboe <[email protected]>,
	"[email protected]" <[email protected]>,
	"[email protected]" <[email protected]>
Subject: Re: [PATCH] btrfs: add io_uring interface for encoded writes
Date: Fri, 15 Nov 2024 17:29:43 +0000	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 12/11/24 21:01, Jens Axboe wrote:
> > 
> On 11/12/24 9:29 AM, Mark Harmstone wrote:
>> Add an io_uring interface for encoded writes, with the same parameters
>> as the BTRFS_IOC_ENCODED_WRITE ioctl.
>>
>> As with the encoded reads code, there's a test program for this at
>> https://github.com/maharmstone/io_uring-encoded, and I'll get this
>> worked into an fstest.
>>
>> How io_uring works is that it initially calls btrfs_uring_cmd with the
>> IO_URING_F_NONBLOCK flag set, and if we return -EAGAIN it tries again in
>> a kthread with the flag cleared.
>      ^^^^^^^^
> 
> Not a kernel thread, it's an io worker. The distinction may seem
> irrelevant, but it's really not - io workers inherit all the properties
> of the original task.
> 
>> Ideally we'd honour this and call try_lock etc., but there's still a lot
>> of work to be done to create non-blocking versions of all the functions
>> in our write path. Instead, just validate the input in
>> btrfs_uring_encoded_write() on the first pass and return -EAGAIN, with a
>> view to properly optimizing the happy path later on.
> 
> But you need to ensure stable state after the first issue, regardless of
> how you handle it. I don't have the other patches handy, but whatever
> you copy from userspace before you return -EAGAIN, you should not be
> copying again. By the time you get the 2nd invocation from io-wq, no
> copying should be taking place, you should be using the state you
> already ensured was stable for the non-blocking issue.
> 
> Maybe this is all handled by the caller of btrfs_uring_encoded_write()
> already? As far as looking at the code below, it just looks like it
> copies everything, then returns -EAGAIN, then copies it again later? Yes
> uring_cmd will make the sqe itself stable, but:
> 
> 	sqe_addr = u64_to_user_ptr(READ_ONCE(cmd->sqe->addr));
> 
> the userspace btrfs_ioctl_encoded_io_args that sqe->addr points too
> should remain stable as well. If not, consider userspace doing:
> 
> some_func()
> {
> 	struct btrfs_ioctl_encoded_io_args args;
> 
> 	fill_in_args(&args);
> 	sqe = io_uring_get_sqe(ring);
> 	sqe->addr = &args;
> 	io_uring_submit();		<- initial invocation here
> }
> 
> main_func()
> {
> 	some_func();
> 				- io-wq invocation perhaps here
> 	wait_on_cqes();
> }
> 
> where io-wq will be reading garbage as args went out of scope, unless
> some_func() used a stable/heap struct that isn't freed until completion.
> some_func() can obviously wait on the cqe, but at that point you'd be
> using it as a sync interface, and there's little point.
> 
> This is why io_kiocb->async_data exists. uring_cmd is already using that
> for the sqe, I think you'd want to add a 2nd "void *op_data" or
> something in there, and have the uring_cmd alloc cache get clear that to
> NULL and have uring_cmd alloc cache put kfree() it if it's non-NULL.
> 
> We'd also need to move the uring_cache struct into
> include/linux/io_uring_types.h so that btrfs can get to it (and probably
> rename it to something saner, uring_cmd_async_data for example).
> 
> static int btrfs_uring_encoded_write(struct io_uring_cmd *cmd, unsigned int issue_flags)
> {
> 	struct io_kiocb *req = cmd_to_io_kiocb(cmd);
> 	struct uring_cmd_async_data *data = req->async_data;
> 	struct btrfs_ioctl_encoded_io_args *args;
> 
> 	if (!data->op_data) {
> 		data->op_data = kmalloc(sizeof(*args), GFP_NOIO);
> 		if (!data->op_data)
> 			return -ENOMEM;
> 		if (copy_from_user(data->op_data, sqe_addr, sizeof(*args))
> 			return -EFAULT;
> 	}
> 	...
> }
> 
> and have it be stable, then moving your copying into a helper rather
> than inline in btrfs_uring_encoded_write() (it probably should be
> regardless). Ignored the compat above, it's just pseudo code.
> 
> Anyway, hope that helps. I'll be happy to do the uring_cmd bit for you,
> but it really should be pretty straight forward.
> 
> I'm also pondering if the encoded read side suffers from the same issue?
> 

Thanks Jens, that makes sense to me.

Mark

      parent reply	other threads:[~2024-11-15 17:30 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-12 16:29 [PATCH] btrfs: add io_uring interface for encoded writes Mark Harmstone
2024-11-12 21:01 ` Jens Axboe
2024-11-12 21:11   ` Jens Axboe
2024-11-20 15:50     ` Mark Harmstone
2024-11-12 21:19   ` Jens Axboe
2024-11-15 17:29   ` Mark Harmstone [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox