public inbox for [email protected]
 help / color / mirror / Atom feed
From: Stefan Roesch <[email protected]>
To: Matthew Wilcox <[email protected]>
Cc: [email protected], [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected], [email protected],
	[email protected]
Subject: Re: [PATCH v8 06/14] iomap: Return -EAGAIN from iomap_write_iter()
Date: Thu, 9 Jun 2022 11:49:34 -0700	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>



On 6/8/22 12:02 PM, Matthew Wilcox wrote:
> On Wed, Jun 08, 2022 at 10:17:33AM -0700, Stefan Roesch wrote:
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index b06a5c24a4db..f701dcb7c26a 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -829,7 +829,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>>  		length -= status;
>>  	} while (iov_iter_count(i) && length);
>>  
>> -	return written ? written : status;
>> +	if (status == -EAGAIN) {
>> +		iov_iter_revert(i, written);
>> +		return -EAGAIN;
>> +	}
>> +	if (written)
>> +		return written;
>> +	return status;
>>  }
> 
> I still don't understand how this can possibly work.  Walk me through it.
> 
> Let's imagine we have a file laid out such that extent 1 is bytes
> 0-4095 of the file and extent 2 is extent 4096-16385 of the file.
> We do a write of 5000 bytes starting at offset 4000 of the file.
> 
> iomap_iter() tells us about the first extent and we write the first
> 96 bytes of our data to the first extent, returning 96.  iomap_iter()
> tells us about the second extent, and we write the next 4000 bytes to
> the second extent.  Then we get a page fault and get to the -EAGAIN case.
> We rewind the iter 4000 bytes.
> 

We have two data structures, the iomap_iter and iov_iter. After the first
96 bytes, the iov_iter offset get updated in iomap_write_iter() and then the
iomap_iter pos gets updated in iomap_iter()->iomap_iter_advance().

We then get the second extend from iomap_iter(). In iomap_write_iter() the
first page is obtained and written successfully, then the second page is
faulted. At this point the iov offset of the iov_iter has advanced. To reset
it to the state when the function iomap_write_iter() was entered, the iov_iter
is reset to iov_offset - written bytes.

iomap_write_iter() is exited and returns -EAGAIN. As iomap_write_iter() returns
an error, the iomap_iter pos is not updated in iomap_iter(). Only the number
of bytes written in the write of the first extent from iomap_file_buffered_write()
is returned from iomap_file_buffered_write().

In xfs_file_buffered_write we updated the iocb->ki_pos with the number of
bytes written. In io-uring, the io_write() call receives the short write result.
It copies the iov_iter struct into the work context for the io worker.
The io_worker uses that information to complete the rest of the write.

The above reset is required to keep the pos in iomap_iter and the offset in
iov_iter in sync.



Side Note:

I had an earlier version of the patch that was changing the signature of the
function iomap_write_iter(). It was returning a return code and changing the
processed value of the iomap_iter (which then also changes the pos value of 
the iomap_iter). This version (version 7 of the patch) does not require to
reset the offset of the iov_iter. It can update the pos in iomap_iter even
when -EAGAIN is returned.



> How do we not end up writing garbage when the kworker does the retry?
> I'd understand if we rewound the iter all the way to the start.  Or if
> we didn't rewind the iter at all and were able to pick up partway through
> the write.  But rewinding to the start of the extent feels like it can't
> possibly work.


  reply	other threads:[~2022-06-09 18:50 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-08 17:17 [PATCH v8 00/14] io-uring/xfs: support async buffered writes Stefan Roesch
2022-06-08 17:17 ` [PATCH v8 01/14] mm: Move starting of background writeback into the main balancing loop Stefan Roesch
2022-06-08 17:17 ` [PATCH v8 02/14] mm: Move updates of dirty_exceeded into one place Stefan Roesch
2022-06-08 17:17 ` [PATCH v8 03/14] mm: Add balance_dirty_pages_ratelimited_flags() function Stefan Roesch
2022-06-08 18:44   ` Matthew Wilcox
2022-06-08 17:17 ` [PATCH v8 04/14] iomap: Add flags parameter to iomap_page_create() Stefan Roesch
2022-06-08 18:51   ` Matthew Wilcox
2022-06-08 17:17 ` [PATCH v8 05/14] iomap: Add async buffered write support Stefan Roesch
2022-06-08 17:17 ` [PATCH v8 06/14] iomap: Return -EAGAIN from iomap_write_iter() Stefan Roesch
2022-06-08 19:02   ` Matthew Wilcox
2022-06-09 18:49     ` Stefan Roesch [this message]
2022-06-08 17:17 ` [PATCH v8 07/14] fs: Add check for async buffered writes to generic_write_checks Stefan Roesch
2022-06-10 11:50   ` Christian Brauner
2022-06-08 17:17 ` [PATCH v8 08/14] fs: add __remove_file_privs() with flags parameter Stefan Roesch
2022-06-10 11:53   ` Christian Brauner
2022-06-08 17:17 ` [PATCH v8 09/14] fs: Split off inode_needs_update_time and __file_update_time Stefan Roesch
2022-06-10 11:55   ` Christian Brauner
2022-06-08 17:17 ` [PATCH v8 10/14] fs: Add async write file modification handling Stefan Roesch
2022-06-10 12:38   ` Christian Brauner
2022-06-08 17:17 ` [PATCH v8 11/14] io_uring: Add support for async buffered writes Stefan Roesch
2022-06-08 17:17 ` [PATCH v8 12/14] io_uring: Add tracepoint for short writes Stefan Roesch
2022-06-08 17:17 ` [PATCH v8 13/14] xfs: Specify lockmode when calling xfs_ilock_for_iomap() Stefan Roesch
2022-06-08 17:17 ` [PATCH v8 14/14] xfs: Add async buffered write support Stefan Roesch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox