From: Kanchan Joshi <[email protected]>
To: Jens Axboe <[email protected]>
Cc: Christoph Hellwig <[email protected]>,
Kanchan Joshi <[email protected]>,
[email protected], [email protected], [email protected],
[email protected], [email protected],
"Matias Bj??rling" <[email protected]>,
[email protected], [email protected],
[email protected], [email protected],
Selvakumar S <[email protected]>,
Nitesh Shetty <[email protected]>,
Javier Gonzalez <[email protected]>
Subject: Re: [PATCH v3 4/4] io_uring: add support for zone-append
Date: Mon, 20 Jul 2020 22:16:28 +0530 [thread overview]
Message-ID: <CA+1E3rK9LCmB4Lt8hTLrCx7bXaF6sETWgm=M6=D6grOnGSgiRQ@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
On Fri, Jul 10, 2020 at 7:39 PM Jens Axboe <[email protected]> wrote:
>
> On 7/10/20 7:10 AM, Christoph Hellwig wrote:
> > On Fri, Jul 10, 2020 at 12:35:43AM +0530, Kanchan Joshi wrote:
> >> Append required special treatment (conversion for sector to bytes) for io_uring.
> >> And we were planning a user-space wrapper to abstract that.
> >>
> >> But good part (as it seems now) was: append result went along with cflags at
> >> virtually no additional cost. And uring code changes became super clean/minimal
> >> with further revisions.
> >> While indirect-offset requires doing allocation/mgmt in application,
> >> io-uring submission
> >> and in completion path (which seems trickier), and those CQE flags
> >> still get written
> >> user-space and serve no purpose for append-write.
> >
> > I have to say that storing the results in the CQE generally make
> > so much more sense. I wonder if we need a per-fd "large CGE" flag
> > that adds two extra u64s to the CQE, and some ops just require this
> > version.
>
> I have been pondering the same thing, we could make certain ops consume
> two CQEs if it makes sense. It's a bit ugly on the app side with two
> different CQEs for a request, though. We can't just treat it as a large
> CQE, as they might not be sequential if we happen to wrap. But maybe
> it's not too bad.
Did some work on the two-cqe scheme for zone-append.
First CQE is the same (as before), while second CQE does not keep
res/flags and instead has 64bit result to report append-location.
It would look like this -
struct io_uring_cqe {
__u64 user_data; /* sqe->data submission passed back */
- __s32 res; /* result code for this event */
- __u32 flags;
+ union {
+ struct {
+ __s32 res; /* result code for this event */
+ __u32 flags;
+ };
+ __u64 append_res; /*only used for append, in
secondary cqe */
+ };
And kernel will produce two CQEs for append completion-
static void __io_cqring_fill_event(struct io_kiocb *req, long res, long cflags)
{
- struct io_uring_cqe *cqe;
+ struct io_uring_cqe *cqe, *cqe2 = NULL;
- cqe = io_get_cqring(ctx);
+ if (unlikely(req->flags & REQ_F_ZONE_APPEND))
+ /* obtain two CQEs for append. NULL if two CQEs are not available */
+ cqe = io_get_two_cqring(ctx, &cqe2);
+ else
+ cqe = io_get_cqring(ctx);
+
if (likely(cqe)) {
WRITE_ONCE(cqe->user_data, req->user_data);
WRITE_ONCE(cqe->res, res);
WRITE_ONCE(cqe->flags, cflags);
+ /* update secondary cqe for zone-append */
+ if (req->flags & REQ_F_ZONE_APPEND) {
+ WRITE_ONCE(cqe2->append_res,
+ (u64)req->append_offset << SECTOR_SHIFT);
+ WRITE_ONCE(cqe2->user_data, req->user_data);
+ }
mutex_unlock(&ctx->uring_lock);
This seems to go fine in Kernel.
But the application will have few differences such as:
- When it submits N appends, and decides to wait for all completions
it needs to specify min_complete as 2*N (or at least 2N-1).
Two appends will produce 4 completion events, and if application
decides to wait for both it must specify 4 (or 3).
io_uring_enter(unsigned int fd, unsigned int to_submit,
unsigned int min_complete, unsigned int flags,
sigset_t *sig);
- Completion-processing sequence for mixed-workload (few reads + few
appends, on the same ring).
Currently there is a one-to-one relationship. Application looks at N
CQE entries, and treats each as distinct IO completion - a for loop
does the work.
With two-cqe scheme, extracting, from a bunch of completion, the ones
for read (one cqe) and append (two cqe): flow gets somewhat
non-linear.
Perhaps this is not too bad, but felt that it must be put here upfront.
--
Kanchan Joshi
next prev parent reply other threads:[~2020-07-20 16:47 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20200705185204epcas5p3adeb4fc3473c5fc0472a7396783c5267@epcas5p3.samsung.com>
2020-07-05 18:47 ` [PATCH v3 0/4] zone-append support in io-uring and aio Kanchan Joshi
[not found] ` <CGME20200705185211epcas5p4059d05d2fcedb91829300a7a7d03fda3@epcas5p4.samsung.com>
2020-07-05 18:47 ` [PATCH v3 1/4] fs: introduce FMODE_ZONE_APPEND and IOCB_ZONE_APPEND Kanchan Joshi
[not found] ` <CGME20200705185217epcas5p1cc12d4b892f057a1fe06d73a00869daa@epcas5p1.samsung.com>
2020-07-05 18:47 ` [PATCH v3 2/4] block: add zone append handling for direct I/O path Kanchan Joshi
[not found] ` <CGME20200705185221epcas5p28b6d060df829b751109265222285da0e@epcas5p2.samsung.com>
2020-07-05 18:47 ` [PATCH v3 3/4] block: enable zone-append for iov_iter of bvec type Kanchan Joshi
[not found] ` <CGME20200705185227epcas5p16fba3cb92561794b960184c89fdf2bb7@epcas5p1.samsung.com>
2020-07-05 18:47 ` [PATCH v3 4/4] io_uring: add support for zone-append Kanchan Joshi
2020-07-05 21:00 ` Jens Axboe
2020-07-05 21:09 ` Matthew Wilcox
2020-07-05 21:12 ` Jens Axboe
2020-07-06 14:10 ` Matthew Wilcox
2020-07-06 14:27 ` Jens Axboe
2020-07-06 14:32 ` Matthew Wilcox
2020-07-06 14:33 ` Jens Axboe
2020-07-07 15:11 ` Kanchan Joshi
2020-07-07 15:52 ` Matthew Wilcox
2020-07-07 16:00 ` Christoph Hellwig
2020-07-07 20:23 ` Kanchan Joshi
2020-07-07 20:40 ` Jens Axboe
2020-07-07 22:18 ` Matthew Wilcox
2020-07-07 22:37 ` Jens Axboe
2020-07-08 12:58 ` Kanchan Joshi
2020-07-08 14:22 ` Matthew Wilcox
2020-07-08 16:41 ` Kanchan Joshi
2020-07-08 14:54 ` Jens Axboe
2020-07-08 14:58 ` Matthew Wilcox
2020-07-08 14:59 ` Jens Axboe
2020-07-08 15:02 ` Matthew Wilcox
2020-07-08 15:06 ` Jens Axboe
2020-07-08 16:08 ` Javier González
2020-07-08 16:33 ` Matthew Wilcox
2020-07-08 16:38 ` Jens Axboe
2020-07-08 17:13 ` Kanchan Joshi
2020-07-08 16:43 ` Javier González
2020-07-06 13:58 ` Kanchan Joshi
2020-07-09 10:15 ` Christoph Hellwig
2020-07-09 13:58 ` Jens Axboe
2020-07-09 14:00 ` Christoph Hellwig
2020-07-09 14:05 ` Jens Axboe
2020-07-09 18:36 ` Kanchan Joshi
2020-07-09 18:50 ` Pavel Begunkov
2020-07-09 18:53 ` Pavel Begunkov
2020-07-09 18:50 ` Jens Axboe
2020-07-09 19:05 ` Kanchan Joshi
2020-07-10 13:10 ` Christoph Hellwig
2020-07-10 13:48 ` Matthew Wilcox
2020-07-10 13:49 ` Christoph Hellwig
2020-07-10 13:51 ` Matthew Wilcox
2020-07-10 14:11 ` Kanchan Joshi
2020-07-20 16:49 ` Kanchan Joshi
2020-07-20 17:14 ` Matthew Wilcox
2020-07-20 20:17 ` Kanchan Joshi
2020-07-21 0:59 ` Damien Le Moal
2020-07-21 1:15 ` Matthew Wilcox
2020-07-21 1:29 ` Jens Axboe
2020-07-21 2:19 ` Damien Le Moal
2020-07-10 14:09 ` Jens Axboe
2020-07-20 16:46 ` Kanchan Joshi [this message]
2020-07-10 13:09 ` Christoph Hellwig
2020-07-10 13:29 ` Kanchan Joshi
2020-07-10 13:43 ` Christoph Hellwig
2020-07-20 17:02 ` Kanchan Joshi
2020-07-10 13:57 ` Kanchan Joshi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CA+1E3rK9LCmB4Lt8hTLrCx7bXaF6sETWgm=M6=D6grOnGSgiRQ@mail.gmail.com' \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox