public inbox for [email protected]
 help / color / mirror / Atom feed
From: Andres Freund <[email protected]>
To: Pavel Begunkov <[email protected]>
Cc: Jens Axboe <[email protected]>, [email protected]
Subject: Re: What does IOSQE_IO_[HARD]LINK actually mean?
Date: Sat, 1 Feb 2020 04:02:29 -0800	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

Hi,

On 2020-02-01 14:30:06 +0300, Pavel Begunkov wrote:
> On 01/02/2020 12:18, Andres Freund wrote:
> > Hi,
> > 
> > Reading the manpage from liburing I read:
> >        IOSQE_IO_LINK
> >               When  this  flag is specified, it forms a link with the next SQE in the submission ring. That next SQE
> >               will not be started before this one completes.  This, in effect, forms a chain of SQEs, which  can  be
> >               arbitrarily  long. The tail of the chain is denoted by the first SQE that does not have this flag set.
> >               This flag has no effect on previous SQE submissions, nor does it impact SQEs that are outside  of  the
> >               chain  tail.  This  means  that multiple chains can be executing in parallel, or chains and individual
> >               SQEs. Only members inside the chain are serialized. Available since 5.3.
> > 
> >        IOSQE_IO_HARDLINK
> >               Like IOSQE_IO_LINK, but it doesn't sever regardless of the completion result.  Note that the link will
> >               still sever if we fail submitting the parent request, hard links are only resilient in the presence of
> >               completion results for requests that did submit correctly.  IOSQE_IO_HARDLINK  implies  IOSQE_IO_LINK.
> >               Available since 5.5.
> > 
> > I can make some sense out of that description of IOSQE_IO_LINK without
> > looking at kernel code. But I don't think it's possible to understand
> > what happens when an earlier chain member fails, and what denotes an
> > error.  IOSQE_IO_HARDLINK's description kind of implies that
> > IOSQE_IO_LINK will not start the next request if there was a failure,
> > but doesn't define failure either.
> > 
> 
> Right, after a "failure" occurred for a IOSQE_IO_LINK request, all subsequent
> requests in the link won't be executed, but completed with -ECANCELED. However,
> if IOSQE_IO_HARDLINK set for the request, it won't sever/break the link and will
> continue to the next one.

I think something along those lines should be added to the manpage... I
think severing the link isn't really a good description, because it's
not like it's separating off the tail to be independent, or such. If
anything it's the opposite.


> > Looks like it's defined in a somewhat adhoc manner. For file read/write
> > subsequent requests are failed if they are a short read/write. But
> > e.g. for sendmsg that looks not to be the case.
> > 
> 
> As you said, it's defined rather sporadically. We should unify for it to make
> sense. I'd prefer to follow the read/write pattern.

I think one problem with that is that it's not necessarily useful to
insist on the length being the maximum allowed length. E.g. for a
recvmsg you'd likely want to not fail the request if you read less than
what you provided for, because that's just a normal occurance. It could
e.g. be useful to just start the next recv (with a different buffer)
immediately.

I'm not even sure it's generally sensible for read either, as that
doesn't work well for EOF, non-file FDs, ... Perhaps there's just no
good solution though.


> > Perhaps it'd make sense to reject use of IOSQE_IO_LINK outside ops where
> > it's meaningful?
> 
> If we disregard it for either length-based operations or the rest ones (or
> whatever combination), the feature won't be flexible enough to be useful,
> but in combination it allows to remove much of context switches.

I really don't want to make it less useful ;) - In fact I'm pretty
excited about having it. I haven't yet implemented / benchmarked that,
but I think for databases it is likely to be very good to achieve low
but consistent IO queue depths for background tasks like checkpointing,
readahead, writeback etc, while still having a low context switch
rates. Without something like IOSQE_IO_LINK it's considerably harder to
have continuous IO that doesn't impact higher priority IO like journal
flushes.

Andres Freund

  reply	other threads:[~2020-02-01 12:02 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-01  9:18 What does IOSQE_IO_[HARD]LINK actually mean? Andres Freund
2020-02-01 11:30 ` Pavel Begunkov
2020-02-01 12:02   ` Andres Freund [this message]
2020-02-01 15:28     ` Pavel Begunkov
2020-02-01 18:06 ` Jens Axboe
2020-02-02  7:36   ` Andres Freund

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox