* What does IOSQE_IO_[HARD]LINK actually mean? @ 2020-02-01 9:18 Andres Freund 2020-02-01 11:30 ` Pavel Begunkov 2020-02-01 18:06 ` Jens Axboe 0 siblings, 2 replies; 6+ messages in thread From: Andres Freund @ 2020-02-01 9:18 UTC (permalink / raw) To: Jens Axboe, io-uring Hi, Reading the manpage from liburing I read: IOSQE_IO_LINK When this flag is specified, it forms a link with the next SQE in the submission ring. That next SQE will not be started before this one completes. This, in effect, forms a chain of SQEs, which can be arbitrarily long. The tail of the chain is denoted by the first SQE that does not have this flag set. This flag has no effect on previous SQE submissions, nor does it impact SQEs that are outside of the chain tail. This means that multiple chains can be executing in parallel, or chains and individual SQEs. Only members inside the chain are serialized. Available since 5.3. IOSQE_IO_HARDLINK Like IOSQE_IO_LINK, but it doesn't sever regardless of the completion result. Note that the link will still sever if we fail submitting the parent request, hard links are only resilient in the presence of completion results for requests that did submit correctly. IOSQE_IO_HARDLINK implies IOSQE_IO_LINK. Available since 5.5. I can make some sense out of that description of IOSQE_IO_LINK without looking at kernel code. But I don't think it's possible to understand what happens when an earlier chain member fails, and what denotes an error. IOSQE_IO_HARDLINK's description kind of implies that IOSQE_IO_LINK will not start the next request if there was a failure, but doesn't define failure either. Looks like it's defined in a somewhat adhoc manner. For file read/write subsequent requests are failed if they are a short read/write. But e.g. for sendmsg that looks not to be the case. Perhaps it'd make sense to reject use of IOSQE_IO_LINK outside ops where it's meaningful? Or maybe I'm just missing something. Greetings, Andres Freund ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: What does IOSQE_IO_[HARD]LINK actually mean? 2020-02-01 9:18 What does IOSQE_IO_[HARD]LINK actually mean? Andres Freund @ 2020-02-01 11:30 ` Pavel Begunkov 2020-02-01 12:02 ` Andres Freund 2020-02-01 18:06 ` Jens Axboe 1 sibling, 1 reply; 6+ messages in thread From: Pavel Begunkov @ 2020-02-01 11:30 UTC (permalink / raw) To: Andres Freund, Jens Axboe, io-uring [-- Attachment #1.1: Type: text/plain, Size: 2594 bytes --] On 01/02/2020 12:18, Andres Freund wrote: > Hi, > > Reading the manpage from liburing I read: > IOSQE_IO_LINK > When this flag is specified, it forms a link with the next SQE in the submission ring. That next SQE > will not be started before this one completes. This, in effect, forms a chain of SQEs, which can be > arbitrarily long. The tail of the chain is denoted by the first SQE that does not have this flag set. > This flag has no effect on previous SQE submissions, nor does it impact SQEs that are outside of the > chain tail. This means that multiple chains can be executing in parallel, or chains and individual > SQEs. Only members inside the chain are serialized. Available since 5.3. > > IOSQE_IO_HARDLINK > Like IOSQE_IO_LINK, but it doesn't sever regardless of the completion result. Note that the link will > still sever if we fail submitting the parent request, hard links are only resilient in the presence of > completion results for requests that did submit correctly. IOSQE_IO_HARDLINK implies IOSQE_IO_LINK. > Available since 5.5. > > I can make some sense out of that description of IOSQE_IO_LINK without > looking at kernel code. But I don't think it's possible to understand > what happens when an earlier chain member fails, and what denotes an > error. IOSQE_IO_HARDLINK's description kind of implies that > IOSQE_IO_LINK will not start the next request if there was a failure, > but doesn't define failure either. > Right, after a "failure" occurred for a IOSQE_IO_LINK request, all subsequent requests in the link won't be executed, but completed with -ECANCELED. However, if IOSQE_IO_HARDLINK set for the request, it won't sever/break the link and will continue to the next one. > Looks like it's defined in a somewhat adhoc manner. For file read/write > subsequent requests are failed if they are a short read/write. But > e.g. for sendmsg that looks not to be the case. > As you said, it's defined rather sporadically. We should unify for it to make sense. I'd prefer to follow the read/write pattern. > Perhaps it'd make sense to reject use of IOSQE_IO_LINK outside ops where > it's meaningful? If we disregard it for either length-based operations or the rest ones (or whatever combination), the feature won't be flexible enough to be useful, but in combination it allows to remove much of context switches. -- Pavel Begunkov [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: What does IOSQE_IO_[HARD]LINK actually mean? 2020-02-01 11:30 ` Pavel Begunkov @ 2020-02-01 12:02 ` Andres Freund 2020-02-01 15:28 ` Pavel Begunkov 0 siblings, 1 reply; 6+ messages in thread From: Andres Freund @ 2020-02-01 12:02 UTC (permalink / raw) To: Pavel Begunkov; +Cc: Jens Axboe, io-uring Hi, On 2020-02-01 14:30:06 +0300, Pavel Begunkov wrote: > On 01/02/2020 12:18, Andres Freund wrote: > > Hi, > > > > Reading the manpage from liburing I read: > > IOSQE_IO_LINK > > When this flag is specified, it forms a link with the next SQE in the submission ring. That next SQE > > will not be started before this one completes. This, in effect, forms a chain of SQEs, which can be > > arbitrarily long. The tail of the chain is denoted by the first SQE that does not have this flag set. > > This flag has no effect on previous SQE submissions, nor does it impact SQEs that are outside of the > > chain tail. This means that multiple chains can be executing in parallel, or chains and individual > > SQEs. Only members inside the chain are serialized. Available since 5.3. > > > > IOSQE_IO_HARDLINK > > Like IOSQE_IO_LINK, but it doesn't sever regardless of the completion result. Note that the link will > > still sever if we fail submitting the parent request, hard links are only resilient in the presence of > > completion results for requests that did submit correctly. IOSQE_IO_HARDLINK implies IOSQE_IO_LINK. > > Available since 5.5. > > > > I can make some sense out of that description of IOSQE_IO_LINK without > > looking at kernel code. But I don't think it's possible to understand > > what happens when an earlier chain member fails, and what denotes an > > error. IOSQE_IO_HARDLINK's description kind of implies that > > IOSQE_IO_LINK will not start the next request if there was a failure, > > but doesn't define failure either. > > > > Right, after a "failure" occurred for a IOSQE_IO_LINK request, all subsequent > requests in the link won't be executed, but completed with -ECANCELED. However, > if IOSQE_IO_HARDLINK set for the request, it won't sever/break the link and will > continue to the next one. I think something along those lines should be added to the manpage... I think severing the link isn't really a good description, because it's not like it's separating off the tail to be independent, or such. If anything it's the opposite. > > Looks like it's defined in a somewhat adhoc manner. For file read/write > > subsequent requests are failed if they are a short read/write. But > > e.g. for sendmsg that looks not to be the case. > > > > As you said, it's defined rather sporadically. We should unify for it to make > sense. I'd prefer to follow the read/write pattern. I think one problem with that is that it's not necessarily useful to insist on the length being the maximum allowed length. E.g. for a recvmsg you'd likely want to not fail the request if you read less than what you provided for, because that's just a normal occurance. It could e.g. be useful to just start the next recv (with a different buffer) immediately. I'm not even sure it's generally sensible for read either, as that doesn't work well for EOF, non-file FDs, ... Perhaps there's just no good solution though. > > Perhaps it'd make sense to reject use of IOSQE_IO_LINK outside ops where > > it's meaningful? > > If we disregard it for either length-based operations or the rest ones (or > whatever combination), the feature won't be flexible enough to be useful, > but in combination it allows to remove much of context switches. I really don't want to make it less useful ;) - In fact I'm pretty excited about having it. I haven't yet implemented / benchmarked that, but I think for databases it is likely to be very good to achieve low but consistent IO queue depths for background tasks like checkpointing, readahead, writeback etc, while still having a low context switch rates. Without something like IOSQE_IO_LINK it's considerably harder to have continuous IO that doesn't impact higher priority IO like journal flushes. Andres Freund ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: What does IOSQE_IO_[HARD]LINK actually mean? 2020-02-01 12:02 ` Andres Freund @ 2020-02-01 15:28 ` Pavel Begunkov 0 siblings, 0 replies; 6+ messages in thread From: Pavel Begunkov @ 2020-02-01 15:28 UTC (permalink / raw) To: Andres Freund; +Cc: Jens Axboe, io-uring [-- Attachment #1.1: Type: text/plain, Size: 2746 bytes --] On 01/02/2020 15:02, Andres Freund wrote: >> Right, after a "failure" occurred for a IOSQE_IO_LINK request, all subsequent >> requests in the link won't be executed, but completed with -ECANCELED. However, >> if IOSQE_IO_HARDLINK set for the request, it won't sever/break the link and will >> continue to the next one. > > I think something along those lines should be added to the manpage... I > think severing the link isn't really a good description, because it's > not like it's separating off the tail to be independent, or such. If > anything it's the opposite. > > >>> Looks like it's defined in a somewhat adhoc manner. For file read/write >>> subsequent requests are failed if they are a short read/write. But >>> e.g. for sendmsg that looks not to be the case. >>> >> >> As you said, it's defined rather sporadically. We should unify for it to make >> sense. I'd prefer to follow the read/write pattern. > > I think one problem with that is that it's not necessarily useful to > insist on the length being the maximum allowed length. E.g. for a > recvmsg you'd likely want to not fail the request if you read less than > what you provided for, because that's just a normal occurance. It could > e.g. be useful to just start the next recv (with a different buffer) > immediately> I'm not even sure it's generally sensible for read either, as that > doesn't work well for EOF, non-file FDs, ... Perhaps there's just no > good solution though. People already asked about such stuff, you can find the discussion somewhere in github issues for liburing. In short, there are a lot of different patterns, and that's not viable to implement them in the kernel. There are thoughts, ideas and plans around using BPF to deal with that. I've sent LSF/MM/BPF topic proposal exactly about that. > > >>> Perhaps it'd make sense to reject use of IOSQE_IO_LINK outside ops where >>> it's meaningful? >> >> If we disregard it for either length-based operations or the rest ones (or >> whatever combination), the feature won't be flexible enough to be useful, >> but in combination it allows to remove much of context switches. > > I really don't want to make it less useful ;) - In fact I'm pretty > excited about having it. I haven't yet implemented / benchmarked that, > but I think for databases it is likely to be very good to achieve low > but consistent IO queue depths for background tasks like checkpointing, > readahead, writeback etc, while still having a low context switch > rates. Without something like IOSQE_IO_LINK it's considerably harder to > have continuous IO that doesn't impact higher priority IO like journal > flushes. > > Andres Freund > -- Pavel Begunkov [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: What does IOSQE_IO_[HARD]LINK actually mean? 2020-02-01 9:18 What does IOSQE_IO_[HARD]LINK actually mean? Andres Freund 2020-02-01 11:30 ` Pavel Begunkov @ 2020-02-01 18:06 ` Jens Axboe 2020-02-02 7:36 ` Andres Freund 1 sibling, 1 reply; 6+ messages in thread From: Jens Axboe @ 2020-02-01 18:06 UTC (permalink / raw) To: Andres Freund, io-uring On 2/1/20 2:18 AM, Andres Freund wrote: > Hi, > > Reading the manpage from liburing I read: > IOSQE_IO_LINK > When this flag is specified, it forms a link with the next SQE in the submission ring. That next SQE > will not be started before this one completes. This, in effect, forms a chain of SQEs, which can be > arbitrarily long. The tail of the chain is denoted by the first SQE that does not have this flag set. > This flag has no effect on previous SQE submissions, nor does it impact SQEs that are outside of the > chain tail. This means that multiple chains can be executing in parallel, or chains and individual > SQEs. Only members inside the chain are serialized. Available since 5.3. > > IOSQE_IO_HARDLINK > Like IOSQE_IO_LINK, but it doesn't sever regardless of the completion result. Note that the link will > still sever if we fail submitting the parent request, hard links are only resilient in the presence of > completion results for requests that did submit correctly. IOSQE_IO_HARDLINK implies IOSQE_IO_LINK. > Available since 5.5. > > I can make some sense out of that description of IOSQE_IO_LINK without > looking at kernel code. But I don't think it's possible to understand > what happens when an earlier chain member fails, and what denotes an > error. IOSQE_IO_HARDLINK's description kind of implies that > IOSQE_IO_LINK will not start the next request if there was a failure, > but doesn't define failure either. I won't touch on the rest since Pavel already did, but I did expand the explanation of when a normal link will sever, and how: https://git.kernel.dk/cgit/liburing/commit/?id=9416351377f04211f859667f39a58d2a223cbd21 LSFMM will have a session on BPF with io_uring, which we'll need to have full control of links outside of the basic use cases. -- Jens Axboe ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: What does IOSQE_IO_[HARD]LINK actually mean? 2020-02-01 18:06 ` Jens Axboe @ 2020-02-02 7:36 ` Andres Freund 0 siblings, 0 replies; 6+ messages in thread From: Andres Freund @ 2020-02-02 7:36 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring On 2020-02-01 11:06:28 -0700, Jens Axboe wrote: > I won't touch on the rest since Pavel already did, but I did expand the > explanation of when a normal link will sever, and how: Awesome. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-02-02 7:36 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-02-01 9:18 What does IOSQE_IO_[HARD]LINK actually mean? Andres Freund 2020-02-01 11:30 ` Pavel Begunkov 2020-02-01 12:02 ` Andres Freund 2020-02-01 15:28 ` Pavel Begunkov 2020-02-01 18:06 ` Jens Axboe 2020-02-02 7:36 ` Andres Freund
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox