* [FEATURE REQUEST] Specify a sqe won't generate a cqe @ 2020-02-14 8:29 Carter Li 李通洲 2020-02-14 10:34 ` Pavel Begunkov 0 siblings, 1 reply; 6+ messages in thread From: Carter Li 李通洲 @ 2020-02-14 8:29 UTC (permalink / raw) To: io-uring To implement io_uring_wait_cqe_timeout, we introduce a magic number called `LIBURING_UDATA_TIMEOUT`. The problem is that not only we must make sure that users should never set sqe->user_data to LIBURING_UDATA_TIMEOUT, but also introduce extra complexity to filter out TIMEOUT cqes. Former discussion: https://github.com/axboe/liburing/issues/53 I’m suggesting introducing a new SQE flag called IOSQE_IGNORE_CQE to solve this problem. For a sqe tagged with IOSQE_IGNORE_CQE flag, it won’t generate a cqe on completion. So that IORING_OP_TIMEOUT can be filtered on kernel side. In addition, `IOSQE_IGNORE_CQE` can be used to save cq size. For example `POLL_ADD(POLLIN)->READ/RECV` link chain, people usually don’t care the result of `POLL_ADD` is ( since it will always be POLLIN ), `IOSQE_IGNORE_CQE` can be set on `POLL_ADD` to save lots of cq size. Besides POLL_ADD, people usually don’t care the result of POLL_REMOVE /TIMEOUT_REMOVE/ASYNC_CANCEL/CLOSE. These operations can also be tagged with IOSQE_IGNORE_CQE. Thoughts? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [FEATURE REQUEST] Specify a sqe won't generate a cqe 2020-02-14 8:29 [FEATURE REQUEST] Specify a sqe won't generate a cqe Carter Li 李通洲 @ 2020-02-14 10:34 ` Pavel Begunkov 2020-02-14 11:27 ` Carter Li 李通洲 0 siblings, 1 reply; 6+ messages in thread From: Pavel Begunkov @ 2020-02-14 10:34 UTC (permalink / raw) To: Carter Li 李通洲, io-uring On 2/14/2020 11:29 AM, Carter Li 李通洲 wrote: > To implement io_uring_wait_cqe_timeout, we introduce a magic number > called `LIBURING_UDATA_TIMEOUT`. The problem is that not only we > must make sure that users should never set sqe->user_data to > LIBURING_UDATA_TIMEOUT, but also introduce extra complexity to > filter out TIMEOUT cqes. > > Former discussion: https://github.com/axboe/liburing/issues/53 > > I’m suggesting introducing a new SQE flag called IOSQE_IGNORE_CQE > to solve this problem. > > For a sqe tagged with IOSQE_IGNORE_CQE flag, it won’t generate a cqe > on completion. So that IORING_OP_TIMEOUT can be filtered on kernel > side. > > In addition, `IOSQE_IGNORE_CQE` can be used to save cq size. > > For example `POLL_ADD(POLLIN)->READ/RECV` link chain, people usually > don’t care the result of `POLL_ADD` is ( since it will always be > POLLIN ), `IOSQE_IGNORE_CQE` can be set on `POLL_ADD` to save lots > of cq size. > > Besides POLL_ADD, people usually don’t care the result of POLL_REMOVE > /TIMEOUT_REMOVE/ASYNC_CANCEL/CLOSE. These operations can also be tagged > with IOSQE_IGNORE_CQE. > > Thoughts? > I like the idea! And that's one of my TODOs for the eBPF plans. Let me list my use cases, so we can think how to extend it a bit. 1. In case of link fail, we need to reap all -ECANCELLED, analise it and resubmit the rest. It's quite inconvenient. We may want to have CQE only for not cancelled requests. 2. When chain succeeded, you in the most cases already know the result of all intermediate CQEs, but you still need to reap and match them. I'd prefer to have only 1 CQE per link, that is either for the first failed or for the last request in the chain. These 2 may shed much processing overhead from the userspace. 3. If we generate requests by eBPF even the notion of per-request event may broke. - eBPF creating new requests would also need to specify user-data, and this may be problematic from the user perspective. - may want to not generate CQEs automatically, but let eBPF do it. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [FEATURE REQUEST] Specify a sqe won't generate a cqe 2020-02-14 10:34 ` Pavel Begunkov @ 2020-02-14 11:27 ` Carter Li 李通洲 2020-02-14 12:52 ` Pavel Begunkov 0 siblings, 1 reply; 6+ messages in thread From: Carter Li 李通洲 @ 2020-02-14 11:27 UTC (permalink / raw) To: Pavel Begunkov; +Cc: io-uring > 2020年2月14日 下午6:34,Pavel Begunkov <[email protected]> 写道: > > On 2/14/2020 11:29 AM, Carter Li 李通洲 wrote: >> To implement io_uring_wait_cqe_timeout, we introduce a magic number >> called `LIBURING_UDATA_TIMEOUT`. The problem is that not only we >> must make sure that users should never set sqe->user_data to >> LIBURING_UDATA_TIMEOUT, but also introduce extra complexity to >> filter out TIMEOUT cqes. >> >> Former discussion: https://github.com/axboe/liburing/issues/53 >> >> I’m suggesting introducing a new SQE flag called IOSQE_IGNORE_CQE >> to solve this problem. >> >> For a sqe tagged with IOSQE_IGNORE_CQE flag, it won’t generate a cqe >> on completion. So that IORING_OP_TIMEOUT can be filtered on kernel >> side. >> >> In addition, `IOSQE_IGNORE_CQE` can be used to save cq size. >> >> For example `POLL_ADD(POLLIN)->READ/RECV` link chain, people usually >> don’t care the result of `POLL_ADD` is ( since it will always be >> POLLIN ), `IOSQE_IGNORE_CQE` can be set on `POLL_ADD` to save lots >> of cq size. >> >> Besides POLL_ADD, people usually don’t care the result of POLL_REMOVE >> /TIMEOUT_REMOVE/ASYNC_CANCEL/CLOSE. These operations can also be tagged >> with IOSQE_IGNORE_CQE. >> >> Thoughts? >> > > I like the idea! And that's one of my TODOs for the eBPF plans. > Let me list my use cases, so we can think how to extend it a bit. > > 1. In case of link fail, we need to reap all -ECANCELLED, analise it and > resubmit the rest. It's quite inconvenient. We may want to have CQE only > for not cancelled requests. > > 2. When chain succeeded, you in the most cases already know the result > of all intermediate CQEs, but you still need to reap and match them. > I'd prefer to have only 1 CQE per link, that is either for the first > failed or for the last request in the chain. > > These 2 may shed much processing overhead from the userspace. I couldn't agree more! Another problem is that io_uring_enter will be awaked for completion of every operation in a link, which results in unnecessary context switch. When awaked, users have nothing to do but issue another io_uring_enter syscall to wait for completion of the entire link chain. > > 3. If we generate requests by eBPF even the notion of per-request event > may broke. > - eBPF creating new requests would also need to specify user-data, and > this may be problematic from the user perspective. > - may want to not generate CQEs automatically, but let eBPF do it. > > -- > Pavel Begunkov ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [FEATURE REQUEST] Specify a sqe won't generate a cqe 2020-02-14 11:27 ` Carter Li 李通洲 @ 2020-02-14 12:52 ` Pavel Begunkov 2020-02-14 13:27 ` Carter Li 李通洲 0 siblings, 1 reply; 6+ messages in thread From: Pavel Begunkov @ 2020-02-14 12:52 UTC (permalink / raw) To: Carter Li 李通洲; +Cc: io-uring On 2/14/2020 2:27 PM, Carter Li 李通洲 wrote: > >> 2020年2月14日 下午6:34,Pavel Begunkov <[email protected]> 写道: >> >> On 2/14/2020 11:29 AM, Carter Li 李通洲 wrote: >>> To implement io_uring_wait_cqe_timeout, we introduce a magic number >>> called `LIBURING_UDATA_TIMEOUT`. The problem is that not only we >>> must make sure that users should never set sqe->user_data to >>> LIBURING_UDATA_TIMEOUT, but also introduce extra complexity to >>> filter out TIMEOUT cqes. >>> >>> Former discussion: https://github.com/axboe/liburing/issues/53 >>> >>> I’m suggesting introducing a new SQE flag called IOSQE_IGNORE_CQE >>> to solve this problem. >>> >>> For a sqe tagged with IOSQE_IGNORE_CQE flag, it won’t generate a cqe >>> on completion. So that IORING_OP_TIMEOUT can be filtered on kernel >>> side. >>> >>> In addition, `IOSQE_IGNORE_CQE` can be used to save cq size. >>> >>> For example `POLL_ADD(POLLIN)->READ/RECV` link chain, people usually >>> don’t care the result of `POLL_ADD` is ( since it will always be >>> POLLIN ), `IOSQE_IGNORE_CQE` can be set on `POLL_ADD` to save lots >>> of cq size. >>> >>> Besides POLL_ADD, people usually don’t care the result of POLL_REMOVE >>> /TIMEOUT_REMOVE/ASYNC_CANCEL/CLOSE. These operations can also be tagged >>> with IOSQE_IGNORE_CQE. >>> >>> Thoughts? >>> >> >> I like the idea! And that's one of my TODOs for the eBPF plans. >> Let me list my use cases, so we can think how to extend it a bit. >> >> 1. In case of link fail, we need to reap all -ECANCELLED, analise it and >> resubmit the rest. It's quite inconvenient. We may want to have CQE only >> for not cancelled requests. >> >> 2. When chain succeeded, you in the most cases already know the result >> of all intermediate CQEs, but you still need to reap and match them. >> I'd prefer to have only 1 CQE per link, that is either for the first >> failed or for the last request in the chain. >> >> These 2 may shed much processing overhead from the userspace. > > I couldn't agree more! > > Another problem is that io_uring_enter will be awaked for completion of > every operation in a link, which results in unnecessary context switch. > When awaked, users have nothing to do but issue another io_uring_enter > syscall to wait for completion of the entire link chain. Good point. Sounds like I have one more thing to do :) Would the behaviour as in the (2) cover all your needs? There is a nuisance with linked timeouts, but I think it's reasonable for REQ->LINKED_TIMEOUT, where it didn't fired, notify only for REQ >> >> 3. If we generate requests by eBPF even the notion of per-request event >> may broke. >> - eBPF creating new requests would also need to specify user-data, and >> this may be problematic from the user perspective. >> - may want to not generate CQEs automatically, but let eBPF do it. >> >> -- >> Pavel Begunkov > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [FEATURE REQUEST] Specify a sqe won't generate a cqe 2020-02-14 12:52 ` Pavel Begunkov @ 2020-02-14 13:27 ` Carter Li 李通洲 2020-02-14 14:16 ` Pavel Begunkov 0 siblings, 1 reply; 6+ messages in thread From: Carter Li 李通洲 @ 2020-02-14 13:27 UTC (permalink / raw) To: Pavel Begunkov; +Cc: io-uring > 2020年2月14日 下午8:52,Pavel Begunkov <[email protected]> 写道: > > On 2/14/2020 2:27 PM, Carter Li 李通洲 wrote: >> >>> 2020年2月14日 下午6:34,Pavel Begunkov <[email protected]> 写道: >>> >>> On 2/14/2020 11:29 AM, Carter Li 李通洲 wrote: >>>> To implement io_uring_wait_cqe_timeout, we introduce a magic number >>>> called `LIBURING_UDATA_TIMEOUT`. The problem is that not only we >>>> must make sure that users should never set sqe->user_data to >>>> LIBURING_UDATA_TIMEOUT, but also introduce extra complexity to >>>> filter out TIMEOUT cqes. >>>> >>>> Former discussion: https://github.com/axboe/liburing/issues/53 >>>> >>>> I’m suggesting introducing a new SQE flag called IOSQE_IGNORE_CQE >>>> to solve this problem. >>>> >>>> For a sqe tagged with IOSQE_IGNORE_CQE flag, it won’t generate a cqe >>>> on completion. So that IORING_OP_TIMEOUT can be filtered on kernel >>>> side. >>>> >>>> In addition, `IOSQE_IGNORE_CQE` can be used to save cq size. >>>> >>>> For example `POLL_ADD(POLLIN)->READ/RECV` link chain, people usually >>>> don’t care the result of `POLL_ADD` is ( since it will always be >>>> POLLIN ), `IOSQE_IGNORE_CQE` can be set on `POLL_ADD` to save lots >>>> of cq size. >>>> >>>> Besides POLL_ADD, people usually don’t care the result of POLL_REMOVE >>>> /TIMEOUT_REMOVE/ASYNC_CANCEL/CLOSE. These operations can also be tagged >>>> with IOSQE_IGNORE_CQE. >>>> >>>> Thoughts? >>>> >>> >>> I like the idea! And that's one of my TODOs for the eBPF plans. >>> Let me list my use cases, so we can think how to extend it a bit. >>> >>> 1. In case of link fail, we need to reap all -ECANCELLED, analise it and >>> resubmit the rest. It's quite inconvenient. We may want to have CQE only >>> for not cancelled requests. >>> >>> 2. When chain succeeded, you in the most cases already know the result >>> of all intermediate CQEs, but you still need to reap and match them. >>> I'd prefer to have only 1 CQE per link, that is either for the first >>> failed or for the last request in the chain. >>> >>> These 2 may shed much processing overhead from the userspace. >> >> I couldn't agree more! >> >> Another problem is that io_uring_enter will be awaked for completion of >> every operation in a link, which results in unnecessary context switch. >> When awaked, users have nothing to do but issue another io_uring_enter >> syscall to wait for completion of the entire link chain. > > Good point. Sounds like I have one more thing to do :) > Would the behaviour as in the (2) cover all your needs? (2) should cover most cases for me. For cases it couldn’t cover ( if any ), I can still use normal sqes. > > There is a nuisance with linked timeouts, but I think it's reasonable > for REQ->LINKED_TIMEOUT, where it didn't fired, notify only for REQ > >>> >>> 3. If we generate requests by eBPF even the notion of per-request event >>> may broke. >>> - eBPF creating new requests would also need to specify user-data, and >>> this may be problematic from the user perspective. >>> - may want to not generate CQEs automatically, but let eBPF do it. >>> >>> -- >>> Pavel Begunkov >> > > -- > Pavel Begunkov ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [FEATURE REQUEST] Specify a sqe won't generate a cqe 2020-02-14 13:27 ` Carter Li 李通洲 @ 2020-02-14 14:16 ` Pavel Begunkov 0 siblings, 0 replies; 6+ messages in thread From: Pavel Begunkov @ 2020-02-14 14:16 UTC (permalink / raw) To: Carter Li 李通洲; +Cc: io-uring On 2/14/2020 4:27 PM, Carter Li 李通洲 wrote: >> 2020年2月14日 下午8:52,Pavel Begunkov <[email protected]> 写道: >> On 2/14/2020 2:27 PM, Carter Li 李通洲 wrote: >>>> 2020年2月14日 下午6:34,Pavel Begunkov <[email protected]> 写道: >>>> On 2/14/2020 11:29 AM, Carter Li 李通洲 wrote: >>>>> To implement io_uring_wait_cqe_timeout, we introduce a magic number >>>>> called `LIBURING_UDATA_TIMEOUT`. The problem is that not only we >>>>> must make sure that users should never set sqe->user_data to >>>>> LIBURING_UDATA_TIMEOUT, but also introduce extra complexity to >>>>> filter out TIMEOUT cqes. >>>>> >>>>> Former discussion: https://github.com/axboe/liburing/issues/53 >>>>> >>>>> I’m suggesting introducing a new SQE flag called IOSQE_IGNORE_CQE >>>>> to solve this problem. >>>>> >>>>> For a sqe tagged with IOSQE_IGNORE_CQE flag, it won’t generate a cqe >>>>> on completion. So that IORING_OP_TIMEOUT can be filtered on kernel >>>>> side. >>>>> >>>>> In addition, `IOSQE_IGNORE_CQE` can be used to save cq size. >>>>> >>>>> For example `POLL_ADD(POLLIN)->READ/RECV` link chain, people usually >>>>> don’t care the result of `POLL_ADD` is ( since it will always be >>>>> POLLIN ), `IOSQE_IGNORE_CQE` can be set on `POLL_ADD` to save lots >>>>> of cq size. >>>>> >>>>> Besides POLL_ADD, people usually don’t care the result of POLL_REMOVE >>>>> /TIMEOUT_REMOVE/ASYNC_CANCEL/CLOSE. These operations can also be tagged >>>>> with IOSQE_IGNORE_CQE. >>>>> >>>>> Thoughts? >>>>> >>>> >>>> I like the idea! And that's one of my TODOs for the eBPF plans. >>>> Let me list my use cases, so we can think how to extend it a bit. >>>> >>>> 1. In case of link fail, we need to reap all -ECANCELLED, analise it and >>>> resubmit the rest. It's quite inconvenient. We may want to have CQE only >>>> for not cancelled requests. >>>> >>>> 2. When chain succeeded, you in the most cases already know the result >>>> of all intermediate CQEs, but you still need to reap and match them. >>>> I'd prefer to have only 1 CQE per link, that is either for the first >>>> failed or for the last request in the chain. >>>> >>>> These 2 may shed much processing overhead from the userspace. >>> >>> I couldn't agree more! >>> >>> Another problem is that io_uring_enter will be awaked for completion of >>> every operation in a link, which results in unnecessary context switch. >>> When awaked, users have nothing to do but issue another io_uring_enter >>> syscall to wait for completion of the entire link chain. >> >> Good point. Sounds like I have one more thing to do :) >> Would the behaviour as in the (2) cover all your needs? > > (2) should cover most cases for me. For cases it couldn’t cover ( if any ), > I can still use normal sqes. > Great! I need to give a thought, what I may need for eBPF-steering stuff, but sounds like a plan. >> >> There is a nuisance with linked timeouts, but I think it's reasonable >> for REQ->LINKED_TIMEOUT, where it didn't fired, notify only for REQ >> >>>> >>>> 3. If we generate requests by eBPF even the notion of per-request event >>>> may broke. >>>> - eBPF creating new requests would also need to specify user-data, and >>>> this may be problematic from the user perspective. >>>> - may want to not generate CQEs automatically, but let eBPF do it. >>>> -- Pavel Begunkov ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-02-14 14:17 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-02-14 8:29 [FEATURE REQUEST] Specify a sqe won't generate a cqe Carter Li 李通洲 2020-02-14 10:34 ` Pavel Begunkov 2020-02-14 11:27 ` Carter Li 李通洲 2020-02-14 12:52 ` Pavel Begunkov 2020-02-14 13:27 ` Carter Li 李通洲 2020-02-14 14:16 ` Pavel Begunkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox