* [PATCH 1/1] io_uring: fix leaks on IOPOLL and CQE_SKIP @ 2022-04-12 16:24 Pavel Begunkov 2022-04-12 16:41 ` Jens Axboe 0 siblings, 1 reply; 12+ messages in thread From: Pavel Begunkov @ 2022-04-12 16:24 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe, asml.silence If all completed requests in io_do_iopoll() were marked with REQ_F_CQE_SKIP, we'll not only skip CQE posting but also io_free_batch_list() leaking memory and resources. Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially return the value greater than the real one, but iopolling will deal with it and the userspace will re-iopoll if needed. In anyway, I don't think there are many use cases for REQ_F_CQE_SKIP + IOPOLL. Fixes: 83a13a4181b0e ("io_uring: tweak iopoll CQE_SKIP event counting") Signed-off-by: Pavel Begunkov <[email protected]> --- fs/io_uring.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index cbd876c023b1..738ec10f038f 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2872,11 +2872,10 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) /* order with io_complete_rw_iopoll(), e.g. ->result updates */ if (!smp_load_acquire(&req->iopoll_completed)) break; + nr_events++; if (unlikely(req->flags & REQ_F_CQE_SKIP)) continue; - __io_fill_cqe_req(req, req->cqe.res, io_put_kbuf(req, 0)); - nr_events++; } if (unlikely(!nr_events)) -- 2.35.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring: fix leaks on IOPOLL and CQE_SKIP 2022-04-12 16:24 [PATCH 1/1] io_uring: fix leaks on IOPOLL and CQE_SKIP Pavel Begunkov @ 2022-04-12 16:41 ` Jens Axboe 2022-04-12 16:46 ` Jens Axboe 0 siblings, 1 reply; 12+ messages in thread From: Jens Axboe @ 2022-04-12 16:41 UTC (permalink / raw) To: Pavel Begunkov, io-uring On 4/12/22 10:24 AM, Pavel Begunkov wrote: > If all completed requests in io_do_iopoll() were marked with > REQ_F_CQE_SKIP, we'll not only skip CQE posting but also > io_free_batch_list() leaking memory and resources. > > Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially > return the value greater than the real one, but iopolling will deal with > it and the userspace will re-iopoll if needed. In anyway, I don't think > there are many use cases for REQ_F_CQE_SKIP + IOPOLL. Ah good catch - yes probably not much practical concern, as the lack of ordering for file IO means that CQE_SKIP isn't really useful for that scenario. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring: fix leaks on IOPOLL and CQE_SKIP 2022-04-12 16:41 ` Jens Axboe @ 2022-04-12 16:46 ` Jens Axboe 2022-04-15 21:05 ` Pavel Begunkov 0 siblings, 1 reply; 12+ messages in thread From: Jens Axboe @ 2022-04-12 16:46 UTC (permalink / raw) To: Pavel Begunkov, io-uring On 4/12/22 10:41 AM, Jens Axboe wrote: > On 4/12/22 10:24 AM, Pavel Begunkov wrote: >> If all completed requests in io_do_iopoll() were marked with >> REQ_F_CQE_SKIP, we'll not only skip CQE posting but also >> io_free_batch_list() leaking memory and resources. >> >> Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially >> return the value greater than the real one, but iopolling will deal with >> it and the userspace will re-iopoll if needed. In anyway, I don't think >> there are many use cases for REQ_F_CQE_SKIP + IOPOLL. > > Ah good catch - yes probably not much practical concern, as the lack of > ordering for file IO means that CQE_SKIP isn't really useful for that > scenario. One potential snag is with the change we're now doing io_cqring_ev_posted_iopoll() even if didn't post an event. Again probably not a practical concern, but it is theoretically a violation if an eventfd is used. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring: fix leaks on IOPOLL and CQE_SKIP 2022-04-12 16:46 ` Jens Axboe @ 2022-04-15 21:05 ` Pavel Begunkov 2022-04-15 22:03 ` Jens Axboe 0 siblings, 1 reply; 12+ messages in thread From: Pavel Begunkov @ 2022-04-15 21:05 UTC (permalink / raw) To: Jens Axboe, io-uring On 4/12/22 17:46, Jens Axboe wrote: > On 4/12/22 10:41 AM, Jens Axboe wrote: >> On 4/12/22 10:24 AM, Pavel Begunkov wrote: >>> If all completed requests in io_do_iopoll() were marked with >>> REQ_F_CQE_SKIP, we'll not only skip CQE posting but also >>> io_free_batch_list() leaking memory and resources. >>> >>> Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially >>> return the value greater than the real one, but iopolling will deal with >>> it and the userspace will re-iopoll if needed. In anyway, I don't think >>> there are many use cases for REQ_F_CQE_SKIP + IOPOLL. >> >> Ah good catch - yes probably not much practical concern, as the lack of >> ordering for file IO means that CQE_SKIP isn't really useful for that >> scenario. > > One potential snag is with the change we're now doing > io_cqring_ev_posted_iopoll() even if didn't post an event. Again > probably not a practical concern, but it is theoretically a violation > if an eventfd is used. Looks this didn't get applied. Are you concerned about eventfd? Is there any good reason why the userspace can't tolerate spurious eventfd events? Because I don't think we should care this case -- Pavel Begunkov ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring: fix leaks on IOPOLL and CQE_SKIP 2022-04-15 21:05 ` Pavel Begunkov @ 2022-04-15 22:03 ` Jens Axboe 2022-04-15 22:41 ` Pavel Begunkov 0 siblings, 1 reply; 12+ messages in thread From: Jens Axboe @ 2022-04-15 22:03 UTC (permalink / raw) To: Pavel Begunkov, io-uring On 4/15/22 3:05 PM, Pavel Begunkov wrote: > On 4/12/22 17:46, Jens Axboe wrote: >> On 4/12/22 10:41 AM, Jens Axboe wrote: >>> On 4/12/22 10:24 AM, Pavel Begunkov wrote: >>>> If all completed requests in io_do_iopoll() were marked with >>>> REQ_F_CQE_SKIP, we'll not only skip CQE posting but also >>>> io_free_batch_list() leaking memory and resources. >>>> >>>> Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially >>>> return the value greater than the real one, but iopolling will deal with >>>> it and the userspace will re-iopoll if needed. In anyway, I don't think >>>> there are many use cases for REQ_F_CQE_SKIP + IOPOLL. >>> >>> Ah good catch - yes probably not much practical concern, as the lack of >>> ordering for file IO means that CQE_SKIP isn't really useful for that >>> scenario. >> >> One potential snag is with the change we're now doing >> io_cqring_ev_posted_iopoll() even if didn't post an event. Again >> probably not a practical concern, but it is theoretically a violation >> if an eventfd is used. > Looks this didn't get applied. Are you concerned about eventfd? Yep, was hoping to get a reply back, so just deferred it for now. > Is there any good reason why the userspace can't tolerate spurious > eventfd events? Because I don't think we should care this case I always forget the details on that, but we've had cases like this in the past where some applications assume that if they got N eventfd events, then are are also N events in the ring. Which granted is a bit odd, but it does also make some sense. Why would you have more eventfd events posted than events? So while I don't think it's a huge issue, and particularly because IOPOLL and eventfd would be a nonsensical combo, it would still be nice to generally make sure it's the case. This isn't the only one though, so maybe we just apply this fix and do a full check down the line. Can't see this one making issues. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring: fix leaks on IOPOLL and CQE_SKIP 2022-04-15 22:03 ` Jens Axboe @ 2022-04-15 22:41 ` Pavel Begunkov 2022-04-15 22:53 ` Jens Axboe 2022-04-16 8:34 ` Pavel Begunkov 0 siblings, 2 replies; 12+ messages in thread From: Pavel Begunkov @ 2022-04-15 22:41 UTC (permalink / raw) To: Jens Axboe, io-uring On 4/15/22 23:03, Jens Axboe wrote: > On 4/15/22 3:05 PM, Pavel Begunkov wrote: >> On 4/12/22 17:46, Jens Axboe wrote: >>> On 4/12/22 10:41 AM, Jens Axboe wrote: >>>> On 4/12/22 10:24 AM, Pavel Begunkov wrote: >>>>> If all completed requests in io_do_iopoll() were marked with >>>>> REQ_F_CQE_SKIP, we'll not only skip CQE posting but also >>>>> io_free_batch_list() leaking memory and resources. >>>>> >>>>> Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially >>>>> return the value greater than the real one, but iopolling will deal with >>>>> it and the userspace will re-iopoll if needed. In anyway, I don't think >>>>> there are many use cases for REQ_F_CQE_SKIP + IOPOLL. >>>> >>>> Ah good catch - yes probably not much practical concern, as the lack of >>>> ordering for file IO means that CQE_SKIP isn't really useful for that >>>> scenario. >>> >>> One potential snag is with the change we're now doing >>> io_cqring_ev_posted_iopoll() even if didn't post an event. Again >>> probably not a practical concern, but it is theoretically a violation >>> if an eventfd is used. >> Looks this didn't get applied. Are you concerned about eventfd? > > Yep, was hoping to get a reply back, so just deferred it for now. > >> Is there any good reason why the userspace can't tolerate spurious >> eventfd events? Because I don't think we should care this case > > I always forget the details on that, but we've had cases like this in > the past where some applications assume that if they got N eventfd > events, then are are also N events in the ring. Which granted is a bit > odd, but it does also make some sense. Why would you have more eventfd > events posted than events? For the same reason why it can get less eventfd events than there are CQEs, as for me it's only a communication channel but not a replacement for completion events. Ok, we don't want to break old applications, but it's a new most probably not widely used feature, and we can say that the userspace has to handle spurious eventfd. > So while I don't think it's a huge issue, and particularly because > IOPOLL and eventfd would be a nonsensical combo, it would still be nice > to generally make sure it's the case. > > This isn't the only one though, so maybe we just apply this fix and do > a full check down the line. Can't see this one making issues. > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring: fix leaks on IOPOLL and CQE_SKIP 2022-04-15 22:41 ` Pavel Begunkov @ 2022-04-15 22:53 ` Jens Axboe 2022-04-15 23:51 ` Jens Axboe 2022-04-16 8:39 ` Pavel Begunkov 2022-04-16 8:34 ` Pavel Begunkov 1 sibling, 2 replies; 12+ messages in thread From: Jens Axboe @ 2022-04-15 22:53 UTC (permalink / raw) To: Pavel Begunkov, io-uring On 4/15/22 4:41 PM, Pavel Begunkov wrote: > On 4/15/22 23:03, Jens Axboe wrote: >> On 4/15/22 3:05 PM, Pavel Begunkov wrote: >>> On 4/12/22 17:46, Jens Axboe wrote: >>>> On 4/12/22 10:41 AM, Jens Axboe wrote: >>>>> On 4/12/22 10:24 AM, Pavel Begunkov wrote: >>>>>> If all completed requests in io_do_iopoll() were marked with >>>>>> REQ_F_CQE_SKIP, we'll not only skip CQE posting but also >>>>>> io_free_batch_list() leaking memory and resources. >>>>>> >>>>>> Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially >>>>>> return the value greater than the real one, but iopolling will deal with >>>>>> it and the userspace will re-iopoll if needed. In anyway, I don't think >>>>>> there are many use cases for REQ_F_CQE_SKIP + IOPOLL. >>>>> >>>>> Ah good catch - yes probably not much practical concern, as the lack of >>>>> ordering for file IO means that CQE_SKIP isn't really useful for that >>>>> scenario. >>>> >>>> One potential snag is with the change we're now doing >>>> io_cqring_ev_posted_iopoll() even if didn't post an event. Again >>>> probably not a practical concern, but it is theoretically a violation >>>> if an eventfd is used. >>> Looks this didn't get applied. Are you concerned about eventfd? >> >> Yep, was hoping to get a reply back, so just deferred it for now. >> >>> Is there any good reason why the userspace can't tolerate spurious >>> eventfd events? Because I don't think we should care this case >> >> I always forget the details on that, but we've had cases like this in >> the past where some applications assume that if they got N eventfd >> events, then are are also N events in the ring. Which granted is a bit >> odd, but it does also make some sense. Why would you have more eventfd >> events posted than events? > > For the same reason why it can get less eventfd events than there are > CQEs, as for me it's only a communication channel but not a > replacement for completion events. That part is inherently racy in that we might get some CQEs while we respond to the initial eventfd notifications. But I'm totally agreeing with you, and it doesn't seem like a big deal to me. > Ok, we don't want to break old applications, but it's a new most > probably not widely used feature, and we can say that the userspace > has to handle spurious eventfd. If I were to guess, I'd say it's probably epoll + eventfd conversions. But it should just be made explicit. Since events reaped and checked happen differently anyway, it seems like a bad assumption to make that eventfd notifications == events available. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring: fix leaks on IOPOLL and CQE_SKIP 2022-04-15 22:53 ` Jens Axboe @ 2022-04-15 23:51 ` Jens Axboe 2022-04-16 8:36 ` Pavel Begunkov 2022-04-16 8:39 ` Pavel Begunkov 1 sibling, 1 reply; 12+ messages in thread From: Jens Axboe @ 2022-04-15 23:51 UTC (permalink / raw) To: Pavel Begunkov, io-uring On 4/15/22 4:53 PM, Jens Axboe wrote: > On 4/15/22 4:41 PM, Pavel Begunkov wrote: >> On 4/15/22 23:03, Jens Axboe wrote: >>> On 4/15/22 3:05 PM, Pavel Begunkov wrote: >>>> On 4/12/22 17:46, Jens Axboe wrote: >>>>> On 4/12/22 10:41 AM, Jens Axboe wrote: >>>>>> On 4/12/22 10:24 AM, Pavel Begunkov wrote: >>>>>>> If all completed requests in io_do_iopoll() were marked with >>>>>>> REQ_F_CQE_SKIP, we'll not only skip CQE posting but also >>>>>>> io_free_batch_list() leaking memory and resources. >>>>>>> >>>>>>> Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially >>>>>>> return the value greater than the real one, but iopolling will deal with >>>>>>> it and the userspace will re-iopoll if needed. In anyway, I don't think >>>>>>> there are many use cases for REQ_F_CQE_SKIP + IOPOLL. >>>>>> >>>>>> Ah good catch - yes probably not much practical concern, as the lack of >>>>>> ordering for file IO means that CQE_SKIP isn't really useful for that >>>>>> scenario. >>>>> >>>>> One potential snag is with the change we're now doing >>>>> io_cqring_ev_posted_iopoll() even if didn't post an event. Again >>>>> probably not a practical concern, but it is theoretically a violation >>>>> if an eventfd is used. >>>> Looks this didn't get applied. Are you concerned about eventfd? >>> >>> Yep, was hoping to get a reply back, so just deferred it for now. >>> >>>> Is there any good reason why the userspace can't tolerate spurious >>>> eventfd events? Because I don't think we should care this case >>> >>> I always forget the details on that, but we've had cases like this in >>> the past where some applications assume that if they got N eventfd >>> events, then are are also N events in the ring. Which granted is a bit >>> odd, but it does also make some sense. Why would you have more eventfd >>> events posted than events? >> >> For the same reason why it can get less eventfd events than there are >> CQEs, as for me it's only a communication channel but not a >> replacement for completion events. > > That part is inherently racy in that we might get some CQEs while we > respond to the initial eventfd notifications. But I'm totally agreeing > with you, and it doesn't seem like a big deal to me. > >> Ok, we don't want to break old applications, but it's a new most >> probably not widely used feature, and we can say that the userspace >> has to handle spurious eventfd. > > If I were to guess, I'd say it's probably epoll + eventfd conversions. > But it should just be made explicit. Since events reaped and checked > happen differently anyway, it seems like a bad assumption to make that > eventfd notifications == events available. The patch is against the 5.19 branch, but it might be a better idea to do this for 5.18 as the 5.17 backport will then not need assistance. Can you send it against io_uring-5.18? -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring: fix leaks on IOPOLL and CQE_SKIP 2022-04-15 23:51 ` Jens Axboe @ 2022-04-16 8:36 ` Pavel Begunkov 0 siblings, 0 replies; 12+ messages in thread From: Pavel Begunkov @ 2022-04-16 8:36 UTC (permalink / raw) To: Jens Axboe, io-uring On 4/16/22 00:51, Jens Axboe wrote: > On 4/15/22 4:53 PM, Jens Axboe wrote: >> On 4/15/22 4:41 PM, Pavel Begunkov wrote: >>> On 4/15/22 23:03, Jens Axboe wrote: >>>> On 4/15/22 3:05 PM, Pavel Begunkov wrote: >>>>> On 4/12/22 17:46, Jens Axboe wrote: >>>>>> On 4/12/22 10:41 AM, Jens Axboe wrote: >>>>>>> On 4/12/22 10:24 AM, Pavel Begunkov wrote: >>>>>>>> If all completed requests in io_do_iopoll() were marked with >>>>>>>> REQ_F_CQE_SKIP, we'll not only skip CQE posting but also >>>>>>>> io_free_batch_list() leaking memory and resources. >>>>>>>> >>>>>>>> Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially >>>>>>>> return the value greater than the real one, but iopolling will deal with >>>>>>>> it and the userspace will re-iopoll if needed. In anyway, I don't think >>>>>>>> there are many use cases for REQ_F_CQE_SKIP + IOPOLL. >>>>>>> >>>>>>> Ah good catch - yes probably not much practical concern, as the lack of >>>>>>> ordering for file IO means that CQE_SKIP isn't really useful for that >>>>>>> scenario. >>>>>> >>>>>> One potential snag is with the change we're now doing >>>>>> io_cqring_ev_posted_iopoll() even if didn't post an event. Again >>>>>> probably not a practical concern, but it is theoretically a violation >>>>>> if an eventfd is used. >>>>> Looks this didn't get applied. Are you concerned about eventfd? >>>> >>>> Yep, was hoping to get a reply back, so just deferred it for now. >>>> >>>>> Is there any good reason why the userspace can't tolerate spurious >>>>> eventfd events? Because I don't think we should care this case >>>> >>>> I always forget the details on that, but we've had cases like this in >>>> the past where some applications assume that if they got N eventfd >>>> events, then are are also N events in the ring. Which granted is a bit >>>> odd, but it does also make some sense. Why would you have more eventfd >>>> events posted than events? >>> >>> For the same reason why it can get less eventfd events than there are >>> CQEs, as for me it's only a communication channel but not a >>> replacement for completion events. >> >> That part is inherently racy in that we might get some CQEs while we >> respond to the initial eventfd notifications. But I'm totally agreeing >> with you, and it doesn't seem like a big deal to me. >> >>> Ok, we don't want to break old applications, but it's a new most >>> probably not widely used feature, and we can say that the userspace >>> has to handle spurious eventfd. >> >> If I were to guess, I'd say it's probably epoll + eventfd conversions. >> But it should just be made explicit. Since events reaped and checked >> happen differently anyway, it seems like a bad assumption to make that >> eventfd notifications == events available. > > The patch is against the 5.19 branch, but it might be a better idea > to do this for 5.18 as the 5.17 backport will then not need > assistance. Can you send it against io_uring-5.18? sure -- Pavel Begunkov ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring: fix leaks on IOPOLL and CQE_SKIP 2022-04-15 22:53 ` Jens Axboe 2022-04-15 23:51 ` Jens Axboe @ 2022-04-16 8:39 ` Pavel Begunkov 2022-04-16 13:23 ` Jens Axboe 1 sibling, 1 reply; 12+ messages in thread From: Pavel Begunkov @ 2022-04-16 8:39 UTC (permalink / raw) To: Jens Axboe, io-uring On 4/15/22 23:53, Jens Axboe wrote: > On 4/15/22 4:41 PM, Pavel Begunkov wrote: >> On 4/15/22 23:03, Jens Axboe wrote: >>> On 4/15/22 3:05 PM, Pavel Begunkov wrote: >>>> On 4/12/22 17:46, Jens Axboe wrote: >>>>> On 4/12/22 10:41 AM, Jens Axboe wrote: >>>>>> On 4/12/22 10:24 AM, Pavel Begunkov wrote: >>>>>>> If all completed requests in io_do_iopoll() were marked with >>>>>>> REQ_F_CQE_SKIP, we'll not only skip CQE posting but also >>>>>>> io_free_batch_list() leaking memory and resources. >>>>>>> >>>>>>> Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially >>>>>>> return the value greater than the real one, but iopolling will deal with >>>>>>> it and the userspace will re-iopoll if needed. In anyway, I don't think >>>>>>> there are many use cases for REQ_F_CQE_SKIP + IOPOLL. >>>>>> >>>>>> Ah good catch - yes probably not much practical concern, as the lack of >>>>>> ordering for file IO means that CQE_SKIP isn't really useful for that >>>>>> scenario. >>>>> >>>>> One potential snag is with the change we're now doing >>>>> io_cqring_ev_posted_iopoll() even if didn't post an event. Again >>>>> probably not a practical concern, but it is theoretically a violation >>>>> if an eventfd is used. >>>> Looks this didn't get applied. Are you concerned about eventfd? >>> >>> Yep, was hoping to get a reply back, so just deferred it for now. >>> >>>> Is there any good reason why the userspace can't tolerate spurious >>>> eventfd events? Because I don't think we should care this case >>> >>> I always forget the details on that, but we've had cases like this in >>> the past where some applications assume that if they got N eventfd >>> events, then are are also N events in the ring. Which granted is a bit >>> odd, but it does also make some sense. Why would you have more eventfd >>> events posted than events? >> >> For the same reason why it can get less eventfd events than there are >> CQEs, as for me it's only a communication channel but not a >> replacement for completion events. > > That part is inherently racy in that we might get some CQEs while we > respond to the initial eventfd notifications. But I'm totally agreeing > with you, and it doesn't seem like a big deal to me. > >> Ok, we don't want to break old applications, but it's a new most >> probably not widely used feature, and we can say that the userspace >> has to handle spurious eventfd. > > If I were to guess, I'd say it's probably epoll + eventfd conversions. > But it should just be made explicit. Since events reaped and checked Didn't get it, what should be made explicit? Do you mean documenting that there might be spurious eventfd events or something else? > happen differently anyway, it seems like a bad assumption to make that > eventfd notifications == events available. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring: fix leaks on IOPOLL and CQE_SKIP 2022-04-16 8:39 ` Pavel Begunkov @ 2022-04-16 13:23 ` Jens Axboe 0 siblings, 0 replies; 12+ messages in thread From: Jens Axboe @ 2022-04-16 13:23 UTC (permalink / raw) To: Pavel Begunkov, io-uring On 4/16/22 2:39 AM, Pavel Begunkov wrote: > On 4/15/22 23:53, Jens Axboe wrote: >> On 4/15/22 4:41 PM, Pavel Begunkov wrote: >>> On 4/15/22 23:03, Jens Axboe wrote: >>>> On 4/15/22 3:05 PM, Pavel Begunkov wrote: >>>>> On 4/12/22 17:46, Jens Axboe wrote: >>>>>> On 4/12/22 10:41 AM, Jens Axboe wrote: >>>>>>> On 4/12/22 10:24 AM, Pavel Begunkov wrote: >>>>>>>> If all completed requests in io_do_iopoll() were marked with >>>>>>>> REQ_F_CQE_SKIP, we'll not only skip CQE posting but also >>>>>>>> io_free_batch_list() leaking memory and resources. >>>>>>>> >>>>>>>> Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially >>>>>>>> return the value greater than the real one, but iopolling will deal with >>>>>>>> it and the userspace will re-iopoll if needed. In anyway, I don't think >>>>>>>> there are many use cases for REQ_F_CQE_SKIP + IOPOLL. >>>>>>> >>>>>>> Ah good catch - yes probably not much practical concern, as the lack of >>>>>>> ordering for file IO means that CQE_SKIP isn't really useful for that >>>>>>> scenario. >>>>>> >>>>>> One potential snag is with the change we're now doing >>>>>> io_cqring_ev_posted_iopoll() even if didn't post an event. Again >>>>>> probably not a practical concern, but it is theoretically a violation >>>>>> if an eventfd is used. >>>>> Looks this didn't get applied. Are you concerned about eventfd? >>>> >>>> Yep, was hoping to get a reply back, so just deferred it for now. >>>> >>>>> Is there any good reason why the userspace can't tolerate spurious >>>>> eventfd events? Because I don't think we should care this case >>>> >>>> I always forget the details on that, but we've had cases like this in >>>> the past where some applications assume that if they got N eventfd >>>> events, then are are also N events in the ring. Which granted is a bit >>>> odd, but it does also make some sense. Why would you have more eventfd >>>> events posted than events? >>> >>> For the same reason why it can get less eventfd events than there are >>> CQEs, as for me it's only a communication channel but not a >>> replacement for completion events. >> >> That part is inherently racy in that we might get some CQEs while we >> respond to the initial eventfd notifications. But I'm totally agreeing >> with you, and it doesn't seem like a big deal to me. >> >>> Ok, we don't want to break old applications, but it's a new most >>> probably not widely used feature, and we can say that the userspace >>> has to handle spurious eventfd. >> >> If I were to guess, I'd say it's probably epoll + eventfd conversions. >> But it should just be made explicit. Since events reaped and checked > > Didn't get it, what should be made explicit? Do you mean documenting > that there might be spurious eventfd events or something else? Right, we basically have both cases: - A batch of completions are done, silly to do more than one eventfd notification for that. - Spurious notifications, like this example with polling and CQE_SKIP. This one means that we may post a notification, but there are no events to be found. It just needs to be clear that an eventfd notification just means that you can check for events, it doesn't tell you anything about the number of events that may be available. Spurious events should be avoid, if possible, and are worse than batched ones imho. Getting an eventfd notification yet having no events available is silly. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring: fix leaks on IOPOLL and CQE_SKIP 2022-04-15 22:41 ` Pavel Begunkov 2022-04-15 22:53 ` Jens Axboe @ 2022-04-16 8:34 ` Pavel Begunkov 1 sibling, 0 replies; 12+ messages in thread From: Pavel Begunkov @ 2022-04-16 8:34 UTC (permalink / raw) To: Jens Axboe, io-uring On 4/15/22 23:41, Pavel Begunkov wrote: > On 4/15/22 23:03, Jens Axboe wrote: >> On 4/15/22 3:05 PM, Pavel Begunkov wrote: >>> On 4/12/22 17:46, Jens Axboe wrote: >>>> On 4/12/22 10:41 AM, Jens Axboe wrote: >>>>> On 4/12/22 10:24 AM, Pavel Begunkov wrote: >>>>>> If all completed requests in io_do_iopoll() were marked with >>>>>> REQ_F_CQE_SKIP, we'll not only skip CQE posting but also >>>>>> io_free_batch_list() leaking memory and resources. >>>>>> >>>>>> Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially >>>>>> return the value greater than the real one, but iopolling will deal with >>>>>> it and the userspace will re-iopoll if needed. In anyway, I don't think >>>>>> there are many use cases for REQ_F_CQE_SKIP + IOPOLL. >>>>> >>>>> Ah good catch - yes probably not much practical concern, as the lack of >>>>> ordering for file IO means that CQE_SKIP isn't really useful for that >>>>> scenario. >>>> >>>> One potential snag is with the change we're now doing >>>> io_cqring_ev_posted_iopoll() even if didn't post an event. Again >>>> probably not a practical concern, but it is theoretically a violation >>>> if an eventfd is used. >>> Looks this didn't get applied. Are you concerned about eventfd? >> >> Yep, was hoping to get a reply back, so just deferred it for now. >> >>> Is there any good reason why the userspace can't tolerate spurious >>> eventfd events? Because I don't think we should care this case >> >> I always forget the details on that, but we've had cases like this in >> the past where some applications assume that if they got N eventfd >> events, then are are also N events in the ring. Which granted is a bit >> odd, but it does also make some sense. Why would you have more eventfd >> events posted than events? > > For the same reason why it can get less eventfd events than there are > CQEs, as for me it's only a communication channel but not a s/communication/notification/ > replacement for completion events. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-04-16 13:24 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-04-12 16:24 [PATCH 1/1] io_uring: fix leaks on IOPOLL and CQE_SKIP Pavel Begunkov 2022-04-12 16:41 ` Jens Axboe 2022-04-12 16:46 ` Jens Axboe 2022-04-15 21:05 ` Pavel Begunkov 2022-04-15 22:03 ` Jens Axboe 2022-04-15 22:41 ` Pavel Begunkov 2022-04-15 22:53 ` Jens Axboe 2022-04-15 23:51 ` Jens Axboe 2022-04-16 8:36 ` Pavel Begunkov 2022-04-16 8:39 ` Pavel Begunkov 2022-04-16 13:23 ` Jens Axboe 2022-04-16 8:34 ` Pavel Begunkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox