Re: [PATCH] io_uring: export cq overflow status to userspace

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Xiaoguang Wang <[email protected]>
To: Jens Axboe <[email protected]>
Cc: [email protected], [email protected]
Subject: Re: [PATCH] io_uring: export cq overflow status to userspace
Date: Thu, 9 Jul 2020 08:52:53 +0800	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

hi,

> On 7/8/20 10:51 AM, Xiaoguang Wang wrote:
>> hi,
>>
>>> On 7/8/20 9:39 AM, Xiaoguang Wang wrote:
>>>> hi,
>>>>
>>>>> On 7/7/20 11:29 PM, Xiaoguang Wang wrote:
>>>>>> I modify above test program a bit:
>>>>>> #include <errno.h>
>>>>>> #include <stdio.h>
>>>>>> #include <unistd.h>
>>>>>> #include <stdlib.h>
>>>>>> #include <string.h>
>>>>>> #include <fcntl.h>
>>>>>> #include <assert.h>
>>>>>>
>>>>>> #include "liburing.h"
>>>>>>
>>>>>> static void test_cq_overflow(struct io_uring *ring)
>>>>>> {
>>>>>>             struct io_uring_cqe *cqe;
>>>>>>             struct io_uring_sqe *sqe;
>>>>>>             int issued = 0;
>>>>>>             int ret = 0;
>>>>>>             int i;
>>>>>>
>>>>>>             for (i = 0; i < 33; i++) {
>>>>>>                     sqe = io_uring_get_sqe(ring);
>>>>>>                     if (!sqe) {
>>>>>>                             fprintf(stderr, "get sqe failed\n");
>>>>>>                             break;;
>>>>>>                     }
>>>>>>                     ret = io_uring_submit(ring);
>>>>>>                     if (ret <= 0) {
>>>>>>                             if (ret != -EBUSY)
>>>>>>                                     fprintf(stderr, "sqe submit failed: %d\n", ret);
>>>>>>                             break;
>>>>>>                     }
>>>>>>                     issued++;
>>>>>>             }
>>>>>>
>>>>>>             printf("issued requests: %d\n", issued);
>>>>>>
>>>>>>             while (issued) {
>>>>>>                     ret = io_uring_peek_cqe(ring, &cqe);
>>>>>>                     if (ret) {
>>>>>>                             if (ret != -EAGAIN) {
>>>>>>                                     fprintf(stderr, "peek completion failed: %s\n",
>>>>>>                                             strerror(ret));
>>>>>>                                     break;
>>>>>>                             }
>>>>>>                             printf("left requets: %d %d\n", issued, IO_URING_READ_ONCE(*ring->sq.kflags));
>>>>>>                             continue;
>>>>>>                     }
>>>>>>                     io_uring_cqe_seen(ring, cqe);
>>>>>>                     issued--;
>>>>>>                     printf("left requets: %d\n", issued);
>>>>>>             }
>>>>>> }
>>>>>>
>>>>>> int main(int argc, char *argv[])
>>>>>> {
>>>>>>             int ret;
>>>>>>             struct io_uring ring;
>>>>>>
>>>>>>             ret = io_uring_queue_init(16, &ring, 0);
>>>>>>             if (ret) {
>>>>>>                     fprintf(stderr, "ring setup failed: %d\n", ret);
>>>>>>                     return 1;
>>>>>>             }
>>>>>>
>>>>>>             test_cq_overflow(&ring);
>>>>>>             return 0;
>>>>>> }
>>>>>>
>>>>>> Though with your patches applied, we still can not peek the last cqe.
>>>>>> This test program will only issue 33 sqes, so it won't get EBUSY error.
>>>>>
>>>>> How about we make this even simpler, then - make the
>>>>> IORING_SQ_CQ_OVERFLOW actually track the state, rather than when we fail
>>>>> on submission. The liburing change would be the same, the kernel side
>>>>> would then look like the below.
>>>>>
>>>>>
>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>> index 4c9a494c9f9f..01981926cdf4 100644
>>>>> --- a/fs/io_uring.c
>>>>> +++ b/fs/io_uring.c
>>>>> @@ -1342,6 +1342,7 @@ static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
>>>>>     	if (cqe) {
>>>>>     		clear_bit(0, &ctx->sq_check_overflow);
>>>>>     		clear_bit(0, &ctx->cq_check_overflow);
>>>>> +		ctx->rings->sq_flags &= ~IORING_SQ_CQ_OVERFLOW;
>>>>>     	}
>>>>>     	spin_unlock_irqrestore(&ctx->completion_lock, flags);
>>>>>     	io_cqring_ev_posted(ctx);
>>>>> @@ -1379,6 +1380,7 @@ static void __io_cqring_fill_event(struct io_kiocb *req, long res, long cflags)
>>>>>     		if (list_empty(&ctx->cq_overflow_list)) {
>>>>>     			set_bit(0, &ctx->sq_check_overflow);
>>>>>     			set_bit(0, &ctx->cq_check_overflow);
>>>>> +			ctx->rings->sq_flags |= IORING_SQ_CQ_OVERFLOW;
>> Some callers to __io_cqring_fill_event() don't hold completion_lock, for example:
>> ==> io_iopoll_complete
>> ====> __io_cqring_fill_event()
>> So this patch maybe still not safe when SQPOLL is enabled.
>> Do you perfer adding a new lock or just do completion_lock here only when cq ring is overflowed?
> 
> The polled side isn't IRQ driven, so should be serialized separately. This works
> because we don't allow non-polled IO on a polled context, and vice versa. If not,
> we'd have bigger issues than just the flags modification.
> 
> So it should be fine as-is.
Thanks for explanation, previously I worry about below race:
==> io_uring_enter
==== > io_iopoll_check
======> io_iopoll_getevents
========> io_do_iopoll
==========> io_iopoll_complete
============> __io_cqring_fill_event, which will modify sq_flags

and

==> io_sq_thread
====> will go to sleep, so also modify sq_flags.

But indeed above race won't happen, becuase if SQPOLL and IOPOLL are both
enabled, io_uring_enter will rely on io_sq_thread to do the iopoll job.
		if (ctx->flags & IORING_SETUP_IOPOLL &&
		    !(ctx->flags & IORING_SETUP_SQPOLL)) {
			ret = io_iopoll_check(ctx, &nr_events, min_complete);
		} else {
			ret = io_cqring_wait(ctx, min_complete, sig, sigsz);
		}

Regards,
Xiaoguang Wang


>

next prev parent reply	other threads:[~2020-07-09  0:52 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-07 13:24 [PATCH] io_uring: export cq overflow status to userspace Xiaoguang Wang
2020-07-07 14:28 ` Jens Axboe
2020-07-07 16:21   ` Jens Axboe
2020-07-07 16:25     ` Pavel Begunkov
2020-07-07 16:30       ` Jens Axboe
2020-07-07 16:36     ` Xiaoguang Wang
2020-07-07 17:23       ` Jens Axboe
2020-07-08  3:25     ` Xiaoguang Wang
2020-07-08  3:46       ` Jens Axboe
2020-07-08  5:29         ` Xiaoguang Wang
2020-07-08 15:29           ` Jens Axboe
2020-07-08 15:39             ` Xiaoguang Wang
2020-07-08 15:41               ` Jens Axboe
2020-07-08 16:51                 ` Xiaoguang Wang
2020-07-08 21:33                   ` Jens Axboe
2020-07-09  0:52                     ` Xiaoguang Wang [this message]
2020-07-07 16:29   ` Xiaoguang Wang
2020-07-07 16:30     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=39cd889e-e4de-95b6-36f0-53be95636193@linux.alibaba.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox