public inbox for [email protected]
 help / color / mirror / Atom feed
From: Xiaoguang Wang <[email protected]>
To: Jens Axboe <[email protected]>, [email protected]
Cc: [email protected]
Subject: Re: [PATCH] __io_uring_get_cqe: eliminate unnecessary io_uring_enter() syscalls
Date: Wed, 4 Mar 2020 21:27:05 +0800	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

hi,

> On 3/2/20 8:24 AM, Jens Axboe wrote:
>> On 3/2/20 7:05 AM, Jens Axboe wrote:
>>> On 3/1/20 9:18 PM, Xiaoguang Wang wrote:
>>>> When user applis programming mode, like sumbit one sqe and wait its
>>>> completion event, __io_uring_get_cqe() will result in many unnecessary
>>>> syscalls, see below test program:
>>>>
>>>>      int main(int argc, char *argv[])
>>>>      {
>>>>              struct io_uring ring;
>>>>              int fd, ret;
>>>>              struct io_uring_sqe *sqe;
>>>>              struct io_uring_cqe *cqe;
>>>>              struct iovec iov;
>>>>              off_t offset, filesize = 0;
>>>>              void *buf;
>>>>
>>>>              if (argc < 2) {
>>>>                      printf("%s: file\n", argv[0]);
>>>>                      return 1;
>>>>              }
>>>>
>>>>              ret = io_uring_queue_init(4, &ring, 0);
>>>>              if (ret < 0) {
>>>>                      fprintf(stderr, "queue_init: %s\n", strerror(-ret));
>>>>                      return 1;
>>>>              }
>>>>
>>>>              fd = open(argv[1], O_RDONLY | O_DIRECT);
>>>>              if (fd < 0) {
>>>>                      perror("open");
>>>>                      return 1;
>>>>              }
>>>>
>>>>              if (posix_memalign(&buf, 4096, 4096))
>>>>                      return 1;
>>>>              iov.iov_base = buf;
>>>>              iov.iov_len = 4096;
>>>>
>>>>              offset = 0;
>>>>              do {
>>>>                      sqe = io_uring_get_sqe(&ring);
>>>>                      if (!sqe) {
>>>>                              printf("here\n");
>>>>                              break;
>>>>                      }
>>>>                      io_uring_prep_readv(sqe, fd, &iov, 1, offset);
>>>>
>>>>                      ret = io_uring_submit(&ring);
>>>>                      if (ret < 0) {
>>>>                              fprintf(stderr, "io_uring_submit: %s\n", strerror(-ret));
>>>>                              return 1;
>>>>                      }
>>>>
>>>>                      ret = io_uring_wait_cqe(&ring, &cqe);
>>>>                      if (ret < 0) {
>>>>                              fprintf(stderr, "io_uring_wait_cqe: %s\n", strerror(-ret));
>>>>                              return 1;
>>>>                      }
>>>>
>>>>                      if (cqe->res <= 0) {
>>>>                              if (cqe->res < 0) {
>>>>                                      fprintf(stderr, "got eror: %d\n", cqe->res);
>>>>                                      ret = 1;
>>>>                              }
>>>>                              io_uring_cqe_seen(&ring, cqe);
>>>>                              break;
>>>>                      }
>>>>                      offset += cqe->res;
>>>>                      filesize += cqe->res;
>>>>                      io_uring_cqe_seen(&ring, cqe);
>>>>              } while (1);
>>>>
>>>>              printf("filesize: %ld\n", filesize);
>>>>              close(fd);
>>>>              io_uring_queue_exit(&ring);
>>>>              return 0;
>>>>      }
>>>>
>>>> dd if=/dev/zero of=testfile bs=4096 count=16
>>>> ./test  testfile
>>>> and use bpftrace to trace io_uring_enter syscalls, in original codes,
>>>> [lege@localhost ~]$ sudo bpftrace -e "tracepoint:syscalls:sys_enter_io_uring_enter {@c[tid] = count();}"
>>>> Attaching 1 probe...
>>>> @c[11184]: 49
>>>> Above test issues 49 syscalls, it's counterintuitive. After looking
>>>> into the codes, it's because __io_uring_get_cqe issue one more syscall,
>>>> indded when __io_uring_get_cqe issues the first syscall, one cqe should
>>>> already be ready, we don't need to wait again.
>>>>
>>>> To fix this issue, after the first syscall, set wait_nr to be zero, with
>>>> tihs patch, bpftrace shows the number of io_uring_enter syscall is 33.
>>>
>>> Thanks, that's a nice fix, we definitely don't want to be doing
>>> 50% more system calls than we have to...
>>
>> Actually, don't think the fix is quite safe. For one, if we get an error
>> on the __io_uring_enter(), then we may not have waited for entries. Or if
>> we submitted less than we thought we would, we would not have waited
>> either. So we need to check for full success before deeming it safe to
>> clear wait_nr.
> 
> Unrelated fix:
> 
> https://git.kernel.dk/cgit/liburing/commit/?id=0edcef5700fd558d2548532e0e5db26cb74d19ca
> 
> and then a fix for your patch on top:
> 
> https://git.kernel.dk/cgit/liburing/commit/?id=dc14e30a086082b6aebc3130948e2453e3bd3b2a
In this patch, seesms that you forgot to delete:
     if (wait_nr)
         wait_nr = 0;

With these two codes removed, my original test case still produces the same amount
of io_uring_enter syscalls, so you can just remove them safely.

Regards,
Xiaoguang Wang



> 
> Can you double check that your original test case still produces the
> same amount of system calls with the fix in place?
> 

  parent reply	other threads:[~2020-03-04 13:27 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-02  4:18 [PATCH] __io_uring_get_cqe: eliminate unnecessary io_uring_enter() syscalls Xiaoguang Wang
2020-03-02 14:05 ` Jens Axboe
2020-03-02 15:24   ` Jens Axboe
2020-03-02 15:37     ` Jens Axboe
2020-03-03 13:11       ` Xiaoguang Wang
2020-03-03 14:35         ` Jens Axboe
2020-03-04 13:27       ` Xiaoguang Wang [this message]
2020-03-04 13:57         ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40b6ef0c-7e13-a476-0916-3ec293c244d0@linux.alibaba.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox