public inbox for [email protected]
 help / color / mirror / Atom feed
From: Constantine Gavrilov <[email protected]>
To: Jens Axboe <[email protected]>
Cc: [email protected]
Subject: Re: Short sends returned in IORING
Date: Wed, 11 May 2022 22:30:00 +0300	[thread overview]
Message-ID: <CAAL3td1ctkObUzB_XQRL50AY852amsGxwWrZ9kyw6nADqve7pA@mail.gmail.com> (raw)
In-Reply-To: <CAAL3td2TwrQft6hsFa_dMae=6wwRTK+4W3pt5JwhtC8zUbvxCg@mail.gmail.com>

On Wed, May 11, 2022 at 6:33 PM Constantine Gavrilov
<[email protected]> wrote:
>
> On Wed, May 11, 2022 at 6:11 PM Constantine Gavrilov
> <[email protected]> wrote:
> >
> > On Wed, May 11, 2022 at 5:56 PM Constantine Gavrilov
> > <[email protected]> wrote:
> > >
> > > On Wed, May 4, 2022 at 6:55 PM Jens Axboe <[email protected]> wrote:
> > > >
> > > > On 5/4/22 9:28 AM, Jens Axboe wrote:
> > > > > On 5/4/22 9:21 AM, Constantine Gavrilov wrote:
> > > > >> On Wed, May 4, 2022 at 4:54 PM Jens Axboe <[email protected]> wrote:
> > > > >>>
> > > > >>> On 5/3/22 5:05 PM, Constantine Gavrilov wrote:
> > > > >>>> Jens:
> > > > >>>>
> > > > >>>> This is related to the previous thread "Fix MSG_WAITALL for
> > > > >>>> IORING_OP_RECV/RECVMSG".
> > > > >>>>
> > > > >>>> We have a similar issue with TCP socket sends. I see short sends
> > > > >>>> regarding of the method (I tried write, writev, send, and sendmsg
> > > > >>>> opcodes, while using MSG_WAITALL for send and sendmsg). It does not
> > > > >>>> make a difference.
> > > > >>>>
> > > > >>>> Most of the time, sends are not short, and I never saw short sends
> > > > >>>> with loopback and my app. But on real network media, I see short
> > > > >>>> sends.
> > > > >>>>
> > > > >>>> This is a real problem, since because of this it is not possible to
> > > > >>>> implement queue size of > 1 on a TCP socket, which limits the benefit
> > > > >>>> of IORING. When we have a short send, the next send in queue will
> > > > >>>> "corrupt" the stream.
> > > > >>>>
> > > > >>>> Can we have complete send before it completes, unless the socket is
> > > > >>>> disconnected?
> > > > >>>
> > > > >>> I'm guessing that this happens because we get a task_work item queued
> > > > >>> after we've processed some of the send, but not all. What kernel are you
> > > > >>> using?
> > > > >>>
> > > > >>> This:
> > > > >>>
> > > > >>> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.19/io_uring&id=4c3c09439c08b03d9503df0ca4c7619c5842892e
> > > > >>>
> > > > >>> is queued up for 5.19, would be worth trying.
> > > > >>>
> > > > >>> --
> > > > >>> Jens Axboe
> > > > >>>
> > > > >>
> > > > >> Jens:
> > > > >>
> > > > >> Thank you for your reply.
> > > > >>
> > > > >> The kernel is 5.17.4-200.fc35.x86_64. I have looked at the patch. With
> > > > >> the solution in place, I am wondering whether it will be possible to
> > > > >> use multiple uring send IOs on the same socket. I expect that Linux
> > > > >> TCP will serialize multiple send operations on the same socket. I am
> > > > >> not sure it happens with uring (meaning that socket is blocked for
> > > > >> processing a new IO until the pending IO completes). Do I need
> > > > >> IOSQE_IO_DRAIN / IOSQE_IO_LINK for this to work? Would not be optimal
> > > > >> because of multiple different sockets in the same uring. While I
> > > > >> already have a workaround in the form of a "software" queue for
> > > > >> streaming data on TCP sockets, I would rather have kernel to do
> > > > >> "native" queueing in sockets layer, and have exrtra CPU cycles
> > > > >> available to the  application.
> > > > >
> > > > > The patch above will mess with ordering potentially. If the cause is as
> > > > > I suspect, task_work causing it to think it's signaled, then the better
> > > > > approach may indeed be to just flush that work and retry without
> > > > > re-queueing the current one. I can try a patch against 5.18 if you are
> > > > > willing and able to test?
> > > >
> > > > You can try something like this, if you run my for-5.19/io_uring branch.
> > > > I'd be curious to know if this solves the short send issue for you.
> > > >
> > > >
> > > > diff --git a/fs/io_uring.c b/fs/io_uring.c
> > > > index f6b6db216478..b835e80be1fa 100644
> > > > --- a/fs/io_uring.c
> > > > +++ b/fs/io_uring.c
> > > > @@ -5684,6 +5684,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
> > > >         if (flags & MSG_WAITALL)
> > > >                 min_ret = iov_iter_count(&kmsg->msg.msg_iter);
> > > >
> > > > +retry:
> > > >         ret = __sys_sendmsg_sock(sock, &kmsg->msg, flags);
> > > >
> > > >         if (ret < min_ret) {
> > > > @@ -5694,6 +5695,8 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
> > > >                 if (ret > 0 && io_net_retry(sock, flags)) {
> > > >                         sr->done_io += ret;
> > > >                         req->flags |= REQ_F_PARTIAL_IO;
> > > > +                       if (io_run_task_work())
> > > > +                               goto retry;
> > > >                         return io_setup_async_msg(req, kmsg);
> > > >                 }
> > > >                 req_set_fail(req);
> > > > @@ -5744,6 +5747,7 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags)
> > > >                 min_ret = iov_iter_count(&msg.msg_iter);
> > > >
> > > >         msg.msg_flags = flags;
> > > > +retry:
> > > >         ret = sock_sendmsg(sock, &msg);
> > > >         if (ret < min_ret) {
> > > >                 if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
> > > > @@ -5755,6 +5759,8 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags)
> > > >                         sr->buf += ret;
> > > >                         sr->done_io += ret;
> > > >                         req->flags |= REQ_F_PARTIAL_IO;
> > > > +                       if (io_run_task_work())
> > > > +                               goto retry;
> > > >                         return -EAGAIN;
> > > >                 }
> > > >                 req_set_fail(req);
> > > >
> > > > --
> > > > Jens Axboe
> > > >
> > >
> > > Jens:
> > >
> > > I was able to test the first change on the top of Linus kernel git (5.18.0-rc6).
> > >
> > > I do not get short sends anymore, but I get corruption in  sent
> > > packets (corruption is detected by the receiver). It looks like short
> > > sends handled by the patch intermix data from multiple send SQEs in
> > > the stream, so ordering of multiple SQEs in URING becomes broken.
> > >
> > > To test it, I had two implementations of the send functions:
> > > 1. Uses SEND opcode, asserts on short sends. No asserts but data corruption.
> > > 2. Uses TCP send queue implementation (internally uses SEND and
> > > SENDMSG opcodes in URING, only one pending send at a time, and tail of
> > > the short sends is resent until all data is sent). This always works.
> > >
> > > I would like to test the second patch now. Is it on the top of the
> > > first patch or by itself? Do I really need your tree for that? If yes,
> > > can you send me the git pull info, please?
> > >
> > > --
> > > ----------------------------------------
> > > Constantine Gavrilov
> > > Storage Architect
> > > Master Inventor
> > > Tel-Aviv IBM Storage Lab
> > > 1 Azrieli Center, Tel-Aviv
> > > ----------------------------------------
> >
> > Jens: for git branch, is it under
> > git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git?
> >
> > --
> > ----------------------------------------
> > Constantine Gavrilov
> > Storage Architect
> > Master Inventor
> > Tel-Aviv IBM Storage Lab
> > 1 Azrieli Center, Tel-Aviv
> > ----------------------------------------
>
> Jens: checked out yout branch, the first patch is already in, applied
> the second. Will build and test now.
>
> --
> ----------------------------------------
> Constantine Gavrilov
> Storage Architect
> Master Inventor
> Tel-Aviv IBM Storage Lab
> 1 Azrieli Center, Tel-Aviv
> ----------------------------------------

Jens: The second patch also does not resolve corruption in the send
stream, send packets are still intermixed.

-- 
----------------------------------------
Constantine Gavrilov
Storage Architect
Master Inventor
Tel-Aviv IBM Storage Lab
1 Azrieli Center, Tel-Aviv
----------------------------------------

  reply	other threads:[~2022-05-11 19:31 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-03 23:05 Short sends returned in IORING Constantine Gavrilov
2022-05-04 13:54 ` Jens Axboe
2022-05-04 15:21   ` Constantine Gavrilov
2022-05-04 15:28     ` Jens Axboe
2022-05-04 15:55       ` Jens Axboe
2022-05-11 14:56         ` Constantine Gavrilov
2022-05-11 15:11           ` Constantine Gavrilov
2022-05-11 15:33             ` Constantine Gavrilov
2022-05-11 19:30               ` Constantine Gavrilov [this message]
2022-05-12 16:28           ` Jens Axboe
2022-05-15 13:36             ` Constantine Gavrilov
2022-05-16 12:50               ` Constantine Gavrilov
2022-05-04 16:18   ` Constantine Gavrilov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAL3td1ctkObUzB_XQRL50AY852amsGxwWrZ9kyw6nADqve7pA@mail.gmail.com \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox