public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Ming Lei <[email protected]>
Cc: Stefan Metzmacher <[email protected]>,
	[email protected], Pavel Begunkov <[email protected]>,
	David Ahern <[email protected]>
Subject: Re: IOSQE_IO_LINK vs. short send of SOCK_STREAM
Date: Fri, 13 Jan 2023 20:47:22 -0700	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <Y8IPl3PsdAlAfvkq@T590>

On 1/13/23 7:12?PM, Ming Lei wrote:
> On Sat, Jan 14, 2023 at 08:27:37AM +0800, Ming Lei wrote:
>> On Fri, Jan 13, 2023 at 11:01:51AM -0700, Jens Axboe wrote:
>>> On 1/13/23 10:51?AM, Jens Axboe wrote:
>>>> On 1/13/23 3:12?AM, Ming Lei wrote:
>>>>> Hello,
>>>>>
>>>>> On Thu, Jan 12, 2023 at 08:35:36AM +0100, Stefan Metzmacher wrote:
>>>>>> Am 12.01.23 um 04:40 schrieb Jens Axboe:
>>>>>>> On 1/11/23 8:27?PM, Ming Lei wrote:
>>>>>>>> Hi Stefan and Jens,
>>>>>>>>
>>>>>>>> Thanks for the help.
>>>>>>>>
>>>>>>>> BTW, the issue is observed when I write ublk-nbd:
>>>>>>>>
>>>>>>>> https://github.com/ming1/ubdsrv/commits/nbd
>>>>>>>>
>>>>>>>> and it isn't completed yet(multiple send sqe chains not serialized
>>>>>>>> yet), the issue is triggered when writing big chunk data to ublk-nbd.
>>>>>>>
>>>>>>> Gotcha
>>>>>>>
>>>>>>>> On Wed, Jan 11, 2023 at 05:32:00PM +0100, Stefan Metzmacher wrote:
>>>>>>>>> Hi Ming,
>>>>>>>>>
>>>>>>>>>> Per my understanding, a short send on SOCK_STREAM should terminate the
>>>>>>>>>> remainder of the SQE chain built by IOSQE_IO_LINK.
>>>>>>>>>>
>>>>>>>>>> But from my observation, this point isn't true when using io_sendmsg or
>>>>>>>>>> io_sendmsg_zc on TCP socket, and the other remainder of the chain still
>>>>>>>>>> can be completed after one short send is found. MSG_WAITALL is off.
>>>>>>>>>
>>>>>>>>> This is due to legacy reasons, you need pass MSG_WAITALL explicitly
>>>>>>>>> in order to a retry or an error on a short write...
>>>>>>>>> It should work for send, sendmsg, sendmsg_zc, recv and recvmsg.
>>>>>>>>
>>>>>>>> Turns out there is another application bug in which recv sqe may cut in the
>>>>>>>> send sqe chain.
>>>>>>>>
>>>>>>>> After the issue is fixed, if MSG_WAITALL is set, short send can't be
>>>>>>>> observed any more. But if MSG_WAITALL isn't set, short send can be
>>>>>>>> observed and the send io chain still won't be terminated.
>>>>>>>
>>>>>>> Right, if MSG_WAITALL is set, then the whole thing will be written. If
>>>>>>> we get a short send, it's retried appropriately. Unless an error occurs,
>>>>>>> it should send the whole thing.
>>>>>>>
>>>>>>>> So if MSG_WAITALL is set, will io_uring be responsible for retry in case
>>>>>>>> of short send, and application needn't to take care of it?
>>>>>>
>>>>>> With new kernels yes, but the application should be prepared to have retry
>>>>>> logic in order to be compatible with older kernels.
>>>>>
>>>>> Now ublk-nbd can be played, mkfs/mount and fio starts to work.
>>>>>
>>>>> But short send still can be observed sometimes when sending nbd write
>>>>> request, which is done by sendmsg(), and the message includes two vectors,
>>>>> (the 1st is the nbd_request, another one is the data to be written to disk).
>>>>>
>>>>> Short send is reported by cqe in which cqe->res is always 28, which is
>>>>> size of 'struct nbd_request', also the length of the 1st io vec. And not
>>>>> see send cqe failure message.
>>>>>
>>>>> And MSG_WAITALL is set for all ublk-nbd io actually.
>>>>>
>>>>> Follows the steps:
>>>>>
>>>>> 1) install liburing 2.0+
>>>>>
>>>>> 2) build ublk & reproduce the issue:
>>>>>
>>>>> - git clone https://github.com/ming1/ubdsrv.git -b nbd
>>>>>
>>>>> - cd ubdsrv
>>>>>
>>>>> - vim build_with_liburing_src && set LIBURING_DIR to your liburing dir
>>>>>
>>>>> - ./build_with_liburing_src&& make -j4
>>>>>
>>>>> 3) run the nbd test
>>>>> - cd ubdsrv
>>>>> - make test T=nbd
>>>>>
>>>>> Sometimes the test hangs, and the following log can be observed
>>>>> in syslog:
>>>>>
>>>>> nbd_send_req_done: short send/receive tag 2 op 1 8000000000800002, len 524316 written 28 cqe flags 0
>>>>> ...
>>>>>
>>>>
>>>> I can reproduce this, but it's a SEND that ends up being triggered,
>>>> not a SENDMSG. Should the payload carrying op not be a SENDMSG? I'm
>>>> assuming two vecs for that one.
>>>
>>> Added some debug and it looks like the request was indeed send up
>>> and is using IORING_OP_SEND and that the 28 is what was requested.
>>> But the completion side seems to think it's a SENDMSG and we should've
>>> received more?
>>>
>>> I think this needs a bit of debugging on the userspace side first.
>>
>> Yeah, turns out it is indeed one userspace bug, IOSQE_IO_LINK is cleared
>> wrong, and now the issue can't be triggered with the following fix:
>>
>> https://github.com/ming1/ubdsrv/commit/175ffd14ae2f8fa562134edfd4ac949f8050c108
> 
> Figured out, it is still one userspace issue.
> 
> For nbd request sent to server, the cqe could come after the
> ublk io request is completed which is triggered by nbd reply
> from server, then if new ublk io req is submitted to same slot, the
> new data length and op code could be read in nbd_send_req_done(),
> and the warning is triggered.

Figured it was some kind of data reuse issue, as it is consistent with
that. Glad you got it figured out.

-- 
Jens Axboe


      reply	other threads:[~2023-01-14  3:47 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-11 15:26 IOSQE_IO_LINK vs. short send of SOCK_STREAM Ming Lei
2023-01-11 15:49 ` Jens Axboe
2023-01-11 16:32 ` Stefan Metzmacher
2023-01-11 16:36   ` Jens Axboe
2023-01-12  3:27   ` Ming Lei
2023-01-12  3:40     ` Jens Axboe
2023-01-12  7:35       ` Stefan Metzmacher
2023-01-13 10:12         ` Ming Lei
2023-01-13 17:51           ` Jens Axboe
2023-01-13 18:01             ` Jens Axboe
2023-01-14  0:27               ` Ming Lei
2023-01-14  1:39                 ` Ming Lei
2023-01-14  2:12                 ` Ming Lei
2023-01-14  3:47                   ` Jens Axboe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox