From: Jens Axboe <[email protected]>
To: Constantine Gavrilov <[email protected]>,
[email protected]
Subject: Re: io_uring_enter() with opcode IORING_OP_RECV ignores MSG_WAITALL in msg_flags
Date: Wed, 23 Mar 2022 09:25:48 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAAL3td2kwj4Gf-q1zpVUpSgNKFKwXq0biuu7TF6um8ZAQaQo2Q@mail.gmail.com>
On 3/23/22 7:12 AM, Constantine Gavrilov wrote:
>> From: Jens Axboe <[email protected]>
>> Sent: Wednesday, March 23, 2022 14:19
>> To: Constantine Gavrilov <[email protected]>; [email protected] <[email protected]>
>> Cc: io-uring <[email protected]>
>> Subject: [EXTERNAL] Re: io_uring_enter() with opcode IORING_OP_RECV ignores MSG_WAITALL in msg_flags
>>
>> On 3/23/22 4:31 AM, Constantine Gavrilov wrote:
>>> I get partial receives on TCP socket, even though I specify
>>> MSG_WAITALL with IORING_OP_RECV opcode. Looking at tcpdump in
>>> wireshark, I see entire reassambled packet (+4k), so it is not a
>>> disconnect. The MTU is smaller than 4k.
>>>
>>> From the mailing list history, looks like this was discussed before
>>> and it seems the fix was supposed to be in. Can someone clarify the
>>> expected behavior?
>>>
>>> I do not think rsvmsg() has this issue.
>>
>> Do you have a test case? I added the io-uring list, that's the
>> appropriate forum for this kind of discussion.
>>
>> --
>> Jens Axboe
>
> Yes, I have a real test case. I cannot share it vebratim, but with a
> little effort I believe I can come with a simple code of
> client/server.
>
> It seems the issue shall be directly seen from the implementation, but
> if it is not so, I will provide a sample code.
>
> Forgot to mention that the issue is seen of Fedora kernel version
> 5.16.12-200.fc35.x86_64.
Can you try with the below? Neither recv nor recvmsg handle MSG_WAITALL
correctly as far as I can tell.
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 810d2bd90f4d..ee3848da885a 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -612,6 +612,7 @@ struct io_sr_msg {
int msg_flags;
int bgid;
size_t len;
+ size_t done_io;
};
struct io_open {
@@ -782,6 +783,7 @@ enum {
REQ_F_SKIP_LINK_CQES_BIT,
REQ_F_SINGLE_POLL_BIT,
REQ_F_DOUBLE_POLL_BIT,
+ REQ_F_PARTIAL_IO_BIT,
/* keep async read/write and isreg together and in order */
REQ_F_SUPPORT_NOWAIT_BIT,
REQ_F_ISREG_BIT,
@@ -844,6 +846,8 @@ enum {
REQ_F_SINGLE_POLL = BIT(REQ_F_SINGLE_POLL_BIT),
/* double poll may active */
REQ_F_DOUBLE_POLL = BIT(REQ_F_DOUBLE_POLL_BIT),
+ /* request has already done partial IO */
+ REQ_F_PARTIAL_IO = BIT(REQ_F_PARTIAL_IO_BIT),
};
struct async_poll {
@@ -1391,6 +1395,9 @@ static void io_kbuf_recycle(struct io_kiocb *req, unsigned issue_flags)
if (likely(!(req->flags & REQ_F_BUFFER_SELECTED)))
return;
+ /* don't recycle if we already did IO to this buffer */
+ if (req->flags & REQ_F_PARTIAL_IO)
+ return;
if (issue_flags & IO_URING_F_UNLOCKED)
mutex_lock(&ctx->uring_lock);
@@ -5431,12 +5438,14 @@ static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (req->ctx->compat)
sr->msg_flags |= MSG_CMSG_COMPAT;
#endif
+ sr->done_io = 0;
return 0;
}
static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_async_msghdr iomsg, *kmsg;
+ struct io_sr_msg *sr = &req->sr_msg;
struct socket *sock;
struct io_buffer *kbuf;
unsigned flags;
@@ -5479,6 +5488,11 @@ static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
return io_setup_async_msg(req, kmsg);
if (ret == -ERESTARTSYS)
ret = -EINTR;
+ if (ret > 0 && flags & MSG_WAITALL) {
+ sr->done_io += ret;
+ req->flags |= REQ_F_PARTIAL_IO;
+ return io_setup_async_msg(req, kmsg);
+ }
req_set_fail(req);
} else if ((flags & MSG_WAITALL) && (kmsg->msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC))) {
req_set_fail(req);
@@ -5488,6 +5502,10 @@ static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
if (kmsg->free_iov)
kfree(kmsg->free_iov);
req->flags &= ~REQ_F_NEED_CLEANUP;
+ if (ret >= 0)
+ ret += sr->done_io;
+ else if (sr->done_io)
+ ret = sr->done_io;
__io_req_complete(req, issue_flags, ret, io_put_kbuf(req, issue_flags));
return 0;
}
@@ -5538,12 +5556,23 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
return -EAGAIN;
if (ret == -ERESTARTSYS)
ret = -EINTR;
+ if (ret > 0 && flags & MSG_WAITALL) {
+ sr->len -= ret;
+ sr->buf += ret;
+ sr->done_io += ret;
+ req->flags |= REQ_F_PARTIAL_IO;
+ return -EAGAIN;
+ }
req_set_fail(req);
} else if ((flags & MSG_WAITALL) && (msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC))) {
out_free:
req_set_fail(req);
}
+ if (ret >= 0)
+ ret += sr->done_io;
+ else if (sr->done_io)
+ ret = sr->done_io;
__io_req_complete(req, issue_flags, ret, io_put_kbuf(req, issue_flags));
return 0;
}
--
Jens Axboe
next prev parent reply other threads:[~2022-03-23 15:25 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <BYAPR15MB260078EC747F0F0183D1BB1BFA189@BYAPR15MB2600.namprd15.prod.outlook.com>
2022-03-23 12:19 ` io_uring_enter() with opcode IORING_OP_RECV ignores MSG_WAITALL in msg_flags Jens Axboe
2022-03-23 12:32 ` Constantine Gavrilov
2022-03-23 12:38 ` Jens Axboe
[not found] ` <DM6PR15MB2603162E692B5A68A4FD0A6FFA189@DM6PR15MB2603.namprd15.prod.outlook.com>
2022-03-23 13:12 ` Constantine Gavrilov
2022-03-23 15:25 ` Jens Axboe [this message]
2022-03-23 19:40 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox