public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Pavel Begunkov <[email protected]>,
	David Wei <[email protected]>,
	[email protected], [email protected]
Cc: Jakub Kicinski <[email protected]>, Paolo Abeni <[email protected]>,
	"David S. Miller" <[email protected]>,
	Eric Dumazet <[email protected]>,
	Jesper Dangaard Brouer <[email protected]>,
	David Ahern <[email protected]>,
	Mina Almasry <[email protected]>
Subject: Re: [RFC PATCH v3 15/20] io_uring: add io_recvzc request
Date: Wed, 20 Dec 2023 11:09:17 -0700	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 12/20/23 10:04 AM, Pavel Begunkov wrote:
> On 12/20/23 16:27, Jens Axboe wrote:
>> On 12/19/23 2:03 PM, David Wei wrote:
>>> diff --git a/io_uring/net.c b/io_uring/net.c
>>> index 454ba301ae6b..7a2aadf6962c 100644
>>> --- a/io_uring/net.c
>>> +++ b/io_uring/net.c
>>> @@ -637,7 +647,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret,
>>>       unsigned int cflags;
>>>         cflags = io_put_kbuf(req, issue_flags);
>>> -    if (msg->msg_inq && msg->msg_inq != -1)
>>> +    if (msg && msg->msg_inq && msg->msg_inq != -1)
>>>           cflags |= IORING_CQE_F_SOCK_NONEMPTY;
>>>         if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
>>> @@ -652,7 +662,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret,
>>>               io_recv_prep_retry(req);
>>>               /* Known not-empty or unknown state, retry */
>>>               if (cflags & IORING_CQE_F_SOCK_NONEMPTY ||
>>> -                msg->msg_inq == -1)
>>> +                (msg && msg->msg_inq == -1))
>>>                   return false;
>>>               if (issue_flags & IO_URING_F_MULTISHOT)
>>>                   *ret = IOU_ISSUE_SKIP_COMPLETE;
>>
>> These are a bit ugly, just pass in a dummy msg for this?
>>
>>> +int io_recvzc(struct io_kiocb *req, unsigned int issue_flags)
>>> +{
>>> +    struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc);
>>> +    struct socket *sock;
>>> +    unsigned flags;
>>> +    int ret, min_ret = 0;
>>> +    bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>>> +    struct io_zc_rx_ifq *ifq;
>>
>> Eg
>>     struct msghdr dummy_msg;
>>
>>     dummy_msg.msg_inq = -1;
>>
>> which will eat some stack, but probably not really an issue.
>>
>>
>>> +    if (issue_flags & IO_URING_F_UNLOCKED)
>>> +        return -EAGAIN;
>>
>> This seems odd, why? If we're called with IO_URING_F_UNLOCKED set, then
> 
> It's my addition, let me explain.
> 
> io_recvzc() -> io_zc_rx_recv() -> ... -> zc_rx_recv_frag()
> 
> This chain posts completions to a buffer completion queue, and
> we don't want extra locking to share it with io-wq threads. In
> some sense it's quite similar to the CQ locking, considering
> we restrict zc to DEFER_TASKRUN. And doesn't change anything
> anyway because multishot cannot post completions from io-wq
> and are executed from the poll callback in task work.
> 
>> it's from io-wq. And returning -EAGAIN there will not do anything to
> 
> It will. It's supposed to just requeue for polling (it's not
> IOPOLL to keep retrying -EAGAIN), just like multishots do.

It definitely needs a good comment, as it's highly non-obvious when
reading the code!

> Double checking the code, it can actually terminate the request,
> which doesn't make much difference for us because multishots
> should normally never end up in io-wq anyway, but I guess we
> can improve it a liitle bit.

Right, assumptions seems to be that -EAGAIN will lead to poll arm, which
seems a bit fragile.

> And it should also use IO_URING_F_IOWQ, forgot I split out
> it from *F_UNLOCK.

Yep, that'd be clearer.

>>> +static int zc_rx_recv_frag(struct io_zc_rx_ifq *ifq, const skb_frag_t *frag,
>>> +               int off, int len, unsigned sock_idx)
>>> +{
>>> +    off += skb_frag_off(frag);
>>> +
>>> +    if (likely(page_is_page_pool_iov(frag->bv_page))) {
>>> +        struct io_uring_rbuf_cqe *cqe;
>>> +        struct io_zc_rx_buf *buf;
>>> +        struct page_pool_iov *ppiov;
>>> +
>>> +        ppiov = page_to_page_pool_iov(frag->bv_page);
>>> +        if (ppiov->pp->p.memory_provider != PP_MP_IOU_ZCRX ||
>>> +            ppiov->pp->mp_priv != ifq)
>>> +            return -EFAULT;
>>> +
>>> +        cqe = io_zc_get_rbuf_cqe(ifq);
>>> +        if (!cqe)
>>> +            return -ENOBUFS;
>>> +
>>> +        buf = io_iov_to_buf(ppiov);
>>> +        io_zc_rx_get_buf_uref(buf);
>>> +
>>> +        cqe->region = 0;
>>> +        cqe->off = io_buf_pgid(ifq->pool, buf) * PAGE_SIZE + off;
>>> +        cqe->len = len;
>>> +        cqe->sock = sock_idx;
>>> +        cqe->flags = 0;
>>> +    } else {
>>> +        return -EOPNOTSUPP;
>>> +    }
>>> +
>>> +    return len;
>>> +}
>>
>> I think this would read a lot better as:
>>
>>     if (unlikely(!page_is_page_pool_iov(frag->bv_page)))
>>         return -EOPNOTSUPP;
> 
> That's a bit of oracle coding, this branch is implemented in
> a later patch.

Oracle coding?

Each patch stands separately, there's no reason not to make this one as
clean as it can be. And an error case with the main bits inline is a lot
nicer imho than two separate indented parts. For the latter addition
instead of the -EOPNOTSUPP, would probably be nice to have it in a
separate function. Probably ditto for the page pool case here now, would
make the later patch simpler too.

-- 
Jens Axboe


  reply	other threads:[~2023-12-20 18:09 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-19 21:03 [RFC PATCH v3 00/20] Zero copy Rx using io_uring David Wei
2023-12-19 21:03 ` [RFC PATCH v3 01/20] net: page_pool: add ppiov mangling helper David Wei
2023-12-19 23:22   ` Mina Almasry
2023-12-19 23:59     ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 02/20] tcp: don't allow non-devmem originated ppiov David Wei
2023-12-19 23:24   ` Mina Almasry
2023-12-20  1:29     ` Pavel Begunkov
2024-01-02 16:11       ` Mina Almasry
2023-12-19 21:03 ` [RFC PATCH v3 03/20] net: page pool: rework ppiov life cycle David Wei
2023-12-19 23:35   ` Mina Almasry
2023-12-20  0:49     ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 04/20] net: enable napi_pp_put_page for ppiov David Wei
2023-12-19 21:03 ` [RFC PATCH v3 05/20] net: page_pool: add ->scrub mem provider callback David Wei
2023-12-19 21:03 ` [RFC PATCH v3 06/20] io_uring: separate header for exported net bits David Wei
2023-12-20 16:01   ` Jens Axboe
2023-12-19 21:03 ` [RFC PATCH v3 07/20] io_uring: add interface queue David Wei
2023-12-20 16:13   ` Jens Axboe
2023-12-20 16:23     ` Pavel Begunkov
2023-12-21  1:44     ` David Wei
2023-12-21 17:57   ` Willem de Bruijn
2023-12-30 16:25     ` Pavel Begunkov
2023-12-31 22:25       ` Willem de Bruijn
2023-12-19 21:03 ` [RFC PATCH v3 08/20] io_uring: add mmap support for shared ifq ringbuffers David Wei
2023-12-20 16:13   ` Jens Axboe
2023-12-19 21:03 ` [RFC PATCH v3 09/20] netdev: add XDP_SETUP_ZC_RX command David Wei
2023-12-19 21:03 ` [RFC PATCH v3 10/20] io_uring: setup ZC for an Rx queue when registering an ifq David Wei
2023-12-20 16:06   ` Jens Axboe
2023-12-20 16:24     ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 11/20] io_uring/zcrx: implement socket registration David Wei
2023-12-19 21:03 ` [RFC PATCH v3 12/20] io_uring: add ZC buf and pool David Wei
2023-12-19 21:03 ` [RFC PATCH v3 13/20] io_uring: implement pp memory provider for zc rx David Wei
2023-12-19 23:44   ` Mina Almasry
2023-12-20  0:39     ` Pavel Begunkov
2023-12-21 19:36   ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 14/20] net: page pool: add io_uring memory provider David Wei
2023-12-19 23:39   ` Mina Almasry
2023-12-20  0:04     ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 15/20] io_uring: add io_recvzc request David Wei
2023-12-20 16:27   ` Jens Axboe
2023-12-20 17:04     ` Pavel Begunkov
2023-12-20 18:09       ` Jens Axboe [this message]
2023-12-21 18:59         ` Pavel Begunkov
2023-12-21 21:32           ` Jens Axboe
2023-12-30 21:15             ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 16/20] net: execute custom callback from napi David Wei
2023-12-19 21:03 ` [RFC PATCH v3 17/20] io_uring/zcrx: add copy fallback David Wei
2023-12-19 21:03 ` [RFC PATCH v3 18/20] veth: add support for io_uring zc rx David Wei
2023-12-19 21:03 ` [RFC PATCH v3 19/20] net: page pool: generalise ppiov dma address get David Wei
2023-12-21 19:51   ` Mina Almasry
2023-12-19 21:03 ` [RFC PATCH v3 20/20] bnxt: enable io_uring zc page pool David Wei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox