public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Stefan Metzmacher <[email protected]>,
	[email protected], [email protected]
Subject: Re: [PATCH for-6.0 0/5] IORING_OP_SEND_ZC improvements
Date: Wed, 21 Sep 2022 13:58:43 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 9/21/22 13:18, Stefan Metzmacher wrote:
> 
> Hi Pavel,
> 
>>>> If network sends anything it should return how many bytes
>>>> it queued for sending, otherwise there would be duplicated
>>>> packets / data on the other endpoint in userspace, and I
>>>> don't think any driver / lower layer would keep memory
>>>> after returning an error.
>>>
>>> As I'm also working on a socket driver for smbdirect,
>>> I already thought about how I could hook into
>>> IORING_OP_SEND[MSG]_ZC, and for sendmsg I'd have
>>> a loop sending individual fragments, which have a reference,
>>> but if I find a connection drop after the first one, I'd
>>> return ECONNRESET or EPIPE in order to get faster recovery
>>> instead of announcing a short write to the caller.
>>
>> I doesn't sound right for me, but neither I know samba to
>> really have an opinion. In any case, I see how it may be
>> more robust if we always try to push a notification cqe.
>> Will you send a patch?
> 
> You mean the IORING_OP_SEND_ZC should always
> issue a NOTIF cqe, one it passed the io_sendzc_prep stage?

Currently we add F_MORE iff cqe->res >= 0, but I want to
decouple them so users don't try to infer one from another.

In theory, it's already a subset of this, but it sounds
like a good idea to always emit notifications (when we can)
to stop users making assumptions about it. And should also
be cleaner.

>>> If we would take my 5/5 we could also have a different
>>> strategy to check decide if MORE/NOTIF is needed.
>>> If notif->cqe.res is still 0 and io_notif_flush drops
>>> the last reference we could go without MORE/NOTIF at all.
>>> In all other cases we'd either set MORE/NOTIF at the end
>>> of io_sendzc of in the fail hook.
>>
>> I had a similar optimisation, i.e. when io_notif_flush() in
>> the submission path is dropping the last ref, but killed it
>> as it was completely useless, I haven't hit this path even
>> once even with UDP, not to mention TCP.
> 
> If I remember correctly I hit it all the time on loopback,
> but I'd have to recheck.

Yeah, I meant real network in particular. I've seen it with
virtual interfaces, but for instance loopback is not interesting
as it doesn't support zerocopy in the first place. You was
probably testing with a modified skb_orphan_frags_rx().

>>>> In any case, I was looking on a bit different problem, but
>>>> it should look much cleaner using the same approach, see
>>>> branch [1], and patch [3] for sendzc in particular.
>>>>
>>>> [1] https://github.com/isilence/linux.git partial-fail
>>>> [2] https://github.com/isilence/linux/tree/io_uring/partial-fail
>>>> [3] https://github.com/isilence/linux/commit/acb4f9bf869e1c2542849e11d992a63d95f2b894
>>>
>>>      const struct io_op_def *def = &io_op_defs[req->opcode];
>>>
>>>      req_set_fail(req);
>>>      io_req_set_res(req, res, io_put_kbuf(req, IO_URING_F_UNLOCKED));
>>>      if (def->fail)
>>>          def->fail(req);
>>>      io_req_complete_post(req);
>>>
>>> Will loose req->cqe.flags, but the fail hook in general looks like a good idea.
>>
>> I just don't like those sporadic changes all across core io_uring
>> code also adding some overhead.
>>
>>> And don't we care about the other failure cases where req->cqe.flags gets overwritten?
>>
>> We don't usually carry them around ->issue handler boundaries,
>> e.g. directly do io_post_aux_cqe(res, IORING_CQE_F_MORE);
>>
>> IORING_CQE_F_BUFFER is a bit more trickier, but there is
>> special handling for this one and it wouldn't fit "set cflags
>> in advance" logic anyway.
>>
>> iow, ->fail callback sounds good enough for now, we'll change
>> it later if needed.
> 
> The fail hook should re-add the MORE flag?

Never set CQE_SKIP for notifications and add MORE flag in the
fail hook, sounds good.


> So I'll try to do the following changes:
> 
> 1. take your ->fail() patches
> 
> 2. once io_sendzc_prep() is over always trigger MORE/NOFIF cqes
>     (But the documentation should still make it clear that
>      userspace have to cope with just a single cqe (without MORE)
>      for both successs and failure, so that we can improve things later)

I've sent a liburing man patch, should be good enough.

> 3. Can I change the cqe.res of the NOTIF cqe to be 0xffffffff ?
>     That would indicate to userspace that we don't give any information
>     if zero copy was actually used or not. This would present someone
>     from relying on cqe.res == 0 and we can improve it by providing
>     more useful values in future.

I don't get the difference, someone just as easily may rely on
0xf..ff. What does it gives us? 0 usually suits better as default.

> Are you ok with that plan for 6.0?

It's late 1-2 are better to be 6.1 + stable. It doesn't break uapi,
so it's fine. It's not even strictly necessary to back port it
(but still a good idea to do it).

-- 
Pavel Begunkov

  reply	other threads:[~2022-09-21 13:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-16 21:36 [PATCH for-6.0 0/5] IORING_OP_SEND_ZC improvements Stefan Metzmacher
2022-09-16 21:36 ` [PATCH 1/5] io_uring/opdef: rename SENDZC_NOTIF to SEND_ZC Stefan Metzmacher
2022-09-17  9:17   ` Pavel Begunkov
2022-09-16 21:36 ` [PATCH 2/5] io_uring/core: move io_cqe->fd over from io_cqe->flags to io_cqe->res Stefan Metzmacher
2022-09-16 21:36 ` [PATCH 3/5] io_uring/core: keep req->cqe.flags on generic errors Stefan Metzmacher
2022-09-16 21:36 ` [PATCH 4/5] io_uring/net: let io_sendzc set IORING_CQE_F_MORE before sock_sendmsg() Stefan Metzmacher
2022-09-16 21:36 ` [PATCH 5/5] io_uring/notif: let userspace know how effective the zero copy usage was Stefan Metzmacher
2022-09-17  9:22   ` Pavel Begunkov
2022-09-17 10:24     ` Stefan Metzmacher
2022-09-21 12:04       ` Pavel Begunkov
2022-09-21 12:33         ` Stefan Metzmacher
2022-09-17  9:16 ` [PATCH for-6.0 0/5] IORING_OP_SEND_ZC improvements Pavel Begunkov
2022-09-17 10:44   ` Stefan Metzmacher
2022-09-21 11:39     ` Pavel Begunkov
2022-09-21 12:18       ` Stefan Metzmacher
2022-09-21 12:58         ` Pavel Begunkov [this message]
2022-09-18 22:49 ` (subset) " Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox