public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Fiona Ebner <[email protected]>,
	Shyam Prasad N <[email protected]>,
	Enzo Matsumiya <[email protected]>
Cc: [email protected], CIFS <[email protected]>,
	Thomas Lamprecht <[email protected]>
Subject: Re: Problematic interaction of io_uring and CIFS
Date: Tue, 4 Oct 2022 08:02:54 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 10/4/22 2:59 AM, Fiona Ebner wrote:
> Am 26.08.22 um 10:21 schrieb Fiona Ebner:
>> Am 11.07.22 um 15:40 schrieb Fabian Ebner:
>>> Am 09.07.22 um 05:39 schrieb Shyam Prasad N:
>>>> On Sat, Jul 9, 2022 at 9:00 AM Shyam Prasad N <[email protected]> wrote:
>>>>>
>>>>> On Fri, Jul 8, 2022 at 11:22 PM Enzo Matsumiya <[email protected]> wrote:
>>>>>>
>>>>>> On 07/08, Fabian Ebner wrote:
>>>>>>> (Re-sending without the log from the older kernel, because the mail hit
>>>>>>> the 100000 char limit with that)
>>>>>>>
>>>>>>> Hi,
>>>>>>> it seems that in kernels >= 5.15, io_uring and CIFS don't interact
>>>>>>> nicely sometimes, leading to IO errors. Unfortunately, my reproducer is
>>>>>>> a QEMU VM with a disk on CIFS (original report by one of our users [0]),
>>>>>>> but I can try to cook up something simpler if you want.
>>>>>>>
>>>>>>> Bisecting got me to 8ef12efe26c8 ("io_uring: run regular file
>>>>>>> completions from task_work") being the first bad commit.
>>>>>>>
>>
>> I finally got around to taking another look at this issue (still present
>> in 5.19.3) and I think I've finally figured out the root cause:
>>
>> After commit 8ef12efe26c8, for my reproducer, the write completion is
>> added to task_work with notify_method being TWA_SIGNAL and thus
>> TIF_NOTIFY_SIGNAL is set for the task.
>>
>> After that, if we end up in sk_stream_wait_memory() via sock_sendmsg(),
>> signal_pending(current) will evaluate to true and thus -EINTR is
>> returned all the way up to sock_sendmsg() in smb_send_kvec().
>>
>> Related: in __smb_send_rqst() there too is a signal_pending(current)
>> check leading to the -ERESTARTSYS return value.
>>
>> To verify that this is the cause, I wasn't able to trigger the issue
>> anymore with this hack applied (i.e. excluding the TIF_NOTIFY_SIGNAL check):
>>
>>> diff --git a/net/core/stream.c b/net/core/stream.c
>>> index 06b36c730ce8..58e3825930bb 100644
>>> --- a/net/core/stream.c
>>> +++ b/net/core/stream.c
>>> @@ -134,7 +134,7 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
>>>                         goto do_error;
>>>                 if (!*timeo_p)
>>>                         goto do_eagain;
>>> -               if (signal_pending(current))
>>> +               if (task_sigpending(current))
>>>                         goto do_interrupted;
>>>                 sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
>>>                 if (sk_stream_memory_free(sk) && !vm_wait)
>>
>>
>> In __cifs_writev() we have
>>
>>>     /*
>>>      * If at least one write was successfully sent, then discard any rc
>>>      * value from the later writes. If the other write succeeds, then
>>>      * we'll end up returning whatever was written. If it fails, then
>>>      * we'll get a new rc value from that.
>>>      */
>>
>> so it can happen that collect_uncached_write_data() will (correctly)
>> report a short write when calling ctx->iocb->ki_complete().
>>
>> But QEMU's io_uring backend treats a short write as an -ENOSPC error,
>> which also is a bug? Or does the kernel give any guarantees in that
>> direction?
>>
>> Still, it doesn't seem ideal that the "interrupt" happens and in fact
>> __smb_send_rqst() tries to avoid it, but fails to do so, because of the
>> unexpected TIF_NOTIFY_SIGNAL:
>>>     /*
>>>      * We should not allow signals to interrupt the network send because
>>>      * any partial send will cause session reconnects thus increasing
>>>      * latency of system calls and overload a server with unnecessary
>>>      * requests.
>>>      */
>>>
>>>     sigfillset(&mask);
>>>     sigprocmask(SIG_BLOCK, &mask, &oldmask);
>>
>> Do you have any suggestions for how to proceed?
>>
> 
> Ping. The issue is still present in Linux 6.0. Does it make sense to
> also temporarily unset the task's TIF_NOTIFY_SIGNAL here or is that a
> bad idea?

You could try setting up with ring with IORING_SETUP_COOP_TASKRUN,
that'll avoid the TIF_NOTIFY_SIGNAL bits.

-- 
Jens Axboe



  reply	other threads:[~2022-10-04 14:03 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-08 12:05 Problematic interaction of io_uring and CIFS Fabian Ebner
2022-07-08 17:48 ` Enzo Matsumiya
2022-07-09  3:30   ` Shyam Prasad N
2022-07-09  3:39     ` Shyam Prasad N
2022-07-11 13:40       ` Fabian Ebner
2022-08-26  8:21         ` Fiona Ebner
2022-10-04  8:59           ` Fiona Ebner
2022-10-04 14:02             ` Jens Axboe [this message]
2022-10-05  8:20               ` Fiona Ebner
2022-07-08 18:12 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox