public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: 姜智伟 <qq282012236@gmail.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
	 akpm@linux-foundation.org, peterx@redhat.com,
	asml.silence@gmail.com,  linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org,
	io-uring@vger.kernel.org
Subject: Re: [PATCH v2 1/2] io_uring: Add new functions to handle user fault scenarios
Date: Thu, 24 Apr 2025 22:08:50 +0800	[thread overview]
Message-ID: <CANHzP_uJft1FPJ0W++0Zp5rUjayaULEdpAQRn1VuuqDVq3DmJA@mail.gmail.com> (raw)
In-Reply-To: <cac3a5c9-e798-47f2-81ff-3c6003c6d8bb@kernel.dk>

Jens Axboe <axboe@kernel.dk> 于2025年4月24日周四 06:58写道:
>
> On 4/23/25 9:55 AM, Jens Axboe wrote:
> > Something like this, perhaps - it'll ensure that io-wq workers get a
> > chance to flush out pending work, which should prevent the looping. I've
> > attached a basic test case. It'll issue a write that will fault, and
> > then try and cancel that as a way to trigger the TIF_NOTIFY_SIGNAL based
> > looping.
>
> Something that may actually work - use TASK_UNINTERRUPTIBLE IFF
> signal_pending() is true AND the fault has already been tried once
> before. If that's the case, rather than just call schedule() with
> TASK_INTERRUPTIBLE, use TASK_UNINTERRUPTIBLE and schedule_timeout() with
> a suitable timeout length that prevents the annoying parts busy looping.
> I used HZ / 10.
>
> I don't see how to fix userfaultfd for this case, either using io_uring
> or normal write(2). Normal syscalls can pass back -ERESTARTSYS and get
> it retried, but there's no way to do that from inside fault handling. So
> I think we just have to be nicer about it.
>
> Andrew, as the userfaultfd maintainer, what do you think?
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index d80f94346199..1016268c7b51 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -334,15 +334,29 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
>         return ret;
>  }
>
> -static inline unsigned int userfaultfd_get_blocking_state(unsigned int flags)
> +struct userfault_wait {
> +       unsigned int task_state;
> +       bool timeout;
> +};
> +
> +static struct userfault_wait userfaultfd_get_blocking_state(unsigned int flags)
>  {
> +       /*
> +        * If the fault has already been tried AND there's a signal pending
> +        * for this task, use TASK_UNINTERRUPTIBLE with a small timeout.
> +        * This prevents busy looping where schedule() otherwise does nothing
> +        * for TASK_INTERRUPTIBLE when the task has a signal pending.
> +        */
> +       if ((flags & FAULT_FLAG_TRIED) && signal_pending(current))
> +               return (struct userfault_wait) { TASK_UNINTERRUPTIBLE, true };
> +
>         if (flags & FAULT_FLAG_INTERRUPTIBLE)
> -               return TASK_INTERRUPTIBLE;
> +               return (struct userfault_wait) { TASK_INTERRUPTIBLE, false };
>
>         if (flags & FAULT_FLAG_KILLABLE)
> -               return TASK_KILLABLE;
> +               return (struct userfault_wait) { TASK_KILLABLE, false };
>
> -       return TASK_UNINTERRUPTIBLE;
> +       return (struct userfault_wait) { TASK_UNINTERRUPTIBLE, false };
>  }
>
>  /*
> @@ -368,7 +382,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
>         struct userfaultfd_wait_queue uwq;
>         vm_fault_t ret = VM_FAULT_SIGBUS;
>         bool must_wait;
> -       unsigned int blocking_state;
> +       struct userfault_wait wait_mode;
>
>         /*
>          * We don't do userfault handling for the final child pid update
> @@ -466,7 +480,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
>         uwq.ctx = ctx;
>         uwq.waken = false;
>
> -       blocking_state = userfaultfd_get_blocking_state(vmf->flags);
> +       wait_mode = userfaultfd_get_blocking_state(vmf->flags);
>
>          /*
>           * Take the vma lock now, in order to safely call
> @@ -488,7 +502,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
>          * following the spin_unlock to happen before the list_add in
>          * __add_wait_queue.
>          */
> -       set_current_state(blocking_state);
> +       set_current_state(wait_mode.task_state);
>         spin_unlock_irq(&ctx->fault_pending_wqh.lock);
>
>         if (!is_vm_hugetlb_page(vma))
> @@ -501,7 +515,11 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
>
>         if (likely(must_wait && !READ_ONCE(ctx->released))) {
>                 wake_up_poll(&ctx->fd_wqh, EPOLLIN);
> -               schedule();
> +               /* See comment in userfaultfd_get_blocking_state() */
> +               if (!wait_mode.timeout)
> +                       schedule();
> +               else
> +                       schedule_timeout(HZ / 10);
>         }
>
>         __set_current_state(TASK_RUNNING);
>
> --
> Jens Axboe
I guess the previous io_work_fault patch might have already addressed the
issue sufficiently. The later patch that adds a timeout for userfaultfd might
not be necessary  wouldn’t returning after a timeout just cause the same
fault to repeat indefinitely again? Regardless of whether the thread is in
UN or IN state, the expected behavior should be to wait until the page is
filled or the uffd resource is released to be woken up,
which seems like the correct logic.

  reply	other threads:[~2025-04-24 14:09 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-22 16:29 [PATCH v2 0/2] Fix 100% CPU usage issue in IOU worker threads Zhiwei Jiang
2025-04-22 16:29 ` [PATCH v2 1/2] io_uring: Add new functions to handle user fault scenarios Zhiwei Jiang
2025-04-22 16:32   ` Jens Axboe
2025-04-22 17:04     ` 姜智伟
2025-04-22 17:33       ` Jens Axboe
2025-04-23  2:49         ` 姜智伟
2025-04-23  3:11           ` 姜智伟
2025-04-23  6:22             ` 姜智伟
2025-04-23 13:34           ` Jens Axboe
2025-04-23 14:29             ` 姜智伟
2025-04-23 15:10               ` Jens Axboe
2025-04-23 18:55                 ` Jens Axboe
2025-04-23 15:55             ` Jens Axboe
2025-04-23 16:07               ` 姜智伟
2025-04-23 16:17               ` Pavel Begunkov
2025-04-23 16:23                 ` Jens Axboe
2025-04-23 22:57               ` Jens Axboe
2025-04-24 14:08                 ` 姜智伟 [this message]
2025-04-24 14:13                   ` Jens Axboe
2025-04-24 14:45                     ` 姜智伟
2025-04-24 14:52                       ` Jens Axboe
2025-04-24 15:12                         ` 姜智伟
2025-04-24 15:21                           ` Jens Axboe
2025-04-24 15:51                             ` 姜智伟
2025-04-22 16:29 ` [PATCH v2 2/2] userfaultfd: Set the corresponding flag in IOU worker context Zhiwei Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANHzP_uJft1FPJ0W++0Zp5rUjayaULEdpAQRn1VuuqDVq3DmJA@mail.gmail.com \
    --to=qq282012236@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=io-uring@vger.kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox