From: Minchan Kim <[email protected]>
To: Jens Axboe <[email protected]>
Cc: Jann Horn <[email protected]>, io-uring <[email protected]>,
Andrew Morton <[email protected]>,
LKML <[email protected]>,
linux-mm <[email protected]>,
Linux API <[email protected]>,
Oleksandr Natalenko <[email protected]>,
Suren Baghdasaryan <[email protected]>,
Tim Murray <[email protected]>,
Daniel Colascione <[email protected]>,
Sandeep Patil <[email protected]>,
Sonny Rao <[email protected]>,
Brian Geffon <[email protected]>, Michal Hocko <[email protected]>,
Johannes Weiner <[email protected]>,
Shakeel Butt <[email protected]>,
John Dias <[email protected]>,
Joel Fernandes <[email protected]>,
[email protected],
Alexander Duyck <[email protected]>
Subject: Re: [PATCH v5 1/7] mm: pass task and mm to do_madvise
Date: Fri, 14 Feb 2020 10:45:14 -0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On Fri, Feb 14, 2020 at 11:22:08AM -0700, Jens Axboe wrote:
> On 2/14/20 10:25 AM, Jann Horn wrote:
> > +Jens and io-uring list
> >
> > On Fri, Feb 14, 2020 at 6:06 PM Minchan Kim <[email protected]> wrote:
> >> In upcoming patches, do_madvise will be called from external process
> >> context so we shouldn't asssume "current" is always hinted process's
> >> task_struct.
> > [...]
> >> [1] http://lore.kernel.org/r/CAG48ez27=pwm5m_N_988xT1huO7g7h6arTQL44zev6TD-h-7Tg@mail.gmail.com
> > [...]
> >> diff --git a/fs/io_uring.c b/fs/io_uring.c
> > [...]
> >> @@ -2736,7 +2736,7 @@ static int io_madvise(struct io_kiocb *req, struct io_kiocb **nxt,
> >> if (force_nonblock)
> >> return -EAGAIN;
> >>
> >> - ret = do_madvise(ma->addr, ma->len, ma->advice);
> >> + ret = do_madvise(current, current->mm, ma->addr, ma->len, ma->advice);
> >> if (ret < 0)
> >> req_set_fail_links(req);
> >> io_cqring_add_event(req, ret);
> >
> > Jens, can you have a look at this change and the following patch
> > <https://lore.kernel.org/linux-mm/[email protected]/>
> > ("[PATCH v5 3/7] mm: check fatal signal pending of target process")?
> > Basically Minchan's patch tries to plumb through the identity of the
> > target task so that if that task gets killed in the middle of the
> > operation, the (potentially long-running and costly) madvise operation
> > can be cancelled. Just passing in "current" instead (which in this
> > case is the uring worker thread AFAIK) doesn't really break anything,
> > other than making the optimization not work, but I wonder whether this
> > couldn't be done more cleanly - maybe by passing in NULL to mean "we
> > don't know who the target task is", since I think we don't know that
> > here?
>
> Thanks for bringing this to my attention, patches that touch io_uring
> (or anything else) really should be CC'ed to the maintainer(s) of those
> areas...
Hi Jens, it was my mistake. Sorry for that.
>
> Yeah, the change above won't do the right thing for io_uring, in fact
> it'll always be the wrong task. So I'd second Jann's question, and ask
> if we really need the actual task, or if NULL could be used? For
> cancelation purposes, I'm guessing you want the task that's actually
> doing the operation, even if it's on behalf of someone else. That makes
> the interface a bit weird, as you'd assume the task/mm passed in would
> be related to the madvise itself, not just for cancelation.
>
> Would be nice with some clarification, so we can figure out an approach
> that would actually work.
MADV_(COLD|PAGEOUT) checks both caller and callee and the part aims for
callee(ie, target task). Thus, we could pass NULL for io_madvise if
it couldn't know who is target and let's have NULL check before the
fatal_signal_pending. I will put following checks in [3/7].
if (private->target_Task &&
fatal_signal_pending(private->target_task))
return -EINTR;
From d008a5a1049b03b3e0eeef7121faead2b6555f49 Mon Sep 17 00:00:00 2001
From: Minchan Kim <[email protected]>
Date: Fri, 14 Feb 2020 07:29:58 -0800
Subject: [PATCH] mm: pass task and mm to do_madvise
In upcoming patches, do_madvise will be called from external process
context so we shouldn't asssume "current" is always hinted process's
task_struct. Furthermore, we couldn't access mm_struct via task->mm
once it's verified by access_mm which will be introduced in next
patch[1]. And let's pass *current* and current->mm as arguments of
do_madvise so it shouldn't change existing behavior but prepare
next patch to make review easy.
Note: io_madvise pass NULL as target_tas argument of do_madvise
because it couldn't know who is target.
[1] http://lore.kernel.org/r/CAG48ez27=pwm5m_N_988xT1huO7g7h6arTQL44zev6TD-h-7Tg@mail.gmail.com
Cc: Jens Axboe <[email protected]>
Cc: Jann Horn <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
fs/io_uring.c | 2 +-
include/linux/mm.h | 3 ++-
mm/madvise.c | 34 +++++++++++++++++++---------------
3 files changed, 22 insertions(+), 17 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 63beda9bafc5..1c7e9cd6c8ce 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2736,7 +2736,7 @@ static int io_madvise(struct io_kiocb *req, struct io_kiocb **nxt,
if (force_nonblock)
return -EAGAIN;
- ret = do_madvise(ma->addr, ma->len, ma->advice);
+ ret = do_madvise(NULL, current->mm, ma->addr, ma->len, ma->advice);
if (ret < 0)
req_set_fail_links(req);
io_cqring_add_event(req, ret);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 52269e56c514..beb9259f9ed1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2323,7 +2323,8 @@ extern int __do_munmap(struct mm_struct *, unsigned long, size_t,
struct list_head *uf, bool downgrade);
extern int do_munmap(struct mm_struct *, unsigned long, size_t,
struct list_head *uf);
-extern int do_madvise(unsigned long start, size_t len_in, int behavior);
+extern int do_madvise(struct task_struct *task, struct mm_struct *mm,
+ unsigned long start, size_t len_in, int behavior);
static inline unsigned long
do_mmap_pgoff(struct file *file, unsigned long addr,
diff --git a/mm/madvise.c b/mm/madvise.c
index 43b47d3fae02..f75c86b6c463 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -254,6 +254,7 @@ static long madvise_willneed(struct vm_area_struct *vma,
struct vm_area_struct **prev,
unsigned long start, unsigned long end)
{
+ struct mm_struct *mm = vma->vm_mm;
struct file *file = vma->vm_file;
loff_t offset;
@@ -288,12 +289,12 @@ static long madvise_willneed(struct vm_area_struct *vma,
*/
*prev = NULL; /* tell sys_madvise we drop mmap_sem */
get_file(file);
- up_read(¤t->mm->mmap_sem);
+ up_read(&mm->mmap_sem);
offset = (loff_t)(start - vma->vm_start)
+ ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
vfs_fadvise(file, offset, end - start, POSIX_FADV_WILLNEED);
fput(file);
- down_read(¤t->mm->mmap_sem);
+ down_read(&mm->mmap_sem);
return 0;
}
@@ -676,7 +677,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
if (nr_swap) {
if (current->mm == mm)
sync_mm_rss(mm);
-
add_mm_counter(mm, MM_SWAPENTS, nr_swap);
}
arch_leave_lazy_mmu_mode();
@@ -756,6 +756,8 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
unsigned long start, unsigned long end,
int behavior)
{
+ struct mm_struct *mm = vma->vm_mm;
+
*prev = vma;
if (!can_madv_lru_vma(vma))
return -EINVAL;
@@ -763,8 +765,8 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
if (!userfaultfd_remove(vma, start, end)) {
*prev = NULL; /* mmap_sem has been dropped, prev is stale */
- down_read(¤t->mm->mmap_sem);
- vma = find_vma(current->mm, start);
+ down_read(&mm->mmap_sem);
+ vma = find_vma(mm, start);
if (!vma)
return -ENOMEM;
if (start < vma->vm_start) {
@@ -818,6 +820,7 @@ static long madvise_remove(struct vm_area_struct *vma,
loff_t offset;
int error;
struct file *f;
+ struct mm_struct *mm = vma->vm_mm;
*prev = NULL; /* tell sys_madvise we drop mmap_sem */
@@ -845,13 +848,13 @@ static long madvise_remove(struct vm_area_struct *vma,
get_file(f);
if (userfaultfd_remove(vma, start, end)) {
/* mmap_sem was not released by userfaultfd_remove() */
- up_read(¤t->mm->mmap_sem);
+ up_read(&mm->mmap_sem);
}
error = vfs_fallocate(f,
FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
offset, end - start);
fput(f);
- down_read(¤t->mm->mmap_sem);
+ down_read(&mm->mmap_sem);
return error;
}
@@ -1044,7 +1047,8 @@ madvise_behavior_valid(int behavior)
* -EBADF - map exists, but area maps something that isn't a file.
* -EAGAIN - a kernel resource was temporarily unavailable.
*/
-int do_madvise(unsigned long start, size_t len_in, int behavior)
+int do_madvise(struct task_struct *target_task, struct mm_struct *mm,
+ unsigned long start, size_t len_in, int behavior)
{
unsigned long end, tmp;
struct vm_area_struct *vma, *prev;
@@ -1082,10 +1086,10 @@ int do_madvise(unsigned long start, size_t len_in, int behavior)
write = madvise_need_mmap_write(behavior);
if (write) {
- if (down_write_killable(¤t->mm->mmap_sem))
+ if (down_write_killable(&mm->mmap_sem))
return -EINTR;
} else {
- down_read(¤t->mm->mmap_sem);
+ down_read(&mm->mmap_sem);
}
/*
@@ -1093,7 +1097,7 @@ int do_madvise(unsigned long start, size_t len_in, int behavior)
* ranges, just ignore them, but return -ENOMEM at the end.
* - different from the way of handling in mlock etc.
*/
- vma = find_vma_prev(current->mm, start, &prev);
+ vma = find_vma_prev(mm, start, &prev);
if (vma && start > vma->vm_start)
prev = vma;
@@ -1130,19 +1134,19 @@ int do_madvise(unsigned long start, size_t len_in, int behavior)
if (prev)
vma = prev->vm_next;
else /* madvise_remove dropped mmap_sem */
- vma = find_vma(current->mm, start);
+ vma = find_vma(mm, start);
}
out:
blk_finish_plug(&plug);
if (write)
- up_write(¤t->mm->mmap_sem);
+ up_write(&mm->mmap_sem);
else
- up_read(¤t->mm->mmap_sem);
+ up_read(&mm->mmap_sem);
return error;
}
SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
{
- return do_madvise(start, len_in, behavior);
+ return do_madvise(current, current->mm, start, len_in, behavior);
}
--
2.25.0.265.gbab2e86ba0-goog
next prev parent reply other threads:[~2020-02-14 18:45 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <[email protected]>
[not found] ` <[email protected]>
2020-02-14 17:25 ` [PATCH v5 1/7] mm: pass task and mm to do_madvise Jann Horn
2020-02-14 18:22 ` Jens Axboe
2020-02-14 18:45 ` Minchan Kim [this message]
2020-02-14 19:09 ` Jens Axboe
2020-02-14 19:31 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox