* [PATCH 01/19] mm: Introduce vm_account
[not found] <cover.c238416f0e82377b449846dbb2459ae9d7030c8e.1675669136.git-series.apopple@nvidia.com>
@ 2023-02-06 7:47 ` Alistair Popple
2023-02-06 7:47 ` [PATCH 09/19] io_uring: convert to use vm_account Alistair Popple
1 sibling, 0 replies; 8+ messages in thread
From: Alistair Popple @ 2023-02-06 7:47 UTC (permalink / raw)
To: linux-mm, cgroups
Cc: linux-kernel, jgg, jhubbard, tjmercier, hannes, surenb, mkoutny,
daniel, Daniel P . Berrange, Alex Williamson, Alistair Popple,
linuxppc-dev, linux-fpga, linux-rdma, virtualization, kvm, netdev,
io-uring, bpf, rds-devel, linux-kselftest
Kernel drivers that pin pages should account these pages against
either user->locked_vm and/or mm->pinned_vm and fail the pinning if
RLIMIT_MEMLOCK is exceeded and CAP_IPC_LOCK isn't held.
Currently drivers open-code this accounting and use various methods to
update the atomic variables and check against the limits leading to
various bugs and inconsistencies. To fix this introduce a standard
interface for charging pinned and locked memory. As this involves
taking references on kernel objects such as mm_struct or user_struct
we introduce a new vm_account struct to hold these references. Several
helper functions are then introduced to grab references and check
limits.
As the way these limits are charged and enforced is visible to
userspace we need to be careful not to break existing applications by
charging to different counters. As a result the vm_account functions
support accounting to different counters as required.
A future change will extend this to also account against a cgroup for
pinned pages.
Signed-off-by: Alistair Popple <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
include/linux/vm_account.h | 56 +++++++++++++++++-
mm/util.c | 127 ++++++++++++++++++++++++++++++++++++++-
2 files changed, 183 insertions(+)
create mode 100644 include/linux/vm_account.h
diff --git a/include/linux/vm_account.h b/include/linux/vm_account.h
new file mode 100644
index 0000000..b4b2e90
--- /dev/null
+++ b/include/linux/vm_account.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_VM_ACCOUNT_H
+#define _LINUX_VM_ACCOUNT_H
+
+/**
+ * enum vm_account_flags - Determine how pinned/locked memory is accounted.
+ * @VM_ACCOUNT_TASK: Account pinned memory to mm->pinned_vm.
+ * @VM_ACCOUNT_BYPASS: Don't enforce rlimit on any charges.
+ * @VM_ACCOUNT_USER: Account locked memory to user->locked_vm.
+ *
+ * Determines which statistic pinned/locked memory is accounted
+ * against. All limits will be enforced against RLIMIT_MEMLOCK and the
+ * pins cgroup if CONFIG_CGROUP_PINS is enabled.
+ *
+ * New drivers should use VM_ACCOUNT_USER. VM_ACCOUNT_TASK is used by
+ * pre-existing drivers to maintain existing accounting against
+ * mm->pinned_mm rather than user->locked_mm.
+ *
+ * VM_ACCOUNT_BYPASS may also be specified to bypass rlimit
+ * checks. Typically this is used to cache CAP_IPC_LOCK from when a
+ * driver is first initialised. Note that this does not bypass cgroup
+ * limit checks.
+ */
+enum vm_account_flags {
+ VM_ACCOUNT_USER = 0,
+ VM_ACCOUNT_BYPASS = 1,
+ VM_ACCOUNT_TASK = 1,
+};
+
+struct vm_account {
+ struct task_struct *task;
+ struct mm_struct *mm;
+ struct user_struct *user;
+ enum vm_account_flags flags;
+};
+
+void vm_account_init(struct vm_account *vm_account, struct task_struct *task,
+ struct user_struct *user, enum vm_account_flags flags);
+
+/**
+ * vm_account_init_current - Initialise a new struct vm_account.
+ * @vm_account: pointer to uninitialised vm_account.
+ *
+ * Helper to initialise a vm_account for the common case of charging
+ * with VM_ACCOUNT_TASK against current.
+ */
+static inline void vm_account_init_current(struct vm_account *vm_account)
+{
+ vm_account_init(vm_account, current, NULL, VM_ACCOUNT_TASK);
+}
+
+void vm_account_release(struct vm_account *vm_account);
+int vm_account_pinned(struct vm_account *vm_account, unsigned long npages);
+void vm_unaccount_pinned(struct vm_account *vm_account, unsigned long npages);
+
+#endif /* _LINUX_VM_ACCOUNT_H */
diff --git a/mm/util.c b/mm/util.c
index b56c92f..d8c19f8 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -23,6 +23,7 @@
#include <linux/processor.h>
#include <linux/sizes.h>
#include <linux/compat.h>
+#include <linux/vm_account.h>
#include <linux/uaccess.h>
@@ -431,6 +432,132 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
#endif
/**
+ * vm_account_init - Initialise a new struct vm_account.
+ * @vm_account: pointer to uninitialised vm_account.
+ * @task: task to charge against.
+ * @user: user to charge against. Must be non-NULL for VM_ACCOUNT_USER.
+ * @flags: flags to use when charging to vm_account.
+ *
+ * Initialise a new uninitialised struct vm_account. Takes references
+ * on the task/mm/user/cgroup as required although callers must ensure
+ * any references passed in remain valid for the duration of this
+ * call.
+ */
+void vm_account_init(struct vm_account *vm_account, struct task_struct *task,
+ struct user_struct *user, enum vm_account_flags flags)
+{
+ vm_account->task = get_task_struct(task);
+
+ if (flags & VM_ACCOUNT_USER)
+ vm_account->user = get_uid(user);
+
+ mmgrab(task->mm);
+ vm_account->mm = task->mm;
+ vm_account->flags = flags;
+}
+EXPORT_SYMBOL_GPL(vm_account_init);
+
+/**
+ * vm_account_release - Initialise a new struct vm_account.
+ * @vm_account: pointer to initialised vm_account.
+ *
+ * Drop any object references obtained by vm_account_init(). The
+ * vm_account must not be used after calling this unless reinitialised
+ * with vm_account_init().
+ */
+void vm_account_release(struct vm_account *vm_account)
+{
+ put_task_struct(vm_account->task);
+ if (vm_account->flags & VM_ACCOUNT_USER)
+ free_uid(vm_account->user);
+
+ mmdrop(vm_account->mm);
+}
+EXPORT_SYMBOL_GPL(vm_account_release);
+
+/*
+ * Charge pages with an atomic compare and swap. Returns -ENOMEM on
+ * failure, 1 on success and 0 for retry.
+ */
+static int vm_account_cmpxchg(struct vm_account *vm_account,
+ unsigned long npages, unsigned long lock_limit)
+{
+ u64 cur_pages, new_pages;
+
+ if (vm_account->flags & VM_ACCOUNT_USER)
+ cur_pages = atomic_long_read(&vm_account->user->locked_vm);
+ else
+ cur_pages = atomic64_read(&vm_account->mm->pinned_vm);
+
+ new_pages = cur_pages + npages;
+ if (lock_limit != RLIM_INFINITY && new_pages > lock_limit)
+ return -ENOMEM;
+
+ if (vm_account->flags & VM_ACCOUNT_USER) {
+ return atomic_long_cmpxchg(&vm_account->user->locked_vm,
+ cur_pages, new_pages) == cur_pages;
+ } else {
+ return atomic64_cmpxchg(&vm_account->mm->pinned_vm,
+ cur_pages, new_pages) == cur_pages;
+ }
+}
+
+/**
+ * vm_account_pinned - Charge pinned or locked memory to the vm_account.
+ * @vm_account: pointer to an initialised vm_account.
+ * @npages: number of pages to charge.
+ *
+ * Return: 0 on success, -ENOMEM if a limit would be exceeded.
+ *
+ * Note: All pages must be explicitly uncharged with
+ * vm_unaccount_pinned() prior to releasing the vm_account with
+ * vm_account_release().
+ */
+int vm_account_pinned(struct vm_account *vm_account, unsigned long npages)
+{
+ unsigned long lock_limit = RLIM_INFINITY;
+ int ret;
+
+ if (!(vm_account->flags & VM_ACCOUNT_BYPASS) && !capable(CAP_IPC_LOCK))
+ lock_limit = task_rlimit(vm_account->task,
+ RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+
+ while (true) {
+ ret = vm_account_cmpxchg(vm_account, npages, lock_limit);
+ if (ret > 0)
+ break;
+ else if (ret < 0)
+ return ret;
+ }
+
+ /*
+ * Always add pinned pages to mm->pinned_vm even when we're
+ * not enforcing the limit against that.
+ */
+ if (vm_account->flags & VM_ACCOUNT_USER)
+ atomic64_add(npages, &vm_account->mm->pinned_vm);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vm_account_pinned);
+
+/**
+ * vm_unaccount_pinned - Uncharge pinned or locked memory to the vm_account.
+ * @vm_account: pointer to an initialised vm_account.
+ * @npages: number of pages to uncharge.
+ */
+void vm_unaccount_pinned(struct vm_account *vm_account, unsigned long npages)
+{
+ if (vm_account->flags & VM_ACCOUNT_USER) {
+ atomic_long_sub(npages, &vm_account->user->locked_vm);
+ atomic64_sub(npages, &vm_account->mm->pinned_vm);
+ } else {
+ atomic64_sub(npages, &vm_account->mm->pinned_vm);
+ }
+}
+EXPORT_SYMBOL_GPL(vm_unaccount_pinned);
+
+/**
* __account_locked_vm - account locked pages to an mm's locked_vm
* @mm: mm to account against
* @pages: number of pages to account
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 09/19] io_uring: convert to use vm_account
[not found] <cover.c238416f0e82377b449846dbb2459ae9d7030c8e.1675669136.git-series.apopple@nvidia.com>
2023-02-06 7:47 ` [PATCH 01/19] mm: Introduce vm_account Alistair Popple
@ 2023-02-06 7:47 ` Alistair Popple
2023-02-06 15:29 ` Jens Axboe
1 sibling, 1 reply; 8+ messages in thread
From: Alistair Popple @ 2023-02-06 7:47 UTC (permalink / raw)
To: linux-mm, cgroups
Cc: linux-kernel, jgg, jhubbard, tjmercier, hannes, surenb, mkoutny,
daniel, Daniel P . Berrange, Alex Williamson, Alistair Popple,
Jens Axboe, Pavel Begunkov, io-uring
Convert io_uring to use vm_account instead of directly charging pages
against the user/mm. Rather than charge pages to both user->locked_vm
and mm->pinned_vm this will only charge pages to user->locked_vm.
Signed-off-by: Alistair Popple <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Pavel Begunkov <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
include/linux/io_uring_types.h | 4 ++--
io_uring/io_uring.c | 20 +++---------------
io_uring/notif.c | 4 ++--
io_uring/notif.h | 10 +++------
io_uring/rsrc.c | 38 +++--------------------------------
io_uring/rsrc.h | 9 +--------
6 files changed, 17 insertions(+), 68 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 128a67a..45ac75d 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -5,6 +5,7 @@
#include <linux/task_work.h>
#include <linux/bitmap.h>
#include <linux/llist.h>
+#include <linux/vm_account.h>
#include <uapi/linux/io_uring.h>
struct io_wq_work_node {
@@ -343,8 +344,7 @@ struct io_ring_ctx {
struct io_wq_hash *hash_map;
/* Only used for accounting purposes */
- struct user_struct *user;
- struct mm_struct *mm_account;
+ struct vm_account vm_account;
/* ctx exit and cancelation */
struct llist_head fallback_llist;
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 0a4efad..912da4f 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2744,15 +2744,11 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
#endif
WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list));
- if (ctx->mm_account) {
- mmdrop(ctx->mm_account);
- ctx->mm_account = NULL;
- }
+ vm_account_release(&ctx->vm_account);
io_mem_free(ctx->rings);
io_mem_free(ctx->sq_sqes);
percpu_ref_exit(&ctx->refs);
- free_uid(ctx->user);
io_req_caches_free(ctx);
if (ctx->hash_map)
io_wq_put_hash(ctx->hash_map);
@@ -3585,8 +3581,9 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
ctx->syscall_iopoll = 1;
ctx->compat = in_compat_syscall();
- if (!capable(CAP_IPC_LOCK))
- ctx->user = get_uid(current_user());
+ vm_account_init(&ctx->vm_account, current, current_user(),
+ VM_ACCOUNT_USER |
+ (capable(CAP_IPC_LOCK) ? VM_ACCOUNT_BYPASS : 0));
/*
* For SQPOLL, we just need a wakeup, always. For !SQPOLL, if
@@ -3619,15 +3616,6 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
goto err;
}
- /*
- * This is just grabbed for accounting purposes. When a process exits,
- * the mm is exited and dropped before the files, hence we need to hang
- * on to this mm purely for the purposes of being able to unaccount
- * memory (locked/pinned vm). It's not used for anything else.
- */
- mmgrab(current->mm);
- ctx->mm_account = current->mm;
-
ret = io_allocate_scq_urings(ctx, p);
if (ret)
goto err;
diff --git a/io_uring/notif.c b/io_uring/notif.c
index c4bb793..0f589fa 100644
--- a/io_uring/notif.c
+++ b/io_uring/notif.c
@@ -17,8 +17,8 @@ static void io_notif_complete_tw_ext(struct io_kiocb *notif, bool *locked)
if (nd->zc_report && (nd->zc_copied || !nd->zc_used))
notif->cqe.res |= IORING_NOTIF_USAGE_ZC_COPIED;
- if (nd->account_pages && ctx->user) {
- __io_unaccount_mem(ctx->user, nd->account_pages);
+ if (nd->account_pages) {
+ vm_unaccount_pinned(&ctx->vm_account, nd->account_pages);
nd->account_pages = 0;
}
io_req_task_complete(notif, locked);
diff --git a/io_uring/notif.h b/io_uring/notif.h
index c88c800..e2cb44a 100644
--- a/io_uring/notif.h
+++ b/io_uring/notif.h
@@ -43,11 +43,9 @@ static inline int io_notif_account_mem(struct io_kiocb *notif, unsigned len)
unsigned nr_pages = (len >> PAGE_SHIFT) + 2;
int ret;
- if (ctx->user) {
- ret = __io_account_mem(ctx->user, nr_pages);
- if (ret)
- return ret;
- nd->account_pages += nr_pages;
- }
+ ret = __io_account_mem(&ctx->vm_account, nr_pages);
+ if (ret)
+ return ret;
+ nd->account_pages += nr_pages;
return 0;
}
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 18de10c..aa44528 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -42,49 +42,19 @@ void io_rsrc_refs_drop(struct io_ring_ctx *ctx)
}
}
-int __io_account_mem(struct user_struct *user, unsigned long nr_pages)
+int __io_account_mem(struct vm_account *vm_account, unsigned long nr_pages)
{
- unsigned long page_limit, cur_pages, new_pages;
-
- if (!nr_pages)
- return 0;
-
- /* Don't allow more pages than we can safely lock */
- page_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-
- cur_pages = atomic_long_read(&user->locked_vm);
- do {
- new_pages = cur_pages + nr_pages;
- if (new_pages > page_limit)
- return -ENOMEM;
- } while (!atomic_long_try_cmpxchg(&user->locked_vm,
- &cur_pages, new_pages));
- return 0;
+ return vm_account_pinned(vm_account, nr_pages);
}
static void io_unaccount_mem(struct io_ring_ctx *ctx, unsigned long nr_pages)
{
- if (ctx->user)
- __io_unaccount_mem(ctx->user, nr_pages);
-
- if (ctx->mm_account)
- atomic64_sub(nr_pages, &ctx->mm_account->pinned_vm);
+ vm_unaccount_pinned(&ctx->vm_account, nr_pages);
}
static int io_account_mem(struct io_ring_ctx *ctx, unsigned long nr_pages)
{
- int ret;
-
- if (ctx->user) {
- ret = __io_account_mem(ctx->user, nr_pages);
- if (ret)
- return ret;
- }
-
- if (ctx->mm_account)
- atomic64_add(nr_pages, &ctx->mm_account->pinned_vm);
-
- return 0;
+ return vm_account_pinned(&ctx->vm_account, nr_pages);
}
static int io_copy_iov(struct io_ring_ctx *ctx, struct iovec *dst,
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index 2b87436..d8833d0 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -167,12 +167,5 @@ static inline u64 *io_get_tag_slot(struct io_rsrc_data *data, unsigned int idx)
int io_files_update(struct io_kiocb *req, unsigned int issue_flags);
int io_files_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
-int __io_account_mem(struct user_struct *user, unsigned long nr_pages);
-
-static inline void __io_unaccount_mem(struct user_struct *user,
- unsigned long nr_pages)
-{
- atomic_long_sub(nr_pages, &user->locked_vm);
-}
-
+int __io_account_mem(struct vm_account *vm_account, unsigned long nr_pages);
#endif
--
git-series 0.9.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 09/19] io_uring: convert to use vm_account
2023-02-06 7:47 ` [PATCH 09/19] io_uring: convert to use vm_account Alistair Popple
@ 2023-02-06 15:29 ` Jens Axboe
2023-02-07 1:03 ` Alistair Popple
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2023-02-06 15:29 UTC (permalink / raw)
To: Alistair Popple, linux-mm, cgroups
Cc: linux-kernel, jgg, jhubbard, tjmercier, hannes, surenb, mkoutny,
daniel, Daniel P . Berrange, Alex Williamson, Pavel Begunkov,
io-uring
On 2/6/23 12:47 AM, Alistair Popple wrote:
> Convert io_uring to use vm_account instead of directly charging pages
> against the user/mm. Rather than charge pages to both user->locked_vm
> and mm->pinned_vm this will only charge pages to user->locked_vm.
Not sure how we're supposed to review this, when you just send us 9/19
and vm_account_release() is supposedly an earlier patch in this series.
Either CC the whole series, or at least the cover letter, core parts,
and the per-subsystem parts.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 09/19] io_uring: convert to use vm_account
2023-02-06 15:29 ` Jens Axboe
@ 2023-02-07 1:03 ` Alistair Popple
2023-02-07 14:28 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Alistair Popple @ 2023-02-07 1:03 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-mm, cgroups, linux-kernel, jgg, jhubbard, tjmercier, hannes,
surenb, mkoutny, daniel, Daniel P . Berrange, Alex Williamson,
Pavel Begunkov, io-uring
Jens Axboe <[email protected]> writes:
> On 2/6/23 12:47 AM, Alistair Popple wrote:
>> Convert io_uring to use vm_account instead of directly charging pages
>> against the user/mm. Rather than charge pages to both user->locked_vm
>> and mm->pinned_vm this will only charge pages to user->locked_vm.
>
> Not sure how we're supposed to review this, when you just send us 9/19
> and vm_account_release() is supposedly an earlier patch in this series.
>
> Either CC the whole series, or at least the cover letter, core parts,
> and the per-subsystem parts.
Ok, thanks. Will be sure to add everyone to the cover letter and patch
01 when I send the next version.
For reference the cover letter is here:
https://lore.kernel.org/linux-mm/cover.c238416f0e82377b449846dbb2459ae9d7030c8e.1675669136.git-series.apopple@nvidia.com/
And the core patch that introduces vm_account is here:
https://lore.kernel.org/linux-mm/e80b61561f97296a6c08faeebe281cb949333d1d.1675669136.git-series.apopple@nvidia.com/
No problem if you want to wait for the resend/next version before
taking another look though.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 09/19] io_uring: convert to use vm_account
2023-02-07 1:03 ` Alistair Popple
@ 2023-02-07 14:28 ` Jens Axboe
2023-02-07 14:55 ` Jason Gunthorpe
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2023-02-07 14:28 UTC (permalink / raw)
To: Alistair Popple
Cc: linux-mm, cgroups, linux-kernel, jgg, jhubbard, tjmercier, hannes,
surenb, mkoutny, daniel, Daniel P . Berrange, Alex Williamson,
Pavel Begunkov, io-uring
On 2/6/23 6:03?PM, Alistair Popple wrote:
>
> Jens Axboe <[email protected]> writes:
>
>> On 2/6/23 12:47?AM, Alistair Popple wrote:
>>> Convert io_uring to use vm_account instead of directly charging pages
>>> against the user/mm. Rather than charge pages to both user->locked_vm
>>> and mm->pinned_vm this will only charge pages to user->locked_vm.
>>
>> Not sure how we're supposed to review this, when you just send us 9/19
>> and vm_account_release() is supposedly an earlier patch in this series.
>>
>> Either CC the whole series, or at least the cover letter, core parts,
>> and the per-subsystem parts.
>
> Ok, thanks. Will be sure to add everyone to the cover letter and patch
> 01 when I send the next version.
>
> For reference the cover letter is here:
>
> https://lore.kernel.org/linux-mm/cover.c238416f0e82377b449846dbb2459ae9d7030c8e.1675669136.git-series.apopple@nvidia.com/
>
> And the core patch that introduces vm_account is here:
>
> https://lore.kernel.org/linux-mm/e80b61561f97296a6c08faeebe281cb949333d1d.1675669136.git-series.apopple@nvidia.com/
>
> No problem if you want to wait for the resend/next version before
> taking another look though.
Thanks, that helps. Like listed in the cover letter, I also have to
agree that this is badly named. It's way too generic, it needs to have a
name that tells you what it does. There's tons of accounting, you need
to be more specific.
Outside of that, we're now doubling the amount of memory associated with
tracking this. That isn't necessarily a showstopper, but it is not
ideal. I didn't take a look at the other conversions (again, because
they were not sent to me), but seems like the task_struct and flags
could just be passed in as they may very well be known to many/most
callers?
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 09/19] io_uring: convert to use vm_account
2023-02-07 14:28 ` Jens Axboe
@ 2023-02-07 14:55 ` Jason Gunthorpe
2023-02-07 17:05 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2023-02-07 14:55 UTC (permalink / raw)
To: Jens Axboe
Cc: Alistair Popple, linux-mm, cgroups, linux-kernel, jhubbard,
tjmercier, hannes, surenb, mkoutny, daniel, Daniel P . Berrange,
Alex Williamson, Pavel Begunkov, io-uring
On Tue, Feb 07, 2023 at 07:28:56AM -0700, Jens Axboe wrote:
> Outside of that, we're now doubling the amount of memory associated with
> tracking this. That isn't necessarily a showstopper, but it is not
> ideal. I didn't take a look at the other conversions (again, because
> they were not sent to me), but seems like the task_struct and flags
> could just be passed in as they may very well be known to many/most
> callers?
For places doing the mm accounting type it cannot use the task struct
as the underlying mm can be replaced and keep the task, IIRC.
We just had a bug in VFIO related to this..
If we could go back from the mm to the task (even a from a destroyed
mm though) that might work to reduce storage?
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 09/19] io_uring: convert to use vm_account
2023-02-07 14:55 ` Jason Gunthorpe
@ 2023-02-07 17:05 ` Jens Axboe
2023-02-13 11:30 ` Alistair Popple
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2023-02-07 17:05 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Alistair Popple, linux-mm, cgroups, linux-kernel, jhubbard,
tjmercier, hannes, surenb, mkoutny, daniel, Daniel P . Berrange,
Alex Williamson, Pavel Begunkov, io-uring
On 2/7/23 7:55?AM, Jason Gunthorpe wrote:
> On Tue, Feb 07, 2023 at 07:28:56AM -0700, Jens Axboe wrote:
>
>> Outside of that, we're now doubling the amount of memory associated with
>> tracking this. That isn't necessarily a showstopper, but it is not
>> ideal. I didn't take a look at the other conversions (again, because
>> they were not sent to me), but seems like the task_struct and flags
>> could just be passed in as they may very well be known to many/most
>> callers?
>
> For places doing the mm accounting type it cannot use the task struct
> as the underlying mm can be replaced and keep the task, IIRC.
>
> We just had a bug in VFIO related to this..
>
> If we could go back from the mm to the task (even a from a destroyed
> mm though) that might work to reduce storage?
Then maybe just nest them:
struct small_one {
struct mm_struct *mm;
struct user_struct *user;
};
struct big_one {
struct small_one foo;
struct task_struct *task;
enum vm_account_flags flags;
};
and have the real helpers deal with small_one, and wrappers around that
taking big_one that just passes in the missing bits. Then users that
don't need the extra bits can just use the right API.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 09/19] io_uring: convert to use vm_account
2023-02-07 17:05 ` Jens Axboe
@ 2023-02-13 11:30 ` Alistair Popple
0 siblings, 0 replies; 8+ messages in thread
From: Alistair Popple @ 2023-02-13 11:30 UTC (permalink / raw)
To: Jens Axboe
Cc: Jason Gunthorpe, linux-mm, cgroups, linux-kernel, jhubbard,
tjmercier, hannes, surenb, mkoutny, daniel, Daniel P . Berrange,
Alex Williamson, Pavel Begunkov, io-uring
Jens Axboe <[email protected]> writes:
> On 2/7/23 7:55?AM, Jason Gunthorpe wrote:
>> On Tue, Feb 07, 2023 at 07:28:56AM -0700, Jens Axboe wrote:
>>
>>> Outside of that, we're now doubling the amount of memory associated with
>>> tracking this. That isn't necessarily a showstopper, but it is not
>>> ideal. I didn't take a look at the other conversions (again, because
>>> they were not sent to me), but seems like the task_struct and flags
>>> could just be passed in as they may very well be known to many/most
>>> callers?
>>
>> For places doing the mm accounting type it cannot use the task struct
>> as the underlying mm can be replaced and keep the task, IIRC.
>>
>> We just had a bug in VFIO related to this..
>>
>> If we could go back from the mm to the task (even a from a destroyed
>> mm though) that might work to reduce storage?
Yes, it's the going back from a destroyed mm that gets a bit murky. I
don't think it's a showstopper as we could probably keep track of that
when we destroy the mm but it seems like a fair amount of complexity to
save a smallish amount of memory.
However if we end up tacking this onto memcg instead then we could use
that to go back to the task and move any charges when the mm moves.
> Then maybe just nest them:
>
> struct small_one {
> struct mm_struct *mm;
> struct user_struct *user;
> };
>
> struct big_one {
> struct small_one foo;
> struct task_struct *task;
> enum vm_account_flags flags;
> };
>
> and have the real helpers deal with small_one, and wrappers around that
> taking big_one that just passes in the missing bits. Then users that
> don't need the extra bits can just use the right API.
Thanks for the suggestion, it should work noting that we will have to
add a struct pins_cgroup pointer to struct small_one. It may also help
with a similar problem I was having with one of the networking
conversions.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-02-13 11:32 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <cover.c238416f0e82377b449846dbb2459ae9d7030c8e.1675669136.git-series.apopple@nvidia.com>
2023-02-06 7:47 ` [PATCH 01/19] mm: Introduce vm_account Alistair Popple
2023-02-06 7:47 ` [PATCH 09/19] io_uring: convert to use vm_account Alistair Popple
2023-02-06 15:29 ` Jens Axboe
2023-02-07 1:03 ` Alistair Popple
2023-02-07 14:28 ` Jens Axboe
2023-02-07 14:55 ` Jason Gunthorpe
2023-02-07 17:05 ` Jens Axboe
2023-02-13 11:30 ` Alistair Popple
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox