public inbox for [email protected]
 help / color / mirror / Atom feed
* Re: Linux 5.18-rc4
       [not found] <CAHk-=whmtHMzjaVUF9bS+7vE_rrRctcCTvsAeB8fuLYcyYLN-g@mail.gmail.com>
@ 2022-04-27 17:59 ` Ammar Faizi
  2022-04-27 18:31   ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Ammar Faizi @ 2022-04-27 17:59 UTC (permalink / raw)
  To: Linus Torvalds, Linux Kernel Mailing List
  Cc: Al Viro, Eric Biederman, Kees Cook, linux-fsdevel, linux-mm,
	linux-kernel, gwml

On 4/25/22 5:22 AM, Linus Torvalds wrote:
> Fairly slow and calm week - which makes me just suspect that the other
> shoe will drop at some point.
> 
> But maybe things are just going really well this release. It's bound
> to happen _occasionally_, after all.

+ fs/exec.c maintainers.

Testing Linux 5.18-rc4 on my laptop, it has been running for 2 days. Got
the following lockdep splat this night. I don't have the reproducer. If
you need more information, feel free to let me know.

[78140.503644] ======================================================
[78140.503646] WARNING: possible circular locking dependency detected
[78140.503648] 5.18.0-rc4-superb-owl-00006-gd615b5416f8a #12 Tainted: G        W
[78140.503650] ------------------------------------------------------
[78140.503651] preconv/111629 is trying to acquire lock:
[78140.503653] ffff88834d633248 (&ctx->lock){+.+.}-{2:2}, at: update_file_ctx+0x19/0xe0
[78140.503663]
                but task is already holding lock:
[78140.503664] ffff888103d80458 (&newf->file_lock){+.+.}-{2:2}, at: iterate_fd+0x34/0x150
[78140.503669]
                which lock already depends on the new lock.

[78140.503671]
                the existing dependency chain (in reverse order) is:
[78140.503672]
                -> #4 (&newf->file_lock){+.+.}-{2:2}:
[78140.503675]        _raw_spin_lock+0x2f/0x40
[78140.503679]        seq_show+0x72/0x280
[78140.503681]        seq_read_iter+0x125/0x3c0
[78140.503684]        seq_read+0xd0/0xe0
[78140.503686]        vfs_read+0xf5/0x2f0
[78140.503688]        ksys_read+0x58/0xb0
[78140.503690]        do_syscall_64+0x3d/0x90
[78140.503693]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[78140.503695]
                -> #3 (&p->alloc_lock){+.+.}-{2:2}:
[78140.503699]        _raw_spin_lock+0x2f/0x40
[78140.503700]        newseg+0x25b/0x360
[78140.503703]        ipcget+0x3fb/0x480
[78140.503705]        __x64_sys_shmget+0x48/0x50
[78140.503708]        do_syscall_64+0x3d/0x90
[78140.503710]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[78140.503713]
                -> #2 (&new->lock){+.+.}-{2:2}:
[78140.503716]        _raw_spin_lock+0x2f/0x40
[78140.503718]        ipc_addid+0xb3/0x700
[78140.503720]        newseg+0x238/0x360
[78140.503722]        ipcget+0x3fb/0x480
[78140.503724]        __x64_sys_shmget+0x48/0x50
[78140.503727]        do_syscall_64+0x3d/0x90
[78140.503729]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[78140.503731]
                -> #1 (lock#3){+.+.}-{2:2}:
[78140.503735]        local_lock_acquire+0x1d/0x70
[78140.503738]        __radix_tree_preload+0x38/0x150
[78140.503740]        idr_preload+0xa/0x40
[78140.503743]        aa_alloc_secid+0x15/0xb0
[78140.503745]        aa_label_alloc+0x6c/0x1b0
[78140.503747]        aa_label_merge+0x52/0x430
[78140.503750]        update_file_ctx+0x3f/0xe0
[78140.503752]        aa_file_perm+0x56e/0x5c0
[78140.503754]        common_file_perm+0x70/0xd0
[78140.503756]        security_mmap_file+0x4b/0xd0
[78140.503759]        vm_mmap_pgoff+0x50/0x150
[78140.503761]        elf_map+0x9f/0x120
[78140.503763]        load_elf_binary+0x521/0xc80
[78140.503767]        bprm_execve+0x39f/0x660
[78140.503769]        do_execveat_common+0x1d0/0x220
[78140.503771]        __x64_sys_execveat+0x3d/0x50
[78140.503773]        do_syscall_64+0x3d/0x90
[78140.503775]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[78140.503777]
                -> #0 (&ctx->lock){+.+.}-{2:2}:
[78140.503780]        __lock_acquire+0x1573/0x2ce0
[78140.503783]        lock_acquire+0xbd/0x190
[78140.503785]        _raw_spin_lock+0x2f/0x40
[78140.503787]        update_file_ctx+0x19/0xe0
[78140.503788]        aa_file_perm+0x56e/0x5c0
[78140.503790]        match_file+0x78/0x90
[78140.503792]        iterate_fd+0xae/0x150
[78140.503794]        aa_inherit_files+0xbe/0x170
[78140.503796]        apparmor_bprm_committing_creds+0x50/0x80
[78140.503798]        security_bprm_committing_creds+0x1d/0x30
[78140.503800]        begin_new_exec+0x3c5/0x450
[78140.503802]        load_elf_binary+0x269/0xc80
[78140.503804]        bprm_execve+0x39f/0x660
[78140.503806]        do_execveat_common+0x1d0/0x220
[78140.503808]        __x64_sys_execve+0x36/0x40
[78140.503809]        do_syscall_64+0x3d/0x90
[78140.503812]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[78140.503815]
                other info that might help us debug this:

[78140.503816] Chain exists of:
                  &ctx->lock --> &p->alloc_lock --> &newf->file_lock

[78140.503820]  Possible unsafe locking scenario:

[78140.503821]        CPU0                    CPU1
[78140.503823]        ----                    ----
[78140.503824]   lock(&newf->file_lock);
[78140.503826]                                lock(&p->alloc_lock);
[78140.503828]                                lock(&newf->file_lock);
[78140.503830]   lock(&ctx->lock);
[78140.503832]
                 *** DEADLOCK ***

[78140.503833] 3 locks held by preconv/111629:
[78140.503835]  #0: ffff888111b62550 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x39/0x660
[78140.503840]  #1: ffff888111b625e8 (&sig->exec_update_lock){++++}-{3:3}, at: exec_mmap+0x4e/0x250
[78140.503844]  #2: ffff888103d80458 (&newf->file_lock){+.+.}-{2:2}, at: iterate_fd+0x34/0x150
[78140.503849]
                stack backtrace:
[78140.503851] CPU: 3 PID: 111629 Comm: preconv Tainted: G        W         5.18.0-rc4-superb-owl-00006-gd615b5416f8a #12 6fd282a37da6f0e0172ecfa29689f3d250476a2b
[78140.503855] Hardware name: HP HP Laptop 14s-dq2xxx/87FD, BIOS F.15 09/15/2021
[78140.503856] Call Trace:
[78140.503858]  <TASK>
[78140.503860]  dump_stack_lvl+0x5a/0x74
[78140.503863]  check_noncircular+0xd3/0xe0
[78140.503866]  ? register_lock_class+0x35/0x2a0
[78140.503870]  __lock_acquire+0x1573/0x2ce0
[78140.503872]  ? prepend_path+0x375/0x410
[78140.503876]  ? d_absolute_path+0x48/0x80
[78140.503879]  ? aa_path_name+0x132/0x470
[78140.503883]  ? lock_is_held_type+0xd0/0x130
[78140.503886]  lock_acquire+0xbd/0x190
[78140.503888]  ? update_file_ctx+0x19/0xe0
[78140.503892]  _raw_spin_lock+0x2f/0x40
[78140.503894]  ? update_file_ctx+0x19/0xe0
[78140.503896]  update_file_ctx+0x19/0xe0
[78140.503899]  aa_file_perm+0x56e/0x5c0
[78140.503904]  ? aa_inherit_files+0x170/0x170
[78140.503906]  match_file+0x78/0x90
[78140.503909]  iterate_fd+0xae/0x150
[78140.503912]  aa_inherit_files+0xbe/0x170
[78140.503915]  apparmor_bprm_committing_creds+0x50/0x80
[78140.503918]  security_bprm_committing_creds+0x1d/0x30
[78140.503921]  begin_new_exec+0x3c5/0x450
[78140.503924]  load_elf_binary+0x269/0xc80
[78140.503928]  ? lock_release+0x1ee/0x260
[78140.503930]  ? bprm_execve+0x399/0x660
[78140.503933]  bprm_execve+0x39f/0x660
[78140.503936]  do_execveat_common+0x1d0/0x220
[78140.503940]  __x64_sys_execve+0x36/0x40
[78140.503942]  do_syscall_64+0x3d/0x90
[78140.503946]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[78140.503948] RIP: 0033:0x7f700a8ea33b
[78140.503954] Code: Unable to access opcode bytes at RIP 0x7f700a8ea311.
[78140.503955] RSP: 002b:00007fff315e7db8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[78140.503958] RAX: ffffffffffffffda RBX: 00007fff315e7dc0 RCX: 00007f700a8ea33b
[78140.503960] RDX: 000056419e9ea7e0 RSI: 000056419e9e9160 RDI: 00007fff315e7dc0
[78140.503962] RBP: 00007fff315e7f60 R08: 0000000000000008 R09: 0000000000000000
[78140.503964] R10: 0000000000000001 R11: 0000000000000246 R12: 000056419e9ea760
[78140.503965] R13: 000056419e9e9160 R14: 00007fff315e9eb4 R15: 00007fff315e9ebc
[78140.503971]  </TASK>

-- 
Ammar Faizi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux 5.18-rc4
  2022-04-27 17:59 ` Linux 5.18-rc4 Ammar Faizi
@ 2022-04-27 18:31   ` Linus Torvalds
  2022-06-06 15:19     ` Eric W. Biederman
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2022-04-27 18:31 UTC (permalink / raw)
  To: Ammar Faizi, John Johansen, James Morris, LSM List
  Cc: Linux Kernel Mailing List, Al Viro, Eric Biederman, Kees Cook,
	linux-fsdevel, Linux-MM, gwml

This looks like it might be AppArmor-related.

Adding AppArmor and security module people to the participants.

Sorry for top-posting and quoting the whole thing, but this is really
just bringing in more people to the discussion.

So on the exec path we have

  apparmor_bprm_committing_creds() ->
    aa_inherit_files() ->
      iterate_fd (takes files->file_lock) ->
        aa_file_perm ->
          update_file_ctx (takes aa_file_ctx->lock)

which gives us that file_lock -> ctx lock order. All within AppArmor.

And then we apparently _also_ have the reverse ctx lock -> file_lock
order by way of 'alloc_lock', which is the 'task_lock()' thing

That one is a horror to decode and I didn't, but seems to go through
ipcget -> newseg..

Anybody?

         Linus

On Wed, Apr 27, 2022 at 11:00 AM Ammar Faizi <[email protected]> wrote:
>
> On 4/25/22 5:22 AM, Linus Torvalds wrote:
> > Fairly slow and calm week - which makes me just suspect that the other
> > shoe will drop at some point.
> >
> > But maybe things are just going really well this release. It's bound
> > to happen _occasionally_, after all.
>
> + fs/exec.c maintainers.
>
> Testing Linux 5.18-rc4 on my laptop, it has been running for 2 days. Got
> the following lockdep splat this night. I don't have the reproducer. If
> you need more information, feel free to let me know.
>
> [78140.503644] ======================================================
> [78140.503646] WARNING: possible circular locking dependency detected
> [78140.503648] 5.18.0-rc4-superb-owl-00006-gd615b5416f8a #12 Tainted: G        W
> [78140.503650] ------------------------------------------------------
> [78140.503651] preconv/111629 is trying to acquire lock:
> [78140.503653] ffff88834d633248 (&ctx->lock){+.+.}-{2:2}, at: update_file_ctx+0x19/0xe0
> [78140.503663]
>                 but task is already holding lock:
> [78140.503664] ffff888103d80458 (&newf->file_lock){+.+.}-{2:2}, at: iterate_fd+0x34/0x150
> [78140.503669]
>                 which lock already depends on the new lock.
>
> [78140.503671]
>                 the existing dependency chain (in reverse order) is:
> [78140.503672]
>                 -> #4 (&newf->file_lock){+.+.}-{2:2}:
> [78140.503675]        _raw_spin_lock+0x2f/0x40
> [78140.503679]        seq_show+0x72/0x280
> [78140.503681]        seq_read_iter+0x125/0x3c0
> [78140.503684]        seq_read+0xd0/0xe0
> [78140.503686]        vfs_read+0xf5/0x2f0
> [78140.503688]        ksys_read+0x58/0xb0
> [78140.503690]        do_syscall_64+0x3d/0x90
> [78140.503693]        entry_SYSCALL_64_after_hwframe+0x44/0xae
> [78140.503695]
>                 -> #3 (&p->alloc_lock){+.+.}-{2:2}:
> [78140.503699]        _raw_spin_lock+0x2f/0x40
> [78140.503700]        newseg+0x25b/0x360
> [78140.503703]        ipcget+0x3fb/0x480
> [78140.503705]        __x64_sys_shmget+0x48/0x50
> [78140.503708]        do_syscall_64+0x3d/0x90
> [78140.503710]        entry_SYSCALL_64_after_hwframe+0x44/0xae
> [78140.503713]
>                 -> #2 (&new->lock){+.+.}-{2:2}:
> [78140.503716]        _raw_spin_lock+0x2f/0x40
> [78140.503718]        ipc_addid+0xb3/0x700
> [78140.503720]        newseg+0x238/0x360
> [78140.503722]        ipcget+0x3fb/0x480
> [78140.503724]        __x64_sys_shmget+0x48/0x50
> [78140.503727]        do_syscall_64+0x3d/0x90
> [78140.503729]        entry_SYSCALL_64_after_hwframe+0x44/0xae
> [78140.503731]
>                 -> #1 (lock#3){+.+.}-{2:2}:
> [78140.503735]        local_lock_acquire+0x1d/0x70
> [78140.503738]        __radix_tree_preload+0x38/0x150
> [78140.503740]        idr_preload+0xa/0x40
> [78140.503743]        aa_alloc_secid+0x15/0xb0
> [78140.503745]        aa_label_alloc+0x6c/0x1b0
> [78140.503747]        aa_label_merge+0x52/0x430
> [78140.503750]        update_file_ctx+0x3f/0xe0
> [78140.503752]        aa_file_perm+0x56e/0x5c0
> [78140.503754]        common_file_perm+0x70/0xd0
> [78140.503756]        security_mmap_file+0x4b/0xd0
> [78140.503759]        vm_mmap_pgoff+0x50/0x150
> [78140.503761]        elf_map+0x9f/0x120
> [78140.503763]        load_elf_binary+0x521/0xc80
> [78140.503767]        bprm_execve+0x39f/0x660
> [78140.503769]        do_execveat_common+0x1d0/0x220
> [78140.503771]        __x64_sys_execveat+0x3d/0x50
> [78140.503773]        do_syscall_64+0x3d/0x90
> [78140.503775]        entry_SYSCALL_64_after_hwframe+0x44/0xae
> [78140.503777]
>                 -> #0 (&ctx->lock){+.+.}-{2:2}:
> [78140.503780]        __lock_acquire+0x1573/0x2ce0
> [78140.503783]        lock_acquire+0xbd/0x190
> [78140.503785]        _raw_spin_lock+0x2f/0x40
> [78140.503787]        update_file_ctx+0x19/0xe0
> [78140.503788]        aa_file_perm+0x56e/0x5c0
> [78140.503790]        match_file+0x78/0x90
> [78140.503792]        iterate_fd+0xae/0x150
> [78140.503794]        aa_inherit_files+0xbe/0x170
> [78140.503796]        apparmor_bprm_committing_creds+0x50/0x80
> [78140.503798]        security_bprm_committing_creds+0x1d/0x30
> [78140.503800]        begin_new_exec+0x3c5/0x450
> [78140.503802]        load_elf_binary+0x269/0xc80
> [78140.503804]        bprm_execve+0x39f/0x660
> [78140.503806]        do_execveat_common+0x1d0/0x220
> [78140.503808]        __x64_sys_execve+0x36/0x40
> [78140.503809]        do_syscall_64+0x3d/0x90
> [78140.503812]        entry_SYSCALL_64_after_hwframe+0x44/0xae
> [78140.503815]
>                 other info that might help us debug this:
>
> [78140.503816] Chain exists of:
>                   &ctx->lock --> &p->alloc_lock --> &newf->file_lock
>
> [78140.503820]  Possible unsafe locking scenario:
>
> [78140.503821]        CPU0                    CPU1
> [78140.503823]        ----                    ----
> [78140.503824]   lock(&newf->file_lock);
> [78140.503826]                                lock(&p->alloc_lock);
> [78140.503828]                                lock(&newf->file_lock);
> [78140.503830]   lock(&ctx->lock);
> [78140.503832]
>                  *** DEADLOCK ***
>
> [78140.503833] 3 locks held by preconv/111629:
> [78140.503835]  #0: ffff888111b62550 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x39/0x660
> [78140.503840]  #1: ffff888111b625e8 (&sig->exec_update_lock){++++}-{3:3}, at: exec_mmap+0x4e/0x250
> [78140.503844]  #2: ffff888103d80458 (&newf->file_lock){+.+.}-{2:2}, at: iterate_fd+0x34/0x150
> [78140.503849]
>                 stack backtrace:
> [78140.503851] CPU: 3 PID: 111629 Comm: preconv Tainted: G        W         5.18.0-rc4-superb-owl-00006-gd615b5416f8a #12 6fd282a37da6f0e0172ecfa29689f3d250476a2b
> [78140.503855] Hardware name: HP HP Laptop 14s-dq2xxx/87FD, BIOS F.15 09/15/2021
> [78140.503856] Call Trace:
> [78140.503858]  <TASK>
> [78140.503860]  dump_stack_lvl+0x5a/0x74
> [78140.503863]  check_noncircular+0xd3/0xe0
> [78140.503866]  ? register_lock_class+0x35/0x2a0
> [78140.503870]  __lock_acquire+0x1573/0x2ce0
> [78140.503872]  ? prepend_path+0x375/0x410
> [78140.503876]  ? d_absolute_path+0x48/0x80
> [78140.503879]  ? aa_path_name+0x132/0x470
> [78140.503883]  ? lock_is_held_type+0xd0/0x130
> [78140.503886]  lock_acquire+0xbd/0x190
> [78140.503888]  ? update_file_ctx+0x19/0xe0
> [78140.503892]  _raw_spin_lock+0x2f/0x40
> [78140.503894]  ? update_file_ctx+0x19/0xe0
> [78140.503896]  update_file_ctx+0x19/0xe0
> [78140.503899]  aa_file_perm+0x56e/0x5c0
> [78140.503904]  ? aa_inherit_files+0x170/0x170
> [78140.503906]  match_file+0x78/0x90
> [78140.503909]  iterate_fd+0xae/0x150
> [78140.503912]  aa_inherit_files+0xbe/0x170
> [78140.503915]  apparmor_bprm_committing_creds+0x50/0x80
> [78140.503918]  security_bprm_committing_creds+0x1d/0x30
> [78140.503921]  begin_new_exec+0x3c5/0x450
> [78140.503924]  load_elf_binary+0x269/0xc80
> [78140.503928]  ? lock_release+0x1ee/0x260
> [78140.503930]  ? bprm_execve+0x399/0x660
> [78140.503933]  bprm_execve+0x39f/0x660
> [78140.503936]  do_execveat_common+0x1d0/0x220
> [78140.503940]  __x64_sys_execve+0x36/0x40
> [78140.503942]  do_syscall_64+0x3d/0x90
> [78140.503946]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [78140.503948] RIP: 0033:0x7f700a8ea33b
> [78140.503954] Code: Unable to access opcode bytes at RIP 0x7f700a8ea311.
> [78140.503955] RSP: 002b:00007fff315e7db8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
> [78140.503958] RAX: ffffffffffffffda RBX: 00007fff315e7dc0 RCX: 00007f700a8ea33b
> [78140.503960] RDX: 000056419e9ea7e0 RSI: 000056419e9e9160 RDI: 00007fff315e7dc0
> [78140.503962] RBP: 00007fff315e7f60 R08: 0000000000000008 R09: 0000000000000000
> [78140.503964] R10: 0000000000000001 R11: 0000000000000246 R12: 000056419e9ea760
> [78140.503965] R13: 000056419e9e9160 R14: 00007fff315e9eb4 R15: 00007fff315e9ebc
> [78140.503971]  </TASK>
>
> --
> Ammar Faizi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux 5.18-rc4
  2022-04-27 18:31   ` Linus Torvalds
@ 2022-06-06 15:19     ` Eric W. Biederman
  2022-06-06 18:28       ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Eric W. Biederman @ 2022-06-06 15:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ammar Faizi, John Johansen, James Morris, LSM List,
	Linux Kernel Mailing List, Al Viro, Kees Cook, linux-fsdevel

Linus Torvalds <[email protected]> writes:

> This looks like it might be AppArmor-related.
>
> Adding AppArmor and security module people to the participants.
>
> Sorry for top-posting and quoting the whole thing, but this is really
> just bringing in more people to the discussion.
>
> So on the exec path we have
>
>   apparmor_bprm_committing_creds() ->
>     aa_inherit_files() ->
>       iterate_fd (takes files->file_lock) ->
>         aa_file_perm ->
>           update_file_ctx (takes aa_file_ctx->lock)
>
> which gives us that file_lock -> ctx lock order. All within AppArmor.
>
> And then we apparently _also_ have the reverse ctx lock -> file_lock
> order by way of 'alloc_lock', which is the 'task_lock()' thing
>
> That one is a horror to decode and I didn't, but seems to go through
> ipcget -> newseg..
>
> Anybody?

Has anyone looked into this lock ordering issues?

Eric

>
> On Wed, Apr 27, 2022 at 11:00 AM Ammar Faizi <[email protected]> wrote:
>>
>> On 4/25/22 5:22 AM, Linus Torvalds wrote:
>> > Fairly slow and calm week - which makes me just suspect that the other
>> > shoe will drop at some point.
>> >
>> > But maybe things are just going really well this release. It's bound
>> > to happen _occasionally_, after all.
>>
>> + fs/exec.c maintainers.
>>
>> Testing Linux 5.18-rc4 on my laptop, it has been running for 2 days. Got
>> the following lockdep splat this night. I don't have the reproducer. If
>> you need more information, feel free to let me know.
>>
>> [78140.503644] ======================================================
>> [78140.503646] WARNING: possible circular locking dependency detected
>> [78140.503648] 5.18.0-rc4-superb-owl-00006-gd615b5416f8a #12 Tainted: G        W
>> [78140.503650] ------------------------------------------------------
>> [78140.503651] preconv/111629 is trying to acquire lock:
>> [78140.503653] ffff88834d633248 (&ctx->lock){+.+.}-{2:2}, at: update_file_ctx+0x19/0xe0
>> [78140.503663]
>>                 but task is already holding lock:
>> [78140.503664] ffff888103d80458 (&newf->file_lock){+.+.}-{2:2}, at: iterate_fd+0x34/0x150
>> [78140.503669]
>>                 which lock already depends on the new lock.
>>
>> [78140.503671]
>>                 the existing dependency chain (in reverse order) is:
>> [78140.503672]
>>                 -> #4 (&newf->file_lock){+.+.}-{2:2}:
>> [78140.503675]        _raw_spin_lock+0x2f/0x40
>> [78140.503679]        seq_show+0x72/0x280
>> [78140.503681]        seq_read_iter+0x125/0x3c0
>> [78140.503684]        seq_read+0xd0/0xe0
>> [78140.503686]        vfs_read+0xf5/0x2f0
>> [78140.503688]        ksys_read+0x58/0xb0
>> [78140.503690]        do_syscall_64+0x3d/0x90
>> [78140.503693]        entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [78140.503695]
>>                 -> #3 (&p->alloc_lock){+.+.}-{2:2}:
>> [78140.503699]        _raw_spin_lock+0x2f/0x40
>> [78140.503700]        newseg+0x25b/0x360
>> [78140.503703]        ipcget+0x3fb/0x480
>> [78140.503705]        __x64_sys_shmget+0x48/0x50
>> [78140.503708]        do_syscall_64+0x3d/0x90
>> [78140.503710]        entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [78140.503713]
>>                 -> #2 (&new->lock){+.+.}-{2:2}:
>> [78140.503716]        _raw_spin_lock+0x2f/0x40
>> [78140.503718]        ipc_addid+0xb3/0x700
>> [78140.503720]        newseg+0x238/0x360
>> [78140.503722]        ipcget+0x3fb/0x480
>> [78140.503724]        __x64_sys_shmget+0x48/0x50
>> [78140.503727]        do_syscall_64+0x3d/0x90
>> [78140.503729]        entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [78140.503731]
>>                 -> #1 (lock#3){+.+.}-{2:2}:
>> [78140.503735]        local_lock_acquire+0x1d/0x70
>> [78140.503738]        __radix_tree_preload+0x38/0x150
>> [78140.503740]        idr_preload+0xa/0x40
>> [78140.503743]        aa_alloc_secid+0x15/0xb0
>> [78140.503745]        aa_label_alloc+0x6c/0x1b0
>> [78140.503747]        aa_label_merge+0x52/0x430
>> [78140.503750]        update_file_ctx+0x3f/0xe0
>> [78140.503752]        aa_file_perm+0x56e/0x5c0
>> [78140.503754]        common_file_perm+0x70/0xd0
>> [78140.503756]        security_mmap_file+0x4b/0xd0
>> [78140.503759]        vm_mmap_pgoff+0x50/0x150
>> [78140.503761]        elf_map+0x9f/0x120
>> [78140.503763]        load_elf_binary+0x521/0xc80
>> [78140.503767]        bprm_execve+0x39f/0x660
>> [78140.503769]        do_execveat_common+0x1d0/0x220
>> [78140.503771]        __x64_sys_execveat+0x3d/0x50
>> [78140.503773]        do_syscall_64+0x3d/0x90
>> [78140.503775]        entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [78140.503777]
>>                 -> #0 (&ctx->lock){+.+.}-{2:2}:
>> [78140.503780]        __lock_acquire+0x1573/0x2ce0
>> [78140.503783]        lock_acquire+0xbd/0x190
>> [78140.503785]        _raw_spin_lock+0x2f/0x40
>> [78140.503787]        update_file_ctx+0x19/0xe0
>> [78140.503788]        aa_file_perm+0x56e/0x5c0
>> [78140.503790]        match_file+0x78/0x90
>> [78140.503792]        iterate_fd+0xae/0x150
>> [78140.503794]        aa_inherit_files+0xbe/0x170
>> [78140.503796]        apparmor_bprm_committing_creds+0x50/0x80
>> [78140.503798]        security_bprm_committing_creds+0x1d/0x30
>> [78140.503800]        begin_new_exec+0x3c5/0x450
>> [78140.503802]        load_elf_binary+0x269/0xc80
>> [78140.503804]        bprm_execve+0x39f/0x660
>> [78140.503806]        do_execveat_common+0x1d0/0x220
>> [78140.503808]        __x64_sys_execve+0x36/0x40
>> [78140.503809]        do_syscall_64+0x3d/0x90
>> [78140.503812]        entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [78140.503815]
>>                 other info that might help us debug this:
>>
>> [78140.503816] Chain exists of:
>>                   &ctx->lock --> &p->alloc_lock --> &newf->file_lock
>>
>> [78140.503820]  Possible unsafe locking scenario:
>>
>> [78140.503821]        CPU0                    CPU1
>> [78140.503823]        ----                    ----
>> [78140.503824]   lock(&newf->file_lock);
>> [78140.503826]                                lock(&p->alloc_lock);
>> [78140.503828]                                lock(&newf->file_lock);
>> [78140.503830]   lock(&ctx->lock);
>> [78140.503832]
>>                  *** DEADLOCK ***
>>
>> [78140.503833] 3 locks held by preconv/111629:
>> [78140.503835]  #0: ffff888111b62550 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x39/0x660
>> [78140.503840]  #1: ffff888111b625e8 (&sig->exec_update_lock){++++}-{3:3}, at: exec_mmap+0x4e/0x250
>> [78140.503844]  #2: ffff888103d80458 (&newf->file_lock){+.+.}-{2:2}, at: iterate_fd+0x34/0x150
>> [78140.503849]
>>                 stack backtrace:
>> [78140.503851] CPU: 3 PID: 111629 Comm: preconv Tainted: G        W         5.18.0-rc4-superb-owl-00006-gd615b5416f8a #12 6fd282a37da6f0e0172ecfa29689f3d250476a2b
>> [78140.503855] Hardware name: HP HP Laptop 14s-dq2xxx/87FD, BIOS F.15 09/15/2021
>> [78140.503856] Call Trace:
>> [78140.503858]  <TASK>
>> [78140.503860]  dump_stack_lvl+0x5a/0x74
>> [78140.503863]  check_noncircular+0xd3/0xe0
>> [78140.503866]  ? register_lock_class+0x35/0x2a0
>> [78140.503870]  __lock_acquire+0x1573/0x2ce0
>> [78140.503872]  ? prepend_path+0x375/0x410
>> [78140.503876]  ? d_absolute_path+0x48/0x80
>> [78140.503879]  ? aa_path_name+0x132/0x470
>> [78140.503883]  ? lock_is_held_type+0xd0/0x130
>> [78140.503886]  lock_acquire+0xbd/0x190
>> [78140.503888]  ? update_file_ctx+0x19/0xe0
>> [78140.503892]  _raw_spin_lock+0x2f/0x40
>> [78140.503894]  ? update_file_ctx+0x19/0xe0
>> [78140.503896]  update_file_ctx+0x19/0xe0
>> [78140.503899]  aa_file_perm+0x56e/0x5c0
>> [78140.503904]  ? aa_inherit_files+0x170/0x170
>> [78140.503906]  match_file+0x78/0x90
>> [78140.503909]  iterate_fd+0xae/0x150
>> [78140.503912]  aa_inherit_files+0xbe/0x170
>> [78140.503915]  apparmor_bprm_committing_creds+0x50/0x80
>> [78140.503918]  security_bprm_committing_creds+0x1d/0x30
>> [78140.503921]  begin_new_exec+0x3c5/0x450
>> [78140.503924]  load_elf_binary+0x269/0xc80
>> [78140.503928]  ? lock_release+0x1ee/0x260
>> [78140.503930]  ? bprm_execve+0x399/0x660
>> [78140.503933]  bprm_execve+0x39f/0x660
>> [78140.503936]  do_execveat_common+0x1d0/0x220
>> [78140.503940]  __x64_sys_execve+0x36/0x40
>> [78140.503942]  do_syscall_64+0x3d/0x90
>> [78140.503946]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [78140.503948] RIP: 0033:0x7f700a8ea33b
>> [78140.503954] Code: Unable to access opcode bytes at RIP 0x7f700a8ea311.
>> [78140.503955] RSP: 002b:00007fff315e7db8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
>> [78140.503958] RAX: ffffffffffffffda RBX: 00007fff315e7dc0 RCX: 00007f700a8ea33b
>> [78140.503960] RDX: 000056419e9ea7e0 RSI: 000056419e9e9160 RDI: 00007fff315e7dc0
>> [78140.503962] RBP: 00007fff315e7f60 R08: 0000000000000008 R09: 0000000000000000
>> [78140.503964] R10: 0000000000000001 R11: 0000000000000246 R12: 000056419e9ea760
>> [78140.503965] R13: 000056419e9e9160 R14: 00007fff315e9eb4 R15: 00007fff315e9ebc
>> [78140.503971]  </TASK>
>>
>> --
>> Ammar Faizi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux 5.18-rc4
  2022-06-06 15:19     ` Eric W. Biederman
@ 2022-06-06 18:28       ` Linus Torvalds
  2022-06-06 19:19         ` John Johansen
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2022-06-06 18:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ammar Faizi, John Johansen, James Morris, LSM List,
	Linux Kernel Mailing List, Al Viro, Kees Cook,
	<[email protected]>, Linux-MM, gwml

On Mon, Jun 6, 2022 at 8:19 AM Eric W. Biederman <[email protected]> wrote:
> Has anyone looked into this lock ordering issues?

The deadlock is

> >> [78140.503821]        CPU0                    CPU1
> >> [78140.503823]        ----                    ----
> >> [78140.503824]   lock(&newf->file_lock);
> >> [78140.503826]                                lock(&p->alloc_lock);
> >> [78140.503828]                                lock(&newf->file_lock);
> >> [78140.503830]   lock(&ctx->lock);

and the alloc_lock -> file_lock on CPU1 is trivial - it's seq_show()
in fs/proc/fd.c:

        task_lock(task);
        files = task->files;
        if (files) {
                unsigned int fd = proc_fd(m->private);

                spin_lock(&files->file_lock);

and that looks all normal.

But the other chains look painful.

I do see the IPC code doing ugly things, in particular I detest this code:

        task_lock(current);
        list_add(&shp->shm_clist, &current->sysvshm.shm_clist);
        task_unlock(current);

where it is using the task lock to protect the shm_clist list. Nasty.

And it's doing that inside the shm_ids.rwsem lock _and_ inside the
shp->shm_perm.lock.

So the IPC code has newseg() doing

   shmget ->
    ipcget():
     down_write(ids->rwsem) ->
       newseg():
         ipc_addid gets perm->lock
         task_lock(current)

so you have

  ids->rwsem -> perm->lock -> alloc_lock

there.

So now we have that

   ids->rwsem -> ipcperm->lock -> alloc_lock -> file_lock

when you put those sequences together.

But I didn't figure out what the security subsystem angle is and how
that then apparently mixes things up with execve.

Yes, newseg() is doing that

        error = security_shm_alloc(&shp->shm_perm);

while holding rwsem, but I can't see how that matters. From the
lockdep output, rwsem doesn't actually seem to be part of the whole
sequence.

It *looks* like we have

   apparmour ctx->lock -->
      radix_tree_preloads.lock -->
         ipcperm->lock

and apparently that's called under the file_lock somewhere, completing
the circle.

I guess the execve component is that

  begin_new_exec ->
    security_bprm_committing_creds ->
      apparmor_bprm_committing_creds ->
        aa_inherit_files ->
          iterate_fd ->   *takes file_lock*
            match_file ->
              aa_file_perm ->
                update_file_ctx *takes ctx->lock*

so that's how you get file_lock -> ctx->lock.

So you have:

 SHMGET:
    ipcperm->lock -> alloc_lock
 /proc:
    alloc_lock -> file_lock
 apparmor_bprm_committing_creds:
    file_lock -> ctx->lock

and then all you need is ctx->lock -> ipcperm->lock but I didn't find that part.

I suspect that part is that both Apparmor and IPC use the idr local lock.

               Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux 5.18-rc4
  2022-06-06 18:28       ` Linus Torvalds
@ 2022-06-06 19:19         ` John Johansen
  2022-06-06 19:47           ` Linus Torvalds
  2022-06-06 20:23           ` Matthew Wilcox
  0 siblings, 2 replies; 11+ messages in thread
From: John Johansen @ 2022-06-06 19:19 UTC (permalink / raw)
  To: Linus Torvalds, Eric W. Biederman
  Cc: Ammar Faizi, James Morris, LSM List, Linux Kernel Mailing List,
	Al Viro, Kees Cook, linux-fsdevel, Linux-MM, gwml

On 6/6/22 11:28, Linus Torvalds wrote:
> On Mon, Jun 6, 2022 at 8:19 AM Eric W. Biederman <[email protected]> wrote:
>> Has anyone looked into this lock ordering issues?
> 
> The deadlock is
> 
>>>> [78140.503821]        CPU0                    CPU1
>>>> [78140.503823]        ----                    ----
>>>> [78140.503824]   lock(&newf->file_lock);
>>>> [78140.503826]                                lock(&p->alloc_lock);
>>>> [78140.503828]                                lock(&newf->file_lock);
>>>> [78140.503830]   lock(&ctx->lock);
> 
> and the alloc_lock -> file_lock on CPU1 is trivial - it's seq_show()
> in fs/proc/fd.c:
> 
>         task_lock(task);
>         files = task->files;
>         if (files) {
>                 unsigned int fd = proc_fd(m->private);
> 
>                 spin_lock(&files->file_lock);
> 
> and that looks all normal.
> 
> But the other chains look painful.
> 
> I do see the IPC code doing ugly things, in particular I detest this code:
> 
>         task_lock(current);
>         list_add(&shp->shm_clist, &current->sysvshm.shm_clist);
>         task_unlock(current);
> 
> where it is using the task lock to protect the shm_clist list. Nasty.
> 
> And it's doing that inside the shm_ids.rwsem lock _and_ inside the
> shp->shm_perm.lock.
> 
> So the IPC code has newseg() doing
> 
>    shmget ->
>     ipcget():
>      down_write(ids->rwsem) ->
>        newseg():
>          ipc_addid gets perm->lock
>          task_lock(current)
> 
> so you have
> 
>   ids->rwsem -> perm->lock -> alloc_lock
> 
> there.
> 
> So now we have that
> 
>    ids->rwsem -> ipcperm->lock -> alloc_lock -> file_lock
> 
> when you put those sequences together.
> 
> But I didn't figure out what the security subsystem angle is and how
> that then apparently mixes things up with execve.
> 
> Yes, newseg() is doing that
> 
>         error = security_shm_alloc(&shp->shm_perm);
> 
> while holding rwsem, but I can't see how that matters. From the
> lockdep output, rwsem doesn't actually seem to be part of the whole
> sequence.
> 
> It *looks* like we have
> 
>    apparmour ctx->lock -->
>       radix_tree_preloads.lock -->
>          ipcperm->lock
> 
> and apparently that's called under the file_lock somewhere, completing
> the circle.
> 
> I guess the execve component is that
> 
>   begin_new_exec ->
>     security_bprm_committing_creds ->
>       apparmor_bprm_committing_creds ->
>         aa_inherit_files ->
>           iterate_fd ->   *takes file_lock*
>             match_file ->
>               aa_file_perm ->
>                 update_file_ctx *takes ctx->lock*
> 
> so that's how you get file_lock -> ctx->lock.
> 
yes

> So you have:
> 
>  SHMGET:
>     ipcperm->lock -> alloc_lock
>  /proc:
>     alloc_lock -> file_lock
>  apparmor_bprm_committing_creds:
>     file_lock -> ctx->lock
> 
> and then all you need is ctx->lock -> ipcperm->lock but I didn't find that part.
> 
yeah that is the part I got stuck on, before being pulled away from this

> I suspect that part is that both Apparmor and IPC use the idr local lock.
> 
bingo,

apparmor moved its secids allocation from a custom radix tree to idr in

  99cc45e48678 apparmor: Use an IDR to allocate apparmor secids

and ipc is using the idr for its id allocation as well

I can easily lift the secid() allocation out of the ctx->lock but that
would still leave it happening under the file_lock and not fix the problem.
I think the quick solution would be for apparmor to stop using idr, reverting
back at least temporarily to the custom radix tree.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux 5.18-rc4
  2022-06-06 19:19         ` John Johansen
@ 2022-06-06 19:47           ` Linus Torvalds
  2022-06-06 20:23           ` Matthew Wilcox
  1 sibling, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2022-06-06 19:47 UTC (permalink / raw)
  To: John Johansen, Thomas Gleixner
  Cc: Eric W. Biederman, Ammar Faizi, James Morris, LSM List,
	Linux Kernel Mailing List, Al Viro, Kees Cook, linux-fsdevel,
	Linux-MM, gwml

On Mon, Jun 6, 2022 at 12:19 PM John Johansen
<[email protected]> wrote:
>
> > I suspect that part is that both Apparmor and IPC use the idr local lock.
>
> bingo,
>
> apparmor moved its secids allocation from a custom radix tree to idr in
>
>   99cc45e48678 apparmor: Use an IDR to allocate apparmor secids
>
> and ipc is using the idr for its id allocation as well

The thing is, I'm not entirely convinced you can deadlock on a local lock.

A local lock is per-cpu, so one CPU holding that lock won't actually
block another CPU holding it. Even on RT, I think.

I *think* local locks are useful for lockdep not because of any lock
chains they introduce, but because of how lockdep catches irq mis-use
(where they *can* deadlock).

But I may be entirely wrong, and maybe that lock chain through the
local lock actually does matter.

Let's bring in people who actually know what they are doing, rather
than my wild speculation. Thomas?

                    Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux 5.18-rc4
  2022-06-06 19:19         ` John Johansen
  2022-06-06 19:47           ` Linus Torvalds
@ 2022-06-06 20:23           ` Matthew Wilcox
  2022-06-06 21:00             ` John Johansen
  1 sibling, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2022-06-06 20:23 UTC (permalink / raw)
  To: John Johansen
  Cc: Linus Torvalds, Eric W. Biederman, Ammar Faizi, James Morris,
	LSM List, Linux Kernel Mailing List, Al Viro, Kees Cook,
	linux-fsdevel, Linux-MM, gwml

On Mon, Jun 06, 2022 at 12:19:36PM -0700, John Johansen wrote:
> > I suspect that part is that both Apparmor and IPC use the idr local lock.
> > 
> bingo,
> 
> apparmor moved its secids allocation from a custom radix tree to idr in
> 
>   99cc45e48678 apparmor: Use an IDR to allocate apparmor secids
> 
> and ipc is using the idr for its id allocation as well
> 
> I can easily lift the secid() allocation out of the ctx->lock but that
> would still leave it happening under the file_lock and not fix the problem.
> I think the quick solution would be for apparmor to stop using idr, reverting
> back at least temporarily to the custom radix tree.

How about moving forward to the XArray that doesn't use that horrid
prealloc gunk?  Compile tested only.


diff --git a/security/apparmor/include/secid.h b/security/apparmor/include/secid.h
index 48ff1ddecad5..278dff5ecd1f 100644
--- a/security/apparmor/include/secid.h
+++ b/security/apparmor/include/secid.h
@@ -31,6 +31,4 @@ int aa_alloc_secid(struct aa_label *label, gfp_t gfp);
 void aa_free_secid(u32 secid);
 void aa_secid_update(u32 secid, struct aa_label *label);
 
-void aa_secids_init(void);
-
 #endif /* __AA_SECID_H */
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 900bc540656a..9dfb4e4631da 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -1857,8 +1857,6 @@ static int __init apparmor_init(void)
 {
 	int error;
 
-	aa_secids_init();
-
 	error = aa_setup_dfa_engine();
 	if (error) {
 		AA_ERROR("Unable to setup dfa engine\n");
diff --git a/security/apparmor/secid.c b/security/apparmor/secid.c
index ce545f99259e..3b08942db1f6 100644
--- a/security/apparmor/secid.c
+++ b/security/apparmor/secid.c
@@ -13,9 +13,9 @@
 #include <linux/errno.h>
 #include <linux/err.h>
 #include <linux/gfp.h>
-#include <linux/idr.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
+#include <linux/xarray.h>
 
 #include "include/cred.h"
 #include "include/lib.h"
@@ -29,8 +29,7 @@
  */
 #define AA_FIRST_SECID 2
 
-static DEFINE_IDR(aa_secids);
-static DEFINE_SPINLOCK(secid_lock);
+static DEFINE_XARRAY_FLAGS(aa_secids, XA_FLAGS_LOCK_IRQ | XA_FLAGS_TRACK_FREE);
 
 /*
  * TODO: allow policy to reserve a secid range?
@@ -47,9 +46,9 @@ void aa_secid_update(u32 secid, struct aa_label *label)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&secid_lock, flags);
-	idr_replace(&aa_secids, label, secid);
-	spin_unlock_irqrestore(&secid_lock, flags);
+	xa_lock_irqsave(&aa_secids, flags);
+	__xa_store(&aa_secids, secid, label, 0);
+	xa_unlock_irqrestore(&aa_secids, flags);
 }
 
 /**
@@ -58,13 +57,7 @@ void aa_secid_update(u32 secid, struct aa_label *label)
  */
 struct aa_label *aa_secid_to_label(u32 secid)
 {
-	struct aa_label *label;
-
-	rcu_read_lock();
-	label = idr_find(&aa_secids, secid);
-	rcu_read_unlock();
-
-	return label;
+	return xa_load(&aa_secids, secid);
 }
 
 int apparmor_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
@@ -126,19 +119,16 @@ int aa_alloc_secid(struct aa_label *label, gfp_t gfp)
 	unsigned long flags;
 	int ret;
 
-	idr_preload(gfp);
-	spin_lock_irqsave(&secid_lock, flags);
-	ret = idr_alloc(&aa_secids, label, AA_FIRST_SECID, 0, GFP_ATOMIC);
-	spin_unlock_irqrestore(&secid_lock, flags);
-	idr_preload_end();
+	xa_lock_irqsave(&aa_secids, flags);
+	ret = __xa_alloc(&aa_secids, &label->secid, label,
+			XA_LIMIT(AA_FIRST_SECID, INT_MAX), gfp);
+	xa_unlock_irqrestore(&aa_secids, flags);
 
 	if (ret < 0) {
 		label->secid = AA_SECID_INVALID;
 		return ret;
 	}
 
-	AA_BUG(ret == AA_SECID_INVALID);
-	label->secid = ret;
 	return 0;
 }
 
@@ -150,12 +140,7 @@ void aa_free_secid(u32 secid)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&secid_lock, flags);
-	idr_remove(&aa_secids, secid);
-	spin_unlock_irqrestore(&secid_lock, flags);
-}
-
-void aa_secids_init(void)
-{
-	idr_init_base(&aa_secids, AA_FIRST_SECID);
+	xa_lock_irqsave(&aa_secids, flags);
+	__xa_erase(&aa_secids, secid);
+	xa_unlock_irqrestore(&aa_secids, flags);
 }

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Linux 5.18-rc4
  2022-06-06 20:23           ` Matthew Wilcox
@ 2022-06-06 21:00             ` John Johansen
  2022-06-13 22:48               ` Matthew Wilcox
  0 siblings, 1 reply; 11+ messages in thread
From: John Johansen @ 2022-06-06 21:00 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Eric W. Biederman, Ammar Faizi, James Morris,
	LSM List, Linux Kernel Mailing List, Al Viro, Kees Cook,
	linux-fsdevel, Linux-MM, gwml

On 6/6/22 13:23, Matthew Wilcox wrote:
> On Mon, Jun 06, 2022 at 12:19:36PM -0700, John Johansen wrote:
>>> I suspect that part is that both Apparmor and IPC use the idr local lock.
>>>
>> bingo,
>>
>> apparmor moved its secids allocation from a custom radix tree to idr in
>>
>>   99cc45e48678 apparmor: Use an IDR to allocate apparmor secids
>>
>> and ipc is using the idr for its id allocation as well
>>
>> I can easily lift the secid() allocation out of the ctx->lock but that
>> would still leave it happening under the file_lock and not fix the problem.
>> I think the quick solution would be for apparmor to stop using idr, reverting
>> back at least temporarily to the custom radix tree.
> 
> How about moving forward to the XArray that doesn't use that horrid
> prealloc gunk?  Compile tested only.
> 

I'm not very familiar with XArray but it does seem like a good fit. We do try
to keep the secid allocation dense, ideally no holes. Wrt the current locking
issue I want to hear what Thomas has to say. Regardless I am looking into
whether we should just switch to XArrays going forward.


> 
> diff --git a/security/apparmor/include/secid.h b/security/apparmor/include/secid.h
> index 48ff1ddecad5..278dff5ecd1f 100644
> --- a/security/apparmor/include/secid.h
> +++ b/security/apparmor/include/secid.h
> @@ -31,6 +31,4 @@ int aa_alloc_secid(struct aa_label *label, gfp_t gfp);
>  void aa_free_secid(u32 secid);
>  void aa_secid_update(u32 secid, struct aa_label *label);
>  
> -void aa_secids_init(void);
> -
>  #endif /* __AA_SECID_H */
> diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
> index 900bc540656a..9dfb4e4631da 100644
> --- a/security/apparmor/lsm.c
> +++ b/security/apparmor/lsm.c
> @@ -1857,8 +1857,6 @@ static int __init apparmor_init(void)
>  {
>  	int error;
>  
> -	aa_secids_init();
> -
>  	error = aa_setup_dfa_engine();
>  	if (error) {
>  		AA_ERROR("Unable to setup dfa engine\n");
> diff --git a/security/apparmor/secid.c b/security/apparmor/secid.c
> index ce545f99259e..3b08942db1f6 100644
> --- a/security/apparmor/secid.c
> +++ b/security/apparmor/secid.c
> @@ -13,9 +13,9 @@
>  #include <linux/errno.h>
>  #include <linux/err.h>
>  #include <linux/gfp.h>
> -#include <linux/idr.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
> +#include <linux/xarray.h>
>  
>  #include "include/cred.h"
>  #include "include/lib.h"
> @@ -29,8 +29,7 @@
>   */
>  #define AA_FIRST_SECID 2
>  
> -static DEFINE_IDR(aa_secids);
> -static DEFINE_SPINLOCK(secid_lock);
> +static DEFINE_XARRAY_FLAGS(aa_secids, XA_FLAGS_LOCK_IRQ | XA_FLAGS_TRACK_FREE);
>  
>  /*
>   * TODO: allow policy to reserve a secid range?
> @@ -47,9 +46,9 @@ void aa_secid_update(u32 secid, struct aa_label *label)
>  {
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&secid_lock, flags);
> -	idr_replace(&aa_secids, label, secid);
> -	spin_unlock_irqrestore(&secid_lock, flags);
> +	xa_lock_irqsave(&aa_secids, flags);
> +	__xa_store(&aa_secids, secid, label, 0);
> +	xa_unlock_irqrestore(&aa_secids, flags);
>  }
>  
>  /**
> @@ -58,13 +57,7 @@ void aa_secid_update(u32 secid, struct aa_label *label)
>   */
>  struct aa_label *aa_secid_to_label(u32 secid)
>  {
> -	struct aa_label *label;
> -
> -	rcu_read_lock();
> -	label = idr_find(&aa_secids, secid);
> -	rcu_read_unlock();
> -
> -	return label;
> +	return xa_load(&aa_secids, secid);
>  }
>  
>  int apparmor_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
> @@ -126,19 +119,16 @@ int aa_alloc_secid(struct aa_label *label, gfp_t gfp)
>  	unsigned long flags;
>  	int ret;
>  
> -	idr_preload(gfp);
> -	spin_lock_irqsave(&secid_lock, flags);
> -	ret = idr_alloc(&aa_secids, label, AA_FIRST_SECID, 0, GFP_ATOMIC);
> -	spin_unlock_irqrestore(&secid_lock, flags);
> -	idr_preload_end();
> +	xa_lock_irqsave(&aa_secids, flags);
> +	ret = __xa_alloc(&aa_secids, &label->secid, label,
> +			XA_LIMIT(AA_FIRST_SECID, INT_MAX), gfp);
> +	xa_unlock_irqrestore(&aa_secids, flags);
>  
>  	if (ret < 0) {
>  		label->secid = AA_SECID_INVALID;
>  		return ret;
>  	}
>  
> -	AA_BUG(ret == AA_SECID_INVALID);
> -	label->secid = ret;
>  	return 0;
>  }
>  
> @@ -150,12 +140,7 @@ void aa_free_secid(u32 secid)
>  {
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&secid_lock, flags);
> -	idr_remove(&aa_secids, secid);
> -	spin_unlock_irqrestore(&secid_lock, flags);
> -}
> -
> -void aa_secids_init(void)
> -{
> -	idr_init_base(&aa_secids, AA_FIRST_SECID);
> +	xa_lock_irqsave(&aa_secids, flags);
> +	__xa_erase(&aa_secids, secid);
> +	xa_unlock_irqrestore(&aa_secids, flags);
>  }


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux 5.18-rc4
  2022-06-06 21:00             ` John Johansen
@ 2022-06-13 22:48               ` Matthew Wilcox
  2022-06-21 20:27                 ` John Johansen
  0 siblings, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2022-06-13 22:48 UTC (permalink / raw)
  To: John Johansen
  Cc: Linus Torvalds, Eric W. Biederman, Ammar Faizi, James Morris,
	LSM List, Linux Kernel Mailing List, Al Viro, Kees Cook,
	linux-fsdevel, Linux-MM, gwml

On Mon, Jun 06, 2022 at 02:00:33PM -0700, John Johansen wrote:
> On 6/6/22 13:23, Matthew Wilcox wrote:
> > On Mon, Jun 06, 2022 at 12:19:36PM -0700, John Johansen wrote:
> >>> I suspect that part is that both Apparmor and IPC use the idr local lock.
> >>>
> >> bingo,
> >>
> >> apparmor moved its secids allocation from a custom radix tree to idr in
> >>
> >>   99cc45e48678 apparmor: Use an IDR to allocate apparmor secids
> >>
> >> and ipc is using the idr for its id allocation as well
> >>
> >> I can easily lift the secid() allocation out of the ctx->lock but that
> >> would still leave it happening under the file_lock and not fix the problem.
> >> I think the quick solution would be for apparmor to stop using idr, reverting
> >> back at least temporarily to the custom radix tree.
> > 
> > How about moving forward to the XArray that doesn't use that horrid
> > prealloc gunk?  Compile tested only.
> > 
> 
> I'm not very familiar with XArray but it does seem like a good fit. We do try
> to keep the secid allocation dense, ideally no holes. Wrt the current locking
> issue I want to hear what Thomas has to say. Regardless I am looking into
> whether we should just switch to XArrays going forward.

Nothing from Thomas ... shall we just go with this?  Do you want a
commit message, etc for the patch?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux 5.18-rc4
  2022-06-13 22:48               ` Matthew Wilcox
@ 2022-06-21 20:27                 ` John Johansen
  2022-07-13  9:37                   ` Ammar Faizi
  0 siblings, 1 reply; 11+ messages in thread
From: John Johansen @ 2022-06-21 20:27 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Eric W. Biederman, Ammar Faizi, James Morris,
	LSM List, Linux Kernel Mailing List, Al Viro, Kees Cook,
	linux-fsdevel, Linux-MM, gwml

On 6/13/22 15:48, Matthew Wilcox wrote:
> On Mon, Jun 06, 2022 at 02:00:33PM -0700, John Johansen wrote:
>> On 6/6/22 13:23, Matthew Wilcox wrote:
>>> On Mon, Jun 06, 2022 at 12:19:36PM -0700, John Johansen wrote:
>>>>> I suspect that part is that both Apparmor and IPC use the idr local lock.
>>>>>
>>>> bingo,
>>>>
>>>> apparmor moved its secids allocation from a custom radix tree to idr in
>>>>
>>>>   99cc45e48678 apparmor: Use an IDR to allocate apparmor secids
>>>>
>>>> and ipc is using the idr for its id allocation as well
>>>>
>>>> I can easily lift the secid() allocation out of the ctx->lock but that
>>>> would still leave it happening under the file_lock and not fix the problem.
>>>> I think the quick solution would be for apparmor to stop using idr, reverting
>>>> back at least temporarily to the custom radix tree.
>>>
>>> How about moving forward to the XArray that doesn't use that horrid
>>> prealloc gunk?  Compile tested only.
>>>
>>
>> I'm not very familiar with XArray but it does seem like a good fit. We do try
>> to keep the secid allocation dense, ideally no holes. Wrt the current locking
>> issue I want to hear what Thomas has to say. Regardless I am looking into
>> whether we should just switch to XArrays going forward.
> 
> Nothing from Thomas ... shall we just go with this?  Do you want a
> commit message, etc for the patch?

Hey Matthew,

I have done testing with this and the patch looks good. We will certainly
go this route going forward so a commit message, would be good. As for
fixing the issue in current kernels. I am not opposed to pulling this
back as fixes for

  99cc45e48678 apparmor: Use an IDR to allocate apparmor secids

but I would like some other peoples opinions on doing that, because we
don't understand why we are getting the current lock splat, and without
understanding it is a fix by avoiding the issue rather than being sure
the actual issue is fixed.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux 5.18-rc4
  2022-06-21 20:27                 ` John Johansen
@ 2022-07-13  9:37                   ` Ammar Faizi
  0 siblings, 0 replies; 11+ messages in thread
From: Ammar Faizi @ 2022-07-13  9:37 UTC (permalink / raw)
  To: John Johansen, Matthew Wilcox
  Cc: Linus Torvalds, Eric W. Biederman, Ammar Faizi, Al Viro,
	James Morris, LSM List, Linux Kernel Mailing List, Kees Cook,
	linux-fsdevel, Linux-MM, GNU/Weeb Mailing List

On 6/22/22 3:27 AM, John Johansen wrote:
>    99cc45e48678 apparmor: Use an IDR to allocate apparmor secids
> 
> but I would like some other peoples opinions on doing that, because we
> don't understand why we are getting the current lock splat, and without
> understanding it is a fix by avoiding the issue rather than being sure
> the actual issue is fixed.

Update from me:

I can't reproduce the splat in 5.19.0-rc4. Not sure if it's already
fixed though, since it happened randomly and I didn't realize what
action provoked the splat. I am running Ubuntu 22.04 for my daily
desktop activity.

I'll keep my lockdep on and will send you update if it's still
triggering.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-07-13  9:37 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAHk-=whmtHMzjaVUF9bS+7vE_rrRctcCTvsAeB8fuLYcyYLN-g@mail.gmail.com>
2022-04-27 17:59 ` Linux 5.18-rc4 Ammar Faizi
2022-04-27 18:31   ` Linus Torvalds
2022-06-06 15:19     ` Eric W. Biederman
2022-06-06 18:28       ` Linus Torvalds
2022-06-06 19:19         ` John Johansen
2022-06-06 19:47           ` Linus Torvalds
2022-06-06 20:23           ` Matthew Wilcox
2022-06-06 21:00             ` John Johansen
2022-06-13 22:48               ` Matthew Wilcox
2022-06-21 20:27                 ` John Johansen
2022-07-13  9:37                   ` Ammar Faizi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox