* [PATCH v6 0/2] Two x86 fixes @ 2022-03-29 10:47 Ammar Faizi 2022-03-29 10:47 ` [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` Ammar Faizi 2022-03-29 10:47 ` [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails Ammar Faizi 0 siblings, 2 replies; 14+ messages in thread From: Ammar Faizi @ 2022-03-29 10:47 UTC (permalink / raw) To: Borislav Petkov, Thomas Gleixner Cc: Ammar Faizi, Alviro Iskandar Setiawan, Dave Hansen, H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam, Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel, GNU/Weeb Mailing List, x86 Mailing List Hi, This is the v6 of two x86 fixes. 1) x86/delay: Fix the wrong Assembly constraint in delay_loop() function. 2) x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails. ## Changelog v6: - Remove unnecessary Cc tags. - Undo the stable mark for patch 1. - Update commit message, emphasize the danger when the compiler decides to inline the function. - Fix the Fixes tag sha1 in patch 1. - Change the helper function name to __threshold_remove_device(). v5: - Mark patch #1 for stable. - Commit message improvement for patch #1 and #2. - Fold in changes from Yazen and Alviro (for patch #2). v4: - Address comment from Greg, sha1 commit Fixes only needs to be 12 chars. - Add the author of the fixed commit to the CC list. v3: - Fold in changes from Alviro, the previous version is still leaking @bank[n]. v2: - Fix wrong copy/paste. Signed-off-by: Alviro Iskandar Setiawan <[email protected]> Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Ammar Faizi <[email protected]> --- Ammar Faizi (2): x86/delay: Fix the wrong asm constraint in `delay_loop()` x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails arch/x86/kernel/cpu/mce/amd.c | 32 +++++++++++++++++++------------- arch/x86/lib/delay.c | 4 ++-- 2 files changed, 21 insertions(+), 15 deletions(-) base-commit: 1930a6e739c4b4a654a69164dbe39e554d228915 -- Ammar Faizi ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` 2022-03-29 10:47 [PATCH v6 0/2] Two x86 fixes Ammar Faizi @ 2022-03-29 10:47 ` Ammar Faizi 2022-04-01 17:42 ` Dave Hansen ` (2 more replies) 2022-03-29 10:47 ` [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails Ammar Faizi 1 sibling, 3 replies; 14+ messages in thread From: Ammar Faizi @ 2022-03-29 10:47 UTC (permalink / raw) To: Borislav Petkov, Thomas Gleixner Cc: Ammar Faizi, Alviro Iskandar Setiawan, Dave Hansen, H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam, Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel, GNU/Weeb Mailing List, x86 Mailing List, David Laight, Jiri Hladky The asm constraint does not reflect that the asm statement can modify the value of @loops. But the asm statement in delay_loop() does modify the @loops. Specifiying the wrong constraint may lead to undefined behavior, it may clobber random stuff (e.g. local variable, important temporary value in regs, etc.). This is especially dangerous when the compiler decides to inline the function and since it doesn't know that the value gets modified, it might decide to use it from a register directly without reloading it. Fix this by changing the constraint from "a" (as an input) to "+a" (as an input and output). Cc: David Laight <[email protected]> Cc: Jiri Hladky <[email protected]> Cc: Alviro Iskandar Setiawan <[email protected]> Fixes: e01b70ef3eb3 ("x86: fix bug in arch/i386/lib/delay.c file, delay_loop function") Signed-off-by: Ammar Faizi <[email protected]> --- v6: - Remove unnecessary Cc tags. - Update commit message, emphasize the danger when the compiler decides to inline the function. - Fix the Fixes tag sha1. - Remove stable Cc. --- arch/x86/lib/delay.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c index 65d15df6212d..0e65d00e2339 100644 --- a/arch/x86/lib/delay.c +++ b/arch/x86/lib/delay.c @@ -54,8 +54,8 @@ static void delay_loop(u64 __loops) " jnz 2b \n" "3: dec %0 \n" - : /* we don't need output */ - :"a" (loops) + : "+a" (loops) + : ); } -- Ammar Faizi ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` 2022-03-29 10:47 ` [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` Ammar Faizi @ 2022-04-01 17:42 ` Dave Hansen 2022-04-02 5:15 ` Ammar Faizi 2022-04-03 16:57 ` Thomas Gleixner 2022-04-05 20:50 ` [tip: x86/misc] x86/delay: Fix the wrong asm constraint in delay_loop() tip-bot2 for Ammar Faizi 2 siblings, 1 reply; 14+ messages in thread From: Dave Hansen @ 2022-04-01 17:42 UTC (permalink / raw) To: Ammar Faizi, Borislav Petkov, Thomas Gleixner Cc: Alviro Iskandar Setiawan, Dave Hansen, H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam, Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel, GNU/Weeb Mailing List, x86 Mailing List, David Laight, Jiri Hladky On 3/29/22 03:47, Ammar Faizi wrote: > The asm constraint does not reflect that the asm statement can modify > the value of @loops. But the asm statement in delay_loop() does modify > the @loops. > > Specifiying the wrong constraint may lead to undefined behavior, it may > clobber random stuff (e.g. local variable, important temporary value in > regs, etc.). This is especially dangerous when the compiler decides to > inline the function and since it doesn't know that the value gets > modified, it might decide to use it from a register directly without > reloading it. > > Fix this by changing the constraint from "a" (as an input) to "+a" (as > an input and output). Was this found by inspection or was it causing real-world problems? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` 2022-04-01 17:42 ` Dave Hansen @ 2022-04-02 5:15 ` Ammar Faizi 0 siblings, 0 replies; 14+ messages in thread From: Ammar Faizi @ 2022-04-02 5:15 UTC (permalink / raw) To: Dave Hansen, Borislav Petkov, Thomas Gleixner Cc: Alviro Iskandar Setiawan, Dave Hansen, H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam, Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel, GNU/Weeb Mailing List, x86 Mailing List, David Laight, Jiri Hladky On 4/2/22 12:42 AM, Dave Hansen wrote: > Was this found by inspection or was it causing real-world problems? It was found by inspection, no real-world problems found so far. -- Ammar Faizi ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` 2022-03-29 10:47 ` [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` Ammar Faizi 2022-04-01 17:42 ` Dave Hansen @ 2022-04-03 16:57 ` Thomas Gleixner 2022-04-03 17:11 ` Ammar Faizi 2022-04-03 17:14 ` Thomas Gleixner 2022-04-05 20:50 ` [tip: x86/misc] x86/delay: Fix the wrong asm constraint in delay_loop() tip-bot2 for Ammar Faizi 2 siblings, 2 replies; 14+ messages in thread From: Thomas Gleixner @ 2022-04-03 16:57 UTC (permalink / raw) To: Ammar Faizi, Borislav Petkov Cc: Ammar Faizi, Alviro Iskandar Setiawan, Dave Hansen, H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam, Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel, GNU/Weeb Mailing List, x86 Mailing List, David Laight, Jiri Hladky On Tue, Mar 29 2022 at 17:47, Ammar Faizi wrote: > The asm constraint does not reflect that the asm statement can modify > the value of @loops. But the asm statement in delay_loop() does modify > the @loops. > > Specifiying the wrong constraint may lead to undefined behavior, it may > clobber random stuff (e.g. local variable, important temporary value in > regs, etc.). This is especially dangerous when the compiler decides to > inline the function and since it doesn't know that the value gets > modified, it might decide to use it from a register directly without > reloading it. > > Fix this by changing the constraint from "a" (as an input) to "+a" (as > an input and output). This analysis is plain wrong. The assembly code operates on a register and not on memory: asm volatile( " test %0,%0 \n" " jz 3f \n" " jmp 1f \n" ".align 16 \n" "1: jmp 2f \n" ".align 16 \n" "2: dec %0 \n" " jnz 2b \n" "3: dec %0 \n" : /* we don't need output */ ----> :"a" (loops) This tells the compiler to use [RE]AX and initialize it from the variable 'loops'. It's never written back because all '%0' in the above assembly are substituted with [RE]AX. This also tells the compiler that the inline assembly clobbers [RE]AX and that's all it needs to know. Nothing to fix here, whether the code is inlined or not. Thanks, tglx ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` 2022-04-03 16:57 ` Thomas Gleixner @ 2022-04-03 17:11 ` Ammar Faizi 2022-04-03 17:14 ` Thomas Gleixner 1 sibling, 0 replies; 14+ messages in thread From: Ammar Faizi @ 2022-04-03 17:11 UTC (permalink / raw) To: Thomas Gleixner, Borislav Petkov Cc: Alviro Iskandar Setiawan, Dave Hansen, H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam, Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel, GNU/Weeb Mailing List, x86 Mailing List, David Laight, Jiri Hladky On 4/3/22 11:57 PM, Thomas Gleixner wrote: > On Tue, Mar 29 2022 at 17:47, Ammar Faizi wrote: >> The asm constraint does not reflect that the asm statement can modify >> the value of @loops. But the asm statement in delay_loop() does modify >> the @loops. >> >> Specifiying the wrong constraint may lead to undefined behavior, it may >> clobber random stuff (e.g. local variable, important temporary value in >> regs, etc.). This is especially dangerous when the compiler decides to >> inline the function and since it doesn't know that the value gets >> modified, it might decide to use it from a register directly without >> reloading it. >> >> Fix this by changing the constraint from "a" (as an input) to "+a" (as >> an input and output). > > This analysis is plain wrong. The assembly code operates on a register > and not on memory: > asm volatile( > " test %0,%0 \n" > " jz 3f \n" > " jmp 1f \n" > > ".align 16 \n" > "1: jmp 2f \n" > > ".align 16 \n" > "2: dec %0 \n" > " jnz 2b \n" > "3: dec %0 \n" > > : /* we don't need output */ > ----> :"a" (loops) > > This tells the compiler to use [RE]AX and initialize it from the > variable 'loops'. It's never written back because all '%0' in the above > assembly are substituted with [RE]AX. This also tells the compiler that > the inline assembly clobbers [RE]AX and that's all it needs to know. Hi Thomas, Thanks for taking a look. I doubt about your sentence "This also tells the compiler that the inline assembly clobbers [RE]AX". How come it tells the compiler that the inline ASM clobbers [RE]AX? That's an input constraint. Doesn't that mean it is read-only for the ASM statement? That means the compiler is allowed to assume [RE]AX doesn't change inside the ASM statement. Those `dec`s do really change the [RE]AX. Please review this again. Thanks! -- Ammar Faizi ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` 2022-04-03 16:57 ` Thomas Gleixner 2022-04-03 17:11 ` Ammar Faizi @ 2022-04-03 17:14 ` Thomas Gleixner 1 sibling, 0 replies; 14+ messages in thread From: Thomas Gleixner @ 2022-04-03 17:14 UTC (permalink / raw) To: Ammar Faizi, Borislav Petkov Cc: Ammar Faizi, Alviro Iskandar Setiawan, Dave Hansen, H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam, Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel, GNU/Weeb Mailing List, x86 Mailing List, David Laight, Jiri Hladky On Sun, Apr 03 2022 at 18:57, Thomas Gleixner wrote: > On Tue, Mar 29 2022 at 17:47, Ammar Faizi wrote: >> The asm constraint does not reflect that the asm statement can modify >> the value of @loops. But the asm statement in delay_loop() does modify >> the @loops. >> >> Specifiying the wrong constraint may lead to undefined behavior, it may >> clobber random stuff (e.g. local variable, important temporary value in >> regs, etc.). This is especially dangerous when the compiler decides to >> inline the function and since it doesn't know that the value gets >> modified, it might decide to use it from a register directly without >> reloading it. Ignore me, I misread this part of the explanation. Thanks, tglx ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tip: x86/misc] x86/delay: Fix the wrong asm constraint in delay_loop() 2022-03-29 10:47 ` [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` Ammar Faizi 2022-04-01 17:42 ` Dave Hansen 2022-04-03 16:57 ` Thomas Gleixner @ 2022-04-05 20:50 ` tip-bot2 for Ammar Faizi 2 siblings, 0 replies; 14+ messages in thread From: tip-bot2 for Ammar Faizi @ 2022-04-05 20:50 UTC (permalink / raw) To: linux-tip-commits; +Cc: Ammar Faizi, Borislav Petkov, x86, linux-kernel The following commit has been merged into the x86/misc branch of tip: Commit-ID: b86eb74098a92afd789da02699b4b0dd3f73b889 Gitweb: https://git.kernel.org/tip/b86eb74098a92afd789da02699b4b0dd3f73b889 Author: Ammar Faizi <[email protected]> AuthorDate: Tue, 29 Mar 2022 17:47:04 +07:00 Committer: Borislav Petkov <[email protected]> CommitterDate: Tue, 05 Apr 2022 21:21:57 +02:00 x86/delay: Fix the wrong asm constraint in delay_loop() The asm constraint does not reflect the fact that the asm statement can modify the value of the local variable loops. Which it does. Specifying the wrong constraint may lead to undefined behavior, it may clobber random stuff (e.g. local variable, important temporary value in regs, etc.). This is especially dangerous when the compiler decides to inline the function and since it doesn't know that the value gets modified, it might decide to use it from a register directly without reloading it. Change the constraint to "+a" to denote that the first argument is an input and an output argument. [ bp: Fix typo, massage commit message. ] Fixes: e01b70ef3eb3 ("x86: fix bug in arch/i386/lib/delay.c file, delay_loop function") Signed-off-by: Ammar Faizi <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected] --- arch/x86/lib/delay.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c index 65d15df..0e65d00 100644 --- a/arch/x86/lib/delay.c +++ b/arch/x86/lib/delay.c @@ -54,8 +54,8 @@ static void delay_loop(u64 __loops) " jnz 2b \n" "3: dec %0 \n" - : /* we don't need output */ - :"a" (loops) + : "+a" (loops) + : ); } ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails 2022-03-29 10:47 [PATCH v6 0/2] Two x86 fixes Ammar Faizi 2022-03-29 10:47 ` [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` Ammar Faizi @ 2022-03-29 10:47 ` Ammar Faizi 2022-04-03 17:03 ` Thomas Gleixner 2022-04-05 20:56 ` [tip: ras/core] x86/MCE/AMD: Fix memory leak when threshold_create_bank() fails tip-bot2 for Ammar Faizi 1 sibling, 2 replies; 14+ messages in thread From: Ammar Faizi @ 2022-03-29 10:47 UTC (permalink / raw) To: Borislav Petkov, Thomas Gleixner Cc: Ammar Faizi, Alviro Iskandar Setiawan, Dave Hansen, H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam, Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel, GNU/Weeb Mailing List, x86 Mailing List In mce_threshold_create_device(), if threshold_create_bank() fails, the @bp will be leaked, because the call to mce_threshold_remove_device() will not free the @bp. mce_threshold_remove_device() frees @threshold_banks. At that point, the @bp has not been written to @threshold_banks, @threshold_banks is NULL, so the call is just a nop. Fix this by extracting the cleanup part into a new static function __threshold_remove_device(), then call it from create/remove device functions. Also, eliminate the "goto out_err", just early return inside the loop if the creation fails. Cc: [email protected] # v5.8+ Fixes: 6458de97fc15 ("x86/mce/amd: Straighten CPU hotplug path") Co-developed-by: Alviro Iskandar Setiawan <[email protected]> Signed-off-by: Alviro Iskandar Setiawan <[email protected]> Co-developed-by: Yazen Ghannam <[email protected]> Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Ammar Faizi <[email protected]> --- v6: - Change the helper function name to __threshold_remove_device(). --- arch/x86/kernel/cpu/mce/amd.c | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index 1940d305db1c..d293ae088d6b 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -1294,10 +1294,23 @@ static void threshold_remove_bank(struct threshold_bank *bank) kfree(bank); } +static void __threshold_remove_device(struct threshold_bank **bp, + unsigned int numbanks) +{ + unsigned int bank; + + for (bank = 0; bank < numbanks; bank++) { + if (bp[bank]) { + threshold_remove_bank(bp[bank]); + bp[bank] = NULL; + } + } + kfree(bp); +} + int mce_threshold_remove_device(unsigned int cpu) { struct threshold_bank **bp = this_cpu_read(threshold_banks); - unsigned int bank, numbanks = this_cpu_read(mce_num_banks); if (!bp) return 0; @@ -1308,13 +1321,7 @@ int mce_threshold_remove_device(unsigned int cpu) */ this_cpu_write(threshold_banks, NULL); - for (bank = 0; bank < numbanks; bank++) { - if (bp[bank]) { - threshold_remove_bank(bp[bank]); - bp[bank] = NULL; - } - } - kfree(bp); + __threshold_remove_device(bp, this_cpu_read(mce_num_banks)); return 0; } @@ -1351,15 +1358,14 @@ int mce_threshold_create_device(unsigned int cpu) if (!(this_cpu_read(bank_map) & (1 << bank))) continue; err = threshold_create_bank(bp, cpu, bank); - if (err) - goto out_err; + if (err) { + __threshold_remove_device(bp, numbanks); + return err; + } } this_cpu_write(threshold_banks, bp); if (thresholding_irq_en) mce_threshold_vector = amd_threshold_interrupt; return 0; -out_err: - mce_threshold_remove_device(cpu); - return err; } -- Ammar Faizi ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails 2022-03-29 10:47 ` [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails Ammar Faizi @ 2022-04-03 17:03 ` Thomas Gleixner 2022-04-03 17:43 ` Ammar Faizi 2022-04-05 20:56 ` [tip: ras/core] x86/MCE/AMD: Fix memory leak when threshold_create_bank() fails tip-bot2 for Ammar Faizi 1 sibling, 1 reply; 14+ messages in thread From: Thomas Gleixner @ 2022-04-03 17:03 UTC (permalink / raw) To: Ammar Faizi, Borislav Petkov Cc: Ammar Faizi, Alviro Iskandar Setiawan, Dave Hansen, H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam, Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel, GNU/Weeb Mailing List, x86 Mailing List On Tue, Mar 29 2022 at 17:47, Ammar Faizi wrote: > In mce_threshold_create_device(), if threshold_create_bank() fails, the > @bp will be leaked, because the call to mce_threshold_remove_device() > will not free the @bp. mce_threshold_remove_device() frees > @threshold_banks. At that point, the @bp has not been written to > @threshold_banks, @threshold_banks is NULL, so the call is just a nop. > > Fix this by extracting the cleanup part into a new static function > __threshold_remove_device(), then call it from create/remove device > functions. The way simpler fix is to move > } > this_cpu_write(threshold_banks, bp); before the loop. That's safe because the banks cannot yet be reached via an MCE as the vector is not yet enabled: > if (thresholding_irq_en) > mce_threshold_vector = amd_threshold_interrupt; Thanks, tglx ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails 2022-04-03 17:03 ` Thomas Gleixner @ 2022-04-03 17:43 ` Ammar Faizi 2022-04-03 17:45 ` Ammar Faizi 0 siblings, 1 reply; 14+ messages in thread From: Ammar Faizi @ 2022-04-03 17:43 UTC (permalink / raw) To: Thomas Gleixner, Borislav Petkov Cc: Alviro Iskandar Setiawan, Dave Hansen, H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam, Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel, GNU/Weeb Mailing List, x86 Mailing List On 4/4/22 12:03 AM, Thomas Gleixner wrote: > On Tue, Mar 29 2022 at 17:47, Ammar Faizi wrote: > >> In mce_threshold_create_device(), if threshold_create_bank() fails, the >> @bp will be leaked, because the call to mce_threshold_remove_device() >> will not free the @bp. mce_threshold_remove_device() frees >> @threshold_banks. At that point, the @bp has not been written to >> @threshold_banks, @threshold_banks is NULL, so the call is just a nop. >> >> Fix this by extracting the cleanup part into a new static function >> __threshold_remove_device(), then call it from create/remove device >> functions. > > The way simpler fix is to move > >> } >> this_cpu_write(threshold_banks, bp); > > before the loop. That's safe because the banks cannot yet be reached via > an MCE as the vector is not yet enabled: > >> if (thresholding_irq_en) >> mce_threshold_vector = amd_threshold_interrupt; Thomas, I did like what you said (in the patch v4), but after Yazen and Borislav reviewed it, we got a conclusion that it's not safe. See [1] and [2] for the full message. [1]: https://lore.kernel.org/lkml/[email protected]/ [2]: https://lore.kernel.org/lkml/Yh+oyD%2F5M3TW5ZMM@yaz-ubuntu/ Yazen, Borislav, please take a deeper look on this again. I will send a v7 revision to really make it simpler by moving that "per-CPU var write" before the loop. Thanks! -- Ammar Faizi ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails 2022-04-03 17:43 ` Ammar Faizi @ 2022-04-03 17:45 ` Ammar Faizi 2022-04-03 18:46 ` Borislav Petkov 0 siblings, 1 reply; 14+ messages in thread From: Ammar Faizi @ 2022-04-03 17:45 UTC (permalink / raw) To: Thomas Gleixner, Borislav Petkov Cc: Alviro Iskandar Setiawan, Dave Hansen, H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam, Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel, GNU/Weeb Mailing List, x86 Mailing List On 4/4/22 12:43 AM, Ammar Faizi wrote: > On 4/4/22 12:03 AM, Thomas Gleixner wrote: >> On Tue, Mar 29 2022 at 17:47, Ammar Faizi wrote: >> >>> In mce_threshold_create_device(), if threshold_create_bank() fails, the >>> @bp will be leaked, because the call to mce_threshold_remove_device() >>> will not free the @bp. mce_threshold_remove_device() frees >>> @threshold_banks. At that point, the @bp has not been written to >>> @threshold_banks, @threshold_banks is NULL, so the call is just a nop. >>> >>> Fix this by extracting the cleanup part into a new static function >>> __threshold_remove_device(), then call it from create/remove device >>> functions. >> >> The way simpler fix is to move >> >>> } >>> this_cpu_write(threshold_banks, bp); >> >> before the loop. That's safe because the banks cannot yet be reached via >> an MCE as the vector is not yet enabled: >>> if (thresholding_irq_en) >>> mce_threshold_vector = amd_threshold_interrupt; > Thomas, > > I did like what you said (in the patch v4), but after Yazen and Borislav > reviewed it, we got a conclusion that it's not safe. > > See [1] and [2] for the full message. > > [1]: https://lore.kernel.org/lkml/[email protected]/ > [2]: https://lore.kernel.org/lkml/Yh+oyD%2F5M3TW5ZMM@yaz-ubuntu/ > > Yazen, Borislav, please take a deeper look on this again. I will send > a v7 revision to really make it simpler by moving that "per-CPU var write" > before the loop. (only if it's really safe) -- Ammar Faizi ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails 2022-04-03 17:45 ` Ammar Faizi @ 2022-04-03 18:46 ` Borislav Petkov 0 siblings, 0 replies; 14+ messages in thread From: Borislav Petkov @ 2022-04-03 18:46 UTC (permalink / raw) To: Ammar Faizi Cc: Thomas Gleixner, Alviro Iskandar Setiawan, Dave Hansen, H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam, Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel, GNU/Weeb Mailing List, x86 Mailing List On Mon, Apr 04, 2022 at 12:45:04AM +0700, Ammar Faizi wrote: > > Yazen, Borislav, please take a deeper look on this again. I will send > > a v7 revision to really make it simpler by moving that "per-CPU var write" > > before the loop. > > (only if it's really safe) Don't bother - I've discussed it with tglx offlist. The current solution remains. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tip: ras/core] x86/MCE/AMD: Fix memory leak when threshold_create_bank() fails 2022-03-29 10:47 ` [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails Ammar Faizi 2022-04-03 17:03 ` Thomas Gleixner @ 2022-04-05 20:56 ` tip-bot2 for Ammar Faizi 1 sibling, 0 replies; 14+ messages in thread From: tip-bot2 for Ammar Faizi @ 2022-04-05 20:56 UTC (permalink / raw) To: linux-tip-commits Cc: Alviro Iskandar Setiawan, Yazen Ghannam, Ammar Faizi, Borislav Petkov, stable, x86, linux-kernel The following commit has been merged into the ras/core branch of tip: Commit-ID: e5f28623ceb103e13fc3d7bd45edf9818b227fd0 Gitweb: https://git.kernel.org/tip/e5f28623ceb103e13fc3d7bd45edf9818b227fd0 Author: Ammar Faizi <[email protected]> AuthorDate: Tue, 29 Mar 2022 17:47:05 +07:00 Committer: Borislav Petkov <[email protected]> CommitterDate: Tue, 05 Apr 2022 21:24:37 +02:00 x86/MCE/AMD: Fix memory leak when threshold_create_bank() fails In mce_threshold_create_device(), if threshold_create_bank() fails, the previously allocated threshold banks array @bp will be leaked because the call to mce_threshold_remove_device() will not free it. This happens because mce_threshold_remove_device() fetches the pointer through the threshold_banks per-CPU variable but bp is written there only after the bank creation is successful, and not before, when threshold_create_bank() fails. Add a helper which unwinds all the bank creation work previously done and pass into it the previously allocated threshold banks array for freeing. [ bp: Massage. ] Fixes: 6458de97fc15 ("x86/mce/amd: Straighten CPU hotplug path") Co-developed-by: Alviro Iskandar Setiawan <[email protected]> Signed-off-by: Alviro Iskandar Setiawan <[email protected]> Co-developed-by: Yazen Ghannam <[email protected]> Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Ammar Faizi <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected] --- arch/x86/kernel/cpu/mce/amd.c | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index 1940d30..1c87501 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -1294,10 +1294,23 @@ out_free: kfree(bank); } +static void __threshold_remove_device(struct threshold_bank **bp) +{ + unsigned int bank, numbanks = this_cpu_read(mce_num_banks); + + for (bank = 0; bank < numbanks; bank++) { + if (!bp[bank]) + continue; + + threshold_remove_bank(bp[bank]); + bp[bank] = NULL; + } + kfree(bp); +} + int mce_threshold_remove_device(unsigned int cpu) { struct threshold_bank **bp = this_cpu_read(threshold_banks); - unsigned int bank, numbanks = this_cpu_read(mce_num_banks); if (!bp) return 0; @@ -1308,13 +1321,7 @@ int mce_threshold_remove_device(unsigned int cpu) */ this_cpu_write(threshold_banks, NULL); - for (bank = 0; bank < numbanks; bank++) { - if (bp[bank]) { - threshold_remove_bank(bp[bank]); - bp[bank] = NULL; - } - } - kfree(bp); + __threshold_remove_device(bp); return 0; } @@ -1351,15 +1358,14 @@ int mce_threshold_create_device(unsigned int cpu) if (!(this_cpu_read(bank_map) & (1 << bank))) continue; err = threshold_create_bank(bp, cpu, bank); - if (err) - goto out_err; + if (err) { + __threshold_remove_device(bp); + return err; + } } this_cpu_write(threshold_banks, bp); if (thresholding_irq_en) mce_threshold_vector = amd_threshold_interrupt; return 0; -out_err: - mce_threshold_remove_device(cpu); - return err; } ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-04-05 20:56 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-03-29 10:47 [PATCH v6 0/2] Two x86 fixes Ammar Faizi 2022-03-29 10:47 ` [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` Ammar Faizi 2022-04-01 17:42 ` Dave Hansen 2022-04-02 5:15 ` Ammar Faizi 2022-04-03 16:57 ` Thomas Gleixner 2022-04-03 17:11 ` Ammar Faizi 2022-04-03 17:14 ` Thomas Gleixner 2022-04-05 20:50 ` [tip: x86/misc] x86/delay: Fix the wrong asm constraint in delay_loop() tip-bot2 for Ammar Faizi 2022-03-29 10:47 ` [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails Ammar Faizi 2022-04-03 17:03 ` Thomas Gleixner 2022-04-03 17:43 ` Ammar Faizi 2022-04-03 17:45 ` Ammar Faizi 2022-04-03 18:46 ` Borislav Petkov 2022-04-05 20:56 ` [tip: ras/core] x86/MCE/AMD: Fix memory leak when threshold_create_bank() fails tip-bot2 for Ammar Faizi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox