public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v6 0/2] Two x86 fixes
@ 2022-03-29 10:47 Ammar Faizi
  2022-03-29 10:47 ` [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` Ammar Faizi
  2022-03-29 10:47 ` [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails Ammar Faizi
  0 siblings, 2 replies; 14+ messages in thread
From: Ammar Faizi @ 2022-03-29 10:47 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: Ammar Faizi, Alviro Iskandar Setiawan, Dave Hansen,
	H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam,
	Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel,
	GNU/Weeb Mailing List, x86 Mailing List

Hi,

This is the v6 of two x86 fixes.

1) x86/delay: Fix the wrong Assembly constraint in delay_loop() function.
2) x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails.

## Changelog

v6:
  - Remove unnecessary Cc tags.
  - Undo the stable mark for patch 1.
  - Update commit message, emphasize the danger when the compiler
    decides to inline the function.
  - Fix the Fixes tag sha1 in patch 1.
  - Change the helper function name to __threshold_remove_device().

v5:
  - Mark patch #1 for stable.
  - Commit message improvement for patch #1 and #2.
  - Fold in changes from Yazen and Alviro (for patch #2).

v4:
  - Address comment from Greg, sha1 commit Fixes only needs
    to be 12 chars.
  - Add the author of the fixed commit to the CC list.

v3:
  - Fold in changes from Alviro, the previous version is still
    leaking @bank[n].

v2:
  - Fix wrong copy/paste.

Signed-off-by: Alviro Iskandar Setiawan <[email protected]>
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
Ammar Faizi (2):
  x86/delay: Fix the wrong asm constraint in `delay_loop()`
  x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails

 arch/x86/kernel/cpu/mce/amd.c | 32 +++++++++++++++++++-------------
 arch/x86/lib/delay.c          |  4 ++--
 2 files changed, 21 insertions(+), 15 deletions(-)


base-commit: 1930a6e739c4b4a654a69164dbe39e554d228915
-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()`
  2022-03-29 10:47 [PATCH v6 0/2] Two x86 fixes Ammar Faizi
@ 2022-03-29 10:47 ` Ammar Faizi
  2022-04-01 17:42   ` Dave Hansen
                     ` (2 more replies)
  2022-03-29 10:47 ` [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails Ammar Faizi
  1 sibling, 3 replies; 14+ messages in thread
From: Ammar Faizi @ 2022-03-29 10:47 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: Ammar Faizi, Alviro Iskandar Setiawan, Dave Hansen,
	H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam,
	Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel,
	GNU/Weeb Mailing List, x86 Mailing List, David Laight,
	Jiri Hladky

The asm constraint does not reflect that the asm statement can modify
the value of @loops. But the asm statement in delay_loop() does modify
the @loops.

Specifiying the wrong constraint may lead to undefined behavior, it may
clobber random stuff (e.g. local variable, important temporary value in
regs, etc.). This is especially dangerous when the compiler decides to
inline the function and since it doesn't know that the value gets
modified, it might decide to use it from a register directly without
reloading it.

Fix this by changing the constraint from "a" (as an input) to "+a" (as
an input and output).

Cc: David Laight <[email protected]>
Cc: Jiri Hladky <[email protected]>
Cc: Alviro Iskandar Setiawan <[email protected]>
Fixes: e01b70ef3eb3 ("x86: fix bug in arch/i386/lib/delay.c file, delay_loop function")
Signed-off-by: Ammar Faizi <[email protected]>
---

   v6:
     - Remove unnecessary Cc tags.
     - Update commit message, emphasize the danger when the
       compiler decides to inline the function.
     - Fix the Fixes tag sha1.
     - Remove stable Cc.

---
 arch/x86/lib/delay.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index 65d15df6212d..0e65d00e2339 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -54,8 +54,8 @@ static void delay_loop(u64 __loops)
 		"	jnz 2b		\n"
 		"3:	dec %0		\n"
 
-		: /* we don't need output */
-		:"a" (loops)
+		: "+a" (loops)
+		:
 	);
 }
 
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails
  2022-03-29 10:47 [PATCH v6 0/2] Two x86 fixes Ammar Faizi
  2022-03-29 10:47 ` [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` Ammar Faizi
@ 2022-03-29 10:47 ` Ammar Faizi
  2022-04-03 17:03   ` Thomas Gleixner
  2022-04-05 20:56   ` [tip: ras/core] x86/MCE/AMD: Fix memory leak when threshold_create_bank() fails tip-bot2 for Ammar Faizi
  1 sibling, 2 replies; 14+ messages in thread
From: Ammar Faizi @ 2022-03-29 10:47 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: Ammar Faizi, Alviro Iskandar Setiawan, Dave Hansen,
	H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam,
	Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel,
	GNU/Weeb Mailing List, x86 Mailing List

In mce_threshold_create_device(), if threshold_create_bank() fails, the
@bp will be leaked, because the call to mce_threshold_remove_device()
will not free the @bp. mce_threshold_remove_device() frees
@threshold_banks. At that point, the @bp has not been written to
@threshold_banks, @threshold_banks is NULL, so the call is just a nop.

Fix this by extracting the cleanup part into a new static function
__threshold_remove_device(), then call it from create/remove device
functions.

Also, eliminate the "goto out_err", just early return inside the loop
if the creation fails.

Cc: [email protected] # v5.8+
Fixes: 6458de97fc15 ("x86/mce/amd: Straighten CPU hotplug path")
Co-developed-by: Alviro Iskandar Setiawan <[email protected]>
Signed-off-by: Alviro Iskandar Setiawan <[email protected]>
Co-developed-by: Yazen Ghannam <[email protected]>
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---

   v6:
     - Change the helper function name to __threshold_remove_device().

---
 arch/x86/kernel/cpu/mce/amd.c | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 1940d305db1c..d293ae088d6b 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -1294,10 +1294,23 @@ static void threshold_remove_bank(struct threshold_bank *bank)
 	kfree(bank);
 }
 
+static void __threshold_remove_device(struct threshold_bank **bp,
+				       unsigned int numbanks)
+{
+	unsigned int bank;
+
+	for (bank = 0; bank < numbanks; bank++) {
+		if (bp[bank]) {
+			threshold_remove_bank(bp[bank]);
+			bp[bank] = NULL;
+		}
+	}
+	kfree(bp);
+}
+
 int mce_threshold_remove_device(unsigned int cpu)
 {
 	struct threshold_bank **bp = this_cpu_read(threshold_banks);
-	unsigned int bank, numbanks = this_cpu_read(mce_num_banks);
 
 	if (!bp)
 		return 0;
@@ -1308,13 +1321,7 @@ int mce_threshold_remove_device(unsigned int cpu)
 	 */
 	this_cpu_write(threshold_banks, NULL);
 
-	for (bank = 0; bank < numbanks; bank++) {
-		if (bp[bank]) {
-			threshold_remove_bank(bp[bank]);
-			bp[bank] = NULL;
-		}
-	}
-	kfree(bp);
+	__threshold_remove_device(bp, this_cpu_read(mce_num_banks));
 	return 0;
 }
 
@@ -1351,15 +1358,14 @@ int mce_threshold_create_device(unsigned int cpu)
 		if (!(this_cpu_read(bank_map) & (1 << bank)))
 			continue;
 		err = threshold_create_bank(bp, cpu, bank);
-		if (err)
-			goto out_err;
+		if (err) {
+			__threshold_remove_device(bp, numbanks);
+			return err;
+		}
 	}
 	this_cpu_write(threshold_banks, bp);
 
 	if (thresholding_irq_en)
 		mce_threshold_vector = amd_threshold_interrupt;
 	return 0;
-out_err:
-	mce_threshold_remove_device(cpu);
-	return err;
 }
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()`
  2022-03-29 10:47 ` [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` Ammar Faizi
@ 2022-04-01 17:42   ` Dave Hansen
  2022-04-02  5:15     ` Ammar Faizi
  2022-04-03 16:57   ` Thomas Gleixner
  2022-04-05 20:50   ` [tip: x86/misc] x86/delay: Fix the wrong asm constraint in delay_loop() tip-bot2 for Ammar Faizi
  2 siblings, 1 reply; 14+ messages in thread
From: Dave Hansen @ 2022-04-01 17:42 UTC (permalink / raw)
  To: Ammar Faizi, Borislav Petkov, Thomas Gleixner
  Cc: Alviro Iskandar Setiawan, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Tony Luck, Yazen Ghannam, Linux Edac Mailing List,
	Linux Kernel Mailing List, Stable Kernel, GNU/Weeb Mailing List,
	x86 Mailing List, David Laight, Jiri Hladky

On 3/29/22 03:47, Ammar Faizi wrote:
> The asm constraint does not reflect that the asm statement can modify
> the value of @loops. But the asm statement in delay_loop() does modify
> the @loops.
> 
> Specifiying the wrong constraint may lead to undefined behavior, it may
> clobber random stuff (e.g. local variable, important temporary value in
> regs, etc.). This is especially dangerous when the compiler decides to
> inline the function and since it doesn't know that the value gets
> modified, it might decide to use it from a register directly without
> reloading it.
> 
> Fix this by changing the constraint from "a" (as an input) to "+a" (as
> an input and output).

Was this found by inspection or was it causing real-world problems?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()`
  2022-04-01 17:42   ` Dave Hansen
@ 2022-04-02  5:15     ` Ammar Faizi
  0 siblings, 0 replies; 14+ messages in thread
From: Ammar Faizi @ 2022-04-02  5:15 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov, Thomas Gleixner
  Cc: Alviro Iskandar Setiawan, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Tony Luck, Yazen Ghannam, Linux Edac Mailing List,
	Linux Kernel Mailing List, Stable Kernel, GNU/Weeb Mailing List,
	x86 Mailing List, David Laight, Jiri Hladky

On 4/2/22 12:42 AM, Dave Hansen wrote:
> Was this found by inspection or was it causing real-world problems?

It was found by inspection, no real-world problems found so far.

-- 
Ammar Faizi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()`
  2022-03-29 10:47 ` [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` Ammar Faizi
  2022-04-01 17:42   ` Dave Hansen
@ 2022-04-03 16:57   ` Thomas Gleixner
  2022-04-03 17:11     ` Ammar Faizi
  2022-04-03 17:14     ` Thomas Gleixner
  2022-04-05 20:50   ` [tip: x86/misc] x86/delay: Fix the wrong asm constraint in delay_loop() tip-bot2 for Ammar Faizi
  2 siblings, 2 replies; 14+ messages in thread
From: Thomas Gleixner @ 2022-04-03 16:57 UTC (permalink / raw)
  To: Ammar Faizi, Borislav Petkov
  Cc: Ammar Faizi, Alviro Iskandar Setiawan, Dave Hansen,
	H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam,
	Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel,
	GNU/Weeb Mailing List, x86 Mailing List, David Laight,
	Jiri Hladky

On Tue, Mar 29 2022 at 17:47, Ammar Faizi wrote:
> The asm constraint does not reflect that the asm statement can modify
> the value of @loops. But the asm statement in delay_loop() does modify
> the @loops.
>
> Specifiying the wrong constraint may lead to undefined behavior, it may
> clobber random stuff (e.g. local variable, important temporary value in
> regs, etc.). This is especially dangerous when the compiler decides to
> inline the function and since it doesn't know that the value gets
> modified, it might decide to use it from a register directly without
> reloading it.
>
> Fix this by changing the constraint from "a" (as an input) to "+a" (as
> an input and output).

This analysis is plain wrong. The assembly code operates on a register
and not on memory:

	asm volatile(
		"	test %0,%0	\n"
		"	jz 3f		\n"
		"	jmp 1f		\n"

		".align 16		\n"
		"1:	jmp 2f		\n"

		".align 16		\n"
		"2:	dec %0		\n"
		"	jnz 2b		\n"
		"3:	dec %0		\n"

		: /* we don't need output */
---->		:"a" (loops)

This tells the compiler to use [RE]AX and initialize it from the
variable 'loops'. It's never written back because all '%0' in the above
assembly are substituted with [RE]AX. This also tells the compiler that
the inline assembly clobbers [RE]AX and that's all it needs to know.

Nothing to fix here, whether the code is inlined or not.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails
  2022-03-29 10:47 ` [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails Ammar Faizi
@ 2022-04-03 17:03   ` Thomas Gleixner
  2022-04-03 17:43     ` Ammar Faizi
  2022-04-05 20:56   ` [tip: ras/core] x86/MCE/AMD: Fix memory leak when threshold_create_bank() fails tip-bot2 for Ammar Faizi
  1 sibling, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2022-04-03 17:03 UTC (permalink / raw)
  To: Ammar Faizi, Borislav Petkov
  Cc: Ammar Faizi, Alviro Iskandar Setiawan, Dave Hansen,
	H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam,
	Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel,
	GNU/Weeb Mailing List, x86 Mailing List

On Tue, Mar 29 2022 at 17:47, Ammar Faizi wrote:

> In mce_threshold_create_device(), if threshold_create_bank() fails, the
> @bp will be leaked, because the call to mce_threshold_remove_device()
> will not free the @bp. mce_threshold_remove_device() frees
> @threshold_banks. At that point, the @bp has not been written to
> @threshold_banks, @threshold_banks is NULL, so the call is just a nop.
>
> Fix this by extracting the cleanup part into a new static function
> __threshold_remove_device(), then call it from create/remove device
> functions.

The way simpler fix is to move 

>  	}
>  	this_cpu_write(threshold_banks, bp);

before the loop. That's safe because the banks cannot yet be reached via
an MCE as the vector is not yet enabled:
  
>  	if (thresholding_irq_en)
>  		mce_threshold_vector = amd_threshold_interrupt;

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()`
  2022-04-03 16:57   ` Thomas Gleixner
@ 2022-04-03 17:11     ` Ammar Faizi
  2022-04-03 17:14     ` Thomas Gleixner
  1 sibling, 0 replies; 14+ messages in thread
From: Ammar Faizi @ 2022-04-03 17:11 UTC (permalink / raw)
  To: Thomas Gleixner, Borislav Petkov
  Cc: Alviro Iskandar Setiawan, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Tony Luck, Yazen Ghannam, Linux Edac Mailing List,
	Linux Kernel Mailing List, Stable Kernel, GNU/Weeb Mailing List,
	x86 Mailing List, David Laight, Jiri Hladky

On 4/3/22 11:57 PM, Thomas Gleixner wrote:
> On Tue, Mar 29 2022 at 17:47, Ammar Faizi wrote:
>> The asm constraint does not reflect that the asm statement can modify
>> the value of @loops. But the asm statement in delay_loop() does modify
>> the @loops.
>>
>> Specifiying the wrong constraint may lead to undefined behavior, it may
>> clobber random stuff (e.g. local variable, important temporary value in
>> regs, etc.). This is especially dangerous when the compiler decides to
>> inline the function and since it doesn't know that the value gets
>> modified, it might decide to use it from a register directly without
>> reloading it.
>>
>> Fix this by changing the constraint from "a" (as an input) to "+a" (as
>> an input and output).
> 
> This analysis is plain wrong. The assembly code operates on a register
> and not on memory:



> 	asm volatile(
> 		"	test %0,%0	\n"
> 		"	jz 3f		\n"
> 		"	jmp 1f		\n"
> 
> 		".align 16		\n"
> 		"1:	jmp 2f		\n"
> 
> 		".align 16		\n"
> 		"2:	dec %0		\n"
> 		"	jnz 2b		\n"
> 		"3:	dec %0		\n"
> 
> 		: /* we don't need output */
> ---->		:"a" (loops)
> 
> This tells the compiler to use [RE]AX and initialize it from the
> variable 'loops'. It's never written back because all '%0' in the above
> assembly are substituted with [RE]AX. This also tells the compiler that
> the inline assembly clobbers [RE]AX and that's all it needs to know.

Hi Thomas,

Thanks for taking a look. I doubt about your sentence "This also tells
the compiler that the inline assembly clobbers [RE]AX".

How come it tells the compiler that the inline ASM clobbers [RE]AX?

That's an input constraint. Doesn't that mean it is read-only for the
ASM statement? That means the compiler is allowed to assume [RE]AX doesn't
change inside the ASM statement.

Those `dec`s do really change the [RE]AX. Please review this again.

Thanks!

-- 
Ammar Faizi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()`
  2022-04-03 16:57   ` Thomas Gleixner
  2022-04-03 17:11     ` Ammar Faizi
@ 2022-04-03 17:14     ` Thomas Gleixner
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2022-04-03 17:14 UTC (permalink / raw)
  To: Ammar Faizi, Borislav Petkov
  Cc: Ammar Faizi, Alviro Iskandar Setiawan, Dave Hansen,
	H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam,
	Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel,
	GNU/Weeb Mailing List, x86 Mailing List, David Laight,
	Jiri Hladky

On Sun, Apr 03 2022 at 18:57, Thomas Gleixner wrote:
> On Tue, Mar 29 2022 at 17:47, Ammar Faizi wrote:
>> The asm constraint does not reflect that the asm statement can modify
>> the value of @loops. But the asm statement in delay_loop() does modify
>> the @loops.
>>
>> Specifiying the wrong constraint may lead to undefined behavior, it may
>> clobber random stuff (e.g. local variable, important temporary value in
>> regs, etc.). This is especially dangerous when the compiler decides to
>> inline the function and since it doesn't know that the value gets
>> modified, it might decide to use it from a register directly without
>> reloading it.

Ignore me, I misread this part of the explanation.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails
  2022-04-03 17:03   ` Thomas Gleixner
@ 2022-04-03 17:43     ` Ammar Faizi
  2022-04-03 17:45       ` Ammar Faizi
  0 siblings, 1 reply; 14+ messages in thread
From: Ammar Faizi @ 2022-04-03 17:43 UTC (permalink / raw)
  To: Thomas Gleixner, Borislav Petkov
  Cc: Alviro Iskandar Setiawan, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Tony Luck, Yazen Ghannam, Linux Edac Mailing List,
	Linux Kernel Mailing List, Stable Kernel, GNU/Weeb Mailing List,
	x86 Mailing List

On 4/4/22 12:03 AM, Thomas Gleixner wrote:
> On Tue, Mar 29 2022 at 17:47, Ammar Faizi wrote:
> 
>> In mce_threshold_create_device(), if threshold_create_bank() fails, the
>> @bp will be leaked, because the call to mce_threshold_remove_device()
>> will not free the @bp. mce_threshold_remove_device() frees
>> @threshold_banks. At that point, the @bp has not been written to
>> @threshold_banks, @threshold_banks is NULL, so the call is just a nop.
>>
>> Fix this by extracting the cleanup part into a new static function
>> __threshold_remove_device(), then call it from create/remove device
>> functions.
> 
> The way simpler fix is to move
> 
>>   	}
>>   	this_cpu_write(threshold_banks, bp);
> 
> before the loop. That's safe because the banks cannot yet be reached via
> an MCE as the vector is not yet enabled:
>    
>>   	if (thresholding_irq_en)
>>   		mce_threshold_vector = amd_threshold_interrupt;
Thomas,

I did like what you said (in the patch v4), but after Yazen and Borislav
reviewed it, we got a conclusion that it's not safe.

See [1] and [2] for the full message.

[1]: https://lore.kernel.org/lkml/[email protected]/
[2]: https://lore.kernel.org/lkml/Yh+oyD%2F5M3TW5ZMM@yaz-ubuntu/

Yazen, Borislav, please take a deeper look on this again. I will send
a v7 revision to really make it simpler by moving that "per-CPU var write"
before the loop.

Thanks!

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails
  2022-04-03 17:43     ` Ammar Faizi
@ 2022-04-03 17:45       ` Ammar Faizi
  2022-04-03 18:46         ` Borislav Petkov
  0 siblings, 1 reply; 14+ messages in thread
From: Ammar Faizi @ 2022-04-03 17:45 UTC (permalink / raw)
  To: Thomas Gleixner, Borislav Petkov
  Cc: Alviro Iskandar Setiawan, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Tony Luck, Yazen Ghannam, Linux Edac Mailing List,
	Linux Kernel Mailing List, Stable Kernel, GNU/Weeb Mailing List,
	x86 Mailing List

On 4/4/22 12:43 AM, Ammar Faizi wrote:
> On 4/4/22 12:03 AM, Thomas Gleixner wrote:
>> On Tue, Mar 29 2022 at 17:47, Ammar Faizi wrote:
>>
>>> In mce_threshold_create_device(), if threshold_create_bank() fails, the
>>> @bp will be leaked, because the call to mce_threshold_remove_device()
>>> will not free the @bp. mce_threshold_remove_device() frees
>>> @threshold_banks. At that point, the @bp has not been written to
>>> @threshold_banks, @threshold_banks is NULL, so the call is just a nop.
>>>
>>> Fix this by extracting the cleanup part into a new static function
>>> __threshold_remove_device(), then call it from create/remove device
>>> functions.
>>
>> The way simpler fix is to move
>>
>>>       }
>>>       this_cpu_write(threshold_banks, bp);
>>
>> before the loop. That's safe because the banks cannot yet be reached via
>> an MCE as the vector is not yet enabled:
>>>       if (thresholding_irq_en)
>>>           mce_threshold_vector = amd_threshold_interrupt;
> Thomas,
> 
> I did like what you said (in the patch v4), but after Yazen and Borislav
> reviewed it, we got a conclusion that it's not safe.
> 
> See [1] and [2] for the full message.
> 
> [1]: https://lore.kernel.org/lkml/[email protected]/
> [2]: https://lore.kernel.org/lkml/Yh+oyD%2F5M3TW5ZMM@yaz-ubuntu/
> 
> Yazen, Borislav, please take a deeper look on this again. I will send
> a v7 revision to really make it simpler by moving that "per-CPU var write"
> before the loop.

(only if it's really safe)

-- 
Ammar Faizi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails
  2022-04-03 17:45       ` Ammar Faizi
@ 2022-04-03 18:46         ` Borislav Petkov
  0 siblings, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2022-04-03 18:46 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Thomas Gleixner, Alviro Iskandar Setiawan, Dave Hansen,
	H. Peter Anvin, Ingo Molnar, Tony Luck, Yazen Ghannam,
	Linux Edac Mailing List, Linux Kernel Mailing List, Stable Kernel,
	GNU/Weeb Mailing List, x86 Mailing List

On Mon, Apr 04, 2022 at 12:45:04AM +0700, Ammar Faizi wrote:
> > Yazen, Borislav, please take a deeper look on this again. I will send
> > a v7 revision to really make it simpler by moving that "per-CPU var write"
> > before the loop.
> 
> (only if it's really safe)

Don't bother - I've discussed it with tglx offlist. The current solution
remains.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip: x86/misc] x86/delay: Fix the wrong asm constraint in delay_loop()
  2022-03-29 10:47 ` [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` Ammar Faizi
  2022-04-01 17:42   ` Dave Hansen
  2022-04-03 16:57   ` Thomas Gleixner
@ 2022-04-05 20:50   ` tip-bot2 for Ammar Faizi
  2 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Ammar Faizi @ 2022-04-05 20:50 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ammar Faizi, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/misc branch of tip:

Commit-ID:     b86eb74098a92afd789da02699b4b0dd3f73b889
Gitweb:        https://git.kernel.org/tip/b86eb74098a92afd789da02699b4b0dd3f73b889
Author:        Ammar Faizi <[email protected]>
AuthorDate:    Tue, 29 Mar 2022 17:47:04 +07:00
Committer:     Borislav Petkov <[email protected]>
CommitterDate: Tue, 05 Apr 2022 21:21:57 +02:00

x86/delay: Fix the wrong asm constraint in delay_loop()

The asm constraint does not reflect the fact that the asm statement can
modify the value of the local variable loops. Which it does.

Specifying the wrong constraint may lead to undefined behavior, it may
clobber random stuff (e.g. local variable, important temporary value in
regs, etc.). This is especially dangerous when the compiler decides to
inline the function and since it doesn't know that the value gets
modified, it might decide to use it from a register directly without
reloading it.

Change the constraint to "+a" to denote that the first argument is an
input and an output argument.

  [ bp: Fix typo, massage commit message. ]

Fixes: e01b70ef3eb3 ("x86: fix bug in arch/i386/lib/delay.c file, delay_loop function")
Signed-off-by: Ammar Faizi <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
 arch/x86/lib/delay.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index 65d15df..0e65d00 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -54,8 +54,8 @@ static void delay_loop(u64 __loops)
 		"	jnz 2b		\n"
 		"3:	dec %0		\n"
 
-		: /* we don't need output */
-		:"a" (loops)
+		: "+a" (loops)
+		:
 	);
 }
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip: ras/core] x86/MCE/AMD: Fix memory leak when threshold_create_bank() fails
  2022-03-29 10:47 ` [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails Ammar Faizi
  2022-04-03 17:03   ` Thomas Gleixner
@ 2022-04-05 20:56   ` tip-bot2 for Ammar Faizi
  1 sibling, 0 replies; 14+ messages in thread
From: tip-bot2 for Ammar Faizi @ 2022-04-05 20:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Alviro Iskandar Setiawan, Yazen Ghannam, Ammar Faizi,
	Borislav Petkov, stable, x86, linux-kernel

The following commit has been merged into the ras/core branch of tip:

Commit-ID:     e5f28623ceb103e13fc3d7bd45edf9818b227fd0
Gitweb:        https://git.kernel.org/tip/e5f28623ceb103e13fc3d7bd45edf9818b227fd0
Author:        Ammar Faizi <[email protected]>
AuthorDate:    Tue, 29 Mar 2022 17:47:05 +07:00
Committer:     Borislav Petkov <[email protected]>
CommitterDate: Tue, 05 Apr 2022 21:24:37 +02:00

x86/MCE/AMD: Fix memory leak when threshold_create_bank() fails

In mce_threshold_create_device(), if threshold_create_bank() fails, the
previously allocated threshold banks array @bp will be leaked because
the call to mce_threshold_remove_device() will not free it.

This happens because mce_threshold_remove_device() fetches the pointer
through the threshold_banks per-CPU variable but bp is written there
only after the bank creation is successful, and not before, when
threshold_create_bank() fails.

Add a helper which unwinds all the bank creation work previously done
and pass into it the previously allocated threshold banks array for
freeing.

  [ bp: Massage. ]

Fixes: 6458de97fc15 ("x86/mce/amd: Straighten CPU hotplug path")
Co-developed-by: Alviro Iskandar Setiawan <[email protected]>
Signed-off-by: Alviro Iskandar Setiawan <[email protected]>
Co-developed-by: Yazen Ghannam <[email protected]>
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
 arch/x86/kernel/cpu/mce/amd.c | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 1940d30..1c87501 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -1294,10 +1294,23 @@ out_free:
 	kfree(bank);
 }
 
+static void __threshold_remove_device(struct threshold_bank **bp)
+{
+	unsigned int bank, numbanks = this_cpu_read(mce_num_banks);
+
+	for (bank = 0; bank < numbanks; bank++) {
+		if (!bp[bank])
+			continue;
+
+		threshold_remove_bank(bp[bank]);
+		bp[bank] = NULL;
+	}
+	kfree(bp);
+}
+
 int mce_threshold_remove_device(unsigned int cpu)
 {
 	struct threshold_bank **bp = this_cpu_read(threshold_banks);
-	unsigned int bank, numbanks = this_cpu_read(mce_num_banks);
 
 	if (!bp)
 		return 0;
@@ -1308,13 +1321,7 @@ int mce_threshold_remove_device(unsigned int cpu)
 	 */
 	this_cpu_write(threshold_banks, NULL);
 
-	for (bank = 0; bank < numbanks; bank++) {
-		if (bp[bank]) {
-			threshold_remove_bank(bp[bank]);
-			bp[bank] = NULL;
-		}
-	}
-	kfree(bp);
+	__threshold_remove_device(bp);
 	return 0;
 }
 
@@ -1351,15 +1358,14 @@ int mce_threshold_create_device(unsigned int cpu)
 		if (!(this_cpu_read(bank_map) & (1 << bank)))
 			continue;
 		err = threshold_create_bank(bp, cpu, bank);
-		if (err)
-			goto out_err;
+		if (err) {
+			__threshold_remove_device(bp);
+			return err;
+		}
 	}
 	this_cpu_write(threshold_banks, bp);
 
 	if (thresholding_irq_en)
 		mce_threshold_vector = amd_threshold_interrupt;
 	return 0;
-out_err:
-	mce_threshold_remove_device(cpu);
-	return err;
 }

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-04-05 20:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-03-29 10:47 [PATCH v6 0/2] Two x86 fixes Ammar Faizi
2022-03-29 10:47 ` [PATCH v6 1/2] x86/delay: Fix the wrong asm constraint in `delay_loop()` Ammar Faizi
2022-04-01 17:42   ` Dave Hansen
2022-04-02  5:15     ` Ammar Faizi
2022-04-03 16:57   ` Thomas Gleixner
2022-04-03 17:11     ` Ammar Faizi
2022-04-03 17:14     ` Thomas Gleixner
2022-04-05 20:50   ` [tip: x86/misc] x86/delay: Fix the wrong asm constraint in delay_loop() tip-bot2 for Ammar Faizi
2022-03-29 10:47 ` [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails Ammar Faizi
2022-04-03 17:03   ` Thomas Gleixner
2022-04-03 17:43     ` Ammar Faizi
2022-04-03 17:45       ` Ammar Faizi
2022-04-03 18:46         ` Borislav Petkov
2022-04-05 20:56   ` [tip: ras/core] x86/MCE/AMD: Fix memory leak when threshold_create_bank() fails tip-bot2 for Ammar Faizi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox