* [bug report] Kernel panic - not syncing: Fatal hardware error!
@ 2024-03-18 9:50 Changhui Zhong
2024-03-19 2:20 ` Jens Axboe
0 siblings, 1 reply; 2+ messages in thread
From: Changhui Zhong @ 2024-03-18 9:50 UTC (permalink / raw)
To: io-uring
Hello,
found a kernel panic issue after add io_uring parameters to kernel
cmdline and then reboot,
please help check,
repo:https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
branch: master
commit HEAD:f6cef5f8c37f58a3bc95b3754c3ae98e086631ca
grubby --args='io_uring.enable=y' --update-kernel=/boot/vmlinuz-6.8.0+
grubby --args='sysctl.kernel.io_uring_disabled=0'
--update-kernel=/boot/vmlinuz-6.8.0+
reboot
dmesg log:
Rebooting.
[ 320.186317] {1}[Hardware Error]: Hardware error from APEI Generic
Hardware Error Source: 5
[ 320.186321] {1}[Hardware Error]: event severity: fatal
[ 320.186324] {1}[Hardware Error]: Error 0, type: fatal
[ 320.186325] {1}[Hardware Error]: section_type: PCIe error
[ 320.186326] {1}[Hardware Error]: port_type: 0, PCIe end point
[ 320.186327] {1}[Hardware Error]: version: 3.0
[ 320.186328] {1}[Hardware Error]: command: 0x0002, status: 0x0010
[ 320.186330] {1}[Hardware Error]: device_id: 0000:01:00.1
[ 320.186331] {1}[Hardware Error]: slot: 0
[ 320.186331] {1}[Hardware Error]: secondary_bus: 0x00
[ 320.186332] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f
[ 320.186333] {1}[Hardware Error]: class_code: 020000
[ 320.186334] {1}[Hardware Error]: aer_uncor_status: 0x00100000,
aer_uncor_mask: 0x00010000
[ 320.186335] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030
[ 320.186336] {1}[Hardware Error]: TLP Header: 40000001 0000030f
90028090 00000000
[ 320.186339] Kernel panic - not syncing: Fatal hardware error!
[ 320.186340] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0+ #1
[ 320.186343] Hardware name: Dell Inc. PowerEdge R640/0X45NX, BIOS
2.19.1 06/04/2023
[ 320.186343] Call Trace:
[ 320.186345] <NMI>
[ 320.186347] panic+0x32b/0x350
[ 320.186355] __ghes_panic+0x69/0x70
[ 320.186360] ghes_in_nmi_queue_one_entry.constprop.0+0x1d9/0x2b0
[ 320.186364] ghes_notify_nmi+0x59/0xd0
[ 320.186367] nmi_handle+0x5b/0x150
[ 320.186373] default_do_nmi+0x40/0x100
[ 320.186379] exc_nmi+0x100/0x180
[ 320.186382] end_repeat_nmi+0xf/0x53
[ 320.186386] RIP: 0010:intel_idle+0x59/0xa0
[ 320.186389] Code: d2 48 89 d1 65 48 8b 05 55 41 f3 77 0f 01 c8 48
8b 00 a8 08 75 14 66 90 0f 00 2d ae 1f 43 00 b9 01 00 00 00 48 89 f0
0f 01 c9 <65> 48 8b 05 2f 41 f3 77 f0 80 60 02 df f0 83 44 24 fc 00 48
8b 00
[ 320.186391] RSP: 0018:ffffffff88c03e48 EFLAGS: 00000046
[ 320.186394] RAX: 0000000000000001 RBX: 0000000000000002 RCX: 0000000000000001
[ 320.186395] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff972cafa3ffa0
[ 320.186397] RBP: ffff972cafa3ffa0 R08: 0000000000000002 R09: 00000000fffffffd
[ 320.186398] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff890bbee0
[ 320.186399] R13: ffffffff890bbfc8 R14: 0000000000000002 R15: 0000000000000000
[ 320.186401] ? intel_idle+0x59/0xa0
[ 320.186404] ? intel_idle+0x59/0xa0
[ 320.186407] </NMI>
[ 320.186407] <TASK>
[ 320.186408] cpuidle_enter_state+0x7d/0x410
[ 320.186411] cpuidle_enter+0x29/0x40
[ 320.186415] cpuidle_idle_call+0xf8/0x160
[ 320.186421] do_idle+0x7a/0xe0
[ 320.186423] cpu_startup_entry+0x25/0x30
[ 320.186426] rest_init+0xcc/0xd0
[ 320.186429] start_kernel+0x325/0x400
[ 320.186433] x86_64_start_reservations+0x14/0x30
[ 320.186437] x86_64_start_kernel+0xed/0xf0
[ 320.186440] common_startup_64+0x13e/0x141
[ 320.186445] </TASK>
[ 320.194588] Kernel Offset: 0x6400000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
# lspci -nn -s 01:00.1
01:00.1 Ethernet controller [0200]: Broadcom Inc. and subsidiaries
NetXtreme BCM5720 Gigabit Ethernet PCIe [14e4:165f]
# lspci -vvv -s 01:00.1
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
BCM5720 Gigabit Ethernet PCIe
DeviceName: NIC4
Subsystem: Broadcom Inc. and subsidiaries Device 4160
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 17
NUMA node: 0
Region 0: Memory at 92900000 (64-bit, prefetchable) [size=64K]
Region 2: Memory at 92910000 (64-bit, prefetchable) [size=64K]
Region 4: Memory at 92920000 (64-bit, prefetchable) [size=64K]
Expansion ROM at 90040000 [disabled] [size=256K]
Capabilities: [48] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data
Product Name: Broadcom NetXtreme Gigabit Ethernet
Read-only fields:
[PN] Part number: BCM95720
[MN] Manufacture ID: 1028
[V0] Vendor specific: FFV22.61.8
[V1] Vendor specific: DSV1028VPDR.VER1.0
[V2] Vendor specific: NPY2
[V3] Vendor specific: PMT1
[V4] Vendor specific: NMVBroadcom Corp
[V5] Vendor specific: DTINIC
[V6] Vendor specific: DCM3001008d454101008d45
[RV] Reserved: checksum good, 233 byte(s) reserved
End
Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [a0] MSI-X: Enable+ Count=17 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00001000
Capabilities: [ac] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
<4us, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+
FLReset+ SlotPowerLimit 25.000W
DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd- ExtTag- PhantFunc- AuxPwr+ NoSnoop- FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
AuxPwr+ TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x2, ASPM not supported
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s (ok), Width x2 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not
Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported,
EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 65ms to 210ms,
TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3-
LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt+
UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr+
CEMsk: RxErr- BadTLP+ BadDLLP+ Rollover+ Timeout+
AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+
ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 40000001 0000020f 90028090 00000000
Capabilities: [13c v1] Device Serial Number 00-00-e4-3d-1a-3c-8b-bb
Capabilities: [150 v1] Power Budgeting <?>
Capabilities: [160 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Kernel driver in use: tg3
Kernel modules: tg3
Thanks,
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [bug report] Kernel panic - not syncing: Fatal hardware error!
2024-03-18 9:50 [bug report] Kernel panic - not syncing: Fatal hardware error! Changhui Zhong
@ 2024-03-19 2:20 ` Jens Axboe
0 siblings, 0 replies; 2+ messages in thread
From: Jens Axboe @ 2024-03-19 2:20 UTC (permalink / raw)
To: Changhui Zhong, io-uring
On 3/18/24 3:50 AM, Changhui Zhong wrote:
> Hello,
>
> found a kernel panic issue after add io_uring parameters to kernel
> cmdline and then reboot,
> please help check,
>
> repo:https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> branch: master
> commit HEAD:f6cef5f8c37f58a3bc95b3754c3ae98e086631ca
>
> grubby --args='io_uring.enable=y' --update-kernel=/boot/vmlinuz-6.8.0+
> grubby --args='sysctl.kernel.io_uring_disabled=0'
> --update-kernel=/boot/vmlinuz-6.8.0+
> reboot
Pretty dubious on that, should have no bearing or impact on that at
all. Does the same sha boot just fine multiple times without the
added parameters?
--
Jens Axboe
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-03-19 2:21 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-18 9:50 [bug report] Kernel panic - not syncing: Fatal hardware error! Changhui Zhong
2024-03-19 2:20 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox