public inbox for [email protected]
 help / color / mirror / Atom feed
* [bug report] Kernel panic - not syncing: Fatal hardware error!
@ 2024-03-18  9:50 Changhui Zhong
  2024-03-19  2:20 ` Jens Axboe
  0 siblings, 1 reply; 2+ messages in thread
From: Changhui Zhong @ 2024-03-18  9:50 UTC (permalink / raw)
  To: io-uring

Hello,

found a kernel panic issue after add io_uring parameters to kernel
cmdline and then reboot,
please help check,

repo:https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
branch: master
commit HEAD:f6cef5f8c37f58a3bc95b3754c3ae98e086631ca

grubby --args='io_uring.enable=y' --update-kernel=/boot/vmlinuz-6.8.0+
grubby --args='sysctl.kernel.io_uring_disabled=0'
--update-kernel=/boot/vmlinuz-6.8.0+
reboot

dmesg log:
Rebooting.
[  320.186317] {1}[Hardware Error]: Hardware error from APEI Generic
Hardware Error Source: 5
[  320.186321] {1}[Hardware Error]: event severity: fatal
[  320.186324] {1}[Hardware Error]:  Error 0, type: fatal
[  320.186325] {1}[Hardware Error]:   section_type: PCIe error
[  320.186326] {1}[Hardware Error]:   port_type: 0, PCIe end point
[  320.186327] {1}[Hardware Error]:   version: 3.0
[  320.186328] {1}[Hardware Error]:   command: 0x0002, status: 0x0010
[  320.186330] {1}[Hardware Error]:   device_id: 0000:01:00.1
[  320.186331] {1}[Hardware Error]:   slot: 0
[  320.186331] {1}[Hardware Error]:   secondary_bus: 0x00
[  320.186332] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id: 0x165f
[  320.186333] {1}[Hardware Error]:   class_code: 020000
[  320.186334] {1}[Hardware Error]:   aer_uncor_status: 0x00100000,
aer_uncor_mask: 0x00010000
[  320.186335] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
[  320.186336] {1}[Hardware Error]:   TLP Header: 40000001 0000030f
90028090 00000000
[  320.186339] Kernel panic - not syncing: Fatal hardware error!
[  320.186340] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0+ #1
[  320.186343] Hardware name: Dell Inc. PowerEdge R640/0X45NX, BIOS
2.19.1 06/04/2023
[  320.186343] Call Trace:
[  320.186345]  <NMI>
[  320.186347]  panic+0x32b/0x350
[  320.186355]  __ghes_panic+0x69/0x70
[  320.186360]  ghes_in_nmi_queue_one_entry.constprop.0+0x1d9/0x2b0
[  320.186364]  ghes_notify_nmi+0x59/0xd0
[  320.186367]  nmi_handle+0x5b/0x150
[  320.186373]  default_do_nmi+0x40/0x100
[  320.186379]  exc_nmi+0x100/0x180
[  320.186382]  end_repeat_nmi+0xf/0x53
[  320.186386] RIP: 0010:intel_idle+0x59/0xa0
[  320.186389] Code: d2 48 89 d1 65 48 8b 05 55 41 f3 77 0f 01 c8 48
8b 00 a8 08 75 14 66 90 0f 00 2d ae 1f 43 00 b9 01 00 00 00 48 89 f0
0f 01 c9 <65> 48 8b 05 2f 41 f3 77 f0 80 60 02 df f0 83 44 24 fc 00 48
8b 00
[  320.186391] RSP: 0018:ffffffff88c03e48 EFLAGS: 00000046
[  320.186394] RAX: 0000000000000001 RBX: 0000000000000002 RCX: 0000000000000001
[  320.186395] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff972cafa3ffa0
[  320.186397] RBP: ffff972cafa3ffa0 R08: 0000000000000002 R09: 00000000fffffffd
[  320.186398] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff890bbee0
[  320.186399] R13: ffffffff890bbfc8 R14: 0000000000000002 R15: 0000000000000000
[  320.186401]  ? intel_idle+0x59/0xa0
[  320.186404]  ? intel_idle+0x59/0xa0
[  320.186407]  </NMI>
[  320.186407]  <TASK>
[  320.186408]  cpuidle_enter_state+0x7d/0x410
[  320.186411]  cpuidle_enter+0x29/0x40
[  320.186415]  cpuidle_idle_call+0xf8/0x160
[  320.186421]  do_idle+0x7a/0xe0
[  320.186423]  cpu_startup_entry+0x25/0x30
[  320.186426]  rest_init+0xcc/0xd0
[  320.186429]  start_kernel+0x325/0x400
[  320.186433]  x86_64_start_reservations+0x14/0x30
[  320.186437]  x86_64_start_kernel+0xed/0xf0
[  320.186440]  common_startup_64+0x13e/0x141
[  320.186445]  </TASK>
[  320.194588] Kernel Offset: 0x6400000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)


# lspci -nn -s 01:00.1
01:00.1 Ethernet controller [0200]: Broadcom Inc. and subsidiaries
NetXtreme BCM5720 Gigabit Ethernet PCIe [14e4:165f]

# lspci -vvv -s 01:00.1
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
BCM5720 Gigabit Ethernet PCIe
        DeviceName: NIC4
        Subsystem: Broadcom Inc. and subsidiaries Device 4160
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin B routed to IRQ 17
        NUMA node: 0
        Region 0: Memory at 92900000 (64-bit, prefetchable) [size=64K]
        Region 2: Memory at 92910000 (64-bit, prefetchable) [size=64K]
        Region 4: Memory at 92920000 (64-bit, prefetchable) [size=64K]
        Expansion ROM at 90040000 [disabled] [size=256K]
        Capabilities: [48] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [50] Vital Product Data
                Product Name: Broadcom NetXtreme Gigabit Ethernet
                Read-only fields:
                        [PN] Part number: BCM95720
                        [MN] Manufacture ID: 1028
                        [V0] Vendor specific: FFV22.61.8
                        [V1] Vendor specific: DSV1028VPDR.VER1.0
                        [V2] Vendor specific: NPY2
                        [V3] Vendor specific: PMT1
                        [V4] Vendor specific: NMVBroadcom Corp
                        [V5] Vendor specific: DTINIC
                        [V6] Vendor specific: DCM3001008d454101008d45
                        [RV] Reserved: checksum good, 233 byte(s) reserved
                End
        Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [a0] MSI-X: Enable+ Count=17 Masked-
                Vector table: BAR=4 offset=00000000
                PBA: BAR=4 offset=00001000
        Capabilities: [ac] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
<4us, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+
FLReset+ SlotPowerLimit 25.000W
                DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr+ NoSnoop- FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x2, ASPM not supported
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s (ok), Width x2 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not
Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported,
EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 65ms to 210ms,
TimeoutDis- LTR- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3-
LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt+
UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP+ BadDLLP+ Rollover+ Timeout+
AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+
ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 40000001 0000020f 90028090 00000000
        Capabilities: [13c v1] Device Serial Number 00-00-e4-3d-1a-3c-8b-bb
        Capabilities: [150 v1] Power Budgeting <?>
        Capabilities: [160 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Kernel driver in use: tg3
        Kernel modules: tg3

Thanks,


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [bug report] Kernel panic - not syncing: Fatal hardware error!
  2024-03-18  9:50 [bug report] Kernel panic - not syncing: Fatal hardware error! Changhui Zhong
@ 2024-03-19  2:20 ` Jens Axboe
  0 siblings, 0 replies; 2+ messages in thread
From: Jens Axboe @ 2024-03-19  2:20 UTC (permalink / raw)
  To: Changhui Zhong, io-uring

On 3/18/24 3:50 AM, Changhui Zhong wrote:
> Hello,
> 
> found a kernel panic issue after add io_uring parameters to kernel
> cmdline and then reboot,
> please help check,
> 
> repo:https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> branch: master
> commit HEAD:f6cef5f8c37f58a3bc95b3754c3ae98e086631ca
> 
> grubby --args='io_uring.enable=y' --update-kernel=/boot/vmlinuz-6.8.0+
> grubby --args='sysctl.kernel.io_uring_disabled=0'
> --update-kernel=/boot/vmlinuz-6.8.0+
> reboot

Pretty dubious on that, should have no bearing or impact on that at
all. Does the same sha boot just fine multiple times without the
added parameters?

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-03-19  2:21 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-18  9:50 [bug report] Kernel panic - not syncing: Fatal hardware error! Changhui Zhong
2024-03-19  2:20 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox