* Re: [GIT PULL v2] PCI changes for v6.17 [not found] ` <175408424863.4088284.13236765550439476565.pr-tracker-bot@kernel.org> @ 2025-08-07 3:34 ` Ammar Faizi 2025-08-07 3:51 ` Lukas Wunner 0 siblings, 1 reply; 18+ messages in thread From: Ammar Faizi @ 2025-08-07 3:34 UTC (permalink / raw) To: Bjorn Helgaas Cc: Linus Torvalds, Linux PCI Mailing List, Linux Kernel Mailing List, Rob Herring, Lorenzo Pieralisi, Manivannan Sadhasivam, Krzysztof Wilczyński, Armando Budianto, Alviro Iskandar Setiawan, gwml On Fri, 01 Aug 2025 21:37:28 +0000, pr-tracker-bot@kernel.org, wrote: > > The pull request you sent on Fri, 1 Aug 2025 09:22:54 -0500: > > > git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git tags/pci-v6.17-changes > > has been merged into torvalds/linux.git: > https://git.kernel.org/torvalds/c/0bd0a41a5120f78685a132834865b0a631b9026a Yesterday, I synced with Linus' tree, but couldn't boot. Crashed with this call trace: https://gist.githubusercontent.com/ammarfaizi2/3ba41f13517be4bae70cde869347d259/raw/0ac09b3e1d90d51c3fed14ca9f837f45d7730f0a/crash.jpg This morning, I synced with Linus' tree again, still the same result. I suspect it's related to pci. I'm still bisecting. I've successfully narrowed it down to this pci pull. 0bd0a41a5120 (refs/bisect/bad) Merge tag 'pci-v6.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci db5f0c3e3e60 ring-buffer: Convert ring_buffer_write() to use guard(preempt_notrace) 12d518961586 tracing: Use __free(kfree) in trace.c to remove gotos debe57fbe12c tracing: Add guard() around locks and mutexes in trace.c 788fa4b47cdc tracing: Add guard(ring_buffer_nest) c89504a703fb tracing: Remove unneeded goto out logic 877d94c74e4c (refs/bisect/good-877d94c74e4c6665d2af55c0154363b43b947e60) Merge tag 'linux-watchdog-6.17-rc1' of git://www.linux-watchdog.org/linux-watchd Now, I am testing: 769ce531faa6 (HEAD) Merge branch 'pci/controller/msi-parent' I'll be back to it later. git bisect says: Bisecting: 65 revisions left to test after this (roughly 6 steps) -- Ammar Faizi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL v2] PCI changes for v6.17 2025-08-07 3:34 ` [GIT PULL v2] PCI changes for v6.17 Ammar Faizi @ 2025-08-07 3:51 ` Lukas Wunner 2025-08-07 4:44 ` Nam Cao 2025-08-07 4:54 ` Ammar Faizi 0 siblings, 2 replies; 18+ messages in thread From: Lukas Wunner @ 2025-08-07 3:51 UTC (permalink / raw) To: Ammar Faizi Cc: Bjorn Helgaas, Linus Torvalds, Linux PCI Mailing List, Linux Kernel Mailing List, Rob Herring, Lorenzo Pieralisi, Manivannan Sadhasivam, Krzysztof Wilczynski, Armando Budianto, Alviro Iskandar Setiawan, gwml, Nam Cao On Thu, Aug 07, 2025 at 10:34:08AM +0700, Ammar Faizi wrote: > On Fri, 01 Aug 2025 21:37:28 +0000, pr-tracker-bot@kernel.org, wrote: > > > > The pull request you sent on Fri, 1 Aug 2025 09:22:54 -0500: > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git tags/pci-v6.17-changes > > > > has been merged into torvalds/linux.git: > > https://git.kernel.org/torvalds/c/0bd0a41a5120f78685a132834865b0a631b9026a > > Yesterday, I synced with Linus' tree, but couldn't boot. Crashed with > this call trace: > > https://gist.githubusercontent.com/ammarfaizi2/3ba41f13517be4bae70cde869347d259/raw/0ac09b3e1d90d51c3fed14ca9f837f45d7730f0a/crash.jpg > > This morning, I synced with Linus' tree again, still the same result. > > I suspect it's related to pci. I'm still bisecting. I've successfully > narrowed it down to this pci pull. > > 0bd0a41a5120 (refs/bisect/bad) Merge tag 'pci-v6.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci > db5f0c3e3e60 ring-buffer: Convert ring_buffer_write() to use guard(preempt_notrace) > 12d518961586 tracing: Use __free(kfree) in trace.c to remove gotos > debe57fbe12c tracing: Add guard() around locks and mutexes in trace.c > 788fa4b47cdc tracing: Add guard(ring_buffer_nest) > c89504a703fb tracing: Remove unneeded goto out logic > 877d94c74e4c (refs/bisect/good-877d94c74e4c6665d2af55c0154363b43b947e60) Merge tag 'linux-watchdog-6.17-rc1' of git://www.linux-watchdog.org/linux-watchd > > Now, I am testing: > > 769ce531faa6 (HEAD) Merge branch 'pci/controller/msi-parent' > > I'll be back to it later. git bisect says: > > Bisecting: 65 revisions left to test after this (roughly 6 steps) Kenneth reports early-stage reboots caused by d7d8ab87e3e ("PCI: vmd: Switch to msi_create_parent_irq_domain()"): https://lore.kernel.org/all/dfa40e48-8840-4e61-9fda-25cdb3ad81c1@panix.com/ Perhaps you're witnessing the same issue? Thanks, Lukas ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL v2] PCI changes for v6.17 2025-08-07 3:51 ` Lukas Wunner @ 2025-08-07 4:44 ` Nam Cao 2025-08-07 4:54 ` Ammar Faizi 1 sibling, 0 replies; 18+ messages in thread From: Nam Cao @ 2025-08-07 4:44 UTC (permalink / raw) To: Lukas Wunner, Ammar Faizi Cc: Bjorn Helgaas, Linus Torvalds, Linux PCI Mailing List, Linux Kernel Mailing List, Rob Herring, Lorenzo Pieralisi, Manivannan Sadhasivam, Krzysztof Wilczynski, Armando Budianto, Alviro Iskandar Setiawan, gwml Lukas Wunner <lukas@wunner.de> writes: > On Thu, Aug 07, 2025 at 10:34:08AM +0700, Ammar Faizi wrote: >> On Fri, 01 Aug 2025 21:37:28 +0000, pr-tracker-bot@kernel.org, wrote: >> > >> > The pull request you sent on Fri, 1 Aug 2025 09:22:54 -0500: >> > >> > > git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git tags/pci-v6.17-changes >> > >> > has been merged into torvalds/linux.git: >> > https://git.kernel.org/torvalds/c/0bd0a41a5120f78685a132834865b0a631b9026a >> >> Yesterday, I synced with Linus' tree, but couldn't boot. Crashed with >> this call trace: >> >> https://gist.githubusercontent.com/ammarfaizi2/3ba41f13517be4bae70cde869347d259/raw/0ac09b3e1d90d51c3fed14ca9f837f45d7730f0a/crash.jpg >> >> This morning, I synced with Linus' tree again, still the same result. >> >> I suspect it's related to pci. I'm still bisecting. I've successfully >> narrowed it down to this pci pull. >> >> 0bd0a41a5120 (refs/bisect/bad) Merge tag 'pci-v6.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci >> db5f0c3e3e60 ring-buffer: Convert ring_buffer_write() to use guard(preempt_notrace) >> 12d518961586 tracing: Use __free(kfree) in trace.c to remove gotos >> debe57fbe12c tracing: Add guard() around locks and mutexes in trace.c >> 788fa4b47cdc tracing: Add guard(ring_buffer_nest) >> c89504a703fb tracing: Remove unneeded goto out logic >> 877d94c74e4c (refs/bisect/good-877d94c74e4c6665d2af55c0154363b43b947e60) Merge tag 'linux-watchdog-6.17-rc1' of git://www.linux-watchdog.org/linux-watchd >> >> Now, I am testing: >> >> 769ce531faa6 (HEAD) Merge branch 'pci/controller/msi-parent' >> >> I'll be back to it later. git bisect says: >> >> Bisecting: 65 revisions left to test after this (roughly 6 steps) > > Kenneth reports early-stage reboots caused by d7d8ab87e3e > ("PCI: vmd: Switch to msi_create_parent_irq_domain()"): > > https://lore.kernel.org/all/dfa40e48-8840-4e61-9fda-25cdb3ad81c1@panix.com/ > > Perhaps you're witnessing the same issue? Thanks for the Cc. The backtrace does look like something that the commit would cause. Let me stare at it. Nam ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL v2] PCI changes for v6.17 2025-08-07 3:51 ` Lukas Wunner 2025-08-07 4:44 ` Nam Cao @ 2025-08-07 4:54 ` Ammar Faizi 2025-08-07 5:03 ` Nam Cao 1 sibling, 1 reply; 18+ messages in thread From: Ammar Faizi @ 2025-08-07 4:54 UTC (permalink / raw) To: Lukas Wunner Cc: Nam Cao, Bjorn Helgaas, Linus Torvalds, Linux PCI Mailing List, Linux Kernel Mailing List, Rob Herring, Lorenzo Pieralisi, Manivannan Sadhasivam, Krzysztof Wilczynski, Armando Budianto, Alviro Iskandar Setiawan, gwml On Thu, Aug 07, 2025 at 05:51:57AM +0200, Lukas Wunner wrote: > Kenneth reports early-stage reboots caused by d7d8ab87e3e > ("PCI: vmd: Switch to msi_create_parent_irq_domain()"): > > https://lore.kernel.org/all/dfa40e48-8840-4e61-9fda-25cdb3ad81c1@panix.com/ > > Perhaps you're witnessing the same issue? Confirmed, reverting that commit works on my machine. I'll try to further diagnose it and report more details. -- Ammar Faizi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL v2] PCI changes for v6.17 2025-08-07 4:54 ` Ammar Faizi @ 2025-08-07 5:03 ` Nam Cao 2025-08-07 5:13 ` Ammar Faizi 0 siblings, 1 reply; 18+ messages in thread From: Nam Cao @ 2025-08-07 5:03 UTC (permalink / raw) To: Ammar Faizi Cc: Lukas Wunner, Bjorn Helgaas, Linus Torvalds, Linux PCI Mailing List, Linux Kernel Mailing List, Rob Herring, Lorenzo Pieralisi, Manivannan Sadhasivam, Krzysztof Wilczynski, Armando Budianto, Alviro Iskandar Setiawan, gwml On Thu, Aug 07, 2025 at 11:54:12AM +0700, Ammar Faizi wrote: > On Thu, Aug 07, 2025 at 05:51:57AM +0200, Lukas Wunner wrote: > > Kenneth reports early-stage reboots caused by d7d8ab87e3e > > ("PCI: vmd: Switch to msi_create_parent_irq_domain()"): > > > > https://lore.kernel.org/all/dfa40e48-8840-4e61-9fda-25cdb3ad81c1@panix.com/ > > > > Perhaps you're witnessing the same issue? > > Confirmed, reverting that commit works on my machine. I'll try to > further diagnose it and report more details. Does the diff below help? diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index 9bbb0ff4cc15..b679c7f28f51 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -280,10 +280,12 @@ static int vmd_msi_alloc(struct irq_domain *domain, unsigned int virq, static void vmd_msi_free(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs) { + struct irq_data *irq_data; struct vmd_irq *vmdirq; for (int i = 0; i < nr_irqs; ++i) { - vmdirq = irq_get_chip_data(virq + i); + irq_data = irq_domain_get_irq_data(domain, virq + i); + vmdirq = irq_data->chip_data; synchronize_srcu(&vmdirq->irq->srcu); ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [GIT PULL v2] PCI changes for v6.17 2025-08-07 5:03 ` Nam Cao @ 2025-08-07 5:13 ` Ammar Faizi 2025-08-08 10:59 ` Ammar Faizi 0 siblings, 1 reply; 18+ messages in thread From: Ammar Faizi @ 2025-08-07 5:13 UTC (permalink / raw) To: Nam Cao Cc: Lukas Wunner, Bjorn Helgaas, Linus Torvalds, Linux PCI Mailing List, Linux Kernel Mailing List, Rob Herring, Lorenzo Pieralisi, Manivannan Sadhasivam, Krzysztof Wilczynski, Armando Budianto, Alviro Iskandar Setiawan, gwml On Thu, Aug 07, 2025 at 07:03:50AM +0200, Nam Cao wrote: > Does the diff below help? > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > index 9bbb0ff4cc15..b679c7f28f51 100644 > --- a/drivers/pci/controller/vmd.c > +++ b/drivers/pci/controller/vmd.c > @@ -280,10 +280,12 @@ static int vmd_msi_alloc(struct irq_domain *domain, unsigned int virq, > static void vmd_msi_free(struct irq_domain *domain, unsigned int virq, > unsigned int nr_irqs) > { > + struct irq_data *irq_data; > struct vmd_irq *vmdirq; > > for (int i = 0; i < nr_irqs; ++i) { > - vmdirq = irq_get_chip_data(virq + i); > + irq_data = irq_domain_get_irq_data(domain, virq + i); > + vmdirq = irq_data->chip_data; > > synchronize_srcu(&vmdirq->irq->srcu); > Yes, it works. Tested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Thank you for the fix! -- Ammar Faizi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL v2] PCI changes for v6.17 2025-08-07 5:13 ` Ammar Faizi @ 2025-08-08 10:59 ` Ammar Faizi 2025-08-08 11:19 ` Ammar Faizi 0 siblings, 1 reply; 18+ messages in thread From: Ammar Faizi @ 2025-08-08 10:59 UTC (permalink / raw) To: Nam Cao Cc: Lukas Wunner, Bjorn Helgaas, Linus Torvalds, Linux PCI Mailing List, Linux Kernel Mailing List, Rob Herring, Lorenzo Pieralisi, Manivannan Sadhasivam, Krzysztof Wilczynski, Armando Budianto, Alviro Iskandar Setiawan, gwml On Thu, Aug 07, 2025 at 12:13:37PM +0700, Ammar Faizi wrote: > On Thu, Aug 07, 2025 at 07:03:50AM +0200, Nam Cao wrote: > > Does the diff below help? > > Yes, it works. So today, I synced with Linus' master branch again: 37816488247d ("Merge tag 'net-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") and applied your fix on top of it. I can boot, but I get this splat. Looking at the call trace, it seems it's still related to pci, but different issue. The call trace is also different from the previous one. Let me know if you have something for me to test. [ 1.017767] pci 10000:e0:1d.0: bridge window [mem 0x82000000-0x820fffff]: assigned [ 1.018519] pci 10000:e0:1d.0: bridge window [io size 0x1000]: can't assign; no space [ 1.019268] pci 10000:e0:1d.0: bridge window [io size 0x1000]: failed to assign [ 1.020026] pci 10000:e0:1d.0: bridge window [io size 0x1000]: can't assign; no space [ 1.020789] pci 10000:e0:1d.0: bridge window [io size 0x1000]: failed to assign [ 1.021539] pci 10000:e1:00.0: BAR 0 [mem 0x82000000-0x82003fff 64bit]: assigned [ 1.022317] pci 10000:e0:1d.0: PCI bridge to [bus e1] [ 1.023091] pci 10000:e0:1d.0: bridge window [mem 0x82000000-0x820fffff] [ 1.023885] pci 10000:e1:00.0: VMD: Default LTR value set by driver [ 1.024654] pci 10000:e1:00.0: can't override BIOS ASPM; OS doesn't have ASPM control [ 1.025442] pcieport 10000:e0:1d.0: can't derive routing for PCI INT A [ 1.026245] pcieport 10000:e0:1d.0: PCI INT A: no GSI [ 1.027058] ------------[ cut here ]------------ [ 1.027849] WARNING: CPU: 0 PID: 166 at drivers/pci/controller/vmd.c:309 vmd_init_dev_msi_info+0x36/0x40 [vmd] [ 1.028649] Modules linked in: hid_generic i2c_hid_acpi i2c_hid drm intel_lpss_pci i2c_i801 i2c_mux intel_lpss idma64 i2c_smbus vmd(+) hid video wmi pinctrl_tigerlake [ 1.029467] CPU: 0 UID: 0 PID: 166 Comm: systemd-udevd Not tainted 6.16.0-afh-home-2025-08-08-g6026508bdb9d #10 PREEMPT(full) fe08b908bb15b9ded6f7769c45f204194ebf7eef [ 1.030301] Hardware name: HP HP Laptop 14s-dq2xxx/87FD, BIOS F.21 03/21/2022 [ 1.031122] RIP: 0010:vmd_init_dev_msi_info+0x36/0x40 [vmd] [ 1.031953] Code: 48 89 cb e8 7c 49 4f e1 84 c0 74 1a 48 8b 53 20 48 c7 42 18 70 18 40 a0 48 8b 53 20 48 c7 42 20 e0 17 40 a0 5b c3 31 c0 5b c3 <0f> 0b 31 c0 c3 0f 1f 44 00 00 0f 1f 44 00 00 41 56 41 55 41 54 49 [ 1.032798] RSP: 0018:ffff888105a47860 EFLAGS: 00010297 [ 1.033642] RAX: ffff8881014d5d98 RBX: ffff8881014d5c00 RCX: ffff8881014d5d98 [ 1.034506] RDX: ffff888120a49400 RSI: ffff888120a49400 RDI: ffff88810132b0c8 [ 1.035359] RBP: 0000000000000001 R08: ffff888100d8de3a R09: 00000000ffffffff [ 1.036206] R10: 0000000000000004 R11: 0000000000000005 R12: ffff8881019b7e40 [ 1.036561] usb 1-2: new full-speed USB device number 3 using xhci_hcd [ 1.037058] R13: ffffffffa020c680 R14: ffff88810132b0c8 R15: ffff888120a49400 [ 1.038724] FS: 00007f25ede3a8c0(0000) GS:ffff8890f1a2d000(0000) knlGS:0000000000000000 [ 1.039559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.040400] CR2: 00007f25ee475ea0 CR3: 000000011dfd1003 CR4: 0000000000770ef0 [ 1.041265] PKRU: 55555554 [ 1.042121] Call Trace: [ 1.042971] <TASK> [ 1.043803] msi_create_device_irq_domain+0x1eb/0x290 [ 1.044645] __pci_enable_msi_range+0x106/0x300 [ 1.045488] pci_alloc_irq_vectors_affinity+0xc5/0x110 [ 1.046336] pcie_portdrv_probe+0x24e/0x610 [ 1.047177] ? kernfs_activate+0x48/0x60 [ 1.048015] local_pci_probe+0x3c/0x80 [ 1.048854] pci_device_probe+0xbc/0x1b0 [ 1.049693] really_probe+0xcd/0x380 [ 1.050529] ? driver_probe_device+0x90/0x90 [ 1.051353] __driver_probe_device+0x78/0x150 [ 1.052182] driver_probe_device+0x1f/0x90 [ 1.053000] __device_attach_driver+0x76/0xf0 [ 1.053812] bus_for_each_drv+0x69/0xa0 [ 1.054633] __device_attach+0xaa/0x1a0 [ 1.055448] pci_bus_add_device+0x4c/0x70 [ 1.056260] pci_bus_add_devices+0x2c/0x70 [ 1.057063] vmd_probe+0x81e/0xa20 [vmd 65bddf00234a3cddd21388091a077f038c9af2be] [ 1.057881] local_pci_probe+0x3c/0x80 [ 1.058682] pci_device_probe+0xbc/0x1b0 [ 1.059489] really_probe+0xcd/0x380 [ 1.060303] ? __device_attach_driver+0xf0/0xf0 [ 1.061112] __driver_probe_device+0x78/0x150 [ 1.061908] driver_probe_device+0x1f/0x90 [ 1.062711] __driver_attach+0xbf/0x1b0 [ 1.063500] bus_for_each_dev+0x64/0xa0 [ 1.064303] bus_add_driver+0x10a/0x230 [ 1.065100] driver_register+0x55/0xf0 [ 1.065892] ? vmd_drv_exit+0x9a0/0x9a0 [vmd 65bddf00234a3cddd21388091a077f038c9af2be] [ 1.066673] do_one_initcall+0x31/0x1e0 [ 1.067460] do_init_module+0x60/0x260 [ 1.068264] init_module_from_file+0x74/0x90 [ 1.069068] idempotent_init_module+0xed/0x2c0 [ 1.069853] __x64_sys_finit_module+0x65/0xd0 [ 1.070630] do_syscall_64+0x56/0x260 [ 1.071394] entry_SYSCALL_64_after_hwframe+0x29/0x31 [ 1.072152] RIP: 0033:0x7f25ee53288d [ 1.072918] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48 [ 1.073698] RSP: 002b:00007fff09cbd5b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 1.074477] RAX: ffffffffffffffda RBX: 000055d9edf36030 RCX: 00007f25ee53288d [ 1.075254] RDX: 0000000000000000 RSI: 00007f25ee6cc441 RDI: 0000000000000005 [ 1.076042] RBP: 0000000000020000 R08: 0000000000000000 R09: 00007fff09cbd6f0 [ 1.076827] R10: 0000000000000005 R11: 0000000000000246 R12: 00007f25ee6cc441 [ 1.077609] R13: 000055d9edf323b0 R14: 000055d9edf36120 R15: 000055d9edf35850 [ 1.078389] </TASK> [ 1.079166] ---[ end trace 0000000000000000 ]--- -- Ammar Faizi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL v2] PCI changes for v6.17 2025-08-08 10:59 ` Ammar Faizi @ 2025-08-08 11:19 ` Ammar Faizi 2025-08-08 13:34 ` Ammar Faizi 2025-08-08 18:07 ` Nam Cao 0 siblings, 2 replies; 18+ messages in thread From: Ammar Faizi @ 2025-08-08 11:19 UTC (permalink / raw) To: Nam Cao Cc: Lukas Wunner, Bjorn Helgaas, Linus Torvalds, Linux PCI Mailing List, Linux Kernel Mailing List, Rob Herring, Lorenzo Pieralisi, Manivannan Sadhasivam, Krzysztof Wilczynski, Armando Budianto, Alviro Iskandar Setiawan, gwml On Fri, Aug 08, 2025 at 05:59:23PM +0700, Ammar Faizi wrote: > On Thu, Aug 07, 2025 at 12:13:37PM +0700, Ammar Faizi wrote: > > On Thu, Aug 07, 2025 at 07:03:50AM +0200, Nam Cao wrote: > > > Does the diff below help? > > > > Yes, it works. > > So today, I synced with Linus' master branch again: > > 37816488247d ("Merge tag 'net-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") > > and applied your fix on top of it. > > I can boot, but I get this splat. Looking at the call trace, it seems > it's still related to pci, but different issue. The call trace is also > different from the previous one. It'll be a bit tricky to bisect this one. Because if I step back to a bad commit post: d7d8ab87e3e ("PCI: vmd: Switch to msi_create_parent_irq_domain()") I won't be able to boot to reach this new splat :/ I guess I need to apply the fix dirty for each bisection step. But I'll also need to make sure the current step has the d7d8ab87e3e commit anchestor before applying. -- Ammar Faizi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL v2] PCI changes for v6.17 2025-08-08 11:19 ` Ammar Faizi @ 2025-08-08 13:34 ` Ammar Faizi 2025-08-08 18:07 ` Nam Cao 1 sibling, 0 replies; 18+ messages in thread From: Ammar Faizi @ 2025-08-08 13:34 UTC (permalink / raw) To: Nam Cao Cc: Lukas Wunner, Bjorn Helgaas, Linus Torvalds, Linux PCI Mailing List, Linux Kernel Mailing List, Rob Herring, Lorenzo Pieralisi, Manivannan Sadhasivam, Krzysztof Wilczynski, Armando Budianto, Alviro Iskandar Setiawan, gwml On Fri, Aug 08, 2025 at 06:19:18PM +0700, Ammar Faizi wrote: > It'll be a bit tricky to bisect this one. Because if I step back to > a bad commit post: > > d7d8ab87e3e ("PCI: vmd: Switch to msi_create_parent_irq_domain()") > > I won't be able to boot to reach this new splat :/ > > I guess I need to apply the fix dirty for each bisection step. But I'll > also need to make sure the current step has the d7d8ab87e3e commit > anchestor before applying. Unfortunately, it turns out I don't have time to bisect it at the moment. But if you have something for me to test, I'll give it a try. I can continue bisecting it later, maybe after rc1 is out if the fix is still pending. For now, I revert the PCI merge entirely in my local tree and it resolves the issue. 0bd0a41a5120 ("Merge tag 'pci-v6.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pc") git revert -m 1 -s 0bd0a41a5120; This is an interesting case for me because it is my first time reverting a merge commit :-) I'll probably need to revert that revert-commit later when the PCI fix actually arrives in mainline before syncing. Happy weekend! -- Ammar Faizi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL v2] PCI changes for v6.17 2025-08-08 11:19 ` Ammar Faizi 2025-08-08 13:34 ` Ammar Faizi @ 2025-08-08 18:07 ` Nam Cao 2025-08-08 22:52 ` Ammar Faizi 1 sibling, 1 reply; 18+ messages in thread From: Nam Cao @ 2025-08-08 18:07 UTC (permalink / raw) To: Ammar Faizi Cc: Lukas Wunner, Bjorn Helgaas, Linus Torvalds, Linux PCI Mailing List, Linux Kernel Mailing List, Rob Herring, Lorenzo Pieralisi, Manivannan Sadhasivam, Krzysztof Wilczynski, Armando Budianto, Alviro Iskandar Setiawan, gwml Ammar Faizi <ammarfaizi2@gnuweeb.org> writes: > On Fri, Aug 08, 2025 at 05:59:23PM +0700, Ammar Faizi wrote: >> On Thu, Aug 07, 2025 at 12:13:37PM +0700, Ammar Faizi wrote: >> > On Thu, Aug 07, 2025 at 07:03:50AM +0200, Nam Cao wrote: >> > > Does the diff below help? >> > >> > Yes, it works. >> >> So today, I synced with Linus' master branch again: >> >> 37816488247d ("Merge tag 'net-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") >> >> and applied your fix on top of it. >> >> I can boot, but I get this splat. Looking at the call trace, it seems >> it's still related to pci, but different issue. The call trace is also >> different from the previous one. > > It'll be a bit tricky to bisect this one. Because if I step back to > a bad commit post: > > d7d8ab87e3e ("PCI: vmd: Switch to msi_create_parent_irq_domain()") > > I won't be able to boot to reach this new splat :/ > > I guess I need to apply the fix dirty for each bisection step. But I'll > also need to make sure the current step has the d7d8ab87e3e commit > anchestor before applying. There is no point in bisecting before that commit, because the WARN_ON() is added by that commit, so you wouldn't see anything before that. The WARN_ONCE() tells us that some devices down the PCI tree are allocating MSI, but VMD supports MSI-X only. From the backtrace: msi_create_device_irq_domain+0x1eb/0x290 __pci_enable_msi_range+0x106/0x300 pci_alloc_irq_vectors_affinity+0xc5/0x110 pcie_portdrv_probe+0x24e/0x610 It seems MSI-X are allocated first, but fail for some reason. Then fallback to MSI, which triggers the WARN_ON(). So we need to figure out why MSI-X allocation fail. I may need to ask you to insert a bunch of printk() to help me pinpoint the problem. But let me stare at it first.. Nam ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL v2] PCI changes for v6.17 2025-08-08 18:07 ` Nam Cao @ 2025-08-08 22:52 ` Ammar Faizi 2025-08-09 4:34 ` Nam Cao 0 siblings, 1 reply; 18+ messages in thread From: Ammar Faizi @ 2025-08-08 22:52 UTC (permalink / raw) To: Nam Cao Cc: Thomas Gleixner, Lukas Wunner, Bjorn Helgaas, Linus Torvalds, Linux PCI Mailing List, Linux Kernel Mailing List, Rob Herring, Lorenzo Pieralisi, Manivannan Sadhasivam, Krzysztof Wilczynski, Armando Budianto, Alviro Iskandar Setiawan, gwml + Adding tglx as he's involved in reviewing that commit. Context: https://lore.kernel.org/linux-pci/aJXYhfc%2F6DfcqfqF@linux.gnuweeb.org/ On Fri, Aug 08, 2025 at 08:07:03PM +0200, Nam Cao wrote: > There is no point in bisecting before that commit, because the WARN_ON() > is added by that commit, so you wouldn't see anything before that. I didn't notice that. Good you pointed that out earlier. > The WARN_ONCE() tells us that some devices down the PCI tree are > allocating MSI, but VMD supports MSI-X only. > > From the backtrace: > msi_create_device_irq_domain+0x1eb/0x290 > __pci_enable_msi_range+0x106/0x300 > pci_alloc_irq_vectors_affinity+0xc5/0x110 > pcie_portdrv_probe+0x24e/0x610 > > It seems MSI-X are allocated first, but fail for some reason. Then > fallback to MSI, which triggers the WARN_ON(). > > So we need to figure out why MSI-X allocation fail. > > I may need to ask you to insert a bunch of printk() to help me pinpoint > the problem. But let me stare at it first.. I can do that. Send me a git diff. I'll test it and back with the dmesg output. -- Ammar Faizi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL v2] PCI changes for v6.17 2025-08-08 22:52 ` Ammar Faizi @ 2025-08-09 4:34 ` Nam Cao 2025-08-09 13:28 ` Ammar Faizi 0 siblings, 1 reply; 18+ messages in thread From: Nam Cao @ 2025-08-09 4:34 UTC (permalink / raw) To: Ammar Faizi Cc: Thomas Gleixner, Lukas Wunner, Bjorn Helgaas, Linus Torvalds, Linux PCI Mailing List, Linux Kernel Mailing List, Rob Herring, Lorenzo Pieralisi, Manivannan Sadhasivam, Krzysztof Wilczynski, Armando Budianto, Alviro Iskandar Setiawan, gwml, namcaov On Sat, Aug 09, 2025 at 07:52:30AM +0900, Ammar Faizi wrote: > I can do that. Send me a git diff. I'll test it and back with the dmesg > output. That would be very helpful, thanks! Please bear with me, this may take a few iterations. Let's first try the below. diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index 9bbb0ff4cc15..ce477c030990 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -304,11 +304,14 @@ static bool vmd_init_dev_msi_info(struct device *dev, struct irq_domain *domain, struct irq_domain *real_parent, struct msi_domain_info *info) { + pr_err("%s: bus_token=%d\n", __func__, (int)info->bus_token); if (WARN_ON_ONCE(info->bus_token != DOMAIN_BUS_PCI_DEVICE_MSIX)) return false; - if (!msi_lib_init_dev_msi_info(dev, domain, real_parent, info)) + if (!msi_lib_init_dev_msi_info(dev, domain, real_parent, info)) { + pr_err("%s:%d err=msi_lib_init\n", __func__, __LINE__); return false; + } info->chip->irq_enable = vmd_pci_msi_enable; info->chip->irq_disable = vmd_pci_msi_disable; diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c index 818d55fbad0d..e5f83036d76f 100644 --- a/drivers/pci/msi/api.c +++ b/drivers/pci/msi/api.c @@ -265,6 +265,7 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, } if (flags & PCI_IRQ_MSIX) { + pci_err(dev, "%s:%d msix\n", __func__, __LINE__); nvecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs, affd, flags); if (nvecs > 0) @@ -272,6 +273,7 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, } if (flags & PCI_IRQ_MSI) { + pci_err(dev, "%s:%d msi\n", __func__, __LINE__); nvecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd); if (nvecs > 0) return nvecs; @@ -279,6 +281,7 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, /* use INTx IRQ if allowed */ if (flags & PCI_IRQ_INTX) { + pci_err(dev, "%s:%d INTx\n", __func__, __LINE__); if (min_vecs == 1 && dev->irq) { /* * Invoke the affinity spreading logic to ensure that diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c index 0938ef7ebabf..6695508e9e53 100644 --- a/drivers/pci/msi/irqdomain.c +++ b/drivers/pci/msi/irqdomain.c @@ -13,8 +13,10 @@ int pci_msi_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) struct irq_domain *domain; domain = dev_get_msi_domain(&dev->dev); - if (domain && irq_domain_is_hierarchy(domain)) + if (domain && irq_domain_is_hierarchy(domain)) { + pci_err(dev, "%s:%d err=\n", __func__, __LINE__); return msi_domain_alloc_irqs_all_locked(&dev->dev, MSI_DEFAULT_DOMAIN, nvec); + } return pci_msi_legacy_setup_msi_irqs(dev, nvec, type); } @@ -355,6 +357,7 @@ bool pci_msi_domain_supports(struct pci_dev *pdev, unsigned int feature_mask, if (!domain || !irq_domain_is_hierarchy(domain)) { if (IS_ENABLED(CONFIG_PCI_MSI_ARCH_FALLBACKS)) return mode == ALLOW_LEGACY; + pci_err(pdev, "%s:%d err\n", __func__, __LINE__); return false; } @@ -364,6 +367,7 @@ bool pci_msi_domain_supports(struct pci_dev *pdev, unsigned int feature_mask, * msi_domain_info::flags is the authoritative source of * information. */ + pci_err(pdev, "%s:%d not msi parent\n", __func__, __LINE__); info = domain->host_data; supported = info->flags; } else { @@ -374,6 +378,7 @@ bool pci_msi_domain_supports(struct pci_dev *pdev, unsigned int feature_mask, * per device domain because the parent is never * expanding the PCI/MSI functionality. */ + pci_err(pdev, "%s:%d is msi parent\n", __func__, __LINE__); supported = domain->msi_parent_ops->supported_flags; } diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c index 34d664139f48..028281d51fd0 100644 --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -31,19 +31,25 @@ static int pci_msi_supported(struct pci_dev *dev, int nvec) struct pci_bus *bus; /* MSI must be globally enabled and supported by the device */ - if (!pci_msi_enable) + if (!pci_msi_enable) { + pci_err(dev, "%s:%d err=pci_msi_enable\n", __func__, __LINE__); return 0; + } - if (!dev || dev->no_msi) + if (!dev || dev->no_msi) { + pci_err(dev, "%s:%d err=no_msi\n", __func__, __LINE__); return 0; + } /* * You can't ask to have 0 or less MSIs configured. * a) it's stupid .. * b) the list manipulation code assumes nvec >= 1. */ - if (nvec < 1) + if (nvec < 1) { + pci_err(dev, "%s:%d err=nvec\n", __func__, __LINE__); return 0; + } /* * Any bridge which does NOT route MSI transactions from its @@ -60,8 +66,10 @@ static int pci_msi_supported(struct pci_dev *dev, int nvec) * at probe time. */ for (bus = dev->bus; bus; bus = bus->parent) - if (bus->bus_flags & PCI_BUS_FLAGS_NO_MSI) + if (bus->bus_flags & PCI_BUS_FLAGS_NO_MSI) { + pci_err(dev, "%s:%d bus_flags\n", __func__, __LINE__); return 0; + } return 1; } @@ -86,8 +94,10 @@ static int pcim_setup_msi_release(struct pci_dev *dev) return 0; ret = devm_add_action(&dev->dev, pcim_msi_release, dev); - if (ret) + if (ret) { + pci_err(dev, "%s:%d err=%d\n", __func__, __LINE__, ret); return ret; + } dev->is_msi_managed = true; return 0; @@ -671,17 +681,23 @@ static int __msix_setup_interrupts(struct pci_dev *__dev, struct msix_entry *ent struct pci_dev *dev __free(free_msi_irqs) = __dev; int ret = msix_setup_msi_descs(dev, entries, nvec, masks); - if (ret) + if (ret) { + pci_err(dev, "%s:%d err=%d\n", __func__, __LINE__, ret); return ret; + } ret = pci_msi_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSIX); - if (ret) + if (ret) { + pci_err(dev, "%s:%d err=%d\n", __func__, __LINE__, ret); return ret; + } /* Check if all MSI entries honor device restrictions */ ret = msi_verify_entries(dev); - if (ret) + if (ret) { + pci_err(dev, "%s:%d err=%d\n", __func__, __LINE__, ret); return ret; + } msix_update_entries(dev, entries); retain_and_null_ptr(dev); @@ -736,8 +752,10 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, } ret = msix_setup_interrupts(dev, entries, nvec, affd); - if (ret) + if (ret) { + pci_err(dev, "%s:%d err=setup irqs\n", __func__, __LINE__); goto out_disable; + } /* Disable INTX */ pci_intx_for_msi(dev, 0); @@ -793,30 +811,40 @@ int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int { int hwsize, rc, nvec = maxvec; - if (maxvec < minvec) + if (maxvec < minvec) { + pci_err(dev, "%s:%d err=%d\n", __func__, __LINE__, -ERANGE); return -ERANGE; + } if (dev->msi_enabled) { - pci_info(dev, "can't enable MSI-X (MSI already enabled)\n"); + pci_err(dev, "can't enable MSI-X (MSI already enabled)\n"); return -EINVAL; } - if (WARN_ON_ONCE(dev->msix_enabled)) + if (WARN_ON_ONCE(dev->msix_enabled)) { + pci_err(dev, "%s:%d err=%d\n", __func__, __LINE__, -EINVAL); return -EINVAL; + } /* Check MSI-X early on irq domain enabled architectures */ - if (!pci_msi_domain_supports(dev, MSI_FLAG_PCI_MSIX, ALLOW_LEGACY)) + if (!pci_msi_domain_supports(dev, MSI_FLAG_PCI_MSIX, ALLOW_LEGACY)) { + pci_err(dev, "%s:%d err=ENOTSUPP\n", __func__, __LINE__); return -ENOTSUPP; + } if (!pci_msi_supported(dev, nvec) || dev->current_state != PCI_D0) return -EINVAL; hwsize = pci_msix_vec_count(dev); - if (hwsize < 0) + if (hwsize < 0) { + pci_err(dev, "%s:%d err=hwsize\n", __func__, __LINE__); return hwsize; + } - if (!pci_msix_validate_entries(dev, entries, nvec)) + if (!pci_msix_validate_entries(dev, entries, nvec)) { + pci_err(dev, "%s:%d err=validate entries\n", __func__, __LINE__); return -EINVAL; + } if (hwsize < nvec) { /* Keep the IRQ virtual hackery working */ @@ -826,21 +854,27 @@ int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int nvec = hwsize; } - if (nvec < minvec) + if (nvec < minvec) { + pci_err(dev, "%s:%d err=ENOSPC\n", __func__, __LINE__); return -ENOSPC; + } rc = pci_setup_msi_context(dev); if (rc) return rc; - if (!pci_setup_msix_device_domain(dev, hwsize)) + if (!pci_setup_msix_device_domain(dev, hwsize)) { + pci_err(dev, "%s:%d err=%d\n", __func__, __LINE__, -ENODEV); return -ENODEV; + } for (;;) { if (affd) { nvec = irq_calc_affinity_vectors(minvec, nvec, affd); - if (nvec < minvec) + if (nvec < minvec) { + pci_err(dev, "%s:%d err=%d\n", __func__, __LINE__, -ENOSPC); return -ENOSPC; + } } rc = msix_capability_init(dev, entries, nvec, affd); diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c index d1b68c18444f..44d1c0cd79fa 100644 --- a/drivers/pci/pcie/portdrv.c +++ b/drivers/pci/pcie/portdrv.c @@ -190,6 +190,7 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask) goto intx_irq; /* Try to use MSI-X or MSI if supported */ + pr_err("%s:%d\n", __func__, __LINE__); if (pcie_port_enable_irq_vec(dev, irqs, mask) == 0) return 0; diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index dc473faadcc8..1771b6a24765 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -229,13 +229,17 @@ static struct irq_domain *__irq_domain_create(const struct irq_domain_info *info if (WARN_ON((info->size && info->direct_max) || (!IS_ENABLED(CONFIG_IRQ_DOMAIN_NOMAP) && info->direct_max) || - (info->direct_max && info->direct_max != info->hwirq_max))) + (info->direct_max && info->direct_max != info->hwirq_max))) { + pr_err("%s:%d\n err", __func__, __LINE__); return ERR_PTR(-EINVAL); + } domain = kzalloc_node(struct_size(domain, revmap, info->size), GFP_KERNEL, of_node_to_nid(to_of_node(info->fwnode))); - if (!domain) + if (!domain) { + pr_err("%s:%d\n err", __func__, __LINE__); return ERR_PTR(-ENOMEM); + } err = irq_domain_set_name(domain, info); if (err) { @@ -328,14 +332,18 @@ static struct irq_domain *__irq_domain_instantiate(const struct irq_domain_info if (info->dgc_info) { err = irq_domain_alloc_generic_chips(domain, info->dgc_info); - if (err) + if (err) { + pr_err("%s:%d\n err=%d", __func__, __LINE__, err); goto err_domain_free; + } } if (info->init) { err = info->init(domain); - if (err) + if (err) { + pr_err("%s:%d\n err=%d", __func__, __LINE__, err); goto err_domain_gc_remove; + } } __irq_domain_publish(domain); diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c index 9b09ad3f9914..b86ed78be575 100644 --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -316,11 +316,14 @@ int msi_setup_device_data(struct device *dev) return 0; md = devres_alloc(msi_device_data_release, sizeof(*md), GFP_KERNEL); - if (!md) + if (!md) { + dev_err(dev, "%s:%d err=ENOMEM\n", __func__, __LINE__); return -ENOMEM; + } ret = msi_sysfs_create_group(dev); if (ret) { + dev_err(dev, "%s:%d err=%d\n", __func__, __LINE__, ret); devres_free(md); return ret; } @@ -1035,16 +1038,22 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid, const struct msi_parent_ops *pops; struct fwnode_handle *fwnode; - if (!irq_domain_is_msi_parent(parent)) + if (!irq_domain_is_msi_parent(parent)) { + dev_err(dev, "%s:%d err=not parent\n", __func__, __LINE__); return false; + } - if (domid >= MSI_MAX_DEVICE_IRQDOMAINS) + if (domid >= MSI_MAX_DEVICE_IRQDOMAINS) { + dev_err(dev, "%s:%d err=max domain\n", __func__, __LINE__); return false; + } struct msi_domain_template *bundle __free(kfree) = kmemdup(template, sizeof(*bundle), GFP_KERNEL); - if (!bundle) + if (!bundle) { + dev_err(dev, "%s:%d err=mem\n", __func__, __LINE__); return false; + } bundle->info.hwsize = hwsize; bundle->info.chip = &bundle->chip; @@ -1077,23 +1086,32 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid, if (!fwnode) return false; - if (msi_setup_device_data(dev)) + if (msi_setup_device_data(dev)) { + dev_err(dev, "%s:%d err=setup\n", __func__, __LINE__); return false; + } guard(msi_descs_lock)(dev); - if (WARN_ON_ONCE(msi_get_device_domain(dev, domid))) + if (WARN_ON_ONCE(msi_get_device_domain(dev, domid))) { + dev_err(dev, "%s:%d err=get domain\n", __func__, __LINE__); return false; + } - if (!pops->init_dev_msi_info(dev, parent, parent, &bundle->info)) + if (!pops->init_dev_msi_info(dev, parent, parent, &bundle->info)) { + dev_err(dev, "%s:%d err=init_msi_info\n", __func__, __LINE__); return false; + } domain = __msi_create_irq_domain(fwnode, &bundle->info, IRQ_DOMAIN_FLAG_MSI_DEVICE, parent); - if (!domain) + if (!domain) { + dev_err(dev, "%s:%d err=create domain\n", __func__, __LINE__); return false; + } dev->msi.data->__domains[domid].domain = domain; if (msi_domain_prepare_irqs(domain, dev, hwsize, &bundle->alloc_info)) { + pr_err("%s:%d\n err=prepare_irqs", __func__, __LINE__); dev->msi.data->__domains[domid].domain = NULL; irq_domain_remove(domain); return false; ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [GIT PULL v2] PCI changes for v6.17 2025-08-09 4:34 ` Nam Cao @ 2025-08-09 13:28 ` Ammar Faizi 2025-08-09 14:49 ` Nam Cao 0 siblings, 1 reply; 18+ messages in thread From: Ammar Faizi @ 2025-08-09 13:28 UTC (permalink / raw) To: Nam Cao Cc: Thomas Gleixner, Lukas Wunner, Bjorn Helgaas, Linus Torvalds, Linux PCI Mailing List, Linux Kernel Mailing List, Rob Herring, Lorenzo Pieralisi, Manivannan Sadhasivam, Krzysztof Wilczynski, Armando Budianto, Alviro Iskandar Setiawan, gwml, namcaov On Sat, Aug 09, 2025 at 06:34:29AM +0200, Nam Cao wrote: > On Sat, Aug 09, 2025 at 07:52:30AM +0900, Ammar Faizi wrote: > > I can do that. Send me a git diff. I'll test it and back with the dmesg > > output. > > That would be very helpful, thanks! > > Please bear with me, this may take a few iterations. > > Let's first try the below. I just got home from a family outing. A bit slow response. Here's the result: https://gist.github.com/ammarfaizi2/ef5f98123ed3868f8d64ed41662edd63#file-dmesg_pci_debug_001-txt-L853 -- Ammar Faizi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL v2] PCI changes for v6.17 2025-08-09 13:28 ` Ammar Faizi @ 2025-08-09 14:49 ` Nam Cao 2025-08-09 15:15 ` Ammar Faizi 0 siblings, 1 reply; 18+ messages in thread From: Nam Cao @ 2025-08-09 14:49 UTC (permalink / raw) To: Ammar Faizi Cc: Thomas Gleixner, Lukas Wunner, Bjorn Helgaas, Linus Torvalds, Linux PCI Mailing List, Linux Kernel Mailing List, Rob Herring, Lorenzo Pieralisi, Manivannan Sadhasivam, Krzysztof Wilczynski, Armando Budianto, Alviro Iskandar Setiawan, gwml, namcaov On Sat, Aug 09, 2025 at 08:28:39PM +0700, Ammar Faizi wrote: > On Sat, Aug 09, 2025 at 06:34:29AM +0200, Nam Cao wrote: > > On Sat, Aug 09, 2025 at 07:52:30AM +0900, Ammar Faizi wrote: > > > I can do that. Send me a git diff. I'll test it and back with the dmesg > > > output. > > > > That would be very helpful, thanks! > > > > Please bear with me, this may take a few iterations. > > > > Let's first try the below. > > I just got home from a family outing. A bit slow response. > > Here's the result: > > https://gist.github.com/ammarfaizi2/ef5f98123ed3868f8d64ed41662edd63#file-dmesg_pci_debug_001-txt-L853 Thanks! Here's the problem: [ 1.037223] pcieport 10000:e0:1d.0: __pci_enable_msix_range:840 err=hwsize The PCIe port driver enables interrupt, trying MSI-X first. However, the device does not support MSI-X, so it tries MSI instead, which triggers the WARN_ON() in VMD driver. What's strange is that, the VMD doc says: "Intel VMD only supports MSIx Interrupts from child devices and therefore the BIOS must enable PCIe Hot Plug and MSIx interrups" Is it lying to us? Can you please try: Revert d5c647b08ee0 ("PCI: vmd: Fix wrong kfree() in vmd_msi_free()") Revert d7d8ab87e3e7 ("PCI: vmd: Switch to msi_create_parent_irq_domain()") So that the driver is back to the original state before I touched it. And apply the diff below. This will tell us if my commit breaks the driver somehow, or VMD has been allowing MSI all this time. diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index d9b893bf4e45..e99d8cefb78d 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -255,9 +255,15 @@ static int vmd_msi_init(struct irq_domain *domain, struct msi_domain_info *info, msi_alloc_info_t *arg) { struct msi_desc *desc = arg->desc; + struct pci_dev *pci_dev = msi_desc_to_pci_dev(desc); struct vmd_dev *vmd = vmd_from_bus(msi_desc_to_pci_dev(desc)->bus); struct vmd_irq *vmdirq = kzalloc(sizeof(*vmdirq), GFP_KERNEL); + if (!pci_msix_vec_count(pci_dev)) + pr_err("But VMD only supports MSIx Interrupts from child devices!\n"); + else + pr_err("MSI-X, looking good...\n"); + dump_stack(); if (!vmdirq) return -ENOMEM; ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [GIT PULL v2] PCI changes for v6.17 2025-08-09 14:49 ` Nam Cao @ 2025-08-09 15:15 ` Ammar Faizi 2025-08-09 15:32 ` Nam Cao 0 siblings, 1 reply; 18+ messages in thread From: Ammar Faizi @ 2025-08-09 15:15 UTC (permalink / raw) To: Nam Cao Cc: Thomas Gleixner, Lukas Wunner, Bjorn Helgaas, Linus Torvalds, Linux PCI Mailing List, Linux Kernel Mailing List, Rob Herring, Lorenzo Pieralisi, Manivannan Sadhasivam, Krzysztof Wilczynski, Armando Budianto, Alviro Iskandar Setiawan, gwml, namcaov On Sat, Aug 09, 2025 at 04:49:27PM +0200, Nam Cao wrote: > Thanks! Here's the problem: > > [ 1.037223] pcieport 10000:e0:1d.0: __pci_enable_msix_range:840 err=hwsize > > The PCIe port driver enables interrupt, trying MSI-X first. However, the > device does not support MSI-X, so it tries MSI instead, which triggers > the WARN_ON() in VMD driver. > > What's strange is that, the VMD doc says: > > "Intel VMD only supports MSIx Interrupts from child devices and > therefore the BIOS must enable PCIe Hot Plug and MSIx interrups" > > Is it lying to us? > > Can you please try: > > Revert d5c647b08ee0 ("PCI: vmd: Fix wrong kfree() in vmd_msi_free()") > Revert d7d8ab87e3e7 ("PCI: vmd: Switch to msi_create_parent_irq_domain()") > > So that the driver is back to the original state before I touched it. > > And apply the diff below. This will tell us if my commit breaks the driver > somehow, or VMD has been allowing MSI all this time. Here's the result after reverting those two commits and applied the diff. https://gist.github.com/ammarfaizi2/03c7a9c0fec2a11f206931f1b7790709#file-dmesg_pci_debug_002-txt Let's see if this one is enough for you to diagnose the problem. Note that the previous printk() diff has to be discarded to avoid conflict in the reverts. If that's still needed, send me a fixed diff after clean reverts. -- Ammar Faizi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL v2] PCI changes for v6.17 2025-08-09 15:15 ` Ammar Faizi @ 2025-08-09 15:32 ` Nam Cao 2025-08-09 16:03 ` Ammar Faizi 0 siblings, 1 reply; 18+ messages in thread From: Nam Cao @ 2025-08-09 15:32 UTC (permalink / raw) To: Ammar Faizi Cc: Thomas Gleixner, Lukas Wunner, Bjorn Helgaas, Linus Torvalds, Linux PCI Mailing List, Linux Kernel Mailing List, Rob Herring, Lorenzo Pieralisi, Manivannan Sadhasivam, Krzysztof Wilczynski, Armando Budianto, Alviro Iskandar Setiawan, gwml, namcaov Ammar Faizi <ammarfaizi2@gnuweeb.org> writes: > Here's the result after reverting those two commits and applied the diff. > > https://gist.github.com/ammarfaizi2/03c7a9c0fec2a11f206931f1b7790709#file-dmesg_pci_debug_002-txt > > Let's see if this one is enough for you to diagnose the problem. Thanks, I think the problem is clear now. The diff I sent you has a mistake, it should be if (pci_msix_vec_count(pci_dev) < 0) not if (!pci_msix_vec_count(pci_dev)) So the log is wrong, it printed "MSI-X, looking good...". It should have printed the other one. But no need to re-run it, the backtrace is enough. MSI-X, looking good... <-------- wrong log CPU: 3 UID: 0 PID: 183 Comm: systemd-udevd Not tainted 6.16.0-afh2-dbg-2025-08-09-gb622ab28bcac #13 PREEMPT(full) 28137b57996795286f6544f071ec852674a057d4 Hardware name: HP HP Laptop 14s-dq2xxx/87FD, BIOS F.21 03/21/2022 Call Trace: <TASK> dump_stack_lvl vmd_msi_init msi_domain_alloc irq_domain_alloc_irqs_locked __irq_domain_alloc_irqs __msi_domain_alloc_irqs msi_domain_alloc_irqs_all_locked __msi_capability_init __pci_enable_msi_range pci_alloc_irq_vectors_affinity pcie_portdrv_probe So unlike what VMD doc says, it actually can have non-MSI-X children devices! Please discard the reverts and the diff I sent you, and try the diff below. I believe your machine will work now. diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index b679c7f28f51..1bd5bf4a6097 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -306,9 +306,6 @@ static bool vmd_init_dev_msi_info(struct device *dev, struct irq_domain *domain, struct irq_domain *real_parent, struct msi_domain_info *info) { - if (WARN_ON_ONCE(info->bus_token != DOMAIN_BUS_PCI_DEVICE_MSIX)) - return false; - if (!msi_lib_init_dev_msi_info(dev, domain, real_parent, info)) return false; ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [GIT PULL v2] PCI changes for v6.17 2025-08-09 15:32 ` Nam Cao @ 2025-08-09 16:03 ` Ammar Faizi 2025-08-09 16:28 ` Nam Cao 0 siblings, 1 reply; 18+ messages in thread From: Ammar Faizi @ 2025-08-09 16:03 UTC (permalink / raw) To: Nam Cao Cc: Thomas Gleixner, Lukas Wunner, Bjorn Helgaas, Linus Torvalds, Linux PCI Mailing List, Linux Kernel Mailing List, Rob Herring, Lorenzo Pieralisi, Manivannan Sadhasivam, Krzysztof Wilczynski, Armando Budianto, Alviro Iskandar Setiawan, gwml, namcaov On Sat, Aug 09, 2025 at 05:32:16PM +0200, Nam Cao wrote: > So unlike what VMD doc says, it actually can have non-MSI-X children devices! If that's the conclusion, then Intel VMD doc also needs fixing :/ > Please discard the reverts and the diff I sent you, and try the diff > below. I believe your machine will work now. Yes, I can confirm it's now clean. Just to verify both sides, here is the last result: https://gist.github.com/ammarfaizi2/72578d2b4cc385fbdb5faee69013d530 If that one fix is final, then: Tested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Thanks for the debugging work. It's probably too late to get the fix in mainline before rc1. But if it can go upstream sooner, that would be great. -- Ammar Faizi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [GIT PULL v2] PCI changes for v6.17 2025-08-09 16:03 ` Ammar Faizi @ 2025-08-09 16:28 ` Nam Cao 0 siblings, 0 replies; 18+ messages in thread From: Nam Cao @ 2025-08-09 16:28 UTC (permalink / raw) To: Ammar Faizi Cc: Thomas Gleixner, Lukas Wunner, Bjorn Helgaas, Linus Torvalds, Linux PCI Mailing List, Linux Kernel Mailing List, Rob Herring, Lorenzo Pieralisi, Manivannan Sadhasivam, Krzysztof Wilczynski, Armando Budianto, Alviro Iskandar Setiawan, gwml, namcaov Ammar Faizi <ammarfaizi2@gnuweeb.org> writes: > On Sat, Aug 09, 2025 at 05:32:16PM +0200, Nam Cao wrote: >> So unlike what VMD doc says, it actually can have non-MSI-X children devices! > > If that's the conclusion, then Intel VMD doc also needs fixing :/ Other possibilities are problem with your BIOS (likely), or problem with Linux's PCI enumeration (unlikely). But without hardware, I cannot investigate this further. Now that we know my commit didn't make the driver any worse, I am done here. >> Please discard the reverts and the diff I sent you, and try the diff >> below. I believe your machine will work now. > > Yes, I can confirm it's now clean. Just to verify both sides, here is > the last result: > > https://gist.github.com/ammarfaizi2/72578d2b4cc385fbdb5faee69013d530 > > If that one fix is final, then: > > Tested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> > > Thanks for the debugging work. Thanks for running tests, it would be impossible to figure out otherwise. > It's probably too late to get the fix in mainline before rc1. But if it > can go upstream sooner, that would be great. I don't think PCI maintainers are available at the moment, so I will send the patch next Monday. Time to enjoy my weekends.. Nam ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-08-09 16:28 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20250801142254.GA3496192@bhelgaas> [not found] ` <175408424863.4088284.13236765550439476565.pr-tracker-bot@kernel.org> 2025-08-07 3:34 ` [GIT PULL v2] PCI changes for v6.17 Ammar Faizi 2025-08-07 3:51 ` Lukas Wunner 2025-08-07 4:44 ` Nam Cao 2025-08-07 4:54 ` Ammar Faizi 2025-08-07 5:03 ` Nam Cao 2025-08-07 5:13 ` Ammar Faizi 2025-08-08 10:59 ` Ammar Faizi 2025-08-08 11:19 ` Ammar Faizi 2025-08-08 13:34 ` Ammar Faizi 2025-08-08 18:07 ` Nam Cao 2025-08-08 22:52 ` Ammar Faizi 2025-08-09 4:34 ` Nam Cao 2025-08-09 13:28 ` Ammar Faizi 2025-08-09 14:49 ` Nam Cao 2025-08-09 15:15 ` Ammar Faizi 2025-08-09 15:32 ` Nam Cao 2025-08-09 16:03 ` Ammar Faizi 2025-08-09 16:28 ` Nam Cao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox