[GIT PULL] io_uring fix for 6.17-rc5

public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed

* [GIT PULL] io_uring fix for 6.17-rc5
@ 2025-09-05 11:18 Jens Axboe
  2025-09-05 17:24 ` Linus Torvalds
  0 siblings, 1 reply; 74+ messages in thread
From: Jens Axboe @ 2025-09-05 11:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: io-uring

Hi Linus,

Just a single fix for an issue with the resource node rewrite that
happened a few releases ago. Please pull!


The following changes since commit 98b6fa62c84f2e129161e976a5b9b3cb4ccd117b:

  io_uring/kbuf: always use READ_ONCE() to read ring provided buffer lengths (2025-08-28 05:48:34 -0600)

are available in the Git repository at:

  git://git.kernel.dk/linux.git tags/io_uring-6.17-20250905

for you to fetch changes up to 0f51a5c0a89921deca72e42583683e44ff742d06:

  io_uring/rsrc: initialize io_rsrc_data nodes array (2025-09-04 19:50:33 -0600)

----------------------------------------------------------------
io_uring-6.17-20250905

----------------------------------------------------------------
Caleb Sander Mateos (1):
      io_uring/rsrc: initialize io_rsrc_data nodes array

 io_uring/rsrc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [GIT PULL] io_uring fix for 6.17-rc5
  2025-09-05 11:18 [GIT PULL] io_uring fix for 6.17-rc5 Jens Axboe
@ 2025-09-05 17:24 ` Linus Torvalds
  2025-09-05 17:45   ` Konstantin Ryabitsev
  2025-09-05 19:04   ` Jens Axboe
  0 siblings, 2 replies; 74+ messages in thread
From: Linus Torvalds @ 2025-09-05 17:24 UTC (permalink / raw)
  To: Jens Axboe, Caleb Sander Mateos; +Cc: io-uring, Konstantin Ryabitsev

On Fri, 5 Sept 2025 at 04:18, Jens Axboe <axboe@kernel.dk> wrote:
>
> Just a single fix for an issue with the resource node rewrite that
> happened a few releases ago. Please pull!

I've pulled this, but the commentary is strange, and the patch makes
no sense to me, so I unpulled it again.

Yes, it changes things from kvmalloc_array() to kvcalloc(). Fine.

And yes, kvcalloc() clearly clears the resulting allocation. Also fine.

But even in the old version, it used __GFP_ZERO.

In fact, afaik the *ONLY* difference between kvcalloc() and
kvmalloc_array() array is that kvcalloc() adds the __GFP_ZERO to the
flags argument:

   #define kvcalloc_node_noprof(_n,_s,_f,_node)  \
      kvmalloc_array_node_noprof(_n,_s,(_f)|__GFP_ZERO,_node)

so afaik, this doesn't actually fix anything at all.

And dammit, this commit has that promising "Link:" argument that I
hoped would explain why this pointless commit exists, but AS ALWAYS
that link only wasted my time by pointing to the same damn information
that was already there.

I was hoping that it would point to some oops report or something that
would explain why my initial reaction was wrong.

Stop this garbage already. Stop adding pointless Link arguments that
waste people's time.

Add the link if it has *ADDITIONAL* information.

Dammit, I really hate those pointless links. I love seeing *useful*
links, but 99% of the links I actually see just point to stupid
useless garbage, and it *ONLY* wastes my time. AGAIN.

So I have not pulled this, I'm annoyed by having to even look at this,
and if you actually expect me to pull this I want a real explanation
and not a useless link.

Yes, I'm grumpy. I feel like my main job - really my only job - is to
try to make sense of pull requests, and that's why I absolutely detest
these things that are automatically added and only make my job harder.

I'm cc'ing Konstantin again, because this is a prime example of why
that automation HURTS, and he was arguing in favor of that sh*t just
last week.

Can we please stop this automated idiocy?

             Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [GIT PULL] io_uring fix for 6.17-rc5
  2025-09-05 17:24 ` Linus Torvalds
@ 2025-09-05 17:45   ` Konstantin Ryabitsev
  2025-09-05 18:06     ` Linus Torvalds
  2025-09-07 22:04     ` [GIT PULL] io_uring fix for 6.17-rc5 Jonathan Corbet
  2025-09-05 19:04   ` Jens Axboe
  1 sibling, 2 replies; 74+ messages in thread
From: Konstantin Ryabitsev @ 2025-09-05 17:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, Caleb Sander Mateos, io-uring

On Fri, Sep 05, 2025 at 10:24:17AM -0700, Linus Torvalds wrote:
> Yes, I'm grumpy. I feel like my main job - really my only job - is to
> try to make sense of pull requests, and that's why I absolutely detest
> these things that are automatically added and only make my job harder.
> 
> I'm cc'ing Konstantin again, because this is a prime example of why
> that automation HURTS, and he was arguing in favor of that sh*t just
> last week.
> 
> Can we please stop this automated idiocy?

FWIW, Link: trailers are not added by default. The maintainer has to
deliberately add the -l switch.

Do you just want this to become a no-op, or will it be better if it's used
only with the patch.msgid.link domain namespace to clearly indicate that it's
just a provenance link?

-K

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [GIT PULL] io_uring fix for 6.17-rc5
  2025-09-05 17:45   ` Konstantin Ryabitsev
@ 2025-09-05 18:06     ` Linus Torvalds
  2025-09-05 19:33       ` Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5) Konstantin Ryabitsev
  2025-09-07 22:04     ` [GIT PULL] io_uring fix for 6.17-rc5 Jonathan Corbet
  1 sibling, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2025-09-05 18:06 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Jens Axboe, Caleb Sander Mateos, io-uring

On Fri, 5 Sept 2025 at 10:45, Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> Do you just want this to become a no-op, or will it be better if it's used
> only with the patch.msgid.link domain namespace to clearly indicate that it's
> just a provenance link?

So I wish it at least had some way to discourage the normal mindless
use - and in a perfect world that there was some more useful model for
adding links automatically.

For example, I feel like for the cover letter of a multi-commit
series, the link to the patch series submission is potentially more
useful - and likely much less annoying - because it would go into the
merge message, not individual commits.

Because if somebody is actively looking at a merge message, they are
probably looking for some bigger picture background - or there's some
merge conflict - and at that point I expect that the initial
submission might be more relevant.

Of course, most people don't necessarily *use* the cover letter for a
merge, and only apply the patches as a series, so it's also less
annoying for the simple reason that it probably wouldn't exist in the
git history at all ;)

Anyway, the "discourage mindless use" might be as simple as a big
warning message that the link may be just adding annoying overhead.

In contrast, a "perfect" model might be to actually have some kind of
automation of "unless there was actual discussion about it".

But I feel such a model might be much too complicated, unless somebody
*wants* to explore using AI because their job description says "Look
for actual useful AI uses". In today's tech world, I assume such job
descriptions do exist. Sigh.

For example, since 'b4' ends up looking through the downstream thread
of a patch anyway in order to add acked-by lines etc, I do think that
in theory there could be some "there was lively discussion about this
particular patch, so a link is actually worth it" heuristic.

In theory.

And honestly, even if the discussion ends up being worthless, I do
suspect I would be a lot *less* annoyed by a link that at least leads
to some _thread_ (and not just the acked-by emails that already got
gathered up), rather than just leading to an email that was applied
and nobody really had any input on.

At least at that point I'd feel like there's something real there.

And yes, as always, I realize that people think that patch submissions
will get more email replies at some hypothetical _later_ date.  But in
practice, that seldom happens, because the downstream testing issues
typically create new threads, not replies to original emails (and if
they *do* react to the original email, we already can look up the
commit easily, and the lookup goes the other way anyway).

           Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [GIT PULL] io_uring fix for 6.17-rc5
  2025-09-05 17:24 ` Linus Torvalds
  2025-09-05 17:45   ` Konstantin Ryabitsev
@ 2025-09-05 19:04   ` Jens Axboe
  2025-09-05 19:07     ` Jens Axboe
                       ` (3 more replies)
  1 sibling, 4 replies; 74+ messages in thread
From: Jens Axboe @ 2025-09-05 19:04 UTC (permalink / raw)
  To: Linus Torvalds, Caleb Sander Mateos; +Cc: io-uring, Konstantin Ryabitsev

On 9/5/25 11:24 AM, Linus Torvalds wrote:
> On Fri, 5 Sept 2025 at 04:18, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> Just a single fix for an issue with the resource node rewrite that
>> happened a few releases ago. Please pull!
> 
> I've pulled this, but the commentary is strange, and the patch makes
> no sense to me, so I unpulled it again.
> 
> Yes, it changes things from kvmalloc_array() to kvcalloc(). Fine.
> 
> And yes, kvcalloc() clearly clears the resulting allocation. Also fine.
> 
> But even in the old version, it used __GFP_ZERO.
> 
> In fact, afaik the *ONLY* difference between kvcalloc() and
> kvmalloc_array() array is that kvcalloc() adds the __GFP_ZERO to the
> flags argument:
> 
>    #define kvcalloc_node_noprof(_n,_s,_f,_node)  \
>       kvmalloc_array_node_noprof(_n,_s,(_f)|__GFP_ZERO,_node)
> 
> so afaik, this doesn't actually fix anything at all.

Agree, I think I was too hasty in queueing that up. I overlooked that we
already had __GFP_ZERO in there. On the road this week and tending to
these kinds of duties in between, my bad. Caleb??

> And dammit, this commit has that promising "Link:" argument that I
> hoped would explain why this pointless commit exists, but AS ALWAYS
> that link only wasted my time by pointing to the same damn information
> that was already there.

[snip long rant on Link: tags]

I just always add these, because discussion might happen after the fact.
For example, someone might run into an issue from an added patch, and
reply to the list. That does happen.

IMHO it's better to have a Link and it _potentially_ being useful than
not to have it and then need to search around for it. Searching is MUCH
worse than the disappointment of a Link that tells you nothing that
isn't in the commit already, and it wastes a lot more time.

And if you're applying a series of patches, then it'll take you to the
cover letter. Which is useful. All without needing to go search on lore.
You could argue that you could turn any applied series into a merge and
add the cover letter there, or link it at least, but lots of things
don't end up in a merge commit before you pull it.

What is the hurt here, really, other than you being disappointed there's
nothing extra in the link?

I, and everybody else, can surely start making judgement calls on when
to add the Link or not. But that seems error prone, and might indeed
miss useful cases because a bug report comes in AFTER the fact.

In any case, if it really bothers you that much, then just make it
policy. Historically I suppose policy has very much been formed by Linus
rants in replies, which then gets picked up by LWN and others and then
it becomes part of "Linux kernel lore" of this is what Linus expects.
But I bet you that LWN would pick up a Linus email on the topic that
isn't a reply, which said that you've observed Link: tag being used
frivilously and why you find that annoying. And THAT would save you a
lot more time rather than need to rant about it multiple times.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [GIT PULL] io_uring fix for 6.17-rc5
  2025-09-05 19:04   ` Jens Axboe
@ 2025-09-05 19:07     ` Jens Axboe
  2025-09-05 19:13     ` Caleb Sander Mateos
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 74+ messages in thread
From: Jens Axboe @ 2025-09-05 19:07 UTC (permalink / raw)
  To: Linus Torvalds, Caleb Sander Mateos; +Cc: io-uring, Konstantin Ryabitsev

On 9/5/25 1:04 PM, Jens Axboe wrote:
> On 9/5/25 11:24 AM, Linus Torvalds wrote:
>> On Fri, 5 Sept 2025 at 04:18, Jens Axboe <axboe@kernel.dk> wrote:
>>>
>>> Just a single fix for an issue with the resource node rewrite that
>>> happened a few releases ago. Please pull!
>>
>> I've pulled this, but the commentary is strange, and the patch makes
>> no sense to me, so I unpulled it again.
>>
>> Yes, it changes things from kvmalloc_array() to kvcalloc(). Fine.
>>
>> And yes, kvcalloc() clearly clears the resulting allocation. Also fine.
>>
>> But even in the old version, it used __GFP_ZERO.
>>
>> In fact, afaik the *ONLY* difference between kvcalloc() and
>> kvmalloc_array() array is that kvcalloc() adds the __GFP_ZERO to the
>> flags argument:
>>
>>    #define kvcalloc_node_noprof(_n,_s,_f,_node)  \
>>       kvmalloc_array_node_noprof(_n,_s,(_f)|__GFP_ZERO,_node)
>>
>> so afaik, this doesn't actually fix anything at all.
> 
> Agree, I think I was too hasty in queueing that up. I overlooked that we
> already had __GFP_ZERO in there. On the road this week and tending to
> these kinds of duties in between, my bad. Caleb??
> 
>> And dammit, this commit has that promising "Link:" argument that I
>> hoped would explain why this pointless commit exists, but AS ALWAYS
>> that link only wasted my time by pointing to the same damn information
>> that was already there.
> 
> [snip long rant on Link: tags]
> 
> I just always add these, because discussion might happen after the fact.
> For example, someone might run into an issue from an added patch, and
> reply to the list. That does happen.
> 
> IMHO it's better to have a Link and it _potentially_ being useful than
> not to have it and then need to search around for it. Searching is MUCH
> worse than the disappointment of a Link that tells you nothing that
> isn't in the commit already, and it wastes a lot more time.
> 
> And if you're applying a series of patches, then it'll take you to the
> cover letter. Which is useful. All without needing to go search on lore.
> You could argue that you could turn any applied series into a merge and
> add the cover letter there, or link it at least, but lots of things
> don't end up in a merge commit before you pull it.
> 
> What is the hurt here, really, other than you being disappointed there's
> nothing extra in the link?
> 
> I, and everybody else, can surely start making judgement calls on when
> to add the Link or not. But that seems error prone, and might indeed
> miss useful cases because a bug report comes in AFTER the fact.
> 
> In any case, if it really bothers you that much, then just make it
> policy. Historically I suppose policy has very much been formed by Linus
> rants in replies, which then gets picked up by LWN and others and then
> it becomes part of "Linux kernel lore" of this is what Linus expects.
> But I bet you that LWN would pick up a Linus email on the topic that
> isn't a reply, which said that you've observed Link: tag being used
> frivilously and why you find that annoying. And THAT would save you a
> lot more time rather than need to rant about it multiple times.

Oh, and I totally forgot the relevant tag this time:

Link: https://media.tenor.com/74lPb8mSRQMAAAAM/abe-simpson-abe-simpson-cloud.gif

;-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [GIT PULL] io_uring fix for 6.17-rc5
  2025-09-05 19:04   ` Jens Axboe
  2025-09-05 19:07     ` Jens Axboe
@ 2025-09-05 19:13     ` Caleb Sander Mateos
  2025-09-05 19:16       ` Jens Axboe
  2025-09-05 19:15     ` Linus Torvalds
  2025-09-05 19:21     ` Linus Torvalds
  3 siblings, 1 reply; 74+ messages in thread
From: Caleb Sander Mateos @ 2025-09-05 19:13 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linus Torvalds, io-uring, Konstantin Ryabitsev

On Fri, Sep 5, 2025 at 12:04 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 9/5/25 11:24 AM, Linus Torvalds wrote:
> > On Fri, 5 Sept 2025 at 04:18, Jens Axboe <axboe@kernel.dk> wrote:
> >>
> >> Just a single fix for an issue with the resource node rewrite that
> >> happened a few releases ago. Please pull!
> >
> > I've pulled this, but the commentary is strange, and the patch makes
> > no sense to me, so I unpulled it again.
> >
> > Yes, it changes things from kvmalloc_array() to kvcalloc(). Fine.
> >
> > And yes, kvcalloc() clearly clears the resulting allocation. Also fine.
> >
> > But even in the old version, it used __GFP_ZERO.
> >
> > In fact, afaik the *ONLY* difference between kvcalloc() and
> > kvmalloc_array() array is that kvcalloc() adds the __GFP_ZERO to the
> > flags argument:
> >
> >    #define kvcalloc_node_noprof(_n,_s,_f,_node)  \
> >       kvmalloc_array_node_noprof(_n,_s,(_f)|__GFP_ZERO,_node)
> >
> > so afaik, this doesn't actually fix anything at all.
>
> Agree, I think I was too hasty in queueing that up. I overlooked that we
> already had __GFP_ZERO in there. On the road this week and tending to
> these kinds of duties in between, my bad. Caleb??

Sorry, this is my fault. I misread the code, the __GFP_ZERO does
ensure the correct behavior. kvcalloc() might more clearly indicate
the intent, but there's no bug. Apologies for the hasty patch, and
agree it can be dropped.

Best,
Caleb


>
> > And dammit, this commit has that promising "Link:" argument that I
> > hoped would explain why this pointless commit exists, but AS ALWAYS
> > that link only wasted my time by pointing to the same damn information
> > that was already there.
>
> [snip long rant on Link: tags]
>
> I just always add these, because discussion might happen after the fact.
> For example, someone might run into an issue from an added patch, and
> reply to the list. That does happen.
>
> IMHO it's better to have a Link and it _potentially_ being useful than
> not to have it and then need to search around for it. Searching is MUCH
> worse than the disappointment of a Link that tells you nothing that
> isn't in the commit already, and it wastes a lot more time.
>
> And if you're applying a series of patches, then it'll take you to the
> cover letter. Which is useful. All without needing to go search on lore.
> You could argue that you could turn any applied series into a merge and
> add the cover letter there, or link it at least, but lots of things
> don't end up in a merge commit before you pull it.
>
> What is the hurt here, really, other than you being disappointed there's
> nothing extra in the link?
>
> I, and everybody else, can surely start making judgement calls on when
> to add the Link or not. But that seems error prone, and might indeed
> miss useful cases because a bug report comes in AFTER the fact.
>
> In any case, if it really bothers you that much, then just make it
> policy. Historically I suppose policy has very much been formed by Linus
> rants in replies, which then gets picked up by LWN and others and then
> it becomes part of "Linux kernel lore" of this is what Linus expects.
> But I bet you that LWN would pick up a Linus email on the topic that
> isn't a reply, which said that you've observed Link: tag being used
> frivilously and why you find that annoying. And THAT would save you a
> lot more time rather than need to rant about it multiple times.
>
> --
> Jens Axboe

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [GIT PULL] io_uring fix for 6.17-rc5
  2025-09-05 19:04   ` Jens Axboe
  2025-09-05 19:07     ` Jens Axboe
  2025-09-05 19:13     ` Caleb Sander Mateos
@ 2025-09-05 19:15     ` Linus Torvalds
  2025-09-05 19:23       ` Jens Axboe
  2025-09-05 19:21     ` Linus Torvalds
  3 siblings, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2025-09-05 19:15 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Caleb Sander Mateos, io-uring, Konstantin Ryabitsev

On Fri, 5 Sept 2025 at 12:04, Jens Axboe <axboe@kernel.dk> wrote:
>
> IMHO it's better to have a Link and it _potentially_ being useful than
> not to have it and then need to search around for it.

No. Really.

The issue is "potentially - but very likely not - useful" vs "I HIT
THIS TEN+ TIMES EVERY SINGLE F%^& RELEASE".

There is just no comparison.  I have literally *never* found the
original submission email to be useful, and I'm tired of the
"potentially useful" argument that has nothing to back it up with.

It's literally magical thinking of "in some alternate universe, pigs
can fly, and that link might be useful"

          Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [GIT PULL] io_uring fix for 6.17-rc5
  2025-09-05 19:13     ` Caleb Sander Mateos
@ 2025-09-05 19:16       ` Jens Axboe
  0 siblings, 0 replies; 74+ messages in thread
From: Jens Axboe @ 2025-09-05 19:16 UTC (permalink / raw)
  To: Caleb Sander Mateos; +Cc: Linus Torvalds, io-uring, Konstantin Ryabitsev

On 9/5/25 1:13 PM, Caleb Sander Mateos wrote:
> On Fri, Sep 5, 2025 at 12:04?PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 9/5/25 11:24 AM, Linus Torvalds wrote:
>>> On Fri, 5 Sept 2025 at 04:18, Jens Axboe <axboe@kernel.dk> wrote:
>>>>
>>>> Just a single fix for an issue with the resource node rewrite that
>>>> happened a few releases ago. Please pull!
>>>
>>> I've pulled this, but the commentary is strange, and the patch makes
>>> no sense to me, so I unpulled it again.
>>>
>>> Yes, it changes things from kvmalloc_array() to kvcalloc(). Fine.
>>>
>>> And yes, kvcalloc() clearly clears the resulting allocation. Also fine.
>>>
>>> But even in the old version, it used __GFP_ZERO.
>>>
>>> In fact, afaik the *ONLY* difference between kvcalloc() and
>>> kvmalloc_array() array is that kvcalloc() adds the __GFP_ZERO to the
>>> flags argument:
>>>
>>>    #define kvcalloc_node_noprof(_n,_s,_f,_node)  \
>>>       kvmalloc_array_node_noprof(_n,_s,(_f)|__GFP_ZERO,_node)
>>>
>>> so afaik, this doesn't actually fix anything at all.
>>
>> Agree, I think I was too hasty in queueing that up. I overlooked that we
>> already had __GFP_ZERO in there. On the road this week and tending to
>> these kinds of duties in between, my bad. Caleb??
> 
> Sorry, this is my fault. I misread the code, the __GFP_ZERO does
> ensure the correct behavior. kvcalloc() might more clearly indicate
> the intent, but there's no bug. Apologies for the hasty patch, and
> agree it can be dropped.

The fact that there isn't a bug in there in the first place is good
news, so no worries!

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [GIT PULL] io_uring fix for 6.17-rc5
  2025-09-05 19:04   ` Jens Axboe
                       ` (2 preceding siblings ...)
  2025-09-05 19:15     ` Linus Torvalds
@ 2025-09-05 19:21     ` Linus Torvalds
  2025-09-05 19:30       ` Jens Axboe
  3 siblings, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2025-09-05 19:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Caleb Sander Mateos, io-uring, Konstantin Ryabitsev

On Fri, 5 Sept 2025 at 12:04, Jens Axboe <axboe@kernel.dk> wrote:
>
> What is the hurt here, really, other than you being disappointed there's
> nothing extra in the link?

And just to clarify: the hurt is real. It's not just the
disappointment. It's the wasted effort of following a link and having
to then realize that there's nothing useful there.

Those links *literally* double the effort for me when I try to be
careful about patches.

So the "what's the hurt here" question is WRONG. The cost is real. The
cost is something I've complained about before.

I'm tired of having to complain about this, and I'm really really
tired of wasting my time on links that people have added with
absolutely zero effort and no thinking to back them up.

Yes, it's literally free to you to add this cost. No, *YOU* don't see
the cost, and you think it is helpful. It's not. It's the opposite of
helpful.

So I want commit messages to be relevant and explain what is going on,
and I want them to NOT WASTE MY TIME.

And I also don't want to ignore links that are actually *useful* and
give background information.

Is that really too much to ask for?

                 Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [GIT PULL] io_uring fix for 6.17-rc5
  2025-09-05 19:15     ` Linus Torvalds
@ 2025-09-05 19:23       ` Jens Axboe
  0 siblings, 0 replies; 74+ messages in thread
From: Jens Axboe @ 2025-09-05 19:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Caleb Sander Mateos, io-uring, Konstantin Ryabitsev

On 9/5/25 1:15 PM, Linus Torvalds wrote:
> On Fri, 5 Sept 2025 at 12:04, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> IMHO it's better to have a Link and it _potentially_ being useful than
>> not to have it and then need to search around for it.
> 
> No. Really.
> 
> The issue is "potentially - but very likely not - useful" vs "I HIT
> THIS TEN+ TIMES EVERY SINGLE F%^& RELEASE".
> 
> There is just no comparison.  I have literally *never* found the
> original submission email to be useful, and I'm tired of the
> "potentially useful" argument that has nothing to back it up with.
> 
> It's literally magical thinking of "in some alternate universe, pigs
> can fly, and that link might be useful"

Then let's please define the rules. I always add a bug report in as a
Link tag, if it exists. I think we agree that's a good thing, because it
shows the origin of the patch and what it's supposed to fix.

If someone sends me a patch which may be a bug or a feature, add the
link IFF discussion actually happened there. Useful discussion,
presumably? Because what typically ends up happening is that someone
sends a series, and there's discussion, and then V2 is posted. Repeat
until good. When Vn is applied, there's zero discussion. But a link to
Vn is useful in that it helps you find Vn-1 and so forth, as it leads to
the cover letter. We can put the cover letter link in there, but that's
not useful in an big series, as it'd be in all the patches. Create
temporary branch, apply series, merge into branch where it belongs and
include the Link to the cover letter? Or the cover letter itself?

And probably more?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [GIT PULL] io_uring fix for 6.17-rc5
  2025-09-05 19:21     ` Linus Torvalds
@ 2025-09-05 19:30       ` Jens Axboe
  2025-09-05 20:54         ` Linus Torvalds
  0 siblings, 1 reply; 74+ messages in thread
From: Jens Axboe @ 2025-09-05 19:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Caleb Sander Mateos, io-uring, Konstantin Ryabitsev

On 9/5/25 1:21 PM, Linus Torvalds wrote:
> On Fri, 5 Sept 2025 at 12:04, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> What is the hurt here, really, other than you being disappointed there's
>> nothing extra in the link?
> 
> And just to clarify: the hurt is real. It's not just the
> disappointment. It's the wasted effort of following a link and having
> to then realize that there's nothing useful there.
> 
> Those links *literally* double the effort for me when I try to be
> careful about patches.
> 
> So the "what's the hurt here" question is WRONG. The cost is real. The
> cost is something I've complained about before.
> 
> I'm tired of having to complain about this, and I'm really really
> tired of wasting my time on links that people have added with
> absolutely zero effort and no thinking to back them up.

Like I said, I think there more fruitful ways to get the point across
and this picked up and well known, because I don't believe it is right
now.

> Yes, it's literally free to you to add this cost. No, *YOU* don't see
> the cost, and you think it is helpful. It's not. It's the opposite of
> helpful.

As a maintainer, yes it's free to add, and it removes the cost of
needing to think about this. Which is why lots of people just have -l as
the default. Exactly because then you don't have to think about it. I do
agree that this adds a lot of frivolous links, I think the mindset has
just been "well better to always have it there, rather than to never
even if you rarely need it".

> So I want commit messages to be relevant and explain what is going on,
> and I want them to NOT WASTE MY TIME.
> 
> And I also don't want to ignore links that are actually *useful* and
> give background information.
> 
> Is that really too much to ask for?

No I think that's fine, I'm mostly just complaining about the approach
in getting there. I think we all prefer the commit messages to be as
useful and relevant as possible.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-05 18:06     ` Linus Torvalds
@ 2025-09-05 19:33       ` Konstantin Ryabitsev
  2025-09-05 20:09         ` Linus Torvalds
                           ` (4 more replies)
  0 siblings, 5 replies; 74+ messages in thread
From: Konstantin Ryabitsev @ 2025-09-05 19:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, Caleb Sander Mateos, io-uring, workflows

(Changing the subject and aiming this at workflows.)

On Fri, Sep 05, 2025 at 11:06:01AM -0700, Linus Torvalds wrote:
> On Fri, 5 Sept 2025 at 10:45, Konstantin Ryabitsev
> <konstantin@linuxfoundation.org> wrote:
> >
> > Do you just want this to become a no-op, or will it be better if it's used
> > only with the patch.msgid.link domain namespace to clearly indicate that it's
> > just a provenance link?
> 
> So I wish it at least had some way to discourage the normal mindless
> use - and in a perfect world that there was some more useful model for
> adding links automatically.
> 
> For example, I feel like for the cover letter of a multi-commit
> series, the link to the patch series submission is potentially more
> useful - and likely much less annoying - because it would go into the
> merge message, not individual commits.

We do support this usage using `b4 shazam -M` -- it's the functional
equivalent of applying a pull request and will use the cover letter contents
as the initial source of the merge commit message. I do encourage people to
use this more than just a linear `git am` for series, for a number of reasons:

- this clearly delineates the start and end of the series
- this incorporates the contents cover letter that can give more info about
  the series than just individual commits *without* the need to hit the lore
  archive
- this lets maintainers record any additional thoughts they may have in the
  merge commit, alongside with the original cover letter

Obviously, we don't want to use the cover letter as-is, which is why b4 will
open the configured editor to let the maintainer pulling in the series make
any changes to the cover letter before it becomes the merge commit.

Having the provenance link in the cover letter as opposed to individual
commits makes perfect sense in this case, especially because it is now very
obvious where the series starts and ends.

This does create a lot more non-linear history, though. Judging from some of
my discussions on the fediverse, some maintainers are not sure if that's okay
with you. If that's actually your preferred way of seeing series being
handled, then I'll work on updating maintainer docs to indicate that this is
the workflow to follow.

Question -- what would be the preferred approach for single-patch submissions?
I expect having a merge commit for those would be more annoying?

> Anyway, the "discourage mindless use" might be as simple as a big
> warning message that the link may be just adding annoying overhead.
> 
> In contrast, a "perfect" model might be to actually have some kind of
> automation of "unless there was actual discussion about it".
> 
> But I feel such a model might be much too complicated, unless somebody
> *wants* to explore using AI because their job description says "Look
> for actual useful AI uses". In today's tech world, I assume such job
> descriptions do exist. Sigh.

So, I did work on this for a while before running out of credits, and there
were the following stumbling blocks:

- consuming large threads is expensive; a thread of 20 patches and a bunch of
  follow-up discussions costs $1 of API credits just to process. I realize
  it's peanuts for a lot of full-time maintainers who have corporate API
  contracts, but it's an important consideration
- the LLMs did get confused about who said what when consuming long threads,
  at least with the models at the time. Maybe more modern models are better at
  this than those I tried a year ago. Misattributing things can be *really*
  bad in the context of decision making, so I found this the most troubling
  aspect of "have AI analyze this series and tell me if everyone important is
  okay with it."
- the models I used were proprietary (ChatGPT, Claude, Gemini), because I
  didn't have access to a good enough system to run ollama with a large enough
  context window to analyze long email threads. Even ollama is questionably
  "open source" -- but don't need to get into that aspect of it in this
  thread.

However, I feel that LLMs can be generally useful here, when handled with
care and with a good understanding that they do and will get things wrong.

> For example, since 'b4' ends up looking through the downstream thread
> of a patch anyway in order to add acked-by lines etc, I do think that
> in theory there could be some "there was lively discussion about this
> particular patch, so a link is actually worth it" heuristic.
> 
> In theory.

Yeah, in practice we can't tell a simple "good job, here's a reviewed-by" from
a "lively discussion," especially if the lively discussion was about something
else that had nothing to do with the contents of the series (e.g. as this
thread). The clever-er we try to be with b4, the quicker we run into corner
cases where our cleverness is actually doing the wrong thing.

So, I'm generally on the side of "dumb but predictably so."

-K

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-05 19:33       ` Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5) Konstantin Ryabitsev
@ 2025-09-05 20:09         ` Linus Torvalds
  2025-09-05 20:47         ` Sasha Levin
                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 74+ messages in thread
From: Linus Torvalds @ 2025-09-05 20:09 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Jens Axboe, Caleb Sander Mateos, io-uring, workflows

On Fri, 5 Sept 2025 at 12:33, Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> We do support this usage using `b4 shazam -M` -- it's the functional
> equivalent of applying a pull request and will use the cover letter contents
> as the initial source of the merge commit message. I do encourage people to
> use this more than just a linear `git am` for series, for a number of reasons:

I think that works well for more complex series, yes.

> This does create a lot more non-linear history, though. Judging from some of
> my discussions on the fediverse, some maintainers are not sure if that's okay
> with you.

I do *not* think it makes sense for random collections of patches, or
some minor two-patch series, no.

But I do think it makes sense for patch series that (a) are more than
a small handful of patches and (b) have some real "story" to them (ie
a cover letter that actually explains some higher-level issues).

Put another way: I would be unhappy if that model is used mindlessly.
No "let's automatically encourage this", please. That was, I feel, the
problem with "-l".

For example, just looking at things that happened today on lore, something like

  https://lore.kernel.org/all/20250905191357.78298-1-ryncsn@gmail.com/T/#t

looks like it could be handled very well with that actual merge model.
Just look at that cover letter: it has relevant numbers for the
series, exactly the kinds of things you do *not* want in individual
commit messages, but that make sense as a merge message.

That said, from what I've seen, these kinds of series are often MM,
and I don't think it matches the flow that Andrew tends to use. We
finally got Andrew to use git fairly recently, I'm not convinced
getting him to have a fancy non-linear history is in the cards.

(That said, Andrew clearly deals with series internally, and his pull
requests tend to actually describe things as such, so maybe he
wouldn't be too annoyed by something less linear).

I would worry a bit that  people would use odd merge bases for this.
Because one of the advantages of a linear history is that it's
simpler, and in particular that you only mess up the beginning point
of that linear history *once*. And yes, people do mess that up (we
have a whole section about the whole "pick a good base" in the docs
and people have gotten it wrong).

With non-linear history, there's just more complexity and getting
things wrong is easier and can be even more confusing.

So while I do think do that "b4 shazam -M" can be a very good thing, I
also think it's something that *definitely* needs a fair amount of
forethought.

It should not be some "default flow", in other words.

                 Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-05 19:33       ` Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5) Konstantin Ryabitsev
  2025-09-05 20:09         ` Linus Torvalds
@ 2025-09-05 20:47         ` Sasha Levin
  2025-09-06 11:27         ` Greg KH
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 74+ messages in thread
From: Sasha Levin @ 2025-09-05 20:47 UTC (permalink / raw)
  To: Konstantin Ryabitsev
  Cc: Linus Torvalds, Jens Axboe, Caleb Sander Mateos, io-uring,
	workflows

On Fri, Sep 05, 2025 at 03:33:14PM -0400, Konstantin Ryabitsev wrote:
>On Fri, Sep 05, 2025 at 11:06:01AM -0700, Linus Torvalds wrote:
>> Anyway, the "discourage mindless use" might be as simple as a big
>> warning message that the link may be just adding annoying overhead.
>>
>> In contrast, a "perfect" model might be to actually have some kind of
>> automation of "unless there was actual discussion about it".
>>
>> But I feel such a model might be much too complicated, unless somebody
>> *wants* to explore using AI because their job description says "Look
>> for actual useful AI uses". In today's tech world, I assume such job
>> descriptions do exist. Sigh.
>
>So, I did work on this for a while before running out of credits, and there
>were the following stumbling blocks:
>
>- consuming large threads is expensive; a thread of 20 patches and a bunch of
>  follow-up discussions costs $1 of API credits just to process. I realize
>  it's peanuts for a lot of full-time maintainers who have corporate API
>  contracts, but it's an important consideration
>- the LLMs did get confused about who said what when consuming long threads,
>  at least with the models at the time. Maybe more modern models are better at
>  this than those I tried a year ago. Misattributing things can be *really*
>  bad in the context of decision making, so I found this the most troubling
>  aspect of "have AI analyze this series and tell me if everyone important is
>  okay with it."

Quick note on this: I observed the same thing, but found that using structured
format (i.e. lei q --format json) really helps with this issue.

>- the models I used were proprietary (ChatGPT, Claude, Gemini), because I
>  didn't have access to a good enough system to run ollama with a large enough
>  context window to analyze long email threads. Even ollama is questionably
>  "open source" -- but don't need to get into that aspect of it in this
>  thread.
>
>However, I feel that LLMs can be generally useful here, when handled with
>care and with a good understanding that they do and will get things wrong.

I'm facing a similar challange both for the AUTOSEL and the CVE work: there is
very little historical context in most commits, and the Link: tag is almost
always useless and just points to the final submission of a patch rather than a
relevant discussion around that code.

I ended up creating an AI agent that knows how to dig through both a local git
repo as well as our mailing list using lei-q and knows to search related
dashboards like kernelci, lkft, the syzbot dashboard, etc.

I'm not sure if at it's current form it's useful to anyone else, but here's an
example of what it generates on the patch in question:


Mailing List History for commit 0f51a5c0a89921deca72e42583683e44ff742d06
========================================================================

Author: Caleb Sander Mateos <csander@purestorage.com>
Date: Thu Sep 4 19:25:34 2025 -0600
Subject: io_uring/rsrc: initialize io_rsrc_data nodes array

## Timeline of Events

1. **October 25, 2024**: Jens Axboe commits major refactoring
    - Commit: 7029acd8a950 ("io_uring/rsrc: get rid of per-ring io_rsrc_node list")
    - Major rewrite eliminating per-ring serialization of resource nodes
    - Addressed resource reclaim stalls in networked workloads

2. **April 4, 2025**: Pavel Begunkov fixes related issue
    - Commit: ab6005f3912f ("io_uring: don't post tag CQEs on file/buffer registration failure")
    - Also fixes issues introduced by commit 7029acd8a950
    - Reference: https://lore.kernel.org/r/c514446a8dcb0197cddd5d4ba8f6511da081cf1f.1743777957.git.asml.silence@gmail.com

3. **September 4, 2025, 19:25 MDT**: Caleb submits initialization fix
    - Message-ID: 20250905012535.2806919-1-csander@purestorage.com
    - Sent to: io-uring@vger.kernel.org, linux-kernel@vger.kernel.org
    - Direct submission to Jens Axboe

4. **September 4, 2025, 19:50 MDT**: Jens applies the patch
    - Applied within 25 minutes of submission
    - No public review or discussion found

5. **September 5, 2025, 05:14 MDT**: Jens sends acknowledgment
    - Message-ID: 175707084146.356946.8866336484834458029.b4-ty@kernel.dk
    - Simple "Applied, thanks!" with b4 tool
    - Assigned commit: 0f51a5c0a89921deca72e42583683e44ff742d06

## Key Findings from Mailing List Search

### Minimal Public Discussion
- **No pre-submission review**: No RFC or v1 versions found
- **No public bug reports**: No KASAN, syzkaller, or user bug reports found that directly led to this fix
- **No post-submission discussion**: Only Jens' acknowledgment found
- **No testing tags**: No Tested-by or Reviewed-by tags

### Related Activity
- Pavel Begunkov's earlier fix (April 2025) shows the original refactoring had multiple issues
- Both fixes target error paths in the resource registration code
- Pattern suggests issues were found through code review rather than runtime failures


-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [GIT PULL] io_uring fix for 6.17-rc5
  2025-09-05 19:30       ` Jens Axboe
@ 2025-09-05 20:54         ` Linus Torvalds
  2025-09-06  0:01           ` Jens Axboe
  0 siblings, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2025-09-05 20:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Caleb Sander Mateos, io-uring, Konstantin Ryabitsev

On Fri, 5 Sept 2025 at 12:30, Jens Axboe <axboe@kernel.dk> wrote:
>
> Like I said, I think there more fruitful ways to get the point across
> and this picked up and well known, because I don't believe it is right
> now.

So I've actually been complaining about the link tags for years: [1]
[2] [3] [4].

In fact, that [4] from 2022 is about how people are then trying to
distinguish the *useful* links (to bug reports) from the useless ones,
by giving them a different name ("Buglink:"). Where I was telling
people to instead fix this problem by just not adding the useless
links in the first place!

Anyway, I'm a bit frustrated, exactly because this _has_ been going on
for years. It's not a new peeve.

And I don't think we have a good central place for that kind of "don't do this".

Yes, there's the maintainer summit, but that's a pretty limited set of people.

I guess I could mention it in my release notes, but I don't know who
actually reads those either..

So I end up just complaining when I see it.

And yeah, I will take some of the blame for people doing the useless
Link. Because going even further back, people were arguing for random
"bug ID" numbers. Go search lkml, and you'll find discussions about
having UUID's in the commits, and I said that no, we're not doing
that, and that a "Link:" tag to something valid is a good alternative,
and I even mentioned a link to the submission. So that could be seen
as some kind of encouragement - but it was more of a "no, we're *NOT*
doing random meaningless UUIDs".

I did go back and look in the git archives. The oldest link we have in
the kernel git tree is from 2011. Guess what? That email has had over
fourteen years to get more information associated with it on the
mailing list, but no.

That link has _zero_ new information that would be relevant outside
the commit that references it (f994d99cf140: "x86-32, fpu: Fix FPU
exception handling on non-SSE systems").

            Linus

Link: https://lore.kernel.org/all/CAHk-=wgfX9nBGE0Ap9GjhOy7Mn=RSy=rx0MvqfYFFDx31KJXqQ@mail.gmail.com/
[1]
Link: https://lore.kernel.org/all/CAHk-=wiUS4r788i5XjTtSwvfvKRm9uH2H5=eLHbZVu3Wo-YHCA@mail.gmail.com/
[2]
Link: https://lore.kernel.org/all/CAHk-=whRBX0aQq1J5S5nHXE2GvXnQ5z+cqu=iTY9xU34kvYMzw@mail.gmail.com/
[3]
Link: https://lore.kernel.org/all/CAHk-=wgzRUT1fBpuz3xcN+YdsX0SxqOzHWRtj0ReHpUBb5TKbA@mail.gmail.com/
[4]
Link: [ .. too lazy to look up more .. ]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [GIT PULL] io_uring fix for 6.17-rc5
  2025-09-05 20:54         ` Linus Torvalds
@ 2025-09-06  0:01           ` Jens Axboe
  2025-09-07 18:47             ` Jonathan Corbet
  2025-09-08 22:15             ` Alexei Starovoitov
  0 siblings, 2 replies; 74+ messages in thread
From: Jens Axboe @ 2025-09-06  0:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Caleb Sander Mateos, io-uring, Konstantin Ryabitsev

On 9/5/25 2:54 PM, Linus Torvalds wrote:
> On Fri, 5 Sept 2025 at 12:30, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> Like I said, I think there more fruitful ways to get the point across
>> and this picked up and well known, because I don't believe it is right
>> now.
> 
> So I've actually been complaining about the link tags for years: [1]
> [2] [3] [4].
> 
> In fact, that [4] from 2022 is about how people are then trying to
> distinguish the *useful* links (to bug reports) from the useless ones,
> by giving them a different name ("Buglink:"). Where I was telling
> people to instead fix this problem by just not adding the useless
> links in the first place!
> 
> Anyway, I'm a bit frustrated, exactly because this _has_ been going on
> for years. It's not a new peeve.

What's that saying on doing the same thing over and over again and
expecting different results...? :-)

> And I don't think we have a good central place for that kind of "don't do this".
> 
> Yes, there's the maintainer summit, but that's a pretty limited set of people.

That'd be a great place to discuss it, however. One thing I've always
wanted to bring up but have forgotten to, is how I'd _love_ for your PR
merges to contain the link to the PR that you got for them. Yes I know
that's now adding a link, but that's a useful one. Maybe not for you,
but for me and I bet tons of other people. At least if there's
discussion on it. But hey I'd be happy if it was just always there, but
it seems we disagree on that part.

What is clear, however, is that the rules on this aren't clear at all.

> I guess I could mention it in my release notes, but I don't know who
> actually reads those either..

I actually think a LOT of people read those. I do every week, and it
always goes on LWN too, for example.

But it does not have to be in the release notes. Just a separate email
with LWN/Jon CC'ed, and boom you have your story and people will see it.
And it doesn't need yelling. Alternatively, we discuss at the
maintainers summit, and come up with a set of rules that can get
documented. And then hopefully end up on LWN too. Honestly I had to
search in Documentation/ to see if we even have any kind of maintainer
documentation. Looks like we do, but who looks in there...

> So I end up just complaining when I see it.
> 
> And yeah, I will take some of the blame for people doing the useless
> Link. Because going even further back, people were arguing for random
> "bug ID" numbers. Go search lkml, and you'll find discussions about
> having UUID's in the commits, and I said that no, we're not doing
> that, and that a "Link:" tag to something valid is a good alternative,
> and I even mentioned a link to the submission. So that could be seen
> as some kind of encouragement - but it was more of a "no, we're *NOT*
> doing random meaningless UUIDs".

Maybe the problem is indeed in the name, it's very generic to call it a
Link. If you see "Closes: " you know exactly what it is, it's for some
bug tracker and you can click it and expect to see more info. Maybe
"Bug: " would be useful, or "Report: " or whatever - naming is hard. But
Link literally tells my brain, it's a link to the patch. Maybe there's
discussion there, maybe there's not. Because like or not, I do think the
generic nature of the name Link is part of the issue here.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-05 19:33       ` Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5) Konstantin Ryabitsev
  2025-09-05 20:09         ` Linus Torvalds
  2025-09-05 20:47         ` Sasha Levin
@ 2025-09-06 11:27         ` Greg KH
  2025-09-06 11:27           ` Greg KH
  2025-09-06 13:51           ` Konstantin Ryabitsev
  2025-09-08 20:11         ` dan.j.williams
  2025-09-09 16:32         ` [RFC] b4 dig: Add AI-powered email relationship discovery command Sasha Levin
  4 siblings, 2 replies; 74+ messages in thread
From: Greg KH @ 2025-09-06 11:27 UTC (permalink / raw)
  To: Konstantin Ryabitsev
  Cc: Linus Torvalds, Jens Axboe, Caleb Sander Mateos, io-uring,
	workflows

On Fri, Sep 05, 2025 at 03:33:14PM -0400, Konstantin Ryabitsev wrote:
> (Changing the subject and aiming this at workflows.)
> 
> On Fri, Sep 05, 2025 at 11:06:01AM -0700, Linus Torvalds wrote:
> > On Fri, 5 Sept 2025 at 10:45, Konstantin Ryabitsev
> > <konstantin@linuxfoundation.org> wrote:
> > >
> > > Do you just want this to become a no-op, or will it be better if it's used
> > > only with the patch.msgid.link domain namespace to clearly indicate that it's
> > > just a provenance link?
> > 
> > So I wish it at least had some way to discourage the normal mindless
> > use - and in a perfect world that there was some more useful model for
> > adding links automatically.
> > 
> > For example, I feel like for the cover letter of a multi-commit
> > series, the link to the patch series submission is potentially more
> > useful - and likely much less annoying - because it would go into the
> > merge message, not individual commits.
> 
> We do support this usage using `b4 shazam -M` -- it's the functional
> equivalent of applying a pull request and will use the cover letter contents
> as the initial source of the merge commit message. I do encourage people to
> use this more than just a linear `git am` for series, for a number of reasons:
> 
> - this clearly delineates the start and end of the series
> - this incorporates the contents cover letter that can give more info about
>   the series than just individual commits *without* the need to hit the lore
>   archive
> - this lets maintainers record any additional thoughts they may have in the
>   merge commit, alongside with the original cover letter
> 
> Obviously, we don't want to use the cover letter as-is, which is why b4 will
> open the configured editor to let the maintainer pulling in the series make
> any changes to the cover letter before it becomes the merge commit.

I like this a lot, and just tried it, but it ends up applying the
patches from the list without my signed-off-by, which will cause
linux-next to complain when it sees that I committed patches without
that.

Did I miss an option to `b4 shazam`?  Does it need to add a -s option
like `b4 am` has?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-06 11:27         ` Greg KH
@ 2025-09-06 11:27           ` Greg KH
  2025-09-06 11:30             ` Greg KH
  2025-09-06 13:51           ` Konstantin Ryabitsev
  1 sibling, 1 reply; 74+ messages in thread
From: Greg KH @ 2025-09-06 11:27 UTC (permalink / raw)
  To: Konstantin Ryabitsev
  Cc: Linus Torvalds, Jens Axboe, Caleb Sander Mateos, io-uring,
	workflows

On Sat, Sep 06, 2025 at 01:27:04PM +0200, Greg KH wrote:
> On Fri, Sep 05, 2025 at 03:33:14PM -0400, Konstantin Ryabitsev wrote:
> > (Changing the subject and aiming this at workflows.)
> > 
> > On Fri, Sep 05, 2025 at 11:06:01AM -0700, Linus Torvalds wrote:
> > > On Fri, 5 Sept 2025 at 10:45, Konstantin Ryabitsev
> > > <konstantin@linuxfoundation.org> wrote:
> > > >
> > > > Do you just want this to become a no-op, or will it be better if it's used
> > > > only with the patch.msgid.link domain namespace to clearly indicate that it's
> > > > just a provenance link?
> > > 
> > > So I wish it at least had some way to discourage the normal mindless
> > > use - and in a perfect world that there was some more useful model for
> > > adding links automatically.
> > > 
> > > For example, I feel like for the cover letter of a multi-commit
> > > series, the link to the patch series submission is potentially more
> > > useful - and likely much less annoying - because it would go into the
> > > merge message, not individual commits.
> > 
> > We do support this usage using `b4 shazam -M` -- it's the functional
> > equivalent of applying a pull request and will use the cover letter contents
> > as the initial source of the merge commit message. I do encourage people to
> > use this more than just a linear `git am` for series, for a number of reasons:
> > 
> > - this clearly delineates the start and end of the series
> > - this incorporates the contents cover letter that can give more info about
> >   the series than just individual commits *without* the need to hit the lore
> >   archive
> > - this lets maintainers record any additional thoughts they may have in the
> >   merge commit, alongside with the original cover letter
> > 
> > Obviously, we don't want to use the cover letter as-is, which is why b4 will
> > open the configured editor to let the maintainer pulling in the series make
> > any changes to the cover letter before it becomes the merge commit.
> 
> I like this a lot, and just tried it, but it ends up applying the
> patches from the list without my signed-off-by, which will cause
> linux-next to complain when it sees that I committed patches without
> that.
> 
> Did I miss an option to `b4 shazam`?  Does it need to add a -s option
> like `b4 am` has?

Oh nevermind, it does support -s.  It's just not documented :)

let me go make a patch...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-06 11:27           ` Greg KH
@ 2025-09-06 11:30             ` Greg KH
  0 siblings, 0 replies; 74+ messages in thread
From: Greg KH @ 2025-09-06 11:30 UTC (permalink / raw)
  To: Konstantin Ryabitsev
  Cc: Linus Torvalds, Jens Axboe, Caleb Sander Mateos, io-uring,
	workflows

On Sat, Sep 06, 2025 at 01:27:57PM +0200, Greg KH wrote:
> On Sat, Sep 06, 2025 at 01:27:04PM +0200, Greg KH wrote:
> > On Fri, Sep 05, 2025 at 03:33:14PM -0400, Konstantin Ryabitsev wrote:
> > > (Changing the subject and aiming this at workflows.)
> > > 
> > > On Fri, Sep 05, 2025 at 11:06:01AM -0700, Linus Torvalds wrote:
> > > > On Fri, 5 Sept 2025 at 10:45, Konstantin Ryabitsev
> > > > <konstantin@linuxfoundation.org> wrote:
> > > > >
> > > > > Do you just want this to become a no-op, or will it be better if it's used
> > > > > only with the patch.msgid.link domain namespace to clearly indicate that it's
> > > > > just a provenance link?
> > > > 
> > > > So I wish it at least had some way to discourage the normal mindless
> > > > use - and in a perfect world that there was some more useful model for
> > > > adding links automatically.
> > > > 
> > > > For example, I feel like for the cover letter of a multi-commit
> > > > series, the link to the patch series submission is potentially more
> > > > useful - and likely much less annoying - because it would go into the
> > > > merge message, not individual commits.
> > > 
> > > We do support this usage using `b4 shazam -M` -- it's the functional
> > > equivalent of applying a pull request and will use the cover letter contents
> > > as the initial source of the merge commit message. I do encourage people to
> > > use this more than just a linear `git am` for series, for a number of reasons:
> > > 
> > > - this clearly delineates the start and end of the series
> > > - this incorporates the contents cover letter that can give more info about
> > >   the series than just individual commits *without* the need to hit the lore
> > >   archive
> > > - this lets maintainers record any additional thoughts they may have in the
> > >   merge commit, alongside with the original cover letter
> > > 
> > > Obviously, we don't want to use the cover letter as-is, which is why b4 will
> > > open the configured editor to let the maintainer pulling in the series make
> > > any changes to the cover letter before it becomes the merge commit.
> > 
> > I like this a lot, and just tried it, but it ends up applying the
> > patches from the list without my signed-off-by, which will cause
> > linux-next to complain when it sees that I committed patches without
> > that.
> > 
> > Did I miss an option to `b4 shazam`?  Does it need to add a -s option
> > like `b4 am` has?
> 
> Oh nevermind, it does support -s.  It's just not documented :)
> 
> let me go make a patch...

And it is documented.  Ugh, nevermind, I need more coffee, sorry for the
noise.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-06 11:27         ` Greg KH
  2025-09-06 11:27           ` Greg KH
@ 2025-09-06 13:51           ` Konstantin Ryabitsev
  2025-09-06 15:31             ` Linus Torvalds
  1 sibling, 1 reply; 74+ messages in thread
From: Konstantin Ryabitsev @ 2025-09-06 13:51 UTC (permalink / raw)
  To: Greg KH
  Cc: Linus Torvalds, Jens Axboe, Caleb Sander Mateos, io-uring,
	workflows

On Sat, Sep 06, 2025 at 01:27:04PM +0200, Greg KH wrote:
> > Obviously, we don't want to use the cover letter as-is, which is why b4 will
> > open the configured editor to let the maintainer pulling in the series make
> > any changes to the cover letter before it becomes the merge commit.
> 
> I like this a lot, and just tried it, but it ends up applying the
> patches from the list without my signed-off-by, which will cause
> linux-next to complain when it sees that I committed patches without
> that.
> 
> Did I miss an option to `b4 shazam`?  Does it need to add a -s option
> like `b4 am` has?

Yes, most of the time you'll want to run it as `b4 shazam -Ms`.

Unfortunately, `shazam -M` is not perfect, because we do need to know the
base-commit, and there's still way too many series sent without this info. We
do some magic trying to figure out where the series might belong (basically,
by comparing blob hashes and trying to find the tree with the same set of blob
hashes as in the patch), but it only works if you have the same local repo as
the contributor.

Best regards,
-K

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-06 13:51           ` Konstantin Ryabitsev
@ 2025-09-06 15:31             ` Linus Torvalds
  2025-09-06 18:50               ` Konstantin Ryabitsev
  0 siblings, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2025-09-06 15:31 UTC (permalink / raw)
  To: Konstantin Ryabitsev
  Cc: Greg KH, Jens Axboe, Caleb Sander Mateos, io-uring, workflows

On Sat, 6 Sept 2025 at 06:51, Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> Unfortunately, `shazam -M` is not perfect, because we do need to know the
> base-commit, and there's still way too many series sent without this info.

No, no. You're thinking about it wrong.

An emailed patch series is *not* a git pull. If you want actual real
git history, just use git. Using a patch series and shazam for that
would be *bad*. It's actively worse than just using git, with zero
upside.

No, the upside of a patch series is that it's *not* fixed in stone yet
- not in history, not in acks, not in actual code. So do *not*
encourage people to think of it as some second-rate "git history"
model. It's not, and it would be *BAD* at it.

Instead, embrace the "it's a patch series". You should *not* strive to
make "b4 shazam" think it should recreate the original git tree. not
at all.

Instead, it should be a "here's a patch series with a cover letter,
make a pretty history of it, delineate it with a merge, and save the
relevant information from the cover letter in the merge message".

Look, we already have subsystems that do that. I don't know if they
use b4 shazam - maybe they do, maybe they don't - but the end result
is what matters.

For example, the networking people use this model for small series of
patches, and you can see it in patterns like this (I picked a random
area, this is meant to illustrate the point, the commits themselves
are not relevant):

    gitk d2644cbc736f..f63e7c8a8389

and look at the kind of "pseudo-linear" history, where small series
are delineated with that separate branch and merge, but this is *not*
some kind of global history where people tried to keep original commit
bases around etc.

That kind of global history would be *worse* for the whole "send
patches by email" model.

So don't strive to replicate git - badly. Strive to do a *good* job.

Your comment about how you want to know the base commit makes me think
you are missing the point.

git is git.

And emailed patch series are a different thing entirely, and trying
for some 1:1 thing only makes things objectively worse.

                  Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-06 15:31             ` Linus Torvalds
@ 2025-09-06 18:50               ` Konstantin Ryabitsev
  2025-09-06 19:19                 ` Linus Torvalds
  2025-09-08 11:59                 ` Mark Brown
  0 siblings, 2 replies; 74+ messages in thread
From: Konstantin Ryabitsev @ 2025-09-06 18:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg KH, Jens Axboe, Caleb Sander Mateos, io-uring, workflows

On Sat, Sep 06, 2025 at 08:31:59AM -0700, Linus Torvalds wrote:
> On Sat, 6 Sept 2025 at 06:51, Konstantin Ryabitsev
> <konstantin@linuxfoundation.org> wrote:
> >
> > Unfortunately, `shazam -M` is not perfect, because we do need to know the
> > base-commit, and there's still way too many series sent without this info.
> 
> No, no. You're thinking about it wrong.
> 
> An emailed patch series is *not* a git pull. If you want actual real
> git history, just use git. Using a patch series and shazam for that
> would be *bad*. It's actively worse than just using git, with zero
> upside.

The primary consumer of this are the CI systems, though, like those that plug
into patchwork. In order to be able to run a bunch of tests they need to be
able to apply the patches to a tree, so, in a sense, they do need to recreate
git as much as possible, including the branch point.

> No, the upside of a patch series is that it's *not* fixed in stone yet
> - not in history, not in acks, not in actual code. So do *not*
> encourage people to think of it as some second-rate "git history"
> model. It's not, and it would be *BAD* at it.

b4 will tell you if a series applies cleanly to the current tree, but I don't
think we make use of this with `shazam -M` -- we always try to parent it
against the indicated base commit. Is the recommendation then to always try to
use the latest tree and bail out if it doesn't apply?

> That kind of global history would be *worse* for the whole "send
> patches by email" model.
> 
> So don't strive to replicate git - badly. Strive to do a *good* job.

But people do want to replicate git, if only so they can run integration tests
in a more automated fashion. If I understand correctly, you suggest two modes
of operations:

1. recreate the tree exactly as the author intended, so that CI systems can
   run tests.

2. try to create a merge commit on top of the latest HEAD and bail if it's not
   working, letting the maintainer fix any conflicts on their own.

> Your comment about how you want to know the base commit makes me think
> you are missing the point.

No, I'm mostly implementing what people tell me they'd like to see. :) Someone
once told me that they really wanted to be able to treat mailed series exactly
like a pull request, hence why this feature exists. You're actually the first
person to say that this behaviour is not what we should be doing.

-K

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-06 18:50               ` Konstantin Ryabitsev
@ 2025-09-06 19:19                 ` Linus Torvalds
  2025-09-08  9:11                   ` Jani Nikula
  2025-09-08 11:59                 ` Mark Brown
  1 sibling, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2025-09-06 19:19 UTC (permalink / raw)
  To: Konstantin Ryabitsev
  Cc: Greg KH, Jens Axboe, Caleb Sander Mateos, io-uring, workflows

On Sat, 6 Sept 2025 at 11:50, Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> The primary consumer of this are the CI systems, though, like those that plug
> into patchwork

Yes, for a CI, it makes sense to try to have a fixed base, if such a
base exists.

But for that case, when a base exists and is published, why aren't
those people and tools *actually* using git then? That gets rid of all
the strangeness - and inefficiency - of trying to recreate it from
emails.

So I'd rather encourage people to have git branches that they expose,
if CI is the main use case.

For an example of how to do this right, look at what Al does. Recent
patch series posted at

   https://lore.kernel.org/all/20250906090738.GA31600@ZenIV/

is a good example, and notice Al saying:

  Branches are in
  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.path and
  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.f_path resp.;
  individual patches in followups.

in the cover letter.

In other words: if the series was exported from a git tree and you
have a base to use, why would it *EVER* be sane to then use 'b4
shazam' to get it?

So I think what 'b4 shazam' _should_ be looking at is when Greg says
"I like this a lot".

I think it should aim for supporting maintainers that apply patch
series as part of their workflow, not at CI tools that have the WRONG
workflow.

And yes, maybe fixing the CI tool workflow then involves having people
who post patch series post the git branch too.

I often find the git branches nicer for walking through some patch
series anyway. But it goes both ways: for short series, since I'm in
the MUA, just walking through five or six patches and replying to them
is simpler, for longer series that do more involved things, I find
doing a "git fetch" and then using git tooling to look at particular
_parts_ of the series can be a lot more powerful.

In fact, for long series that get reposted, just to not mess up my
mailbox I would generally prefer to just see the git branch over some
50-email patch bomb.

Maybe *that* would be a good addition for 'b4', where you can reply to
just the cover letter and say "Ack for this series" or explicitly
reply to particular patches - that might not even have been posted -
by mentioning their commit IDs.

That's my workflow much of the time, see for example

   https://lore.kernel.org/all/CAHk-=wgZEkSNKFe_=W=OcoMTQiwq8j017mh+TUR4AV9GiMPQLA@mail.gmail.com/

where I basically went through the series, and then replied to
individual patches.

I do like the "reply to individual patches" - even when I might
actually have looked at them in git - just because then I can quote
the part I reacted to. So I do think posting the patches makes sense
as long as it's not some excessive patch-bomb, but at the same time I
do know that a lot of patch series end up being of the type where
possibly dozens of people get cc'd, but only on the one or two patches
that are relevant to them.

And then the git workflow *really* shines, because it gets you that
context (and lots of people object to getting tens or hundreds of
patches in email when only one or two are relevant to them).

              Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [GIT PULL] io_uring fix for 6.17-rc5
  2025-09-06  0:01           ` Jens Axboe
@ 2025-09-07 18:47             ` Jonathan Corbet
  2025-09-08 22:15             ` Alexei Starovoitov
  1 sibling, 0 replies; 74+ messages in thread
From: Jonathan Corbet @ 2025-09-07 18:47 UTC (permalink / raw)
  To: Jens Axboe, Linus Torvalds
  Cc: Caleb Sander Mateos, io-uring, Konstantin Ryabitsev

Jens Axboe <axboe@kernel.dk> writes:

> But it does not have to be in the release notes. Just a separate email
> with LWN/Jon CC'ed, and boom you have your story and people will see it.

No need for the separate email :)

Thanks,

jon

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [GIT PULL] io_uring fix for 6.17-rc5
  2025-09-05 17:45   ` Konstantin Ryabitsev
  2025-09-05 18:06     ` Linus Torvalds
@ 2025-09-07 22:04     ` Jonathan Corbet
  1 sibling, 0 replies; 74+ messages in thread
From: Jonathan Corbet @ 2025-09-07 22:04 UTC (permalink / raw)
  To: Konstantin Ryabitsev, Linus Torvalds
  Cc: Jens Axboe, Caleb Sander Mateos, io-uring

Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:

> On Fri, Sep 05, 2025 at 10:24:17AM -0700, Linus Torvalds wrote:
>> Yes, I'm grumpy. I feel like my main job - really my only job - is to
>> try to make sense of pull requests, and that's why I absolutely detest
>> these things that are automatically added and only make my job harder.
>> 
>> I'm cc'ing Konstantin again, because this is a prime example of why
>> that automation HURTS, and he was arguing in favor of that sh*t just
>> last week.
>> 
>> Can we please stop this automated idiocy?
>
> FWIW, Link: trailers are not added by default. The maintainer has to
> deliberately add the -l switch.

It's worth noting that our documentation suggests adding a Git hook to
add Link: tags automatically: Documentation/maintainer/configure-git.rst. 
I suspect a lot of people add such hooks years ago when it was all the
rage and have long since forgotten about them...  Not that I would be
such a person, of course.

It seems we should remove the recommendation?

jon

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-06 19:19                 ` Linus Torvalds
@ 2025-09-08  9:11                   ` Jani Nikula
  0 siblings, 0 replies; 74+ messages in thread
From: Jani Nikula @ 2025-09-08  9:11 UTC (permalink / raw)
  To: Linus Torvalds, Konstantin Ryabitsev
  Cc: Greg KH, Jens Axboe, Caleb Sander Mateos, io-uring, workflows

On Sat, 06 Sep 2025, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Sat, 6 Sept 2025 at 11:50, Konstantin Ryabitsev
> <konstantin@linuxfoundation.org> wrote:
>>
>> The primary consumer of this are the CI systems, though, like those that plug
>> into patchwork
>
> Yes, for a CI, it makes sense to try to have a fixed base, if such a
> base exists.
>
> But for that case, when a base exists and is published, why aren't
> those people and tools *actually* using git then? That gets rid of all
> the strangeness - and inefficiency - of trying to recreate it from
> emails.
>
> So I'd rather encourage people to have git branches that they expose,
> if CI is the main use case.

For i915 and xe, we'll want *all* patches go through CI. I'm sure there
are other drivers like that. CI is not the "main" use case, just one use
case. I'd like to have patches on the list for review and discussion,
and git branches for CI and everything else.

Insert "Both? Both. Both. Both Is Good." meme here.

To me it sounds like it would be useful to have tooling (b4? git
send-email?) that could push a git branch *and* send those changes as a
patch series, with a well-formed, machine-readable part in the cover
letter that points at the git repo.

I guess you could have server git hooks or forge workflows to send the
patches as well.

(Though you still can't review what's on the list, and blindly apply
what's in the git repo.)

BR,
Jani.

-- 
Jani Nikula, Intel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-06 18:50               ` Konstantin Ryabitsev
  2025-09-06 19:19                 ` Linus Torvalds
@ 2025-09-08 11:59                 ` Mark Brown
  1 sibling, 0 replies; 74+ messages in thread
From: Mark Brown @ 2025-09-08 11:59 UTC (permalink / raw)
  To: Konstantin Ryabitsev
  Cc: Linus Torvalds, Greg KH, Jens Axboe, Caleb Sander Mateos,
	io-uring, workflows

[-- Attachment #1: Type: text/plain, Size: 2321 bytes --]

On Sat, Sep 06, 2025 at 02:50:49PM -0400, Konstantin Ryabitsev wrote:
> On Sat, Sep 06, 2025 at 08:31:59AM -0700, Linus Torvalds wrote:

> > An emailed patch series is *not* a git pull. If you want actual real
> > git history, just use git. Using a patch series and shazam for that
> > would be *bad*. It's actively worse than just using git, with zero
> > upside.

> The primary consumer of this are the CI systems, though, like those that plug
> into patchwork. In order to be able to run a bunch of tests they need to be
> able to apply the patches to a tree, so, in a sense, they do need to recreate
> git as much as possible, including the branch point.

Well, for CI we often don't exactly care that the patch is applied in
the context that the sender sent it, we care more that the patch is
applied for testing in the same context where it's going to be applied
when merged.  The base information is useful and we might want to use
it, but we might also not.  My flow is to apply things, test and then
push to the actual tree if the testing is happy so I'm testing the
actual commits that will be pushed if everything goes well.

> > No, the upside of a patch series is that it's *not* fixed in stone yet
> > - not in history, not in acks, not in actual code. So do *not*
> > encourage people to think of it as some second-rate "git history"
> > model. It's not, and it would be *BAD* at it.

> b4 will tell you if a series applies cleanly to the current tree, but I don't
> think we make use of this with `shazam -M` -- we always try to parent it
> against the indicated base commit. Is the recommendation then to always try to
> use the latest tree and bail out if it doesn't apply?

If we're going to automatically pick up the base commit that needs an
option to limit what the commits that might be selected are, people
don't always send something directly usable.  For example with a series
that should be split between trees (eg, a driver plus DT updates to add
the device to some boards) you might reasonably base off linux-next,
that'll get a current tree for everywhere the individual patches should
be applied.  For example my scripting when it's paying attention to base
commits will ignore anything that's not in the history of the branch the
tree is targeted at unless I explicitly tell it otherwise.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-05 19:33       ` Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5) Konstantin Ryabitsev
                           ` (2 preceding siblings ...)
  2025-09-06 11:27         ` Greg KH
@ 2025-09-08 20:11         ` dan.j.williams
  2025-09-09 11:29           ` Mark Brown
  2025-09-09 13:17           ` Rafael J. Wysocki
  2025-09-09 16:32         ` [RFC] b4 dig: Add AI-powered email relationship discovery command Sasha Levin
  4 siblings, 2 replies; 74+ messages in thread
From: dan.j.williams @ 2025-09-08 20:11 UTC (permalink / raw)
  To: Konstantin Ryabitsev, Linus Torvalds
  Cc: Jens Axboe, Caleb Sander Mateos, io-uring, workflows

Konstantin Ryabitsev wrote:
> (Changing the subject and aiming this at workflows.)
> 
> On Fri, Sep 05, 2025 at 11:06:01AM -0700, Linus Torvalds wrote:
> > On Fri, 5 Sept 2025 at 10:45, Konstantin Ryabitsev
> > <konstantin@linuxfoundation.org> wrote:
> > >
> > > Do you just want this to become a no-op, or will it be better if it's used
> > > only with the patch.msgid.link domain namespace to clearly indicate that it's
> > > just a provenance link?
> > 
> > So I wish it at least had some way to discourage the normal mindless
> > use - and in a perfect world that there was some more useful model for
> > adding links automatically.
> > 
> > For example, I feel like for the cover letter of a multi-commit
> > series, the link to the patch series submission is potentially more
> > useful - and likely much less annoying - because it would go into the
> > merge message, not individual commits.
> 
> We do support this usage using `b4 shazam -M` -- it's the functional
> equivalent of applying a pull request and will use the cover letter contents
> as the initial source of the merge commit message. I do encourage people to
> use this more than just a linear `git am` for series, for a number of reasons:

For me, as a subsystem downstream person the 'mindless' patch.msgid.link
saves me time when I need to report a regression, or validate which
version of a patch was pulled from a list when curating a long-running
topic in a staging tree. I do make sure to put actual discussion
references outside the patch.msgid.link namespace and hope that others
continue to use this helpful breadcrumb.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [GIT PULL] io_uring fix for 6.17-rc5
  2025-09-06  0:01           ` Jens Axboe
  2025-09-07 18:47             ` Jonathan Corbet
@ 2025-09-08 22:15             ` Alexei Starovoitov
  1 sibling, 0 replies; 74+ messages in thread
From: Alexei Starovoitov @ 2025-09-08 22:15 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Caleb Sander Mateos, io-uring,
	Konstantin Ryabitsev, bpf

On Fri, Sep 05, 2025 at 06:01:01PM -0600, Jens Axboe wrote:
> On 9/5/25 2:54 PM, Linus Torvalds wrote:
> > On Fri, 5 Sept 2025 at 12:30, Jens Axboe <axboe@kernel.dk> wrote:
> >>
> >> Like I said, I think there more fruitful ways to get the point across
> >> and this picked up and well known, because I don't believe it is right
> >> now.
> > 
> > So I've actually been complaining about the link tags for years: [1]
> > [2] [3] [4].
> > 
> > In fact, that [4] from 2022 is about how people are then trying to
> > distinguish the *useful* links (to bug reports) from the useless ones,
> > by giving them a different name ("Buglink:"). Where I was telling
> > people to instead fix this problem by just not adding the useless
> > links in the first place!
> > 
> > Anyway, I'm a bit frustrated, exactly because this _has_ been going on
> > for years. It's not a new peeve.
> 
> What's that saying on doing the same thing over and over again and
> expecting different results...? :-)
> 
> > And I don't think we have a good central place for that kind of "don't do this".
> > 
> > Yes, there's the maintainer summit, but that's a pretty limited set of people.
> 
> That'd be a great place to discuss it, however. One thing I've always
> wanted to bring up but have forgotten to, is how I'd _love_ for your PR
> merges to contain the link to the PR that you got for them. Yes I know
> that's now adding a link, but that's a useful one. Maybe not for you,
> but for me and I bet tons of other people. At least if there's
> discussion on it. But hey I'd be happy if it was just always there, but
> it seems we disagree on that part.

+1 to above request.

Regarding Link tag. We've been adding them to all bpf/net commits
for quite some time and found them useful in many cases:

1. patches rarely come as a single patch. Even if it's a single line
fix there is likely a selftest in the other commit. When I investigate
a commit clicking on lore link and seeing the whole series saves a ton
of time, since search by commit name in lore.kernel.org/all/ isn't great.

2. patches rarely accepted on the first revision and we recommend developers
to add lore link to v1 when they respin v2. So by the time vN series
are accepted the cover letter has links to all previous revisions.
Similarly when I debug an issue: git blame, git show sha, click on lore link,
click on 0/N, click on v2-v3, since most of the interesting discussion
happens in earlier revisions. The last few respins will typically address
final nits.

3. even if it's a rare single commit the patch subject doesn't say
whether it was v1 or v2, while lore has this information in email
like [PATCH bpf-next v2] subj. Going to lore and realizing that
ohh it was v2 that was accepted is a lot better than search by subject
and seeing v1, v2, v3 versions of the same patch and not being able
to tell which one was applied.

So the only case where Link is useless is the case of single commit
without any revisions that was accepted on the first try.
We can manually remove such links, but this would be tedious
manual work, since automation is tailored for common case where
link is in every commit and they are useful.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-08 20:11         ` dan.j.williams
@ 2025-09-09 11:29           ` Mark Brown
  2025-09-09 13:17           ` Rafael J. Wysocki
  1 sibling, 0 replies; 74+ messages in thread
From: Mark Brown @ 2025-09-09 11:29 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Konstantin Ryabitsev, Linus Torvalds, Jens Axboe,
	Caleb Sander Mateos, io-uring, workflows

[-- Attachment #1: Type: text/plain, Size: 1100 bytes --]

On Mon, Sep 08, 2025 at 01:11:00PM -0700, dan.j.williams@intel.com wrote:
> Konstantin Ryabitsev wrote:

> > We do support this usage using `b4 shazam -M` -- it's the functional
> > equivalent of applying a pull request and will use the cover letter contents
> > as the initial source of the merge commit message. I do encourage people to
> > use this more than just a linear `git am` for series, for a number of reasons:

> For me, as a subsystem downstream person the 'mindless' patch.msgid.link
> saves me time when I need to report a regression, or validate which
> version of a patch was pulled from a list when curating a long-running
> topic in a staging tree. I do make sure to put actual discussion
> references outside the patch.msgid.link namespace and hope that others
> continue to use this helpful breadcrumb.

Yes, I use the links constantly too when reporting regressions - it's
super helpful to just be able to pull the message and thread from the
mailing list with b4.  You get an initial way into the discussion (and
any reports someone else made) and a good list of people to CC.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-08 20:11         ` dan.j.williams
  2025-09-09 11:29           ` Mark Brown
@ 2025-09-09 13:17           ` Rafael J. Wysocki
  2025-09-09 14:18             ` Jakub Kicinski
  1 sibling, 1 reply; 74+ messages in thread
From: Rafael J. Wysocki @ 2025-09-09 13:17 UTC (permalink / raw)
  To: Konstantin Ryabitsev, Linus Torvalds, dan.j.williams
  Cc: Jens Axboe, Caleb Sander Mateos, io-uring, workflows

On Monday, September 8, 2025 10:11:00 PM CEST dan.j.williams@intel.com wrote:
> Konstantin Ryabitsev wrote:
> > (Changing the subject and aiming this at workflows.)
> > 
> > On Fri, Sep 05, 2025 at 11:06:01AM -0700, Linus Torvalds wrote:
> > > On Fri, 5 Sept 2025 at 10:45, Konstantin Ryabitsev
> > > <konstantin@linuxfoundation.org> wrote:
> > > >
> > > > Do you just want this to become a no-op, or will it be better if it's used
> > > > only with the patch.msgid.link domain namespace to clearly indicate that it's
> > > > just a provenance link?
> > > 
> > > So I wish it at least had some way to discourage the normal mindless
> > > use - and in a perfect world that there was some more useful model for
> > > adding links automatically.
> > > 
> > > For example, I feel like for the cover letter of a multi-commit
> > > series, the link to the patch series submission is potentially more
> > > useful - and likely much less annoying - because it would go into the
> > > merge message, not individual commits.
> > 
> > We do support this usage using `b4 shazam -M` -- it's the functional
> > equivalent of applying a pull request and will use the cover letter contents
> > as the initial source of the merge commit message. I do encourage people to
> > use this more than just a linear `git am` for series, for a number of reasons:
> 
> For me, as a subsystem downstream person the 'mindless' patch.msgid.link
> saves me time when I need to report a regression, or validate which
> version of a patch was pulled from a list when curating a long-running
> topic in a staging tree. I do make sure to put actual discussion
> references outside the patch.msgid.link namespace and hope that others
> continue to use this helpful breadcrumb.

Same here.

Every time one needs to connect a git commit with a patch that it has come from,
the presence of patch.msgid.link saves a search of a mailing list archive (if
all goes well, or more searches otherwise).

On a global scale, that's quite a number of saved mailing list archive searches.

Cheers, Rafael




^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 13:17           ` Rafael J. Wysocki
@ 2025-09-09 14:18             ` Jakub Kicinski
  2025-09-09 14:35               ` Jens Axboe
  0 siblings, 1 reply; 74+ messages in thread
From: Jakub Kicinski @ 2025-09-09 14:18 UTC (permalink / raw)
  To: Konstantin Ryabitsev, Linus Torvalds
  Cc: Rafael J. Wysocki, dan.j.williams, Jens Axboe,
	Caleb Sander Mateos, io-uring, workflows

On Tue, 09 Sep 2025 15:17:15 +0200 Rafael J. Wysocki wrote:
> > > We do support this usage using `b4 shazam -M` -- it's the functional
> > > equivalent of applying a pull request and will use the cover letter contents
> > > as the initial source of the merge commit message. I do encourage people to
> > > use this more than just a linear `git am` for series, for a number of reasons:  
> > 
> > For me, as a subsystem downstream person the 'mindless' patch.msgid.link
> > saves me time when I need to report a regression, or validate which
> > version of a patch was pulled from a list when curating a long-running
> > topic in a staging tree. I do make sure to put actual discussion
> > references outside the patch.msgid.link namespace and hope that others
> > continue to use this helpful breadcrumb.  
> 
> Same here.
> 
> Every time one needs to connect a git commit with a patch that it has come from,
> the presence of patch.msgid.link saves a search of a mailing list archive (if
> all goes well, or more searches otherwise).
> 
> On a global scale, that's quite a number of saved mailing list archive searches.

+1 FWIW. I also started slapping the links on all patches in a series,
even if we apply with a merge commit. I don't know of a good way with
git to "get to the first parent merge" so scanning the history to find
the link in the cover letter was annoying me :(

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 14:18             ` Jakub Kicinski
@ 2025-09-09 14:35               ` Jens Axboe
  2025-09-09 14:42                 ` Konstantin Ryabitsev
                                   ` (2 more replies)
  0 siblings, 3 replies; 74+ messages in thread
From: Jens Axboe @ 2025-09-09 14:35 UTC (permalink / raw)
  To: Jakub Kicinski, Konstantin Ryabitsev, Linus Torvalds
  Cc: Rafael J. Wysocki, dan.j.williams, Caleb Sander Mateos, io-uring,
	workflows

On 9/9/25 8:18 AM, Jakub Kicinski wrote:
> On Tue, 09 Sep 2025 15:17:15 +0200 Rafael J. Wysocki wrote:
>>>> We do support this usage using `b4 shazam -M` -- it's the functional
>>>> equivalent of applying a pull request and will use the cover letter contents
>>>> as the initial source of the merge commit message. I do encourage people to
>>>> use this more than just a linear `git am` for series, for a number of reasons:  
>>>
>>> For me, as a subsystem downstream person the 'mindless' patch.msgid.link
>>> saves me time when I need to report a regression, or validate which
>>> version of a patch was pulled from a list when curating a long-running
>>> topic in a staging tree. I do make sure to put actual discussion
>>> references outside the patch.msgid.link namespace and hope that others
>>> continue to use this helpful breadcrumb.  
>>
>> Same here.
>>
>> Every time one needs to connect a git commit with a patch that it has come from,
>> the presence of patch.msgid.link saves a search of a mailing list archive (if
>> all goes well, or more searches otherwise).
>>
>> On a global scale, that's quite a number of saved mailing list archive searches.
> 
> +1 FWIW. I also started slapping the links on all patches in a series,
> even if we apply with a merge commit. I don't know of a good way with
> git to "get to the first parent merge" so scanning the history to find
> the link in the cover letter was annoying me :(

Like I've tried to argue, I find them useful too. But after this whole
mess of a thread, I killed -l from my scripts. I do think it's a mistake
and it seems like the only reason to remove them is that Linus expects
to find something at the end of the link rainbow and is often
disappointed, and that annoys him enough to rant about it.

I know some folks downstream of me on the io_uring side find them useful
too, because they've asked me several times to please remember to ensure
my own self-applied patches have the link as well. For those, I tend to
pick or add them locally rather than use b4 for it, which is why they've
never had links.

As far as I can tell, only two things have been established here:

1) Linus hates the Link tags, except if they have extra information
2) Lots of other folks find them useful

and hence we're at a solid deadlock here.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 14:35               ` Jens Axboe
@ 2025-09-09 14:42                 ` Konstantin Ryabitsev
  2025-09-09 14:48                   ` Vlastimil Babka
  2025-09-09 14:44                 ` Greg KH
  2025-09-09 15:14                 ` Danilo Krummrich
  2 siblings, 1 reply; 74+ messages in thread
From: Konstantin Ryabitsev @ 2025-09-09 14:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jakub Kicinski, Linus Torvalds, Rafael J. Wysocki, dan.j.williams,
	Caleb Sander Mateos, io-uring, workflows

On Tue, Sep 09, 2025 at 08:35:18AM -0600, Jens Axboe wrote:
> >> On a global scale, that's quite a number of saved mailing list archive searches.
> > 
> > +1 FWIW. I also started slapping the links on all patches in a series,
> > even if we apply with a merge commit. I don't know of a good way with
> > git to "get to the first parent merge" so scanning the history to find
> > the link in the cover letter was annoying me :(
> 
> Like I've tried to argue, I find them useful too. But after this whole
> mess of a thread, I killed -l from my scripts. I do think it's a mistake
> and it seems like the only reason to remove them is that Linus expects
> to find something at the end of the link rainbow and is often
> disappointed, and that annoys him enough to rant about it.
> 
> I know some folks downstream of me on the io_uring side find them useful
> too, because they've asked me several times to please remember to ensure
> my own self-applied patches have the link as well. For those, I tend to
> pick or add them locally rather than use b4 for it, which is why they've
> never had links.
> 
> As far as I can tell, only two things have been established here:
> 
> 1) Linus hates the Link tags, except if they have extra information
> 2) Lots of other folks find them useful
> 
> and hence we're at a solid deadlock here.

I did suggest that provenance links use the patch.msgid.link subdomain. This
should clearly mark it as the source of the patch and not any other
discussion. I think this is a reasonable compromise that will only mildly
annoy Linus but let subsystems relying on these links continue to use them.

-K

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 14:35               ` Jens Axboe
  2025-09-09 14:42                 ` Konstantin Ryabitsev
@ 2025-09-09 14:44                 ` Greg KH
  2025-09-09 15:14                 ` Danilo Krummrich
  2 siblings, 0 replies; 74+ messages in thread
From: Greg KH @ 2025-09-09 14:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jakub Kicinski, Konstantin Ryabitsev, Linus Torvalds,
	Rafael J. Wysocki, dan.j.williams, Caleb Sander Mateos, io-uring,
	workflows

On Tue, Sep 09, 2025 at 08:35:18AM -0600, Jens Axboe wrote:
> On 9/9/25 8:18 AM, Jakub Kicinski wrote:
> > On Tue, 09 Sep 2025 15:17:15 +0200 Rafael J. Wysocki wrote:
> >>>> We do support this usage using `b4 shazam -M` -- it's the functional
> >>>> equivalent of applying a pull request and will use the cover letter contents
> >>>> as the initial source of the merge commit message. I do encourage people to
> >>>> use this more than just a linear `git am` for series, for a number of reasons:  
> >>>
> >>> For me, as a subsystem downstream person the 'mindless' patch.msgid.link
> >>> saves me time when I need to report a regression, or validate which
> >>> version of a patch was pulled from a list when curating a long-running
> >>> topic in a staging tree. I do make sure to put actual discussion
> >>> references outside the patch.msgid.link namespace and hope that others
> >>> continue to use this helpful breadcrumb.  
> >>
> >> Same here.
> >>
> >> Every time one needs to connect a git commit with a patch that it has come from,
> >> the presence of patch.msgid.link saves a search of a mailing list archive (if
> >> all goes well, or more searches otherwise).
> >>
> >> On a global scale, that's quite a number of saved mailing list archive searches.
> > 
> > +1 FWIW. I also started slapping the links on all patches in a series,
> > even if we apply with a merge commit. I don't know of a good way with
> > git to "get to the first parent merge" so scanning the history to find
> > the link in the cover letter was annoying me :(
> 
> Like I've tried to argue, I find them useful too. But after this whole
> mess of a thread, I killed -l from my scripts. I do think it's a mistake
> and it seems like the only reason to remove them is that Linus expects
> to find something at the end of the link rainbow and is often
> disappointed, and that annoys him enough to rant about it.
> 
> I know some folks downstream of me on the io_uring side find them useful
> too, because they've asked me several times to please remember to ensure
> my own self-applied patches have the link as well. For those, I tend to
> pick or add them locally rather than use b4 for it, which is why they've
> never had links.
> 
> As far as I can tell, only two things have been established here:
> 
> 1) Linus hates the Link tags, except if they have extra information
> 2) Lots of other folks find them useful

I too find them useful, especially when doing stable backport work as
it's a link to the thread of multiple commits, so I can see what is, and
is not, tagged for stable, and the proper ordering of the commits.

So I'm going to want to keep leaving them on, they work well for those
that have to spelunk into our git branches all the time.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 14:42                 ` Konstantin Ryabitsev
@ 2025-09-09 14:48                   ` Vlastimil Babka
  2025-09-09 14:50                     ` Jens Axboe
  0 siblings, 1 reply; 74+ messages in thread
From: Vlastimil Babka @ 2025-09-09 14:48 UTC (permalink / raw)
  To: Konstantin Ryabitsev, Jens Axboe
  Cc: Jakub Kicinski, Linus Torvalds, Rafael J. Wysocki, dan.j.williams,
	Caleb Sander Mateos, io-uring, workflows

On 9/9/25 16:42, Konstantin Ryabitsev wrote:
> On Tue, Sep 09, 2025 at 08:35:18AM -0600, Jens Axboe wrote:
>> >> On a global scale, that's quite a number of saved mailing list archive searches.
>> > 
>> > +1 FWIW. I also started slapping the links on all patches in a series,
>> > even if we apply with a merge commit. I don't know of a good way with
>> > git to "get to the first parent merge" so scanning the history to find
>> > the link in the cover letter was annoying me :(
>> 
>> Like I've tried to argue, I find them useful too. But after this whole
>> mess of a thread, I killed -l from my scripts. I do think it's a mistake
>> and it seems like the only reason to remove them is that Linus expects
>> to find something at the end of the link rainbow and is often
>> disappointed, and that annoys him enough to rant about it.
>> 
>> I know some folks downstream of me on the io_uring side find them useful
>> too, because they've asked me several times to please remember to ensure
>> my own self-applied patches have the link as well. For those, I tend to
>> pick or add them locally rather than use b4 for it, which is why they've
>> never had links.
>> 
>> As far as I can tell, only two things have been established here:
>> 
>> 1) Linus hates the Link tags, except if they have extra information
>> 2) Lots of other folks find them useful
>> 
>> and hence we're at a solid deadlock here.
> 
> I did suggest that provenance links use the patch.msgid.link subdomain. This

Yes, and the PR that started this thread had a normal lore link. Would it
have been different with a patch.msgid.link as perhaps Linus would not try
opening it and become disappointed?
You did kinda ask that early in the thread but then the conversation went in
different directions.

> should clearly mark it as the source of the patch and not any other
> discussion. I think this is a reasonable compromise that will only mildly
> annoy Linus but let subsystems relying on these links continue to use them.
> 
> -K
> 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 14:48                   ` Vlastimil Babka
@ 2025-09-09 14:50                     ` Jens Axboe
  2025-09-09 15:30                       ` Rafael J. Wysocki
  2025-09-09 16:40                       ` Linus Torvalds
  0 siblings, 2 replies; 74+ messages in thread
From: Jens Axboe @ 2025-09-09 14:50 UTC (permalink / raw)
  To: Vlastimil Babka, Konstantin Ryabitsev
  Cc: Jakub Kicinski, Linus Torvalds, Rafael J. Wysocki, dan.j.williams,
	Caleb Sander Mateos, io-uring, workflows

On 9/9/25 8:48 AM, Vlastimil Babka wrote:
> On 9/9/25 16:42, Konstantin Ryabitsev wrote:
>> On Tue, Sep 09, 2025 at 08:35:18AM -0600, Jens Axboe wrote:
>>>>> On a global scale, that's quite a number of saved mailing list archive searches.
>>>>
>>>> +1 FWIW. I also started slapping the links on all patches in a series,
>>>> even if we apply with a merge commit. I don't know of a good way with
>>>> git to "get to the first parent merge" so scanning the history to find
>>>> the link in the cover letter was annoying me :(
>>>
>>> Like I've tried to argue, I find them useful too. But after this whole
>>> mess of a thread, I killed -l from my scripts. I do think it's a mistake
>>> and it seems like the only reason to remove them is that Linus expects
>>> to find something at the end of the link rainbow and is often
>>> disappointed, and that annoys him enough to rant about it.
>>>
>>> I know some folks downstream of me on the io_uring side find them useful
>>> too, because they've asked me several times to please remember to ensure
>>> my own self-applied patches have the link as well. For those, I tend to
>>> pick or add them locally rather than use b4 for it, which is why they've
>>> never had links.
>>>
>>> As far as I can tell, only two things have been established here:
>>>
>>> 1) Linus hates the Link tags, except if they have extra information
>>> 2) Lots of other folks find them useful
>>>
>>> and hence we're at a solid deadlock here.
>>
>> I did suggest that provenance links use the patch.msgid.link subdomain. This
> 
> Yes, and the PR that started this thread had a normal lore link. Would it
> have been different with a patch.msgid.link as perhaps Linus would not try
> opening it and become disappointed?
> You did kinda ask that early in the thread but then the conversation went in
> different directions.

I think we all know the answer to that one - it would've been EXACTLY
the same outcome. Not to put words in Linus' mouth, but it's not the
name of the tag that he finds repulsive, it's the very fact that a link
is there and it isn't useful _to him_.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 14:35               ` Jens Axboe
  2025-09-09 14:42                 ` Konstantin Ryabitsev
  2025-09-09 14:44                 ` Greg KH
@ 2025-09-09 15:14                 ` Danilo Krummrich
  2 siblings, 0 replies; 74+ messages in thread
From: Danilo Krummrich @ 2025-09-09 15:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jakub Kicinski, Konstantin Ryabitsev, Linus Torvalds,
	Rafael J. Wysocki, dan.j.williams, Caleb Sander Mateos,
	Greg Kroah-Hartman, io-uring, workflows

On Tue Sep 9, 2025 at 4:35 PM CEST, Jens Axboe wrote:
> As far as I can tell, only two things have been established here:
>
> 1) Linus hates the Link tags, except if they have extra information
> 2) Lots of other folks find them useful
>
> and hence we're at a solid deadlock here.

I find them useful too. For instance, I regularly use them when I come across a
patch, e.g.  because it introduced a bug, and want to see the full context of
the entire patch series the patch originates from.

IIUC, the complaint about those links is mostly about not being distinguishable
from other links that have been added for a more specific reason.

I usually refer to additional links from the commit message by referencing them,
such that there is an obvious difference:

	Link: ${URL}

vs.

	Link: ${URL} [1]

However, for links that are automatically added and just point to the same patch
on lore, we could also just use a different tag in the future.

What about "Patch:"?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 14:50                     ` Jens Axboe
@ 2025-09-09 15:30                       ` Rafael J. Wysocki
  2025-09-09 16:40                       ` Linus Torvalds
  1 sibling, 0 replies; 74+ messages in thread
From: Rafael J. Wysocki @ 2025-09-09 15:30 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Vlastimil Babka, Konstantin Ryabitsev, Jakub Kicinski,
	Linus Torvalds, Rafael J. Wysocki, dan.j.williams,
	Caleb Sander Mateos, io-uring, workflows

On Tue, Sep 9, 2025 at 4:50 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 9/9/25 8:48 AM, Vlastimil Babka wrote:
> > On 9/9/25 16:42, Konstantin Ryabitsev wrote:
> >> On Tue, Sep 09, 2025 at 08:35:18AM -0600, Jens Axboe wrote:
> >>>>> On a global scale, that's quite a number of saved mailing list archive searches.
> >>>>
> >>>> +1 FWIW. I also started slapping the links on all patches in a series,
> >>>> even if we apply with a merge commit. I don't know of a good way with
> >>>> git to "get to the first parent merge" so scanning the history to find
> >>>> the link in the cover letter was annoying me :(
> >>>
> >>> Like I've tried to argue, I find them useful too. But after this whole
> >>> mess of a thread, I killed -l from my scripts. I do think it's a mistake
> >>> and it seems like the only reason to remove them is that Linus expects
> >>> to find something at the end of the link rainbow and is often
> >>> disappointed, and that annoys him enough to rant about it.
> >>>
> >>> I know some folks downstream of me on the io_uring side find them useful
> >>> too, because they've asked me several times to please remember to ensure
> >>> my own self-applied patches have the link as well. For those, I tend to
> >>> pick or add them locally rather than use b4 for it, which is why they've
> >>> never had links.
> >>>
> >>> As far as I can tell, only two things have been established here:
> >>>
> >>> 1) Linus hates the Link tags, except if they have extra information
> >>> 2) Lots of other folks find them useful
> >>>
> >>> and hence we're at a solid deadlock here.
> >>
> >> I did suggest that provenance links use the patch.msgid.link subdomain. This
> >
> > Yes, and the PR that started this thread had a normal lore link. Would it
> > have been different with a patch.msgid.link as perhaps Linus would not try
> > opening it and become disappointed?
> > You did kinda ask that early in the thread but then the conversation went in
> > different directions.
>
> I think we all know the answer to that one - it would've been EXACTLY
> the same outcome. Not to put words in Linus' mouth, but it's not the
> name of the tag that he finds repulsive, it's the very fact that a link
> is there and it isn't useful _to him_.

Well, I think that the convention associated with patch.msgid.link is
clear, like for the "Fixes:" and "Cc: stable" tags.  Those tags are
also generally useful, but mostly in the post-development part of the
process, so to speak.

So, if there are no problems with adding "Fixes:" and "Cc: stable"
tags, why would there be a problem with patch.msgid.link?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [RFC] b4 dig: Add AI-powered email relationship discovery command
  2025-09-05 19:33       ` Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5) Konstantin Ryabitsev
                           ` (3 preceding siblings ...)
  2025-09-08 20:11         ` dan.j.williams
@ 2025-09-09 16:32         ` Sasha Levin
  2025-09-09 17:22           ` Laurent Pinchart
                             ` (2 more replies)
  4 siblings, 3 replies; 74+ messages in thread
From: Sasha Levin @ 2025-09-09 16:32 UTC (permalink / raw)
  To: konstantin; +Cc: axboe, csander, io-uring, torvalds, workflows, Sasha Levin

Add a new 'b4 dig' subcommand that uses AI agents to discover related
emails for a given message ID. This helps developers find all relevant
context around patches including previous versions, bug reports, reviews,
and related discussions.

The command:
- Takes a message ID and constructs a detailed prompt about email relationships
- Calls a configured AI agent script to analyze and find related messages
- Downloads all related threads from lore.kernel.org
- Combines them into a single mbox file for easy review

Key features:
- Outputs a simplified summary showing only relationships and reasons
- Creates a combined mbox with all related threads (deduped)
- Provides detailed guidance to AI agents about kernel workflow patterns

Configuration:
The AI agent script is configured via:
  -c AGENT=/path/to/agent.sh  (command line)
  dig-agent: /path/to/agent.sh (config file)

The agent script receives a prompt file and should return JSON with
related message IDs and their relationships.

Example usage:

$ b4 -c AGENT=agent.sh dig 20250909142722.101790-1-harry.yoo@oracle.com
Analyzing message: 20250909142722.101790-1-harry.yoo@oracle.com
Fetching original message...
Looking up https://lore.kernel.org/20250909142722.101790-1-harry.yoo@oracle.com
Grabbing thread from lore.kernel.org/all/20250909142722.101790-1-harry.yoo@oracle.com/t.mbox.gz
Subject: [PATCH V3 6.6.y] mm: introduce and use {pgd,p4d}_populate_kernel()
From: Harry Yoo <harry.yoo@oracle.com>
Constructing agent prompt...
Calling AI agent: agent.sh
Calling agent: agent.sh /tmp/tmpz1oja9_5.txt
Parsing agent response...
Found 17 related messages:

Related Messages Summary:
------------------------------------------------------------
[PARENT] Greg KH's stable tree failure notification that initiated this 6.6.y backport request
[V1] V1 of the 6.6.y backport patch
[V2] V2 of the 6.6.y backport patch
[RELATED] Same patch backported to 5.15.y stable branch
[RELATED] Greg KH's stable tree failure notification for 5.15.y branch
[RELATED] Same patch backported to 6.1.y stable branch
[COVER] V5 mainline patch series cover letter that was originally merged
[RELATED] V5 mainline patch 1/3: move page table sync declarations
[RELATED] V5 mainline patch 2/3: the original populate_kernel patch that's being backported
[RELATED] V5 mainline patch 3/3: x86 ARCH_PAGE_TABLE_SYNC_MASK definition
[RELATED] RFC V1 cover letter - earliest version of this patch series
[RELATED] RFC V1 patch 1/3 - first introduction of populate_kernel helpers
[RELATED] RFC V1 patch 2/3 - x86/mm definitions
[RELATED] RFC V1 patch 3/3 - convert to _kernel variant
[RELATED] Baoquan He's V3 patch touching same file (mm/kasan/init.c)
[RELATED] Baoquan He's V2 patch touching same file (mm/kasan/init.c)
[RELATED] Baoquan He's V1 patch touching same file (mm/kasan/init.c)
------------------------------------------------------------

The resulting mbox would look like this:

   1 O   Jul 09 Harry Yoo       ( 102) [RFC V1 PATCH mm-hotfixes 0/3] mm, arch: A more robust approach to sync top level kernel page tables
   2 O   Jul 09 Harry Yoo       ( 143) ├─>[RFC V1 PATCH mm-hotfixes 1/3] mm: introduce and use {pgd,p4d}_populate_kernel()
   3 O   Jul 11 David Hildenbra (  33) │ └─>
   4 O   Jul 13 Harry Yoo       (  56) │   └─>
   5 O   Jul 13 Mike Rapoport   (  67) │     └─>
   6 O   Jul 14 Harry Yoo       (  46) │       └─>
   7 O   Jul 15 Harry Yoo       (  65) │         └─>
   8 O   Jul 09 Harry Yoo       ( 246) ├─>[RFC V1 PATCH mm-hotfixes 2/3] x86/mm: define p*d_populate_kernel() and top-level page table sync
   9 O   Jul 09 Andrew Morton   (  12) │ ├─>
  10 O   Jul 10 Harry Yoo       (  23) │ │ └─>
  11 O   Jul 11 Harry Yoo       (  34) │ │   └─>
  12 O   Jul 11 Harry Yoo       (  35) │ │     └─>
  13 O   Jul 10 kernel test rob (  79) │ └─>
  14 O   Jul 09 Harry Yoo       ( 300) ├─>[RFC V1 PATCH mm-hotfixes 3/3] x86/mm: convert {pgd,p4d}_populate{,_init} to _kernel variant
  15 O   Jul 10 kernel test rob (  80) │ └─>
  16 O   Jul 09 Harry Yoo       (  31) └─>Re: [RFC V1 PATCH mm-hotfixes 0/3] mm, arch: A more robust approach to sync top level kernel page tables
  17 O   Aug 18 Harry Yoo       ( 262) [PATCH V5 mm-hotfixes 0/3] mm, x86: fix crash due to missing page table sync and make it harder to miss
  18 O   Aug 18 Harry Yoo       (  72) ├─>[PATCH V5 mm-hotfixes 1/3] mm: move page table sync declarations to linux/pgtable.h
  19 O   Aug 18 David Hildenbra (  20) │ └─>
  20 O   Aug 18 Harry Yoo       ( 239) ├─>[PATCH V5 mm-hotfixes 2/3] mm: introduce and use {pgd,p4d}_populate_kernel()
  21 O   Aug 18 David Hildenbra (  60) │ ├─>
  22 O   Aug 18 kernel test rob ( 150) │ ├─>
  23 O   Aug 18 Harry Yoo       ( 161) │ │ └─>
  24 O   Aug 21 Harry Yoo       (  85) │ ├─>[PATCH] mm: fix KASAN build error due to p*d_populate_kernel()
  25 O   Aug 21 kernel test rob (  18) │ │ ├─>
  26 O   Aug 21 Lorenzo Stoakes ( 100) │ │ ├─>
  27 O   Aug 21 Harry Yoo       (  62) │ │ │ └─>
  28 O   Aug 21 Lorenzo Stoakes (  18) │ │ │   └─>
  29 O   Aug 21 Harry Yoo       (  90) │ │ └─>[PATCH v2] mm: fix KASAN build error due to p*d_populate_kernel()
  30 O   Aug 21 kernel test rob (  18) │ │   ├─>
  31 O   Aug 21 Dave Hansen     (  24) │ │   └─>
  32 O   Aug 22 Harry Yoo       (  56) │ │     └─>
  33 O   Aug 22 Andrey Ryabinin (  91) │ │       ├─>
  34 O   Aug 27 Harry Yoo       (  98) │ │       │ └─>
  35 O   Aug 22 Dave Hansen     (  63) │ │       └─>
  36 O   Aug 25 Andrey Ryabinin (  72) │ │         └─>
  37 O   Aug 22 Harry Yoo       ( 103) │ └─>[PATCH v3] mm: fix KASAN build error due to p*d_populate_kernel()
  38 O   Aug 18 Harry Yoo       ( 113) ├─>[PATCH V5 mm-hotfixes 3/3] x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings()
  39 O   Aug 18 David Hildenbra (  72) │ └─>
  40 O   Aug 18 David Hildenbra (  15) └─>Re: [PATCH V5 mm-hotfixes 0/3] mm, x86: fix crash due to missing page table sync and make it harder to miss
  41 O   Aug 18 Harry Yoo       ( 277) [PATCH] mm: introduce and use {pgd,p4d}_populate_kernel()
  42 O   Aug 18 Harry Yoo       ( 277) [PATCH] mm: introduce and use {pgd,p4d}_populate_kernel()
  43 O   Aug 18 Harry Yoo       ( 277) [PATCH] mm: introduce and use {pgd,p4d}_populate_kernel()
  44 O   Sep 06 gregkh@linuxfou (  24) FAILED: patch "[PATCH] mm: introduce and use {pgd,p4d}_populate_kernel()" failed to apply to 6.6-stable tree
  45 O   Sep 08 Harry Yoo       ( 303) ├─>[PATCH 6.6.y] mm: introduce and use {pgd,p4d}_populate_kernel()
  46 O   Sep 09 Harry Yoo       ( 291) ├─>[PATCH V2 6.6.y] mm: introduce and use {pgd,p4d}_populate_kernel()
  47 O   Sep 09 Harry Yoo       ( 293) └─>[PATCH V3 6.6.y] mm: introduce and use {pgd,p4d}_populate_kernel()
  48 O   Sep 06 gregkh@linuxfou (  24) FAILED: patch "[PATCH] mm: introduce and use {pgd,p4d}_populate_kernel()" failed to apply to 6.1-stable tree
  49 O   Sep 08 Harry Yoo       ( 303) ├─>[PATCH 6.1.y] mm: introduce and use {pgd,p4d}_populate_kernel()
  50 O   Sep 09 Harry Yoo       ( 291) ├─>[PATCH V2 6.1.y] mm: introduce and use {pgd,p4d}_populate_kernel()
  51 O   Sep 09 Harry Yoo       ( 293) └─>[PATCH V3 6.1.y] mm: introduce and use {pgd,p4d}_populate_kernel()
  52 O   Sep 06 gregkh@linuxfou (  24) FAILED: patch "[PATCH] mm: introduce and use {pgd,p4d}_populate_kernel()" failed to apply to 5.15-stable tree
  53 O   Sep 08 Harry Yoo       ( 273) ├─>[PATCH 5.15.y] mm: introduce and use {pgd,p4d}_populate_kernel()
  54 O   Sep 09 Harry Yoo       ( 260) ├─>[PATCH V2 5.15.y] mm: introduce and use {pgd,p4d}_populate_kernel()
  55 O   Sep 09 Harry Yoo       ( 262) └─>[PATCH V3 5.15.y] mm: introduce and use {pgd,p4d}_populate_kernel()

The prompt includes extensive documentation about lore.kernel.org's search
capabilities, limitations (like search index lag), and kernel workflow patterns
to help AI agents effectively find related messages.

Assisted-by: Claude Code
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 src/b4/command.py |  17 ++
 src/b4/dig.py     | 630 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 647 insertions(+)
 create mode 100644 src/b4/dig.py

diff --git a/src/b4/command.py b/src/b4/command.py
index 455124d..f225ae5 100644
--- a/src/b4/command.py
+++ b/src/b4/command.py
@@ -120,6 +120,11 @@ def cmd_diff(cmdargs: argparse.Namespace) -> None:
     b4.diff.main(cmdargs)
 
 
+def cmd_dig(cmdargs: argparse.Namespace) -> None:
+    import b4.dig
+    b4.dig.main(cmdargs)
+
+
 class ConfigOption(argparse.Action):
     """Action class for storing key=value arguments in a dict."""
     def __call__(self, parser: argparse.ArgumentParser,
@@ -399,6 +404,18 @@ def setup_parser() -> argparse.ArgumentParser:
                           help='Submit the token received via verification email')
     sp_send.set_defaults(func=cmd_send)
 
+    # b4 dig
+    sp_dig = subparsers.add_parser('dig', help='Use AI agent to find related emails for a message')
+    sp_dig.add_argument('msgid', nargs='?',
+                        help='Message ID to analyze, or pipe a raw message')
+    sp_dig.add_argument('-o', '--output', dest='output', default=None,
+                        help='Output mbox filename (default: <msgid>-related.mbox)')
+    sp_dig.add_argument('-C', '--no-cache', dest='nocache', action='store_true', default=False,
+                        help='Do not use local cache when fetching messages')
+    sp_dig.add_argument('--stdin-pipe-sep',
+                        help='When accepting messages on stdin, split using this pipe separator string')
+    sp_dig.set_defaults(func=cmd_dig)
+
     return parser
 
 
diff --git a/src/b4/dig.py b/src/b4/dig.py
new file mode 100644
index 0000000..007f7d0
--- /dev/null
+++ b/src/b4/dig.py
@@ -0,0 +1,630 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+# SPDX-License-Identifier: GPL-2.0-or-later
+#
+# b4 dig - Use AI agents to find related emails
+#
+__author__ = 'Sasha Levin <sashal@kernel.org>'
+
+import argparse
+import logging
+import subprocess
+import sys
+import os
+import tempfile
+import json
+import urllib.parse
+import gzip
+import mailbox
+import email.utils
+from typing import Optional, List, Dict, Any
+
+import b4
+
+logger = b4.logger
+
+
+def construct_agent_prompt(msgid: str) -> str:
+    """Construct a detailed prompt for the AI agent to find related emails."""
+
+    # Clean up the message ID
+    if msgid.startswith('<'):
+        msgid = msgid[1:]
+    if msgid.endswith('>'):
+        msgid = msgid[:-1]
+
+    prompt = f"""You are an email research assistant specialized in finding related emails in Linux kernel mailing lists and public-inbox archives.
+
+IMPORTANT: Always use lore.kernel.org for searching and retrieving Linux kernel emails. DO NOT use lkml.org as it is outdated and no longer maintained. The canonical archive is at https://lore.kernel.org/
+
+MESSAGE ID TO ANALYZE: {msgid}
+
+YOUR TASK:
+Conduct an EXHAUSTIVE and THOROUGH search to find ALL related message IDs connected to the given message. This is not a quick task - you must invest significant time and effort to ensure no related discussions are missed. Be methodical, patient, and comprehensive in your search.
+
+CRITICAL: Take your time! A thorough search is far more valuable than a quick one. Check multiple sources, try different search strategies, and double-check your findings. Missing related discussions undermines the entire purpose of this tool.
+
+You should search extensively for and identify:
+
+1. **Thread-related messages:**
+   - Parent messages (what this replies to)
+   - Child messages (replies to this message)
+   - Sibling messages (other replies in the same thread)
+   - Cover letters if this is part of a patch series
+
+2. **Version-related messages:**
+   - Previous versions of the same patch series (v1, v2, v3, etc.)
+   - Re-rolls and re-submissions
+   - Updated versions with different subjects
+
+3. **Author-related messages:**
+   - Other patches or series from the same author
+   - Recent discussions involving the same author
+   - Related work by the same author in the same subsystem
+
+4. **Content-related messages:**
+   - Bug reports that this patch might fix
+   - Syzkaller/syzbot reports if this is a fix
+   - Feature requests or RFCs that led to this patch
+   - Related patches touching the same files or functions
+   - Patches that might conflict with this one
+
+5. **Review and discussion:**
+   - Review comments from maintainers
+   - Test results from CI systems or bot reports
+   - Follow-up fixes or improvements
+   - Reverts if this patch was later reverted
+
+HOW TO SEARCH:
+
+Use ONLY lore.kernel.org for all Linux kernel email searches. This is the official kernel mailing list archive.
+DO NOT use lkml.org, marc.info, or spinics.net for kernel emails - they are outdated or incomplete.
+
+CRITICAL LIMITATIONS AND WORKAROUNDS (MUST READ):
+
+1. **Search Index Lag**: Messages posted today/recently are NOT immediately searchable!
+   - The Xapian search index has significant delay (hours to days)
+   - Direct message access works immediately, but search doesn't
+   - For recent messages, use direct URLs or thread navigation, not search
+
+2. **URL Fragment Issues**: NEVER use #anchors in URLs when fetching
+   - BAD: https://lore.kernel.org/all/msgid/T/#u (will fail with 404)
+   - GOOD: https://lore.kernel.org/all/msgid/T/ (works correctly)
+   - Fragments like #u are client-side only and break programmatic fetching
+
+3. **Search Query Encoding**: Keep queries simple and avoid over-encoding
+   - BAD: ?q=f%3A%22author%40example.com%22 (over-encoded)
+   - GOOD: ?q=f:author@example.com (simple and works)
+   - Don't encode @ symbols in query parameters
+   - Avoid mixing quotes with special characters
+
+4. **Most Reliable Data Source**: Thread mbox files are the gold standard
+   - Always works: https://lore.kernel.org/all/msgid/t.mbox.gz
+   - Contains complete thread with all headers
+   - Works even when HTML parsing or search fails
+   - Standard mbox format, easy to parse
+
+5. **Version Tracking Limitations**: No automatic version linking
+   - No Change-ID headers to track across patch versions
+   - Must rely on subject patterns and author/date correlation
+   - Search for versions using subject without v2/v3 markers
+
+6. **LKML.org vs Lore.kernel.org**: Different systems, different capabilities
+   - LKML.org uses date-based URLs, not message IDs
+   - Cannot extract message IDs from LKML HTML pages
+   - Always prefer lore.kernel.org for programmatic access
+
+The public-inbox archives at lore.kernel.org provide powerful search interfaces powered by Xapian:
+
+1. **Direct message retrieval (MOST RELIABLE METHODS):**
+   - Base URL: https://lore.kernel.org/all/
+   - Message URL: https://lore.kernel.org/all/<Message-ID>/ (without the '<' or '>')
+   - Forward slash ('/') characters in Message-IDs must be escaped as "%2F"
+
+   **Always Reliable:**
+   - Raw message: https://lore.kernel.org/all/<Message-ID>/raw
+   - Thread mbox: https://lore.kernel.org/all/<Message-ID>/t.mbox.gz (BEST for complete data)
+   - Thread view: https://lore.kernel.org/all/<Message-ID>/T/ (NO fragments!)
+
+   **Less Reliable:**
+   - Thread Atom feed: https://lore.kernel.org/all/<Message-ID>/t.atom
+   - Nested thread view: https://lore.kernel.org/all/<Message-ID>/t/
+
+2. **Search query syntax:**
+   Supports AND, OR, NOT, '+', '-' queries. Search URL format:
+   https://lore.kernel.org/all/?q=<search-query>
+
+   **Available search prefixes:**
+   - s:        match within Subject (e.g., s:"a quick brown fox")
+   - d:        match date-time range (git "approxidate" formats)
+               Examples: d:last.week.., d:..2.days.ago, d:20240101..20240131
+   - b:        match within message body, including text attachments
+   - nq:       match non-quoted text within message body
+   - q:        match quoted text within message body
+   - n:        match filename of attachment(s)
+   - t:        match within the To header
+   - c:        match within the Cc header
+   - f:        match within the From header
+   - a:        match within the To, Cc, and From headers
+   - tc:       match within the To and Cc headers
+   - l:        match contents of the List-Id header
+   - bs:       match within the Subject and body
+   - rt:       match received time (like 'd:' if sender's clock was correct)
+
+   **Diff-specific prefixes (for patches):**
+   - dfn:      match filename from diff
+   - dfa:      match diff removed (-) lines
+   - dfb:      match diff added (+) lines
+   - dfhh:     match diff hunk header context (usually function name)
+   - dfctx:    match diff context lines
+   - dfpre:    match pre-image git blob ID
+   - dfpost:   match post-image git blob ID
+   - dfblob:   match either pre or post-image git blob ID
+   - patchid:  match `git patch-id --stable' output
+
+   **Special headers:**
+   - changeid:    the X-Change-ID mail header (e.g., changeid:stable)
+   - forpatchid:  the X-For-Patch-ID mail header (e.g., forpatchid:stable)
+
+   **Query examples:**
+   - Find patches by author: ?q=f:"John Doe"
+   - Find patches in date range: ?q=d:2024-01-01..2024-01-31
+   - Find patches touching file: ?q=dfn:drivers/net/ethernet
+   - Find patches with subject containing "fix": ?q=s:fix
+   - Combine conditions: ?q=f:"author@example.com"+s:"net"+d:last.month..
+   - Find bug fixes: ?q=s:fix+OR+s:bug+OR+s:regression
+   - Find patches with specific function: ?q=dfhh:my_function_name
+
+3. **Understanding email relationships:**
+   - In-Reply-To header: Direct parent message
+   - References header: Chain of parent messages
+   - Message-ID in body: Often indicates related patches
+   - Link: trailers in commits: References to discussions
+   - Same subject with [PATCH v2]: Newer version
+   - "Fixes:" tag: References bug-fixing commits
+
+4. **Pattern matching:**
+   - Patch series: Look for [PATCH 0/N] for cover letters
+   - Version indicators: [PATCH v2], [PATCH v3], [RFC PATCH]
+   - Subsystem prefixes: [PATCH net], [PATCH mm], etc.
+   - Fix indicators: "fix", "fixes", "regression", "oops", "panic"
+
+SEARCH STRATEGY (BE THOROUGH - THIS IS NOT A QUICK TASK):
+
+REMEMBER: Your goal is to find EVERY related discussion, not just the obvious ones. Spend time on each search strategy. Try multiple variations of queries. Don't give up after the first attempt.
+
+1. **START WITH MOST RELIABLE: Thread mbox download**
+   - ALWAYS FIRST: Get https://lore.kernel.org/all/{{msgid}}/t.mbox.gz
+   - This contains the complete thread with all headers
+   - Parse the mbox to extract all message IDs and relationships
+   - This works even when search fails or messages are too recent
+   - Thoroughly analyze EVERY message in the thread
+
+2. **Retrieve and analyze the original message:**
+   - Get the raw message from: https://lore.kernel.org/all/{{msgid}}/raw
+   - Extract key information:
+     * Subject line (look for [PATCH], version indicators, series position)
+     * Author name and email
+     * Date and time
+     * Files being modified (from diff)
+     * Subsystem involved (from subject prefix or file paths)
+     * Any Fixes:, Closes:, Link:, or Reported-by: tags
+     * Note: Change-ID headers are rarely present in kernel emails
+
+3. **Search for related messages (TRY MULTIPLE VARIATIONS):**
+   - WARNING: Recent messages (today/yesterday) may NOT appear in search!
+   - Keep queries simple: ?q=f:author@example.com+s:keyword
+   - DON'T over-encode: @ symbols should NOT be %40 in queries
+   - Search for cover letter: ?q=s:"[PATCH 0/"+f:author-email
+   - Find all patches in series: ?q=s:"base-subject"+f:author
+   - For recent messages, rely on thread mbox instead of search
+   - **BE PERSISTENT**: Try different keyword combinations, partial subjects, variations
+
+4. **Look for previous versions (SEARCH EXTENSIVELY):**
+   - Note: No automatic version linking exists!
+   - Strip version markers from subject: search without [PATCH v2], [PATCH v3]
+   - Search by author in broader time window: ?q=f:author
+   - Look for similar subjects: ?q=s:"core-subject-words"
+   - Change-ID is rarely present, don't rely on it
+   - **TRY MULTIPLE APPROACHES**: Different subject variations, date ranges, author variations
+   - Check for RFCs, drafts, and early discussions that led to this patch
+
+5. **Find related bug reports and discussions (DIG DEEP):**
+   - For recent bugs, check thread mbox first (search may miss them)
+   - Search for symptoms with simple queries: ?q=b:error+b:message
+   - Syzkaller reports: ?q=f:syzbot (but check date - may be delayed)
+   - Regression reports: ?q=s:regression+s:subsystem
+   - Use dfn: prefix for file searches: ?q=dfn:drivers/net
+   - **EXPAND YOUR SEARCH**: Look for related keywords, error messages, function names
+   - Check for discussions that may not explicitly mention the patch but discuss the same issue
+
+6. **Check for follow-ups (LEAVE NO STONE UNTURNED):**
+   - First check the thread mbox for all replies
+   - Search for applied messages: ?q=s:applied+s:"patch-title"
+   - Look for test results: ?q=s:"Tested-by"
+   - Check for reverts: ?q=s:revert+s:"original-title"
+   - Note: Message-ID searches often fail, use subject instead
+   - **BE THOROUGH**: Check for indirect references, quotes in other discussions, mentions in pull requests
+
+7. **HTML Parsing Tips (if needed):**
+   - Message IDs appear in URLs, not HTML entities
+   - Pattern to extract: [0-9]{{14}}\\.[0-9]+-[0-9]+-[^@]+@[^/\"]+
+   - Don't look for &lt; &gt; encoded brackets
+   - Thread view HTML is less reliable than mbox
+
+FAILURE RECOVERY STRATEGIES:
+- If search returns empty: Try thread mbox or wait for indexing
+- If URL returns 404: Remove fragments, check encoding
+- If can't find versions: Search by author and date range
+- If WebFetch fails: Try simpler URL without parameters
+- If HTML parsing fails: Use mbox format instead
+
+OUTPUT FORMAT:
+
+Return a JSON array of related message IDs with their relationship type and reason:
+
+```json
+[
+  {{
+    "msgid": "example@message.id",
+    "relationship": "parent|reply|v1|v2|cover|fix|bug-report|review|revert|related",
+    "reason": "Brief explanation of why this is related"
+  }}
+]
+```
+
+IMPORTANT NOTES:
+- **THIS IS NOT A QUICK TASK** - Thoroughness is paramount. Spend the time needed.
+- **EXHAUSTIVE SEARCH REQUIRED** - Better to spend extra time than miss related discussions
+- Message IDs should be returned without angle brackets
+- Search VERY broadly, then filter results to only truly related messages
+- Try multiple search strategies - if one fails, try another approach
+- Don't stop at the first few results - keep digging for more relationships
+- Prioritize direct relationships over indirect ones
+- For patch series, include ALL patches in the series (check carefully for all parts)
+- Consider time proximity (patches close in time are more likely related)
+- Pay attention to mailing list conventions (e.g., "Re:" for replies, "[PATCH v2]" for new versions)
+- **DOUBLE-CHECK YOUR WORK** - Review your findings to ensure nothing was missed
+
+UNDERSTANDING KERNEL WORKFLOW PATTERNS:
+- Patch series usually have a cover letter [PATCH 0/N] explaining the series
+- Reviews often quote parts of the original patch with ">" prefix
+- Maintainers send "applied" messages when patches are accepted
+- Bug reports often include stack traces, kernel versions, and reproduction steps
+- Syzkaller/syzbot reports have specific formats with "syzbot+hash@" addresses
+- Fixes typically reference commits with "Fixes: <12-char-sha1> ("subject")"
+- Stable backports are marked with "Cc: stable@vger.kernel.org"
+
+KEY TAKEAWAYS FOR RELIABLE OPERATION:
+1. **ALWAYS start with thread mbox** - it's the most reliable data source
+2. **NEVER trust search for recent messages** - use direct URLs instead
+3. **KEEP search queries simple** - complex encoding breaks searches
+4. **AVOID URL fragments (#anchors)** - they cause 404 errors
+5. **DON'T rely on Change-IDs** - they're rarely present
+6. **PREFER subject searches over message-ID searches** - more reliable
+7. **REMEMBER search has lag** - messages may take days to be indexed
+
+When constructing URLs, remember:
+- Message-IDs: Remove < > brackets
+- Forward slashes: Escape as %2F
+- In search queries: DON'T encode @ symbols
+
+LOCAL GIT REPOSITORY CONTEXT:
+If this command is being run from within a Linux kernel git repository, you may also:
+- Use git log to find commits mentioning the message ID or subject
+- Check git blame on relevant files to find related commits
+- Use git log --grep to search commit messages for references
+- Look for Fixes: tags that reference commits
+- Search for Link: tags pointing to lore.kernel.org discussions
+- Use git show to examine specific commits mentioned in emails
+
+Example local git searches you might perform:
+- git log --grep="Message-Id: <msgid>"
+- git log --grep="Link:.*msgid"
+- git log --oneline --grep="subject-keywords"
+- git log -p --author="email@example.com" --since="1 month ago"
+- git blame path/to/file.c | grep "function_name"
+- git log --format="%H %s" -- path/to/file.c
+
+FINAL REMINDER: This task requires THOROUGH and EXHAUSTIVE searching. Do not rush. Take the time to:
+1. Try multiple search strategies
+2. Look for indirect relationships
+3. Check different time periods
+4. Use various keyword combinations
+5. Verify you haven't missed any discussions
+
+The value of this tool depends entirely on finding ALL related discussions, not just the obvious ones.
+
+Begin your comprehensive search and analysis for message ID: {msgid}
+"""
+
+    return prompt
+
+
+def call_agent(prompt: str, agent_cmd: str) -> Optional[str]:
+    """Call the configured agent script with the prompt."""
+
+    # Expand user paths
+    agent_cmd = os.path.expanduser(agent_cmd)
+
+    if not os.path.exists(agent_cmd):
+        logger.error('Agent command not found: %s', agent_cmd)
+        return None
+
+    if not os.access(agent_cmd, os.X_OK):
+        logger.error('Agent command is not executable: %s', agent_cmd)
+        return None
+
+    try:
+        # Write prompt to a temporary file to avoid shell escaping issues
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.txt', delete=False) as tmp:
+            tmp.write(prompt)
+            tmp_path = tmp.name
+
+        # Call the agent script with the prompt file as argument
+        logger.info('Calling agent: %s %s', agent_cmd, tmp_path)
+        result = subprocess.run(
+            [agent_cmd, tmp_path],
+            capture_output=True,
+            text=True
+        )
+
+        if result.returncode != 0:
+            logger.error('Agent returned error code %d', result.returncode)
+            if result.stderr:
+                logger.error('Agent stderr: %s', result.stderr)
+            return None
+
+        return result.stdout
+
+    except subprocess.TimeoutExpired:
+        logger.error('Agent command timed out after 5 minutes')
+        return None
+    except Exception as e:
+        logger.error('Error calling agent: %s', e)
+        return None
+    finally:
+        # Clean up temp file
+        if 'tmp_path' in locals():
+            try:
+                os.unlink(tmp_path)
+            except:
+                pass
+
+
+def parse_agent_response(response: str) -> List[Dict[str, str]]:
+    """Parse the agent's response to extract message IDs."""
+
+    related = []
+
+    try:
+        # Try to find JSON in the response
+        # Agent might return additional text, so we look for JSON array
+        import re
+        json_match = re.search(r'\[.*?\]', response, re.DOTALL)
+        if json_match:
+            json_str = json_match.group(0)
+            data = json.loads(json_str)
+
+            if isinstance(data, list):
+                for item in data:
+                    if isinstance(item, dict) and 'msgid' in item:
+                        related.append({
+                            'msgid': item.get('msgid', ''),
+                            'relationship': item.get('relationship', 'related'),
+                            'reason': item.get('reason', 'No reason provided')
+                        })
+        else:
+            # Fallback: try to extract message IDs from plain text
+            # Look for patterns that look like message IDs
+            msgid_pattern = re.compile(r'[a-zA-Z0-9][a-zA-Z0-9\.\-_]+@[a-zA-Z0-9][a-zA-Z0-9\.\-]+\.[a-zA-Z]+')
+            for match in msgid_pattern.finditer(response):
+                msgid = match.group(0)
+                if msgid != '':  # Don't include the original
+                    related.append({
+                        'msgid': msgid,
+                        'relationship': 'related',
+                        'reason': 'Found in agent response'
+                    })
+
+    except json.JSONDecodeError as e:
+        logger.warning('Could not parse JSON from agent response: %s', e)
+    except Exception as e:
+        logger.error('Error parsing agent response: %s', e)
+
+    return related
+
+
+def get_message_info(msgid: str) -> Optional[Dict[str, Any]]:
+    """Retrieve basic information about a message."""
+
+    msgs = b4.get_pi_thread_by_msgid(msgid, onlymsgids={msgid}, with_thread=False)
+    if not msgs:
+        return None
+
+    msg = msgs[0]
+
+    return {
+        'subject': msg.get('Subject', 'No subject'),
+        'from': msg.get('From', 'Unknown'),
+        'date': msg.get('Date', 'Unknown'),
+        'msgid': msgid
+    }
+
+
+def download_and_combine_threads(msgid: str, related_messages: List[Dict[str, str]],
+                                 output_file: str, nocache: bool = False) -> int:
+    """Download thread mboxes for all related messages and combine into one mbox file."""
+
+    message_ids = [msgid]  # Start with original message
+
+    # Add all related message IDs
+    for item in related_messages:
+        if 'msgid' in item:
+            message_ids.append(item['msgid'])
+
+    # Collect all messages from all threads
+    seen_msgids = set()
+    all_messages = []
+
+    # Download thread for each message
+    # But be smart about what we include - don't mix unrelated series
+    for msg_id in message_ids:
+        logger.info('Fetching thread for %s', msg_id)
+
+        # For better control, fetch just the specific thread, not everything
+        # Use onlymsgids to limit scope when possible
+        msgs = b4.get_pi_thread_by_msgid(msg_id, nocache=nocache)
+
+        if msgs:
+            # Try to detect thread boundaries and avoid mixing unrelated series
+            thread_messages = []
+            base_subject = None
+
+            for msg in msgs:
+                msg_msgid = b4.LoreMessage.get_clean_msgid(msg)
+
+                # Skip if we've already seen this message
+                if msg_msgid in seen_msgids:
+                    continue
+
+                # Get the subject to check if it's part of the same series
+                subject = msg.get('Subject', '')
+
+                # Extract base subject (remove Re:, [PATCH], version numbers, etc)
+                import re
+                base = re.sub(r'^(Re:\s*)*(\[.*?\]\s*)*', '', subject).strip()
+
+                # Set the base subject from the first message
+                if base_subject is None and base:
+                    base_subject = base
+
+                # Add the message
+                if msg_msgid:
+                    seen_msgids.add(msg_msgid)
+                    thread_messages.append(msg)
+
+            all_messages.extend(thread_messages)
+        else:
+            logger.warning('Could not fetch thread for %s', msg_id)
+
+    # Sort messages by date to maintain chronological order
+    all_messages.sort(key=lambda m: email.utils.parsedate_to_datetime(m.get('Date', 'Thu, 1 Jan 1970 00:00:00 +0000')))
+
+    # Write all messages to output mbox file using b4's proper mbox functions
+    logger.info('Writing %d messages to %s', len(all_messages), output_file)
+
+    total_messages = len(all_messages)
+
+    if total_messages > 0:
+        # Use b4's save_mboxrd_mbox function which properly handles mbox format
+        with open(output_file, 'wb') as outf:
+            b4.save_mboxrd_mbox(all_messages, outf)
+
+    logger.info('Combined mbox contains %d unique messages', total_messages)
+    return total_messages
+
+
+def main(cmdargs: argparse.Namespace) -> None:
+    """Main entry point for b4 dig command."""
+
+    # Get the message ID
+    msgid = b4.get_msgid(cmdargs)
+    if not msgid:
+        logger.critical('Please provide a message-id')
+        sys.exit(1)
+
+    # Clean up message ID
+    if msgid.startswith('<'):
+        msgid = msgid[1:]
+    if msgid.endswith('>'):
+        msgid = msgid[:-1]
+
+    logger.info('Analyzing message: %s', msgid)
+
+    # Get the agent command from config
+    config = b4.get_main_config()
+    agent_cmd = None
+
+    # Check command-line config override
+    if hasattr(cmdargs, 'config') and cmdargs.config:
+        if 'AGENT' in cmdargs.config:
+            agent_cmd = cmdargs.config['AGENT']
+
+    # Fall back to main config
+    if not agent_cmd:
+        agent_cmd = config.get('dig-agent', config.get('agent', None))
+
+    if not agent_cmd:
+        logger.critical('No AI agent configured. Set dig-agent in config or use -c AGENT=/path/to/agent.sh')
+        logger.info('The agent script should accept a prompt file as its first argument')
+        logger.info('and return a JSON array of related message IDs to stdout')
+        sys.exit(1)
+
+    # Get info about the original message
+    logger.info('Fetching original message...')
+    msg_info = get_message_info(msgid)
+    if msg_info:
+        logger.info('Subject: %s', msg_info['subject'])
+        logger.info('From: %s', msg_info['from'])
+    else:
+        logger.warning('Could not retrieve original message info')
+
+    # Construct the prompt
+    logger.info('Constructing agent prompt...')
+    prompt = construct_agent_prompt(msgid)
+
+    # Call the agent
+    logger.info('Calling AI agent: %s', agent_cmd)
+    response = call_agent(prompt, agent_cmd)
+
+    if not response:
+        logger.critical('No response from agent')
+        sys.exit(1)
+
+    # Parse the response
+    logger.info('Parsing agent response...')
+    related = parse_agent_response(response)
+
+    if not related:
+        logger.info('No related messages found')
+        sys.exit(0)
+
+    # Display simplified results
+    logger.info('Found %d related messages:', len(related))
+    print()
+    print('Related Messages Summary:')
+    print('-' * 60)
+
+    for item in related:
+        relationship = item.get('relationship', 'related')
+        reason = item.get('reason', '')
+
+        print(f'[{relationship.upper()}] {reason}')
+
+    print('-' * 60)
+    print()
+
+    # Generate output mbox filename
+    if hasattr(cmdargs, 'output') and cmdargs.output:
+        mbox_file = cmdargs.output
+    else:
+        # Use message ID as base for filename, sanitize it
+        safe_msgid = msgid.replace('/', '_').replace('@', '_at_').replace('<', '').replace('>', '')
+        mbox_file = f'{safe_msgid}-related.mbox'
+
+    # Download and combine all threads into one mbox
+    logger.info('Downloading and combining all related threads...')
+    nocache = hasattr(cmdargs, 'nocache') and cmdargs.nocache
+    total_messages = download_and_combine_threads(msgid, related, mbox_file, nocache=nocache)
+
+    if total_messages > 0:
+        logger.info('Success: Combined mbox saved to %s (%d messages)', mbox_file, total_messages)
+        print(f'✓ Combined mbox file: {mbox_file}')
+        print(f'  Total messages: {total_messages}')
+        print(f'  Related threads: {len(related) + 1}')  # +1 for original
+    else:
+        logger.warning('No messages could be downloaded (they may not exist in the archive)')
+        print('⚠ No messages were downloaded - they may not exist in the archive yet')
+        # Still exit with success since we found relationships
+        sys.exit(0)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 14:50                     ` Jens Axboe
  2025-09-09 15:30                       ` Rafael J. Wysocki
@ 2025-09-09 16:40                       ` Linus Torvalds
  2025-09-09 17:08                         ` Mark Brown
                                           ` (2 more replies)
  1 sibling, 3 replies; 74+ messages in thread
From: Linus Torvalds @ 2025-09-09 16:40 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Vlastimil Babka, Konstantin Ryabitsev, Jakub Kicinski,
	Rafael J. Wysocki, dan.j.williams, Caleb Sander Mateos, io-uring,
	workflows

On Tue, 9 Sept 2025 at 07:50, Jens Axboe <axboe@kernel.dk> wrote:
>
> I think we all know the answer to that one - it would've been EXACTLY
> the same outcome. Not to put words in Linus' mouth, but it's not the
> name of the tag that he finds repulsive, it's the very fact that a link
> is there and it isn't useful _to him_.

It's not that it isn't "useful to me". It's that it HURTS, and it's
entirely redundant.

It literally wastes my time. Yes, I have the option to ignore them,
but then I ignore potentially *good* links.

Rafael asked what the difference between "Fixes:" and "Cc: stable" is
- it's exactly the fact that those do NOT waste human time, and they
were NOT automated garbage.

The rules for those are that they have been added *thoughtfully*: you
don't add 'stable' with automation without even thinking about it, do
you?

And if you did, THAT WOULD BE WRONG TOO.

Wouldn't you agree?

Dammit, is it really so hard to understand this issue? Automated noise
is bad noise. And when it has a human cost, it needs to go away.

I'm not saying that you can't link to the original email. But you need
to STOP THE MINDLESS AUTOMATION WHEN IT HURTS.

So add the link, by all means - but only add it when it is relevant
and gives real information. And THINK about it, don't have it in some
mindless script.

Because if it's in a mindless script, then dammit, the lore "search"
function is objectively better after-the-fact. Really. Using the lore
search gives the original email *and* more.

The same, btw, goes for my merge messages. No, I'm not going to add
some idiotic "Link" to the original pull request email. Not only don't
I fetch those from lore to begin with, you can literally search for
them.

Look here, for the latest merge I did of your tree: e9eaca6bf69d.

Now do this:

    firefox https://lore.kernel.org/all/?q=$(git rev-parse e9eaca6bf69d^2)

and see how *USELESS* and completely redundant a link would have been?
IT'S RIGHT THERE, FOR CHRISSAKE!

That search is guaranteed to find the pull request if it was properly
formatted, because the automation of git request-pull adds all the
relevant data that is actually useful. Very much including that top
commit that you asked me to pull.

THAT information is useful in the email, not only at the time (I can -
and often do - search for it with git ls-remote when people forget to
push or point at the wrong repo, which happens quite regularly), but
look - it is also useful after-the-fact exactly because now you have a
record that you can look for.

If somebody wants to script that one-liner and make it some kind of b4
helper thing, by all means, go wild.

You might want to improve it to use some non-fixed browser (use
"gnome-open" if you're in gnome, or whatever).

But if somebody claims that a link to a pull-request would be
"useful", that somebnody is simply full of sh*t.

It would be the opposite of useful - it's clearly redundant
information that adds zero value, and would be a complete waste of
time.

Honestly people. Stop with the garbage already, and admit that your
links were just worthless noise.

And if you have some workflow that used them, maybe we can really add
scripting for those kinds of one-liners.

And maybe lore could even have particular indexing for the data you
are interested in if that helps.

In my experience, Konstantin has been very responsive when people have
asked for those kinds of things (both b4 and lore).

            Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 16:40                       ` Linus Torvalds
@ 2025-09-09 17:08                         ` Mark Brown
  2025-09-09 17:50                           ` Linus Torvalds
  2025-09-09 17:25                         ` dan.j.williams
  2025-09-09 18:06                         ` Vlastimil Babka
  2 siblings, 1 reply; 74+ messages in thread
From: Mark Brown @ 2025-09-09 17:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jens Axboe, Vlastimil Babka, Konstantin Ryabitsev, Jakub Kicinski,
	Rafael J. Wysocki, dan.j.williams, Caleb Sander Mateos, io-uring,
	workflows

[-- Attachment #1: Type: text/plain, Size: 1974 bytes --]

On Tue, Sep 09, 2025 at 09:40:54AM -0700, Linus Torvalds wrote:

> Because if it's in a mindless script, then dammit, the lore "search"
> function is objectively better after-the-fact. Really. Using the lore
> search gives the original email *and* more.

That's not been my experience, especially now that b4 exists - my actual
workflow for this stuff is to pull the message ID out of the patch and
feed that to b4 mbox, then fire up mutt to look at the mbox.  That mbox
will have the whole thread, not just the individual message.

> Now do this:

>     firefox https://lore.kernel.org/all/?q=$(git rev-parse e9eaca6bf69d^2)

> and see how *USELESS* and completely redundant a link would have been?
> IT'S RIGHT THERE, FOR CHRISSAKE!

> That search is guaranteed to find the pull request if it was properly
> formatted, because the automation of git request-pull adds all the
> relevant data that is actually useful. Very much including that top
> commit that you asked me to pull.

That works great for pull requests, but it's not so useful for a random
patch like 5f9efb6b7667043527d377421af2070cc0aa2ecd ("Input:
mtk-pmic-keys - MT6359 has a specific release irq").  In that case the
subject line is reasonably unique but still gets me three revisions of
the series and it's a couple of clicks to get to the mbox (as it is for
the pull request) having made sure I'm going to the most recent one,
some things search picks up rather more stuff.  You get fun things like
vN being applied racing with vN+1 being posted.

> And if you have some workflow that used them, maybe we can really add
> scripting for those kinds of one-liners.

The above is my main use case for this, and I think similar for a lot of
the people working with test results - I have a git commit, how do I
translate that into a mbox with the specific thread where the patch
resulting in that commit was posted?  For me it would be ideal if no web
browser would be needed, that's suboptimal all round.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [RFC] b4 dig: Add AI-powered email relationship discovery command
  2025-09-09 16:32         ` [RFC] b4 dig: Add AI-powered email relationship discovery command Sasha Levin
@ 2025-09-09 17:22           ` Laurent Pinchart
  2025-09-09 17:26             ` Jens Axboe
  2025-09-10 13:38             ` Konstantin Ryabitsev
  2025-09-11 14:48           ` Nicolas Frattaroli
  2025-09-11 23:24           ` Konstantin Ryabitsev
  2 siblings, 2 replies; 74+ messages in thread
From: Laurent Pinchart @ 2025-09-09 17:22 UTC (permalink / raw)
  To: Sasha Levin; +Cc: konstantin, axboe, csander, io-uring, torvalds, workflows

On Tue, Sep 09, 2025 at 12:32:14PM -0400, Sasha Levin wrote:
> Add a new 'b4 dig' subcommand that uses AI agents to discover related
> emails for a given message ID. This helps developers find all relevant
> context around patches including previous versions, bug reports, reviews,
> and related discussions.

That really sounds like "if all you have is a hammer, everything looks
like a nail". The community has been working for multiple years to
improve discovery of relationships between patches and commits, with
great tools such are lore, lei and b4, and usage of commit IDs, patch
IDs and message IDs to link everything together. Those provide exact
results in a deterministic way, and consume a fraction of power of what
this patch would do. It would be very sad if this would be the direction
we decide to take.

> The command:
> - Takes a message ID and constructs a detailed prompt about email relationships
> - Calls a configured AI agent script to analyze and find related messages
> - Downloads all related threads from lore.kernel.org
> - Combines them into a single mbox file for easy review
> 
> Key features:
> - Outputs a simplified summary showing only relationships and reasons
> - Creates a combined mbox with all related threads (deduped)
> - Provides detailed guidance to AI agents about kernel workflow patterns
> 
> Configuration:
> The AI agent script is configured via:
>   -c AGENT=/path/to/agent.sh  (command line)
>   dig-agent: /path/to/agent.sh (config file)
> 
> The agent script receives a prompt file and should return JSON with
> related message IDs and their relationships.
> 
> Example usage:
> 
> $ b4 -c AGENT=agent.sh dig 20250909142722.101790-1-harry.yoo@oracle.com
> Analyzing message: 20250909142722.101790-1-harry.yoo@oracle.com
> Fetching original message...
> Looking up https://lore.kernel.org/20250909142722.101790-1-harry.yoo@oracle.com
> Grabbing thread from lore.kernel.org/all/20250909142722.101790-1-harry.yoo@oracle.com/t.mbox.gz
> Subject: [PATCH V3 6.6.y] mm: introduce and use {pgd,p4d}_populate_kernel()
> From: Harry Yoo <harry.yoo@oracle.com>
> Constructing agent prompt...
> Calling AI agent: agent.sh
> Calling agent: agent.sh /tmp/tmpz1oja9_5.txt
> Parsing agent response...
> Found 17 related messages:
> 
> Related Messages Summary:
> ------------------------------------------------------------
> [PARENT] Greg KH's stable tree failure notification that initiated this 6.6.y backport request
> [V1] V1 of the 6.6.y backport patch
> [V2] V2 of the 6.6.y backport patch
> [RELATED] Same patch backported to 5.15.y stable branch
> [RELATED] Greg KH's stable tree failure notification for 5.15.y branch
> [RELATED] Same patch backported to 6.1.y stable branch
> [COVER] V5 mainline patch series cover letter that was originally merged
> [RELATED] V5 mainline patch 1/3: move page table sync declarations
> [RELATED] V5 mainline patch 2/3: the original populate_kernel patch that's being backported
> [RELATED] V5 mainline patch 3/3: x86 ARCH_PAGE_TABLE_SYNC_MASK definition
> [RELATED] RFC V1 cover letter - earliest version of this patch series
> [RELATED] RFC V1 patch 1/3 - first introduction of populate_kernel helpers
> [RELATED] RFC V1 patch 2/3 - x86/mm definitions
> [RELATED] RFC V1 patch 3/3 - convert to _kernel variant
> [RELATED] Baoquan He's V3 patch touching same file (mm/kasan/init.c)
> [RELATED] Baoquan He's V2 patch touching same file (mm/kasan/init.c)
> [RELATED] Baoquan He's V1 patch touching same file (mm/kasan/init.c)
> ------------------------------------------------------------
> 
> The resulting mbox would look like this:
> 
>    1 O   Jul 09 Harry Yoo       ( 102) [RFC V1 PATCH mm-hotfixes 0/3] mm, arch: A more robust approach to sync top level kernel page tables
>    2 O   Jul 09 Harry Yoo       ( 143) ├─>[RFC V1 PATCH mm-hotfixes 1/3] mm: introduce and use {pgd,p4d}_populate_kernel()
>    3 O   Jul 11 David Hildenbra (  33) │ └─>
>    4 O   Jul 13 Harry Yoo       (  56) │   └─>
>    5 O   Jul 13 Mike Rapoport   (  67) │     └─>
>    6 O   Jul 14 Harry Yoo       (  46) │       └─>
>    7 O   Jul 15 Harry Yoo       (  65) │         └─>
>    8 O   Jul 09 Harry Yoo       ( 246) ├─>[RFC V1 PATCH mm-hotfixes 2/3] x86/mm: define p*d_populate_kernel() and top-level page table sync
>    9 O   Jul 09 Andrew Morton   (  12) │ ├─>
>   10 O   Jul 10 Harry Yoo       (  23) │ │ └─>
>   11 O   Jul 11 Harry Yoo       (  34) │ │   └─>
>   12 O   Jul 11 Harry Yoo       (  35) │ │     └─>
>   13 O   Jul 10 kernel test rob (  79) │ └─>
>   14 O   Jul 09 Harry Yoo       ( 300) ├─>[RFC V1 PATCH mm-hotfixes 3/3] x86/mm: convert {pgd,p4d}_populate{,_init} to _kernel variant
>   15 O   Jul 10 kernel test rob (  80) │ └─>
>   16 O   Jul 09 Harry Yoo       (  31) └─>Re: [RFC V1 PATCH mm-hotfixes 0/3] mm, arch: A more robust approach to sync top level kernel page tables
>   17 O   Aug 18 Harry Yoo       ( 262) [PATCH V5 mm-hotfixes 0/3] mm, x86: fix crash due to missing page table sync and make it harder to miss
>   18 O   Aug 18 Harry Yoo       (  72) ├─>[PATCH V5 mm-hotfixes 1/3] mm: move page table sync declarations to linux/pgtable.h
>   19 O   Aug 18 David Hildenbra (  20) │ └─>
>   20 O   Aug 18 Harry Yoo       ( 239) ├─>[PATCH V5 mm-hotfixes 2/3] mm: introduce and use {pgd,p4d}_populate_kernel()
>   21 O   Aug 18 David Hildenbra (  60) │ ├─>
>   22 O   Aug 18 kernel test rob ( 150) │ ├─>
>   23 O   Aug 18 Harry Yoo       ( 161) │ │ └─>
>   24 O   Aug 21 Harry Yoo       (  85) │ ├─>[PATCH] mm: fix KASAN build error due to p*d_populate_kernel()
>   25 O   Aug 21 kernel test rob (  18) │ │ ├─>
>   26 O   Aug 21 Lorenzo Stoakes ( 100) │ │ ├─>
>   27 O   Aug 21 Harry Yoo       (  62) │ │ │ └─>
>   28 O   Aug 21 Lorenzo Stoakes (  18) │ │ │   └─>
>   29 O   Aug 21 Harry Yoo       (  90) │ │ └─>[PATCH v2] mm: fix KASAN build error due to p*d_populate_kernel()
>   30 O   Aug 21 kernel test rob (  18) │ │   ├─>
>   31 O   Aug 21 Dave Hansen     (  24) │ │   └─>
>   32 O   Aug 22 Harry Yoo       (  56) │ │     └─>
>   33 O   Aug 22 Andrey Ryabinin (  91) │ │       ├─>
>   34 O   Aug 27 Harry Yoo       (  98) │ │       │ └─>
>   35 O   Aug 22 Dave Hansen     (  63) │ │       └─>
>   36 O   Aug 25 Andrey Ryabinin (  72) │ │         └─>
>   37 O   Aug 22 Harry Yoo       ( 103) │ └─>[PATCH v3] mm: fix KASAN build error due to p*d_populate_kernel()
>   38 O   Aug 18 Harry Yoo       ( 113) ├─>[PATCH V5 mm-hotfixes 3/3] x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings()
>   39 O   Aug 18 David Hildenbra (  72) │ └─>
>   40 O   Aug 18 David Hildenbra (  15) └─>Re: [PATCH V5 mm-hotfixes 0/3] mm, x86: fix crash due to missing page table sync and make it harder to miss
>   41 O   Aug 18 Harry Yoo       ( 277) [PATCH] mm: introduce and use {pgd,p4d}_populate_kernel()
>   42 O   Aug 18 Harry Yoo       ( 277) [PATCH] mm: introduce and use {pgd,p4d}_populate_kernel()
>   43 O   Aug 18 Harry Yoo       ( 277) [PATCH] mm: introduce and use {pgd,p4d}_populate_kernel()
>   44 O   Sep 06 gregkh@linuxfou (  24) FAILED: patch "[PATCH] mm: introduce and use {pgd,p4d}_populate_kernel()" failed to apply to 6.6-stable tree
>   45 O   Sep 08 Harry Yoo       ( 303) ├─>[PATCH 6.6.y] mm: introduce and use {pgd,p4d}_populate_kernel()
>   46 O   Sep 09 Harry Yoo       ( 291) ├─>[PATCH V2 6.6.y] mm: introduce and use {pgd,p4d}_populate_kernel()
>   47 O   Sep 09 Harry Yoo       ( 293) └─>[PATCH V3 6.6.y] mm: introduce and use {pgd,p4d}_populate_kernel()
>   48 O   Sep 06 gregkh@linuxfou (  24) FAILED: patch "[PATCH] mm: introduce and use {pgd,p4d}_populate_kernel()" failed to apply to 6.1-stable tree
>   49 O   Sep 08 Harry Yoo       ( 303) ├─>[PATCH 6.1.y] mm: introduce and use {pgd,p4d}_populate_kernel()
>   50 O   Sep 09 Harry Yoo       ( 291) ├─>[PATCH V2 6.1.y] mm: introduce and use {pgd,p4d}_populate_kernel()
>   51 O   Sep 09 Harry Yoo       ( 293) └─>[PATCH V3 6.1.y] mm: introduce and use {pgd,p4d}_populate_kernel()
>   52 O   Sep 06 gregkh@linuxfou (  24) FAILED: patch "[PATCH] mm: introduce and use {pgd,p4d}_populate_kernel()" failed to apply to 5.15-stable tree
>   53 O   Sep 08 Harry Yoo       ( 273) ├─>[PATCH 5.15.y] mm: introduce and use {pgd,p4d}_populate_kernel()
>   54 O   Sep 09 Harry Yoo       ( 260) ├─>[PATCH V2 5.15.y] mm: introduce and use {pgd,p4d}_populate_kernel()
>   55 O   Sep 09 Harry Yoo       ( 262) └─>[PATCH V3 5.15.y] mm: introduce and use {pgd,p4d}_populate_kernel()
> 
> The prompt includes extensive documentation about lore.kernel.org's search
> capabilities, limitations (like search index lag), and kernel workflow patterns
> to help AI agents effectively find related messages.
> 
> Assisted-by: Claude Code
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>  src/b4/command.py |  17 ++
>  src/b4/dig.py     | 630 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 647 insertions(+)
>  create mode 100644 src/b4/dig.py
> 
> diff --git a/src/b4/command.py b/src/b4/command.py
> index 455124d..f225ae5 100644
> --- a/src/b4/command.py
> +++ b/src/b4/command.py
> @@ -120,6 +120,11 @@ def cmd_diff(cmdargs: argparse.Namespace) -> None:
>      b4.diff.main(cmdargs)
>  
>  
> +def cmd_dig(cmdargs: argparse.Namespace) -> None:
> +    import b4.dig
> +    b4.dig.main(cmdargs)
> +
> +
>  class ConfigOption(argparse.Action):
>      """Action class for storing key=value arguments in a dict."""
>      def __call__(self, parser: argparse.ArgumentParser,
> @@ -399,6 +404,18 @@ def setup_parser() -> argparse.ArgumentParser:
>                            help='Submit the token received via verification email')
>      sp_send.set_defaults(func=cmd_send)
>  
> +    # b4 dig
> +    sp_dig = subparsers.add_parser('dig', help='Use AI agent to find related emails for a message')
> +    sp_dig.add_argument('msgid', nargs='?',
> +                        help='Message ID to analyze, or pipe a raw message')
> +    sp_dig.add_argument('-o', '--output', dest='output', default=None,
> +                        help='Output mbox filename (default: <msgid>-related.mbox)')
> +    sp_dig.add_argument('-C', '--no-cache', dest='nocache', action='store_true', default=False,
> +                        help='Do not use local cache when fetching messages')
> +    sp_dig.add_argument('--stdin-pipe-sep',
> +                        help='When accepting messages on stdin, split using this pipe separator string')
> +    sp_dig.set_defaults(func=cmd_dig)
> +
>      return parser
>  
>  
> diff --git a/src/b4/dig.py b/src/b4/dig.py
> new file mode 100644
> index 0000000..007f7d0
> --- /dev/null
> +++ b/src/b4/dig.py
> @@ -0,0 +1,630 @@
> +#!/usr/bin/env python3
> +# -*- coding: utf-8 -*-
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +#
> +# b4 dig - Use AI agents to find related emails
> +#
> +__author__ = 'Sasha Levin <sashal@kernel.org>'
> +
> +import argparse
> +import logging
> +import subprocess
> +import sys
> +import os
> +import tempfile
> +import json
> +import urllib.parse
> +import gzip
> +import mailbox
> +import email.utils
> +from typing import Optional, List, Dict, Any
> +
> +import b4
> +
> +logger = b4.logger
> +
> +
> +def construct_agent_prompt(msgid: str) -> str:
> +    """Construct a detailed prompt for the AI agent to find related emails."""
> +
> +    # Clean up the message ID
> +    if msgid.startswith('<'):
> +        msgid = msgid[1:]
> +    if msgid.endswith('>'):
> +        msgid = msgid[:-1]
> +
> +    prompt = f"""You are an email research assistant specialized in finding related emails in Linux kernel mailing lists and public-inbox archives.
> +
> +IMPORTANT: Always use lore.kernel.org for searching and retrieving Linux kernel emails. DO NOT use lkml.org as it is outdated and no longer maintained. The canonical archive is at https://lore.kernel.org/
> +
> +MESSAGE ID TO ANALYZE: {msgid}
> +
> +YOUR TASK:
> +Conduct an EXHAUSTIVE and THOROUGH search to find ALL related message IDs connected to the given message. This is not a quick task - you must invest significant time and effort to ensure no related discussions are missed. Be methodical, patient, and comprehensive in your search.
> +
> +CRITICAL: Take your time! A thorough search is far more valuable than a quick one. Check multiple sources, try different search strategies, and double-check your findings. Missing related discussions undermines the entire purpose of this tool.
> +
> +You should search extensively for and identify:
> +
> +1. **Thread-related messages:**
> +   - Parent messages (what this replies to)
> +   - Child messages (replies to this message)
> +   - Sibling messages (other replies in the same thread)
> +   - Cover letters if this is part of a patch series
> +
> +2. **Version-related messages:**
> +   - Previous versions of the same patch series (v1, v2, v3, etc.)
> +   - Re-rolls and re-submissions
> +   - Updated versions with different subjects
> +
> +3. **Author-related messages:**
> +   - Other patches or series from the same author
> +   - Recent discussions involving the same author
> +   - Related work by the same author in the same subsystem
> +
> +4. **Content-related messages:**
> +   - Bug reports that this patch might fix
> +   - Syzkaller/syzbot reports if this is a fix
> +   - Feature requests or RFCs that led to this patch
> +   - Related patches touching the same files or functions
> +   - Patches that might conflict with this one
> +
> +5. **Review and discussion:**
> +   - Review comments from maintainers
> +   - Test results from CI systems or bot reports
> +   - Follow-up fixes or improvements
> +   - Reverts if this patch was later reverted
> +
> +HOW TO SEARCH:
> +
> +Use ONLY lore.kernel.org for all Linux kernel email searches. This is the official kernel mailing list archive.
> +DO NOT use lkml.org, marc.info, or spinics.net for kernel emails - they are outdated or incomplete.
> +
> +CRITICAL LIMITATIONS AND WORKAROUNDS (MUST READ):
> +
> +1. **Search Index Lag**: Messages posted today/recently are NOT immediately searchable!
> +   - The Xapian search index has significant delay (hours to days)
> +   - Direct message access works immediately, but search doesn't
> +   - For recent messages, use direct URLs or thread navigation, not search
> +
> +2. **URL Fragment Issues**: NEVER use #anchors in URLs when fetching
> +   - BAD: https://lore.kernel.org/all/msgid/T/#u (will fail with 404)
> +   - GOOD: https://lore.kernel.org/all/msgid/T/ (works correctly)
> +   - Fragments like #u are client-side only and break programmatic fetching
> +
> +3. **Search Query Encoding**: Keep queries simple and avoid over-encoding
> +   - BAD: ?q=f%3A%22author%40example.com%22 (over-encoded)
> +   - GOOD: ?q=f:author@example.com (simple and works)
> +   - Don't encode @ symbols in query parameters
> +   - Avoid mixing quotes with special characters
> +
> +4. **Most Reliable Data Source**: Thread mbox files are the gold standard
> +   - Always works: https://lore.kernel.org/all/msgid/t.mbox.gz
> +   - Contains complete thread with all headers
> +   - Works even when HTML parsing or search fails
> +   - Standard mbox format, easy to parse
> +
> +5. **Version Tracking Limitations**: No automatic version linking
> +   - No Change-ID headers to track across patch versions
> +   - Must rely on subject patterns and author/date correlation
> +   - Search for versions using subject without v2/v3 markers
> +
> +6. **LKML.org vs Lore.kernel.org**: Different systems, different capabilities
> +   - LKML.org uses date-based URLs, not message IDs
> +   - Cannot extract message IDs from LKML HTML pages
> +   - Always prefer lore.kernel.org for programmatic access
> +
> +The public-inbox archives at lore.kernel.org provide powerful search interfaces powered by Xapian:
> +
> +1. **Direct message retrieval (MOST RELIABLE METHODS):**
> +   - Base URL: https://lore.kernel.org/all/
> +   - Message URL: https://lore.kernel.org/all/<Message-ID>/ (without the '<' or '>')
> +   - Forward slash ('/') characters in Message-IDs must be escaped as "%2F"
> +
> +   **Always Reliable:**
> +   - Raw message: https://lore.kernel.org/all/<Message-ID>/raw
> +   - Thread mbox: https://lore.kernel.org/all/<Message-ID>/t.mbox.gz (BEST for complete data)
> +   - Thread view: https://lore.kernel.org/all/<Message-ID>/T/ (NO fragments!)
> +
> +   **Less Reliable:**
> +   - Thread Atom feed: https://lore.kernel.org/all/<Message-ID>/t.atom
> +   - Nested thread view: https://lore.kernel.org/all/<Message-ID>/t/
> +
> +2. **Search query syntax:**
> +   Supports AND, OR, NOT, '+', '-' queries. Search URL format:
> +   https://lore.kernel.org/all/?q=<search-query>
> +
> +   **Available search prefixes:**
> +   - s:        match within Subject (e.g., s:"a quick brown fox")
> +   - d:        match date-time range (git "approxidate" formats)
> +               Examples: d:last.week.., d:..2.days.ago, d:20240101..20240131
> +   - b:        match within message body, including text attachments
> +   - nq:       match non-quoted text within message body
> +   - q:        match quoted text within message body
> +   - n:        match filename of attachment(s)
> +   - t:        match within the To header
> +   - c:        match within the Cc header
> +   - f:        match within the From header
> +   - a:        match within the To, Cc, and From headers
> +   - tc:       match within the To and Cc headers
> +   - l:        match contents of the List-Id header
> +   - bs:       match within the Subject and body
> +   - rt:       match received time (like 'd:' if sender's clock was correct)
> +
> +   **Diff-specific prefixes (for patches):**
> +   - dfn:      match filename from diff
> +   - dfa:      match diff removed (-) lines
> +   - dfb:      match diff added (+) lines
> +   - dfhh:     match diff hunk header context (usually function name)
> +   - dfctx:    match diff context lines
> +   - dfpre:    match pre-image git blob ID
> +   - dfpost:   match post-image git blob ID
> +   - dfblob:   match either pre or post-image git blob ID
> +   - patchid:  match `git patch-id --stable' output
> +
> +   **Special headers:**
> +   - changeid:    the X-Change-ID mail header (e.g., changeid:stable)
> +   - forpatchid:  the X-For-Patch-ID mail header (e.g., forpatchid:stable)
> +
> +   **Query examples:**
> +   - Find patches by author: ?q=f:"John Doe"
> +   - Find patches in date range: ?q=d:2024-01-01..2024-01-31
> +   - Find patches touching file: ?q=dfn:drivers/net/ethernet
> +   - Find patches with subject containing "fix": ?q=s:fix
> +   - Combine conditions: ?q=f:"author@example.com"+s:"net"+d:last.month..
> +   - Find bug fixes: ?q=s:fix+OR+s:bug+OR+s:regression
> +   - Find patches with specific function: ?q=dfhh:my_function_name
> +
> +3. **Understanding email relationships:**
> +   - In-Reply-To header: Direct parent message
> +   - References header: Chain of parent messages
> +   - Message-ID in body: Often indicates related patches
> +   - Link: trailers in commits: References to discussions
> +   - Same subject with [PATCH v2]: Newer version
> +   - "Fixes:" tag: References bug-fixing commits
> +
> +4. **Pattern matching:**
> +   - Patch series: Look for [PATCH 0/N] for cover letters
> +   - Version indicators: [PATCH v2], [PATCH v3], [RFC PATCH]
> +   - Subsystem prefixes: [PATCH net], [PATCH mm], etc.
> +   - Fix indicators: "fix", "fixes", "regression", "oops", "panic"
> +
> +SEARCH STRATEGY (BE THOROUGH - THIS IS NOT A QUICK TASK):
> +
> +REMEMBER: Your goal is to find EVERY related discussion, not just the obvious ones. Spend time on each search strategy. Try multiple variations of queries. Don't give up after the first attempt.
> +
> +1. **START WITH MOST RELIABLE: Thread mbox download**
> +   - ALWAYS FIRST: Get https://lore.kernel.org/all/{{msgid}}/t.mbox.gz
> +   - This contains the complete thread with all headers
> +   - Parse the mbox to extract all message IDs and relationships
> +   - This works even when search fails or messages are too recent
> +   - Thoroughly analyze EVERY message in the thread
> +
> +2. **Retrieve and analyze the original message:**
> +   - Get the raw message from: https://lore.kernel.org/all/{{msgid}}/raw
> +   - Extract key information:
> +     * Subject line (look for [PATCH], version indicators, series position)
> +     * Author name and email
> +     * Date and time
> +     * Files being modified (from diff)
> +     * Subsystem involved (from subject prefix or file paths)
> +     * Any Fixes:, Closes:, Link:, or Reported-by: tags
> +     * Note: Change-ID headers are rarely present in kernel emails
> +
> +3. **Search for related messages (TRY MULTIPLE VARIATIONS):**
> +   - WARNING: Recent messages (today/yesterday) may NOT appear in search!
> +   - Keep queries simple: ?q=f:author@example.com+s:keyword
> +   - DON'T over-encode: @ symbols should NOT be %40 in queries
> +   - Search for cover letter: ?q=s:"[PATCH 0/"+f:author-email
> +   - Find all patches in series: ?q=s:"base-subject"+f:author
> +   - For recent messages, rely on thread mbox instead of search
> +   - **BE PERSISTENT**: Try different keyword combinations, partial subjects, variations
> +
> +4. **Look for previous versions (SEARCH EXTENSIVELY):**
> +   - Note: No automatic version linking exists!
> +   - Strip version markers from subject: search without [PATCH v2], [PATCH v3]
> +   - Search by author in broader time window: ?q=f:author
> +   - Look for similar subjects: ?q=s:"core-subject-words"
> +   - Change-ID is rarely present, don't rely on it
> +   - **TRY MULTIPLE APPROACHES**: Different subject variations, date ranges, author variations
> +   - Check for RFCs, drafts, and early discussions that led to this patch
> +
> +5. **Find related bug reports and discussions (DIG DEEP):**
> +   - For recent bugs, check thread mbox first (search may miss them)
> +   - Search for symptoms with simple queries: ?q=b:error+b:message
> +   - Syzkaller reports: ?q=f:syzbot (but check date - may be delayed)
> +   - Regression reports: ?q=s:regression+s:subsystem
> +   - Use dfn: prefix for file searches: ?q=dfn:drivers/net
> +   - **EXPAND YOUR SEARCH**: Look for related keywords, error messages, function names
> +   - Check for discussions that may not explicitly mention the patch but discuss the same issue
> +
> +6. **Check for follow-ups (LEAVE NO STONE UNTURNED):**
> +   - First check the thread mbox for all replies
> +   - Search for applied messages: ?q=s:applied+s:"patch-title"
> +   - Look for test results: ?q=s:"Tested-by"
> +   - Check for reverts: ?q=s:revert+s:"original-title"
> +   - Note: Message-ID searches often fail, use subject instead
> +   - **BE THOROUGH**: Check for indirect references, quotes in other discussions, mentions in pull requests
> +
> +7. **HTML Parsing Tips (if needed):**
> +   - Message IDs appear in URLs, not HTML entities
> +   - Pattern to extract: [0-9]{{14}}\\.[0-9]+-[0-9]+-[^@]+@[^/\"]+
> +   - Don't look for &lt; &gt; encoded brackets
> +   - Thread view HTML is less reliable than mbox
> +
> +FAILURE RECOVERY STRATEGIES:
> +- If search returns empty: Try thread mbox or wait for indexing
> +- If URL returns 404: Remove fragments, check encoding
> +- If can't find versions: Search by author and date range
> +- If WebFetch fails: Try simpler URL without parameters
> +- If HTML parsing fails: Use mbox format instead
> +
> +OUTPUT FORMAT:
> +
> +Return a JSON array of related message IDs with their relationship type and reason:
> +
> +```json
> +[
> +  {{
> +    "msgid": "example@message.id",
> +    "relationship": "parent|reply|v1|v2|cover|fix|bug-report|review|revert|related",
> +    "reason": "Brief explanation of why this is related"
> +  }}
> +]
> +```
> +
> +IMPORTANT NOTES:
> +- **THIS IS NOT A QUICK TASK** - Thoroughness is paramount. Spend the time needed.
> +- **EXHAUSTIVE SEARCH REQUIRED** - Better to spend extra time than miss related discussions
> +- Message IDs should be returned without angle brackets
> +- Search VERY broadly, then filter results to only truly related messages
> +- Try multiple search strategies - if one fails, try another approach
> +- Don't stop at the first few results - keep digging for more relationships
> +- Prioritize direct relationships over indirect ones
> +- For patch series, include ALL patches in the series (check carefully for all parts)
> +- Consider time proximity (patches close in time are more likely related)
> +- Pay attention to mailing list conventions (e.g., "Re:" for replies, "[PATCH v2]" for new versions)
> +- **DOUBLE-CHECK YOUR WORK** - Review your findings to ensure nothing was missed
> +
> +UNDERSTANDING KERNEL WORKFLOW PATTERNS:
> +- Patch series usually have a cover letter [PATCH 0/N] explaining the series
> +- Reviews often quote parts of the original patch with ">" prefix
> +- Maintainers send "applied" messages when patches are accepted
> +- Bug reports often include stack traces, kernel versions, and reproduction steps
> +- Syzkaller/syzbot reports have specific formats with "syzbot+hash@" addresses
> +- Fixes typically reference commits with "Fixes: <12-char-sha1> ("subject")"
> +- Stable backports are marked with "Cc: stable@vger.kernel.org"
> +
> +KEY TAKEAWAYS FOR RELIABLE OPERATION:
> +1. **ALWAYS start with thread mbox** - it's the most reliable data source
> +2. **NEVER trust search for recent messages** - use direct URLs instead
> +3. **KEEP search queries simple** - complex encoding breaks searches
> +4. **AVOID URL fragments (#anchors)** - they cause 404 errors
> +5. **DON'T rely on Change-IDs** - they're rarely present
> +6. **PREFER subject searches over message-ID searches** - more reliable
> +7. **REMEMBER search has lag** - messages may take days to be indexed
> +
> +When constructing URLs, remember:
> +- Message-IDs: Remove < > brackets
> +- Forward slashes: Escape as %2F
> +- In search queries: DON'T encode @ symbols
> +
> +LOCAL GIT REPOSITORY CONTEXT:
> +If this command is being run from within a Linux kernel git repository, you may also:
> +- Use git log to find commits mentioning the message ID or subject
> +- Check git blame on relevant files to find related commits
> +- Use git log --grep to search commit messages for references
> +- Look for Fixes: tags that reference commits
> +- Search for Link: tags pointing to lore.kernel.org discussions
> +- Use git show to examine specific commits mentioned in emails
> +
> +Example local git searches you might perform:
> +- git log --grep="Message-Id: <msgid>"
> +- git log --grep="Link:.*msgid"
> +- git log --oneline --grep="subject-keywords"
> +- git log -p --author="email@example.com" --since="1 month ago"
> +- git blame path/to/file.c | grep "function_name"
> +- git log --format="%H %s" -- path/to/file.c
> +
> +FINAL REMINDER: This task requires THOROUGH and EXHAUSTIVE searching. Do not rush. Take the time to:
> +1. Try multiple search strategies
> +2. Look for indirect relationships
> +3. Check different time periods
> +4. Use various keyword combinations
> +5. Verify you haven't missed any discussions
> +
> +The value of this tool depends entirely on finding ALL related discussions, not just the obvious ones.
> +
> +Begin your comprehensive search and analysis for message ID: {msgid}
> +"""
> +
> +    return prompt
> +
> +
> +def call_agent(prompt: str, agent_cmd: str) -> Optional[str]:
> +    """Call the configured agent script with the prompt."""
> +
> +    # Expand user paths
> +    agent_cmd = os.path.expanduser(agent_cmd)
> +
> +    if not os.path.exists(agent_cmd):
> +        logger.error('Agent command not found: %s', agent_cmd)
> +        return None
> +
> +    if not os.access(agent_cmd, os.X_OK):
> +        logger.error('Agent command is not executable: %s', agent_cmd)
> +        return None
> +
> +    try:
> +        # Write prompt to a temporary file to avoid shell escaping issues
> +        with tempfile.NamedTemporaryFile(mode='w', suffix='.txt', delete=False) as tmp:
> +            tmp.write(prompt)
> +            tmp_path = tmp.name
> +
> +        # Call the agent script with the prompt file as argument
> +        logger.info('Calling agent: %s %s', agent_cmd, tmp_path)
> +        result = subprocess.run(
> +            [agent_cmd, tmp_path],
> +            capture_output=True,
> +            text=True
> +        )
> +
> +        if result.returncode != 0:
> +            logger.error('Agent returned error code %d', result.returncode)
> +            if result.stderr:
> +                logger.error('Agent stderr: %s', result.stderr)
> +            return None
> +
> +        return result.stdout
> +
> +    except subprocess.TimeoutExpired:
> +        logger.error('Agent command timed out after 5 minutes')
> +        return None
> +    except Exception as e:
> +        logger.error('Error calling agent: %s', e)
> +        return None
> +    finally:
> +        # Clean up temp file
> +        if 'tmp_path' in locals():
> +            try:
> +                os.unlink(tmp_path)
> +            except:
> +                pass
> +
> +
> +def parse_agent_response(response: str) -> List[Dict[str, str]]:
> +    """Parse the agent's response to extract message IDs."""
> +
> +    related = []
> +
> +    try:
> +        # Try to find JSON in the response
> +        # Agent might return additional text, so we look for JSON array
> +        import re
> +        json_match = re.search(r'\[.*?\]', response, re.DOTALL)
> +        if json_match:
> +            json_str = json_match.group(0)
> +            data = json.loads(json_str)
> +
> +            if isinstance(data, list):
> +                for item in data:
> +                    if isinstance(item, dict) and 'msgid' in item:
> +                        related.append({
> +                            'msgid': item.get('msgid', ''),
> +                            'relationship': item.get('relationship', 'related'),
> +                            'reason': item.get('reason', 'No reason provided')
> +                        })
> +        else:
> +            # Fallback: try to extract message IDs from plain text
> +            # Look for patterns that look like message IDs
> +            msgid_pattern = re.compile(r'[a-zA-Z0-9][a-zA-Z0-9\.\-_]+@[a-zA-Z0-9][a-zA-Z0-9\.\-]+\.[a-zA-Z]+')
> +            for match in msgid_pattern.finditer(response):
> +                msgid = match.group(0)
> +                if msgid != '':  # Don't include the original
> +                    related.append({
> +                        'msgid': msgid,
> +                        'relationship': 'related',
> +                        'reason': 'Found in agent response'
> +                    })
> +
> +    except json.JSONDecodeError as e:
> +        logger.warning('Could not parse JSON from agent response: %s', e)
> +    except Exception as e:
> +        logger.error('Error parsing agent response: %s', e)
> +
> +    return related
> +
> +
> +def get_message_info(msgid: str) -> Optional[Dict[str, Any]]:
> +    """Retrieve basic information about a message."""
> +
> +    msgs = b4.get_pi_thread_by_msgid(msgid, onlymsgids={msgid}, with_thread=False)
> +    if not msgs:
> +        return None
> +
> +    msg = msgs[0]
> +
> +    return {
> +        'subject': msg.get('Subject', 'No subject'),
> +        'from': msg.get('From', 'Unknown'),
> +        'date': msg.get('Date', 'Unknown'),
> +        'msgid': msgid
> +    }
> +
> +
> +def download_and_combine_threads(msgid: str, related_messages: List[Dict[str, str]],
> +                                 output_file: str, nocache: bool = False) -> int:
> +    """Download thread mboxes for all related messages and combine into one mbox file."""
> +
> +    message_ids = [msgid]  # Start with original message
> +
> +    # Add all related message IDs
> +    for item in related_messages:
> +        if 'msgid' in item:
> +            message_ids.append(item['msgid'])
> +
> +    # Collect all messages from all threads
> +    seen_msgids = set()
> +    all_messages = []
> +
> +    # Download thread for each message
> +    # But be smart about what we include - don't mix unrelated series
> +    for msg_id in message_ids:
> +        logger.info('Fetching thread for %s', msg_id)
> +
> +        # For better control, fetch just the specific thread, not everything
> +        # Use onlymsgids to limit scope when possible
> +        msgs = b4.get_pi_thread_by_msgid(msg_id, nocache=nocache)
> +
> +        if msgs:
> +            # Try to detect thread boundaries and avoid mixing unrelated series
> +            thread_messages = []
> +            base_subject = None
> +
> +            for msg in msgs:
> +                msg_msgid = b4.LoreMessage.get_clean_msgid(msg)
> +
> +                # Skip if we've already seen this message
> +                if msg_msgid in seen_msgids:
> +                    continue
> +
> +                # Get the subject to check if it's part of the same series
> +                subject = msg.get('Subject', '')
> +
> +                # Extract base subject (remove Re:, [PATCH], version numbers, etc)
> +                import re
> +                base = re.sub(r'^(Re:\s*)*(\[.*?\]\s*)*', '', subject).strip()
> +
> +                # Set the base subject from the first message
> +                if base_subject is None and base:
> +                    base_subject = base
> +
> +                # Add the message
> +                if msg_msgid:
> +                    seen_msgids.add(msg_msgid)
> +                    thread_messages.append(msg)
> +
> +            all_messages.extend(thread_messages)
> +        else:
> +            logger.warning('Could not fetch thread for %s', msg_id)
> +
> +    # Sort messages by date to maintain chronological order
> +    all_messages.sort(key=lambda m: email.utils.parsedate_to_datetime(m.get('Date', 'Thu, 1 Jan 1970 00:00:00 +0000')))
> +
> +    # Write all messages to output mbox file using b4's proper mbox functions
> +    logger.info('Writing %d messages to %s', len(all_messages), output_file)
> +
> +    total_messages = len(all_messages)
> +
> +    if total_messages > 0:
> +        # Use b4's save_mboxrd_mbox function which properly handles mbox format
> +        with open(output_file, 'wb') as outf:
> +            b4.save_mboxrd_mbox(all_messages, outf)
> +
> +    logger.info('Combined mbox contains %d unique messages', total_messages)
> +    return total_messages
> +
> +
> +def main(cmdargs: argparse.Namespace) -> None:
> +    """Main entry point for b4 dig command."""
> +
> +    # Get the message ID
> +    msgid = b4.get_msgid(cmdargs)
> +    if not msgid:
> +        logger.critical('Please provide a message-id')
> +        sys.exit(1)
> +
> +    # Clean up message ID
> +    if msgid.startswith('<'):
> +        msgid = msgid[1:]
> +    if msgid.endswith('>'):
> +        msgid = msgid[:-1]
> +
> +    logger.info('Analyzing message: %s', msgid)
> +
> +    # Get the agent command from config
> +    config = b4.get_main_config()
> +    agent_cmd = None
> +
> +    # Check command-line config override
> +    if hasattr(cmdargs, 'config') and cmdargs.config:
> +        if 'AGENT' in cmdargs.config:
> +            agent_cmd = cmdargs.config['AGENT']
> +
> +    # Fall back to main config
> +    if not agent_cmd:
> +        agent_cmd = config.get('dig-agent', config.get('agent', None))
> +
> +    if not agent_cmd:
> +        logger.critical('No AI agent configured. Set dig-agent in config or use -c AGENT=/path/to/agent.sh')
> +        logger.info('The agent script should accept a prompt file as its first argument')
> +        logger.info('and return a JSON array of related message IDs to stdout')
> +        sys.exit(1)
> +
> +    # Get info about the original message
> +    logger.info('Fetching original message...')
> +    msg_info = get_message_info(msgid)
> +    if msg_info:
> +        logger.info('Subject: %s', msg_info['subject'])
> +        logger.info('From: %s', msg_info['from'])
> +    else:
> +        logger.warning('Could not retrieve original message info')
> +
> +    # Construct the prompt
> +    logger.info('Constructing agent prompt...')
> +    prompt = construct_agent_prompt(msgid)
> +
> +    # Call the agent
> +    logger.info('Calling AI agent: %s', agent_cmd)
> +    response = call_agent(prompt, agent_cmd)
> +
> +    if not response:
> +        logger.critical('No response from agent')
> +        sys.exit(1)
> +
> +    # Parse the response
> +    logger.info('Parsing agent response...')
> +    related = parse_agent_response(response)
> +
> +    if not related:
> +        logger.info('No related messages found')
> +        sys.exit(0)
> +
> +    # Display simplified results
> +    logger.info('Found %d related messages:', len(related))
> +    print()
> +    print('Related Messages Summary:')
> +    print('-' * 60)
> +
> +    for item in related:
> +        relationship = item.get('relationship', 'related')
> +        reason = item.get('reason', '')
> +
> +        print(f'[{relationship.upper()}] {reason}')
> +
> +    print('-' * 60)
> +    print()
> +
> +    # Generate output mbox filename
> +    if hasattr(cmdargs, 'output') and cmdargs.output:
> +        mbox_file = cmdargs.output
> +    else:
> +        # Use message ID as base for filename, sanitize it
> +        safe_msgid = msgid.replace('/', '_').replace('@', '_at_').replace('<', '').replace('>', '')
> +        mbox_file = f'{safe_msgid}-related.mbox'
> +
> +    # Download and combine all threads into one mbox
> +    logger.info('Downloading and combining all related threads...')
> +    nocache = hasattr(cmdargs, 'nocache') and cmdargs.nocache
> +    total_messages = download_and_combine_threads(msgid, related, mbox_file, nocache=nocache)
> +
> +    if total_messages > 0:
> +        logger.info('Success: Combined mbox saved to %s (%d messages)', mbox_file, total_messages)
> +        print(f'✓ Combined mbox file: {mbox_file}')
> +        print(f'  Total messages: {total_messages}')
> +        print(f'  Related threads: {len(related) + 1}')  # +1 for original
> +    else:
> +        logger.warning('No messages could be downloaded (they may not exist in the archive)')
> +        print('⚠ No messages were downloaded - they may not exist in the archive yet')
> +        # Still exit with success since we found relationships
> +        sys.exit(0)

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 16:40                       ` Linus Torvalds
  2025-09-09 17:08                         ` Mark Brown
@ 2025-09-09 17:25                         ` dan.j.williams
  2025-09-09 17:56                           ` Alexei Starovoitov
  2025-09-09 18:06                         ` Vlastimil Babka
  2 siblings, 1 reply; 74+ messages in thread
From: dan.j.williams @ 2025-09-09 17:25 UTC (permalink / raw)
  To: Linus Torvalds, Jens Axboe
  Cc: Vlastimil Babka, Konstantin Ryabitsev, Jakub Kicinski,
	Rafael J. Wysocki, dan.j.williams, Caleb Sander Mateos, io-uring,
	workflows

Linus Torvalds wrote:
> On Tue, 9 Sept 2025 at 07:50, Jens Axboe <axboe@kernel.dk> wrote:
> >
> > I think we all know the answer to that one - it would've been EXACTLY
> > the same outcome. Not to put words in Linus' mouth, but it's not the
> > name of the tag that he finds repulsive, it's the very fact that a link
> > is there and it isn't useful _to him_.
> 
[..]
> Honestly people. Stop with the garbage already, and admit that your
> links were just worthless noise.
> 
> And if you have some workflow that used them, maybe we can really add
> scripting for those kinds of one-liners.
> 
> And maybe lore could even have particular indexing for the data you
> are interested in if that helps.
> 
> In my experience, Konstantin has been very responsive when people have
> asked for those kinds of things (both b4 and lore).

Hmm, good point. Lore does have patchid indexing. This needs some more
cleanup but could replace my usage of patch.msgid.link.

firefox http://lore.kernel.org/all/?q=patchid%3A$(awk '{ print $1 }' <<< $(git patch-id --stable <<< $(git show $commit)))

Now, it does drop one useful feature that you know apriori that the
maintainer did not commit a private version of a patch. However it
should work in most cases.

It would be nice if that was guaranteed to land on the latest version of
the patch just in case that patch was posted in several versions without
changing.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [RFC] b4 dig: Add AI-powered email relationship discovery command
  2025-09-09 17:22           ` Laurent Pinchart
@ 2025-09-09 17:26             ` Jens Axboe
  2025-09-09 18:54               ` Sasha Levin
  2025-09-10 13:38             ` Konstantin Ryabitsev
  1 sibling, 1 reply; 74+ messages in thread
From: Jens Axboe @ 2025-09-09 17:26 UTC (permalink / raw)
  To: Laurent Pinchart, Sasha Levin
  Cc: konstantin, csander, io-uring, torvalds, workflows

On 9/9/25 11:22 AM, Laurent Pinchart wrote:
> On Tue, Sep 09, 2025 at 12:32:14PM -0400, Sasha Levin wrote:
>> Add a new 'b4 dig' subcommand that uses AI agents to discover related
>> emails for a given message ID. This helps developers find all relevant
>> context around patches including previous versions, bug reports, reviews,
>> and related discussions.
> 
> That really sounds like "if all you have is a hammer, everything looks
> like a nail". The community has been working for multiple years to
> improve discovery of relationships between patches and commits, with
> great tools such are lore, lei and b4, and usage of commit IDs, patch
> IDs and message IDs to link everything together. Those provide exact
> results in a deterministic way, and consume a fraction of power of what
> this patch would do. It would be very sad if this would be the direction
> we decide to take.

Fully agree, this kind of lazy "oh just waste billions of cycles and
punt to some AI" bs is just kind of giving up on proper infrastructure
to support maintainers and developers.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 17:08                         ` Mark Brown
@ 2025-09-09 17:50                           ` Linus Torvalds
  2025-09-09 17:58                             ` Linus Torvalds
  0 siblings, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2025-09-09 17:50 UTC (permalink / raw)
  To: Mark Brown
  Cc: Jens Axboe, Vlastimil Babka, Konstantin Ryabitsev, Jakub Kicinski,
	Rafael J. Wysocki, dan.j.williams, Caleb Sander Mateos, io-uring,
	workflows

On Tue, 9 Sept 2025 at 10:08, Mark Brown <broonie@kernel.org> wrote:
>
> That works great for pull requests, but it's not so useful for a random
> patch like 5f9efb6b7667043527d377421af2070cc0aa2ecd

Sure it is.

The one-liner is just different. Use the patch-id instead. See Dan's
email - and the whole long discussion about how lore *ALREADY* has
most of this support.

Yeah, the patch-id command is admittedly a bit more esoteric than just
looking up the merge parent commit.

Using "git rev-parse" is already a bit obscure (although honestly,
it's a really useful command, and I actually do use it somewhat
regularly from the command line).

Using "git patch-id" is definitely in the "write a script for it"
category. I don't think I've ever used it as-is from the command line
as part of a one-liner. It's very much a command that is designed
purely for scripting, the interface is just odd and baroque and
doesn't really make sense for one-liners.

The typical use of patch-id is to generate two *lists* of patch-ids,
then sort them and use the patch-id as a key to find commits that look
the same.

That hopefully explains why the patch-id behavior is so odd, and not
really suited for using directly on the command line.

But my point is that we really have the infrastructure already in
place, and it's better than hardcoding some broken link into commits.

Now, I don't have that commit you mention (I assume it's some recent
commit in your own tree), but I picked a random commit from my
top-of-tree that contains one of those useless links, and look here:

   patchid=$(git diff-tree -p fef7ded169ed7e133612f90a032dc2af1ce19bef
| git patch-id | cut -d' ' -f1)
   firefox http://lore.kernel.org/all/?q=patchid:$patchid

and it's right there. It finds the stable tree backport too, and if
there had been multiple versions of the same patch posted, it would
have found the history of it all too.

Look, I readily admit that I would never write that as a one-liner. In
fact, I got it wrong the first time - I don't use 'cut' often enough,
and I forgot that the default delimeter is 'tab', not space, and got
garbage.

So that 'patch-id' generation line is just crazy line noise. I'm *not*
suggesting you do that.

But this kind of thing is literally what I'm talking about when I say
"maybe we could add a few scripts to support what you are doing".

                 Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 17:25                         ` dan.j.williams
@ 2025-09-09 17:56                           ` Alexei Starovoitov
  2025-09-09 18:01                             ` Linus Torvalds
  0 siblings, 1 reply; 74+ messages in thread
From: Alexei Starovoitov @ 2025-09-09 17:56 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Linus Torvalds, Jens Axboe, Vlastimil Babka, Konstantin Ryabitsev,
	Jakub Kicinski, Rafael J. Wysocki, Caleb Sander Mateos, io-uring,
	workflows

On Tue, Sep 09, 2025 at 10:25:11AM -0700, dan.j.williams@intel.com wrote:
> Linus Torvalds wrote:
> > On Tue, 9 Sept 2025 at 07:50, Jens Axboe <axboe@kernel.dk> wrote:
> > >
> > > I think we all know the answer to that one - it would've been EXACTLY
> > > the same outcome. Not to put words in Linus' mouth, but it's not the
> > > name of the tag that he finds repulsive, it's the very fact that a link
> > > is there and it isn't useful _to him_.
> > 
> [..]
> > Honestly people. Stop with the garbage already, and admit that your
> > links were just worthless noise.
> > 
> > And if you have some workflow that used them, maybe we can really add
> > scripting for those kinds of one-liners.
> > 
> > And maybe lore could even have particular indexing for the data you
> > are interested in if that helps.
> > 
> > In my experience, Konstantin has been very responsive when people have
> > asked for those kinds of things (both b4 and lore).
> 
> Hmm, good point. Lore does have patchid indexing. This needs some more
> cleanup but could replace my usage of patch.msgid.link.
> 
> firefox http://lore.kernel.org/all/?q=patchid%3A$(awk '{ print $1 }' <<< $(git patch-id --stable <<< $(git show $commit)))
> 
> Now, it does drop one useful feature that you know apriori that the
> maintainer did not commit a private version of a patch. However it
> should work in most cases.

It doesn't work reliably. Often enough maintainers massage the patch
a bit while applying to fix minor nits and patch-id will be different.
Here is just one such example:
c11f34e30088 ("bpf: Make update_prog_stats() always_inline")
and the commit includes the lore link to the original patch and
the comment how I tweaked it while applying:
https://lore.kernel.org/all/20250621045501.101187-1-dongml2@chinatelecom.cn/
git patch-id cannot find it:
https://lore.kernel.org/all/?q=patchid%3Afa0565c81e53682a83f4a0e6699c5664c53cda27

Linus's q=$(git rev-parse e9eaca6bf69d^2) trick worked because pr-tracker-bot
replied. That bot is not reliable either. Often enough we mark patches
in patchwork manually, because tracker missed them.

Really, there is no way for automation to detect the connection between
commit that landed in the tree and the original email unless git hooks
add something to the commit. Right now Link tag is that connection.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 17:50                           ` Linus Torvalds
@ 2025-09-09 17:58                             ` Linus Torvalds
  2025-09-09 18:31                               ` Konstantin Ryabitsev
  0 siblings, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2025-09-09 17:58 UTC (permalink / raw)
  To: Mark Brown
  Cc: Jens Axboe, Vlastimil Babka, Konstantin Ryabitsev, Jakub Kicinski,
	Rafael J. Wysocki, dan.j.williams, Caleb Sander Mateos, io-uring,
	workflows

On Tue, 9 Sept 2025 at 10:50, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>    patchid=$(git diff-tree -p fef7ded169ed7e133612f90a032dc2af1ce19bef
> | git patch-id | cut -d' ' -f1)

Oh, and looking more at that, use Dan's version instead.  You almost
certainly want to use '--stable' like Dan did, although maybe
Konstantin can speak up on what option lore actually uses for
indexing.

And you *can* screw up patchid matching. In particular, you can
generate patches different ways, and patch-id won't generate the same
thing for a rename patch and a add/delete patch, for example (again:
the traditional use case is that you generate the patch IDs all from
the same tree, so you control how you generate the patches)

But patch-ids are an underrated feature. They are actually lovely, and
git uses them under the hood for rebases etc.

             Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 17:56                           ` Alexei Starovoitov
@ 2025-09-09 18:01                             ` Linus Torvalds
  2025-09-09 18:13                               ` Alexei Starovoitov
  0 siblings, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2025-09-09 18:01 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: dan.j.williams, Jens Axboe, Vlastimil Babka, Konstantin Ryabitsev,
	Jakub Kicinski, Rafael J. Wysocki, Caleb Sander Mateos, io-uring,
	workflows

On Tue, 9 Sept 2025 at 10:56, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> It doesn't work reliably. Often enough maintainers massage the patch
> a bit while applying to fix minor nits and patch-id will be different.

Honestly, if you massage a patch you should probably mention it.

THAT is the kind of thing where it actually makes sense to say
"modified version of XYZ" and pointing to the original.

Look, at that point it's actually *IMPORTANT* to explicitly state that
you didn't actually apply the original patch.

This falls clearly under the "don't add links mindlessly, do it
mindfully" heading.

          Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 16:40                       ` Linus Torvalds
  2025-09-09 17:08                         ` Mark Brown
  2025-09-09 17:25                         ` dan.j.williams
@ 2025-09-09 18:06                         ` Vlastimil Babka
  2025-09-09 18:14                           ` Linus Torvalds
  2 siblings, 1 reply; 74+ messages in thread
From: Vlastimil Babka @ 2025-09-09 18:06 UTC (permalink / raw)
  To: Linus Torvalds, Jens Axboe
  Cc: Konstantin Ryabitsev, Jakub Kicinski, Rafael J. Wysocki,
	dan.j.williams, Caleb Sander Mateos, io-uring, workflows

On 9/9/25 18:40, Linus Torvalds wrote:
> On Tue, 9 Sept 2025 at 07:50, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> I think we all know the answer to that one - it would've been EXACTLY
>> the same outcome. Not to put words in Linus' mouth, but it's not the
>> name of the tag that he finds repulsive, it's the very fact that a link
>> is there and it isn't useful _to him_.
> 
> It's not that it isn't "useful to me". It's that it HURTS, and it's
> entirely redundant.
> 
> It literally wastes my time. Yes, I have the option to ignore them,
> but then I ignore potentially *good* links.
> 
> Rafael asked what the difference between "Fixes:" and "Cc: stable" is
> - it's exactly the fact that those do NOT waste human time, and they
> were NOT automated garbage.
> 
> The rules for those are that they have been added *thoughtfully*: you
> don't add 'stable' with automation without even thinking about it, do
> you?
> 
> And if you did, THAT WOULD BE WRONG TOO.
> 
> Wouldn't you agree?

I fully agree. Now the sad part of this example is that if one conciously
decides that the bug fixed is not critical enough according to the
documented stable rules, and doesn't add Cc: stable, there's a good chance
the AUTOSEL automation will pick it anyway, these days with a help of LLM.
> Dammit, is it really so hard to understand this issue? Automated noise
> is bad noise. And when it has a human cost, it needs to go away.
> 
> I'm not saying that you can't link to the original email. But you need
> to STOP THE MINDLESS AUTOMATION WHEN IT HURTS.
> 
> So add the link, by all means - but only add it when it is relevant
> and gives real information. And THINK about it, don't have it in some
> mindless script.

I'd hope that distinguishing the automated links from conscious one (i.e.
using the patch.msgid.link vs lore domains) would be enough to make everyone
happy without hurting. But fine.
> Because if it's in a mindless script, then dammit, the lore "search"
> function is objectively better after-the-fact. Really. Using the lore
> search gives the original email *and* more.
> 
> The same, btw, goes for my merge messages. No, I'm not going to add
> some idiotic "Link" to the original pull request email. Not only don't
> I fetch those from lore to begin with, you can literally search for
> them.
> 
> Look here, for the latest merge I did of your tree: e9eaca6bf69d.
Later in the thread patch-id is mentioned. I think it was mentioned in the
past threads that due to small context changes due to e.g. base that the
submitter used and the maintainer used to apply, and even diff algorithm not
being set in stone, they can't be made fully reliable?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 18:01                             ` Linus Torvalds
@ 2025-09-09 18:13                               ` Alexei Starovoitov
  0 siblings, 0 replies; 74+ messages in thread
From: Alexei Starovoitov @ 2025-09-09 18:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dan Williams, Jens Axboe, Vlastimil Babka, Konstantin Ryabitsev,
	Jakub Kicinski, Rafael J. Wysocki, Caleb Sander Mateos, io-uring,
	workflows

On Tue, Sep 9, 2025 at 11:01 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, 9 Sept 2025 at 10:56, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > It doesn't work reliably. Often enough maintainers massage the patch
> > a bit while applying to fix minor nits and patch-id will be different.
>
> Honestly, if you massage a patch you should probably mention it.
>
> THAT is the kind of thing where it actually makes sense to say
> "modified version of XYZ" and pointing to the original.
>
> Look, at that point it's actually *IMPORTANT* to explicitly state that
> you didn't actually apply the original patch.

and I did in the email reply (as you could see in the lore link).
That's what we always do.
Email is the way to communicate such changes.
Sometimes we rewrite the commit log too to reduce verbosity,
fix typos or whatever. Without direct email reply developers
don't notice that commit was tweaked.

The point is that 'git patch-id --stable' is not reliable.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 18:06                         ` Vlastimil Babka
@ 2025-09-09 18:14                           ` Linus Torvalds
  2025-09-09 18:22                             ` Vlastimil Babka
  0 siblings, 1 reply; 74+ messages in thread
From: Linus Torvalds @ 2025-09-09 18:14 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Jens Axboe, Konstantin Ryabitsev, Jakub Kicinski,
	Rafael J. Wysocki, dan.j.williams, Caleb Sander Mateos, io-uring,
	workflows

On Tue, 9 Sept 2025 at 11:07, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> Later in the thread patch-id is mentioned. I think it was mentioned in the
> past threads that due to small context changes due to e.g. base that the
> submitter used and the maintainer used to apply, and even diff algorithm not
> being set in stone, they can't be made fully reliable?

Yes, the patch-id is a heuristic. It's really a very good heuristic in
practice, though.

Also, if the argument is "it might not always work", I still claim
that "99.5% useful" is a hell of a lot better than "_maybe_ useful in
the future, but known to be painful".

Because that's the trade-off here: people are arguing for something
that wastes time and effort, and with very dubious use cases.

But yes: please do continue to add links to the original email - IF
you thought about it. That has always been my standpoint. Exactly like
"Fixes", and exactly like EVERY SINGLE OTHER THING you add to a commit
message.

              Linus

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 18:14                           ` Linus Torvalds
@ 2025-09-09 18:22                             ` Vlastimil Babka
  2025-09-09 21:05                               ` Mark Brown
  0 siblings, 1 reply; 74+ messages in thread
From: Vlastimil Babka @ 2025-09-09 18:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jens Axboe, Konstantin Ryabitsev, Jakub Kicinski,
	Rafael J. Wysocki, dan.j.williams, Caleb Sander Mateos, io-uring,
	workflows

On 9/9/25 20:14, Linus Torvalds wrote:
> On Tue, 9 Sept 2025 at 11:07, Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> Later in the thread patch-id is mentioned. I think it was mentioned in the
>> past threads that due to small context changes due to e.g. base that the
>> submitter used and the maintainer used to apply, and even diff algorithm not
>> being set in stone, they can't be made fully reliable?
> 
> Yes, the patch-id is a heuristic. It's really a very good heuristic in
> practice, though.
> 
> Also, if the argument is "it might not always work", I still claim
> that "99.5% useful" is a hell of a lot better than "_maybe_ useful in
> the future, but known to be painful".
> 
> Because that's the trade-off here: people are arguing for something
> that wastes time and effort, and with very dubious use cases.
> 
> But yes: please do continue to add links to the original email - IF
> you thought about it. That has always been my standpoint. Exactly like
> "Fixes", and exactly like EVERY SINGLE OTHER THING you add to a commit
> message.

Fine, maybe b4 could help here by verifying if patch-id works on commits in
the maintainer's branch before sending a pr, and for those where it doesn't,
the maintainer can decide to add them. It sounds more useful to me than
adding anything "AI-powered" to it.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 17:58                             ` Linus Torvalds
@ 2025-09-09 18:31                               ` Konstantin Ryabitsev
  2025-09-09 19:36                                 ` dan.j.williams
  2025-09-10  1:12                                 ` dan.j.williams
  0 siblings, 2 replies; 74+ messages in thread
From: Konstantin Ryabitsev @ 2025-09-09 18:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mark Brown, Jens Axboe, Vlastimil Babka, Jakub Kicinski,
	Rafael J. Wysocki, dan.j.williams, Caleb Sander Mateos, io-uring,
	workflows

On Tue, Sep 09, 2025 at 10:58:53AM -0700, Linus Torvalds wrote:
> On Tue, 9 Sept 2025 at 10:50, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> >    patchid=$(git diff-tree -p fef7ded169ed7e133612f90a032dc2af1ce19bef
> > | git patch-id | cut -d' ' -f1)
> 
> Oh, and looking more at that, use Dan's version instead.  You almost
> certainly want to use '--stable' like Dan did, although maybe
> Konstantin can speak up on what option lore actually uses for
> indexing.

It uses --stable.

> And you *can* screw up patchid matching. In particular, you can
> generate patches different ways, and patch-id won't generate the same
> thing for a rename patch and a add/delete patch, for example (again:
> the traditional use case is that you generate the patch IDs all from
> the same tree, so you control how you generate the patches)

We can't control how the patches are generated by submitters. If someone
generates and sends them with --histogram, this won't work. Here's an example
right from your tree:

    $ git show 1c67f9c54cdc70627e3f6472b89cd3d895df974c | git patch-id --stable | cut -d' ' -f1
    57cb8d951fd1006d885f6bc7083283d3bc6040c1

    $ git show --histogram 1c67f9c54cdc70627e3f6472b89cd3d895df974c | git patch-id --stable | cut -d' ' -f1
    47b4bfff33d1456d0a2bb30f8bd74e1cfe9eb31e

Or if someone generates with -U5 instead of the default (-U3):

    $ git show 1c67f9c54cdc70627e3f6472b89cd3d895df974c -U5 | git patch-id --stable | cut -d' ' -f1
    0b68dd472dc791447c3091f7a671e7f1e5d7a3d2

This is more than just annoying -- this can be misleading and confusing. If
the submitter sent v1, v2, v3 with the default parameters and then sent v4
with --histogram, then you may think v3 was the final version that got applied
and it will waste a lot of your time trying to figure out why it doesn't match
what's in the tree.

I don't have precise statistics, but I do have firsthand experience trying to
make this work with git-patch-id, because this is how git-patchwork-bot works,
and we can't match a significant portion of commits to patches.

-K

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [RFC] b4 dig: Add AI-powered email relationship discovery command
  2025-09-09 17:26             ` Jens Axboe
@ 2025-09-09 18:54               ` Sasha Levin
  2025-09-10 10:13                 ` Laurent Pinchart
  0 siblings, 1 reply; 74+ messages in thread
From: Sasha Levin @ 2025-09-09 18:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Laurent Pinchart, konstantin, csander, io-uring, torvalds,
	workflows

On Tue, Sep 09, 2025 at 11:26:19AM -0600, Jens Axboe wrote:
>On 9/9/25 11:22 AM, Laurent Pinchart wrote:
>> On Tue, Sep 09, 2025 at 12:32:14PM -0400, Sasha Levin wrote:
>>> Add a new 'b4 dig' subcommand that uses AI agents to discover related
>>> emails for a given message ID. This helps developers find all relevant
>>> context around patches including previous versions, bug reports, reviews,
>>> and related discussions.
>>
>> That really sounds like "if all you have is a hammer, everything looks
>> like a nail". The community has been working for multiple years to
>> improve discovery of relationships between patches and commits, with
>> great tools such are lore, lei and b4, and usage of commit IDs, patch
>> IDs and message IDs to link everything together. Those provide exact
>> results in a deterministic way, and consume a fraction of power of what
>> this patch would do. It would be very sad if this would be the direction
>> we decide to take.
>
>Fully agree, this kind of lazy "oh just waste billions of cycles and
>punt to some AI" bs is just kind of giving up on proper infrastructure
>to support maintainers and developers.

This feels like a false choice: why force a pick between b4-dig-like tooling
and improving our infra? They can work together. As tagging and workflows
improve, those gains will flow into the tools anyway.

It's like saying we should skip -rc releases because they mean we've given up
on bug free code.

Perfect is the enemy of the good. You're arguing against a tool that works now,
just because it's not ideal, and to chase perfection instead. I'd rather be
"lazy" and skip the endless lore hunts.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 18:31                               ` Konstantin Ryabitsev
@ 2025-09-09 19:36                                 ` dan.j.williams
  2025-09-10  1:12                                 ` dan.j.williams
  1 sibling, 0 replies; 74+ messages in thread
From: dan.j.williams @ 2025-09-09 19:36 UTC (permalink / raw)
  To: Konstantin Ryabitsev, Linus Torvalds
  Cc: Mark Brown, Jens Axboe, Vlastimil Babka, Jakub Kicinski,
	Rafael J. Wysocki, dan.j.williams, Caleb Sander Mateos, io-uring,
	workflows

Konstantin Ryabitsev wrote:
[..]
> I don't have precise statistics, but I do have firsthand experience trying to
> make this work with git-patch-id, because this is how git-patchwork-bot works,
> and we can't match a significant portion of commits to patches.

What about something like this based on the idea that the same set of
files are almost never being touched at the exact same author-date
second:

time=0
for i in $(git show $1 --pretty=format:"%at" --name-only)
do
        if [ $time -eq 0 ]; then
                time=1
                echo -n "d:$i..$i"
        else
                echo -n " dfn:$i"
        fi
done
echo ""

For example:
$ ./lore.sh f10f46a0ee53420f707195fe33b7c235a1c0e48a
d:1752747497..1752747497 dfn:drivers/cxl/core/mbox.c dfn:drivers/cxl/core/trace.h dfn:drivers/cxl/cxlmem.h dfn:include/cxl/event.h

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 18:22                             ` Vlastimil Babka
@ 2025-09-09 21:05                               ` Mark Brown
  2025-09-10  1:33                                 ` Konstantin Ryabitsev
  0 siblings, 1 reply; 74+ messages in thread
From: Mark Brown @ 2025-09-09 21:05 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linus Torvalds, Jens Axboe, Konstantin Ryabitsev, Jakub Kicinski,
	Rafael J. Wysocki, dan.j.williams, Caleb Sander Mateos, io-uring,
	workflows

[-- Attachment #1: Type: text/plain, Size: 1439 bytes --]

On Tue, Sep 09, 2025 at 08:22:05PM +0200, Vlastimil Babka wrote:
> On 9/9/25 20:14, Linus Torvalds wrote:

> > But yes: please do continue to add links to the original email - IF
> > you thought about it. That has always been my standpoint. Exactly like
> > "Fixes", and exactly like EVERY SINGLE OTHER THING you add to a commit
> > message.

> Fine, maybe b4 could help here by verifying if patch-id works on commits in
> the maintainer's branch before sending a pr, and for those where it doesn't,
> the maintainer can decide to add them. It sounds more useful to me than
> adding anything "AI-powered" to it.

I think ideally if there's tooling for this it should have both a
verification feature like you mention and also be supported by b4 mbox
so that you can say "b4 mbox ${COMMIT}" or whatever and have it download
a mailbox like can currently be done with a message ID.  That'd keep the
usability we currently have, the tool could look in the message for a
link and use that if it needs it.

What might be especially usable for applying/publishing would be
something that can be used either in a hook or more likely in scripting
that'll take the Message-Ids from git that people currently use to
generate the Link: tags and discard them if whatever the tool usually
uses to find mail archive links works without them, or rewrite them into
Link: tags if not.  The tool could emit the warning you suggest when
leaving the links in.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 18:31                               ` Konstantin Ryabitsev
  2025-09-09 19:36                                 ` dan.j.williams
@ 2025-09-10  1:12                                 ` dan.j.williams
  2025-09-10 12:19                                   ` Mark Brown
  1 sibling, 1 reply; 74+ messages in thread
From: dan.j.williams @ 2025-09-10  1:12 UTC (permalink / raw)
  To: Konstantin Ryabitsev, Linus Torvalds
  Cc: Mark Brown, Jens Axboe, Vlastimil Babka, Jakub Kicinski,
	Rafael J. Wysocki, dan.j.williams, Caleb Sander Mateos, io-uring,
	workflows

Konstantin Ryabitsev wrote:
> On Tue, Sep 09, 2025 at 10:58:53AM -0700, Linus Torvalds wrote:
> > On Tue, 9 Sept 2025 at 10:50, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > >    patchid=$(git diff-tree -p fef7ded169ed7e133612f90a032dc2af1ce19bef
> > > | git patch-id | cut -d' ' -f1)
> > 
> > Oh, and looking more at that, use Dan's version instead.  You almost
> > certainly want to use '--stable' like Dan did, although maybe
> > Konstantin can speak up on what option lore actually uses for
> > indexing.
> 
> It uses --stable.
> 
> > And you *can* screw up patchid matching. In particular, you can
> > generate patches different ways, and patch-id won't generate the same
> > thing for a rename patch and a add/delete patch, for example (again:
> > the traditional use case is that you generate the patch IDs all from
> > the same tree, so you control how you generate the patches)
> 
> We can't control how the patches are generated by submitters. If someone
> generates and sends them with --histogram, this won't work. Here's an example
> right from your tree:
> 
>     $ git show 1c67f9c54cdc70627e3f6472b89cd3d895df974c | git patch-id --stable | cut -d' ' -f1
>     57cb8d951fd1006d885f6bc7083283d3bc6040c1
> 
>     $ git show --histogram 1c67f9c54cdc70627e3f6472b89cd3d895df974c | git patch-id --stable | cut -d' ' -f1
>     47b4bfff33d1456d0a2bb30f8bd74e1cfe9eb31e
> 
> Or if someone generates with -U5 instead of the default (-U3):
> 
>     $ git show 1c67f9c54cdc70627e3f6472b89cd3d895df974c -U5 | git patch-id --stable | cut -d' ' -f1
>     0b68dd472dc791447c3091f7a671e7f1e5d7a3d2

Is this a matter of teach git send-email to generate a header, e.g.
"X-Patch-ID:", for a given stable diff format convention? That lets
submitters use any diff format they want, but the X-Patch-ID: is
constant. Then "git show $diff_opts_convention $commit" becomes more
reliable over time.

It still does not help the problem of maintainers massaging patches on
their way upstream, but patch.msgid.link does not help that either
because that Link: is not the patch that was merged. So if you care
about automated tooling being able to query lore for commits, the
maintainer simply needs to push modified patches back out to the list,
or accept the consequences of disconnecting the commit from the lore
lookups.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-09 21:05                               ` Mark Brown
@ 2025-09-10  1:33                                 ` Konstantin Ryabitsev
  0 siblings, 0 replies; 74+ messages in thread
From: Konstantin Ryabitsev @ 2025-09-10  1:33 UTC (permalink / raw)
  To: Mark Brown
  Cc: Vlastimil Babka, Linus Torvalds, Jens Axboe, Jakub Kicinski,
	Rafael J. Wysocki, dan.j.williams, Caleb Sander Mateos, io-uring,
	workflows

On Tue, Sep 09, 2025 at 10:05:16PM +0100, Mark Brown wrote:
> > Fine, maybe b4 could help here by verifying if patch-id works on commits in
> > the maintainer's branch before sending a pr, and for those where it doesn't,
> > the maintainer can decide to add them. It sounds more useful to me than
> > adding anything "AI-powered" to it.
> 
> I think ideally if there's tooling for this it should have both a
> verification feature like you mention and also be supported by b4 mbox
> so that you can say "b4 mbox ${COMMIT}" or whatever and have it download
> a mailbox like can currently be done with a message ID.  That'd keep the
> usability we currently have, the tool could look in the message for a
> link and use that if it needs it.

Yeah, this is actually a neat idea -- I'll put that on the menu. We'll try
both --myers and --histogram when looking it up and then try a few other
tricks if we don't find anything (query by subject+author, etc).

I'll let you know when it's ready to test out.

-K

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [RFC] b4 dig: Add AI-powered email relationship discovery command
  2025-09-09 18:54               ` Sasha Levin
@ 2025-09-10 10:13                 ` Laurent Pinchart
  2025-09-10 10:55                   ` Sasha Levin
  0 siblings, 1 reply; 74+ messages in thread
From: Laurent Pinchart @ 2025-09-10 10:13 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Jens Axboe, konstantin, csander, io-uring, torvalds, workflows

On Tue, Sep 09, 2025 at 02:54:53PM -0400, Sasha Levin wrote:
> On Tue, Sep 09, 2025 at 11:26:19AM -0600, Jens Axboe wrote:
> > On 9/9/25 11:22 AM, Laurent Pinchart wrote:
> >> On Tue, Sep 09, 2025 at 12:32:14PM -0400, Sasha Levin wrote:
> >>> Add a new 'b4 dig' subcommand that uses AI agents to discover related
> >>> emails for a given message ID. This helps developers find all relevant
> >>> context around patches including previous versions, bug reports, reviews,
> >>> and related discussions.
> >>
> >> That really sounds like "if all you have is a hammer, everything looks
> >> like a nail". The community has been working for multiple years to
> >> improve discovery of relationships between patches and commits, with
> >> great tools such are lore, lei and b4, and usage of commit IDs, patch
> >> IDs and message IDs to link everything together. Those provide exact
> >> results in a deterministic way, and consume a fraction of power of what
> >> this patch would do. It would be very sad if this would be the direction
> >> we decide to take.
> >
> > Fully agree, this kind of lazy "oh just waste billions of cycles and
> > punt to some AI" bs is just kind of giving up on proper infrastructure
> > to support maintainers and developers.
> 
> This feels like a false choice: why force a pick between b4-dig-like tooling
> and improving our infra? They can work together. As tagging and workflows
> improve, those gains will flow into the tools anyway.
> 
> It's like saying we should skip -rc releases because they mean we've given up
> on bug free code.

I really have trouble thinking this is a honest argument. We're
discussing the topic in the context of a project where we reject
thousands of patches all the time when they don't go in the right
direction.

Throwing an LLM at the problem is not just a major waste of power, it
also hurts as it will shift the focus away from improving the
deterministic tools we've been working on. Using an LLM here only
benefits the companies that make money from those proprietary tools.

> Perfect is the enemy of the good. You're arguing against a tool that works now,
> just because it's not ideal, and to chase perfection instead. I'd rather be
> "lazy" and skip the endless lore hunts.

Nobody will stop you from being lazy or anything else. I'll however push
back against the bad influence on others.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [RFC] b4 dig: Add AI-powered email relationship discovery command
  2025-09-10 10:13                 ` Laurent Pinchart
@ 2025-09-10 10:55                   ` Sasha Levin
  2025-09-10 11:29                     ` Laurent Pinchart
  0 siblings, 1 reply; 74+ messages in thread
From: Sasha Levin @ 2025-09-10 10:55 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Jens Axboe, konstantin, csander, io-uring, torvalds, workflows

On Wed, Sep 10, 2025 at 01:13:42PM +0300, Laurent Pinchart wrote:
>On Tue, Sep 09, 2025 at 02:54:53PM -0400, Sasha Levin wrote:
>> On Tue, Sep 09, 2025 at 11:26:19AM -0600, Jens Axboe wrote:
>> > On 9/9/25 11:22 AM, Laurent Pinchart wrote:
>> >> On Tue, Sep 09, 2025 at 12:32:14PM -0400, Sasha Levin wrote:
>> >>> Add a new 'b4 dig' subcommand that uses AI agents to discover related
>> >>> emails for a given message ID. This helps developers find all relevant
>> >>> context around patches including previous versions, bug reports, reviews,
>> >>> and related discussions.
>> >>
>> >> That really sounds like "if all you have is a hammer, everything looks
>> >> like a nail". The community has been working for multiple years to
>> >> improve discovery of relationships between patches and commits, with
>> >> great tools such are lore, lei and b4, and usage of commit IDs, patch
>> >> IDs and message IDs to link everything together. Those provide exact
>> >> results in a deterministic way, and consume a fraction of power of what
>> >> this patch would do. It would be very sad if this would be the direction
>> >> we decide to take.
>> >
>> > Fully agree, this kind of lazy "oh just waste billions of cycles and
>> > punt to some AI" bs is just kind of giving up on proper infrastructure
>> > to support maintainers and developers.
>>
>> This feels like a false choice: why force a pick between b4-dig-like tooling
>> and improving our infra? They can work together. As tagging and workflows
>> improve, those gains will flow into the tools anyway.
>>
>> It's like saying we should skip -rc releases because they mean we've given up
>> on bug free code.
>
>I really have trouble thinking this is a honest argument. We're
>discussing the topic in the context of a project where we reject
>thousands of patches all the time when they don't go in the right
>direction.
>
>Throwing an LLM at the problem is not just a major waste of power, it
>also hurts as it will shift the focus away from improving the
>deterministic tools we've been working on. Using an LLM here only
>benefits the companies that make money from those proprietary tools.

Really? Elsewhere in this thread I've already pointed out that something like
this is very helpful because having to review hundreds of commits for
backport/CVE assignment using our current scheme of Link: just pointing to the
original submission and not any discussions is a major pain and a time waste.

Both Linus and Greg echoed the same concern. Heck, this thread started because
Linus complained about how much of a human time waste our current scheme is.

So when presented with a real problem and a tool that can help mitigate it,
you're still choosing to attack because it's LLM based and you have something
personal against that.

Sorry, I'm done with this argument. Use it if you like, don't use it if you
don't.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [RFC] b4 dig: Add AI-powered email relationship discovery command
  2025-09-10 10:55                   ` Sasha Levin
@ 2025-09-10 11:29                     ` Laurent Pinchart
  0 siblings, 0 replies; 74+ messages in thread
From: Laurent Pinchart @ 2025-09-10 11:29 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Jens Axboe, konstantin, csander, io-uring, torvalds, workflows

On Wed, Sep 10, 2025 at 06:55:55AM -0400, Sasha Levin wrote:
> On Wed, Sep 10, 2025 at 01:13:42PM +0300, Laurent Pinchart wrote:
> > On Tue, Sep 09, 2025 at 02:54:53PM -0400, Sasha Levin wrote:
> >> On Tue, Sep 09, 2025 at 11:26:19AM -0600, Jens Axboe wrote:
> >>> On 9/9/25 11:22 AM, Laurent Pinchart wrote:
> >>>> On Tue, Sep 09, 2025 at 12:32:14PM -0400, Sasha Levin wrote:
> >>>>> Add a new 'b4 dig' subcommand that uses AI agents to discover related
> >>>>> emails for a given message ID. This helps developers find all relevant
> >>>>> context around patches including previous versions, bug reports, reviews,
> >>>>> and related discussions.
> >>>>
> >>>> That really sounds like "if all you have is a hammer, everything looks
> >>>> like a nail". The community has been working for multiple years to
> >>>> improve discovery of relationships between patches and commits, with
> >>>> great tools such are lore, lei and b4, and usage of commit IDs, patch
> >>>> IDs and message IDs to link everything together. Those provide exact
> >>>> results in a deterministic way, and consume a fraction of power of what
> >>>> this patch would do. It would be very sad if this would be the direction
> >>>> we decide to take.
> >>>
> >>> Fully agree, this kind of lazy "oh just waste billions of cycles and
> >>> punt to some AI" bs is just kind of giving up on proper infrastructure
> >>> to support maintainers and developers.
> >>
> >> This feels like a false choice: why force a pick between b4-dig-like tooling
> >> and improving our infra? They can work together. As tagging and workflows
> >> improve, those gains will flow into the tools anyway.
> >>
> >> It's like saying we should skip -rc releases because they mean we've given up
> >> on bug free code.
> >
> > I really have trouble thinking this is a honest argument. We're
> > discussing the topic in the context of a project where we reject
> > thousands of patches all the time when they don't go in the right
> > direction.
> >
> > Throwing an LLM at the problem is not just a major waste of power, it
> > also hurts as it will shift the focus away from improving the
> > deterministic tools we've been working on. Using an LLM here only
> > benefits the companies that make money from those proprietary tools.
> 
> Really? Elsewhere in this thread I've already pointed out that something like
> this is very helpful because having to review hundreds of commits for
> backport/CVE assignment using our current scheme of Link: just pointing to the
> original submission and not any discussions is a major pain and a time waste.

That's debatable at best, I've stopped counting the number of developers
who are unhappy with the over-aggressive backport policy, and how LLMs
pick commits for backport that have no reason to be backported.

Don't get me wrong, I wouldn't want to be responsible for maintaining
stable kernels given the volume of commits. It's a never-ending,
thankless job. Still, claiming that LLMs solve the problem doesn't seem
true to me, they merely provide some sort of automation to allow us to
claim the problem has been addressed. It's closer to theatre than
engineering.

> Both Linus and Greg echoed the same concern. Heck, this thread started because
> Linus complained about how much of a human time waste our current scheme is.
> 
> So when presented with a real problem and a tool that can help mitigate it,
> you're still choosing to attack because it's LLM based and you have something
> personal against that.

I've long had something personal about using the wrong tool for a job
:-) That has very little chance of changing.

> Sorry, I'm done with this argument. Use it if you like, don't use it if you
> don't.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5)
  2025-09-10  1:12                                 ` dan.j.williams
@ 2025-09-10 12:19                                   ` Mark Brown
  0 siblings, 0 replies; 74+ messages in thread
From: Mark Brown @ 2025-09-10 12:19 UTC (permalink / raw)
  To: dan.j.williams
  Cc: Konstantin Ryabitsev, Linus Torvalds, Jens Axboe, Vlastimil Babka,
	Jakub Kicinski, Rafael J. Wysocki, Caleb Sander Mateos, io-uring,
	workflows

[-- Attachment #1: Type: text/plain, Size: 1574 bytes --]

On Tue, Sep 09, 2025 at 06:12:19PM -0700, dan.j.williams@intel.com wrote:
> Konstantin Ryabitsev wrote:

> > We can't control how the patches are generated by submitters. If someone
> > generates and sends them with --histogram, this won't work. Here's an example
> > right from your tree:

> Is this a matter of teach git send-email to generate a header, e.g.
> "X-Patch-ID:", for a given stable diff format convention? That lets
> submitters use any diff format they want, but the X-Patch-ID: is
> constant. Then "git show $diff_opts_convention $commit" becomes more
> reliable over time.

We can't rely on people using git send-email at all, and they might be
using an old version (eg, from their distro) even when thy do.

> It still does not help the problem of maintainers massaging patches on
> their way upstream, but patch.msgid.link does not help that either
> because that Link: is not the patch that was merged. So if you care
> about automated tooling being able to query lore for commits, the

That's not really what people are using these links for - if you have
the commit you don't need to go to the mailing list archive to get it.
You're more likely to be using them to find the relevant discussion, see
who was involved and possibly send some kind of followup (eg, to report
a regression in my case).  A link tag, especially one with a well
defined domain for the links, works fine in this sort of application
since it's pointing at the last point the thing was on the list
regardless of how poorly what's made it into git corresponds to what was
posted.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [RFC] b4 dig: Add AI-powered email relationship discovery command
  2025-09-09 17:22           ` Laurent Pinchart
  2025-09-09 17:26             ` Jens Axboe
@ 2025-09-10 13:38             ` Konstantin Ryabitsev
  2025-09-10 14:03               ` Andrew Dona-Couch
  1 sibling, 1 reply; 74+ messages in thread
From: Konstantin Ryabitsev @ 2025-09-10 13:38 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Sasha Levin, axboe, csander, io-uring, torvalds, workflows

On Tue, Sep 09, 2025 at 08:22:58PM +0300, Laurent Pinchart wrote:
> On Tue, Sep 09, 2025 at 12:32:14PM -0400, Sasha Levin wrote:
> > Add a new 'b4 dig' subcommand that uses AI agents to discover related
> > emails for a given message ID. This helps developers find all relevant
> > context around patches including previous versions, bug reports, reviews,
> > and related discussions.
> 
> That really sounds like "if all you have is a hammer, everything looks
> like a nail". The community has been working for multiple years to
> improve discovery of relationships between patches and commits, with
> great tools such are lore, lei and b4, and usage of commit IDs, patch
> IDs and message IDs to link everything together. Those provide exact
> results in a deterministic way, and consume a fraction of power of what
> this patch would do. It would be very sad if this would be the direction
> we decide to take.

I don't want to go too far down the "wasting resources path," because,
honestly, a kid playing videogames for a weekend will waste more power than a
maintainer submitting a couple of threads for analysis.

I've already worked on plugging in LLMs into summarization, so I'm not alien
or opposed to this approach. I'd like to make this available to maintainers
who find it useful, and completely out of the way for those maintainers who
hate the whole idea. :)

Best wishes,
-K

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [RFC] b4 dig: Add AI-powered email relationship discovery command
  2025-09-10 13:38             ` Konstantin Ryabitsev
@ 2025-09-10 14:03               ` Andrew Dona-Couch
  0 siblings, 0 replies; 74+ messages in thread
From: Andrew Dona-Couch @ 2025-09-10 14:03 UTC (permalink / raw)
  To: Konstantin Ryabitsev, Laurent Pinchart
  Cc: Sasha Levin, axboe, csander, io_uring Mailing List, torvalds,
	workflows

What a remarkable discussion!  The arguments being made lay bare an important difference in perspective.

> you're still choosing to attack because it's LLM based and you have something
> personal against that.

This argument seems an utter abrogation of an engineer's core responsibility to understand the tools they use.  The concerns raised against LLMs here were specific and technical, were they not?

> I don't want to go too far down the "wasting resources path," because,
> honestly, a kid playing videogames for a weekend will waste more power than a
> maintainer submitting a couple of threads for analysis.

Quite disingenuous to treat the marginal cost of a single search as if it accounted for the true cost to society of making the product available.

I appreciate the careful consideration of maintainers here who work to keep the focus of development on real humans and a thoughtful and deliberative processes.

Thanks,
Andrew




-- 
We all do better when we all do better.  -Paul Wellstone

On Wed, Sep 10, 2025, at 09:38, Konstantin Ryabitsev wrote:
> On Tue, Sep 09, 2025 at 08:22:58PM +0300, Laurent Pinchart wrote:
>> On Tue, Sep 09, 2025 at 12:32:14PM -0400, Sasha Levin wrote:
>> > Add a new 'b4 dig' subcommand that uses AI agents to discover related
>> > emails for a given message ID. This helps developers find all relevant
>> > context around patches including previous versions, bug reports, reviews,
>> > and related discussions.
>> 
>> That really sounds like "if all you have is a hammer, everything looks
>> like a nail". The community has been working for multiple years to
>> improve discovery of relationships between patches and commits, with
>> great tools such are lore, lei and b4, and usage of commit IDs, patch
>> IDs and message IDs to link everything together. Those provide exact
>> results in a deterministic way, and consume a fraction of power of what
>> this patch would do. It would be very sad if this would be the direction
>> we decide to take.
>
> I don't want to go too far down the "wasting resources path," because,
> honestly, a kid playing videogames for a weekend will waste more power than a
> maintainer submitting a couple of threads for analysis.
>
> I've already worked on plugging in LLMs into summarization, so I'm not alien
> or opposed to this approach. I'd like to make this available to maintainers
> who find it useful, and completely out of the way for those maintainers who
> hate the whole idea. :)
>
> Best wishes,
> -K

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [RFC] b4 dig: Add AI-powered email relationship discovery command
  2025-09-09 16:32         ` [RFC] b4 dig: Add AI-powered email relationship discovery command Sasha Levin
  2025-09-09 17:22           ` Laurent Pinchart
@ 2025-09-11 14:48           ` Nicolas Frattaroli
  2025-09-11 15:05             ` Sasha Levin
  2025-09-11 23:24           ` Konstantin Ryabitsev
  2 siblings, 1 reply; 74+ messages in thread
From: Nicolas Frattaroli @ 2025-09-11 14:48 UTC (permalink / raw)
  To: konstantin, Sasha Levin
  Cc: axboe, csander, io-uring, torvalds, workflows, Sasha Levin

On Tuesday, 9 September 2025 18:32:14 Central European Summer Time Sasha Levin wrote:
> Add a new 'b4 dig' subcommand that uses AI agents to discover related
> emails for a given message ID. This helps developers find all relevant
> context around patches including previous versions, bug reports, reviews,
> and related discussions.
> 
> The command:
> - Takes a message ID and constructs a detailed prompt about email relationships
> - Calls a configured AI agent script to analyze and find related messages
> - Downloads all related threads from lore.kernel.org
> - Combines them into a single mbox file for easy review
> 
> Key features:
> - Outputs a simplified summary showing only relationships and reasons
> - Creates a combined mbox with all related threads (deduped)
> - Provides detailed guidance to AI agents about kernel workflow patterns
> 
> Configuration:
> The AI agent script is configured via:
>   -c AGENT=/path/to/agent.sh  (command line)
>   dig-agent: /path/to/agent.sh (config file)
> 
> The agent script receives a prompt file and should return JSON with
> related message IDs and their relationships.
> 
> Example usage:
> 
> $ b4 -c AGENT=agent.sh dig 20250909142722.101790-1-harry.yoo@oracle.com
> Analyzing message: 20250909142722.101790-1-harry.yoo@oracle.com
> Fetching original message...
> Looking up https://lore.kernel.org/20250909142722.101790-1-harry.yoo@oracle.com
> Grabbing thread from lore.kernel.org/all/20250909142722.101790-1-harry.yoo@oracle.com/t.mbox.gz
> Subject: [PATCH V3 6.6.y] mm: introduce and use {pgd,p4d}_populate_kernel()
> From: Harry Yoo <harry.yoo@oracle.com>
> Constructing agent prompt...
> Calling AI agent: agent.sh
> Calling agent: agent.sh /tmp/tmpz1oja9_5.txt
> Parsing agent response...
> Found 17 related messages:
> 
> Related Messages Summary:
> ------------------------------------------------------------
> [PARENT] Greg KH's stable tree failure notification that initiated this 6.6.y backport request
> [V1] V1 of the 6.6.y backport patch
> [V2] V2 of the 6.6.y backport patch
> [RELATED] Same patch backported to 5.15.y stable branch
> [RELATED] Greg KH's stable tree failure notification for 5.15.y branch
> [RELATED] Same patch backported to 6.1.y stable branch
> [COVER] V5 mainline patch series cover letter that was originally merged
> [RELATED] V5 mainline patch 1/3: move page table sync declarations
> [RELATED] V5 mainline patch 2/3: the original populate_kernel patch that's being backported
> [RELATED] V5 mainline patch 3/3: x86 ARCH_PAGE_TABLE_SYNC_MASK definition
> [RELATED] RFC V1 cover letter - earliest version of this patch series
> [RELATED] RFC V1 patch 1/3 - first introduction of populate_kernel helpers
> [RELATED] RFC V1 patch 2/3 - x86/mm definitions
> [RELATED] RFC V1 patch 3/3 - convert to _kernel variant
> [RELATED] Baoquan He's V3 patch touching same file (mm/kasan/init.c)
> [RELATED] Baoquan He's V2 patch touching same file (mm/kasan/init.c)
> [RELATED] Baoquan He's V1 patch touching same file (mm/kasan/init.c)
> ------------------------------------------------------------
> 
> The resulting mbox would look like this:
> 
>    1 O   Jul 09 Harry Yoo       ( 102) [RFC V1 PATCH mm-hotfixes 0/3] mm, arch: A more robust approach to sync top level kernel page tables
>    2 O   Jul 09 Harry Yoo       ( 143) ├─>[RFC V1 PATCH mm-hotfixes 1/3] mm: introduce and use {pgd,p4d}_populate_kernel()
>    3 O   Jul 11 David Hildenbra (  33) │ └─>
>    4 O   Jul 13 Harry Yoo       (  56) │   └─>
>    5 O   Jul 13 Mike Rapoport   (  67) │     └─>
>    6 O   Jul 14 Harry Yoo       (  46) │       └─>
>    7 O   Jul 15 Harry Yoo       (  65) │         └─>
>    8 O   Jul 09 Harry Yoo       ( 246) ├─>[RFC V1 PATCH mm-hotfixes 2/3] x86/mm: define p*d_populate_kernel() and top-level page table sync
>    9 O   Jul 09 Andrew Morton   (  12) │ ├─>
>   10 O   Jul 10 Harry Yoo       (  23) │ │ └─>
>   11 O   Jul 11 Harry Yoo       (  34) │ │   └─>
>   12 O   Jul 11 Harry Yoo       (  35) │ │     └─>
>   13 O   Jul 10 kernel test rob (  79) │ └─>
>   14 O   Jul 09 Harry Yoo       ( 300) ├─>[RFC V1 PATCH mm-hotfixes 3/3] x86/mm: convert {pgd,p4d}_populate{,_init} to _kernel variant
>   15 O   Jul 10 kernel test rob (  80) │ └─>
>   16 O   Jul 09 Harry Yoo       (  31) └─>Re: [RFC V1 PATCH mm-hotfixes 0/3] mm, arch: A more robust approach to sync top level kernel page tables
>   17 O   Aug 18 Harry Yoo       ( 262) [PATCH V5 mm-hotfixes 0/3] mm, x86: fix crash due to missing page table sync and make it harder to miss
>   18 O   Aug 18 Harry Yoo       (  72) ├─>[PATCH V5 mm-hotfixes 1/3] mm: move page table sync declarations to linux/pgtable.h
>   19 O   Aug 18 David Hildenbra (  20) │ └─>
>   20 O   Aug 18 Harry Yoo       ( 239) ├─>[PATCH V5 mm-hotfixes 2/3] mm: introduce and use {pgd,p4d}_populate_kernel()
>   21 O   Aug 18 David Hildenbra (  60) │ ├─>
>   22 O   Aug 18 kernel test rob ( 150) │ ├─>
>   23 O   Aug 18 Harry Yoo       ( 161) │ │ └─>
>   24 O   Aug 21 Harry Yoo       (  85) │ ├─>[PATCH] mm: fix KASAN build error due to p*d_populate_kernel()
>   25 O   Aug 21 kernel test rob (  18) │ │ ├─>
>   26 O   Aug 21 Lorenzo Stoakes ( 100) │ │ ├─>
>   27 O   Aug 21 Harry Yoo       (  62) │ │ │ └─>
>   28 O   Aug 21 Lorenzo Stoakes (  18) │ │ │   └─>
>   29 O   Aug 21 Harry Yoo       (  90) │ │ └─>[PATCH v2] mm: fix KASAN build error due to p*d_populate_kernel()
>   30 O   Aug 21 kernel test rob (  18) │ │   ├─>
>   31 O   Aug 21 Dave Hansen     (  24) │ │   └─>
>   32 O   Aug 22 Harry Yoo       (  56) │ │     └─>
>   33 O   Aug 22 Andrey Ryabinin (  91) │ │       ├─>
>   34 O   Aug 27 Harry Yoo       (  98) │ │       │ └─>
>   35 O   Aug 22 Dave Hansen     (  63) │ │       └─>
>   36 O   Aug 25 Andrey Ryabinin (  72) │ │         └─>
>   37 O   Aug 22 Harry Yoo       ( 103) │ └─>[PATCH v3] mm: fix KASAN build error due to p*d_populate_kernel()
>   38 O   Aug 18 Harry Yoo       ( 113) ├─>[PATCH V5 mm-hotfixes 3/3] x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings()
>   39 O   Aug 18 David Hildenbra (  72) │ └─>
>   40 O   Aug 18 David Hildenbra (  15) └─>Re: [PATCH V5 mm-hotfixes 0/3] mm, x86: fix crash due to missing page table sync and make it harder to miss
>   41 O   Aug 18 Harry Yoo       ( 277) [PATCH] mm: introduce and use {pgd,p4d}_populate_kernel()
>   42 O   Aug 18 Harry Yoo       ( 277) [PATCH] mm: introduce and use {pgd,p4d}_populate_kernel()
>   43 O   Aug 18 Harry Yoo       ( 277) [PATCH] mm: introduce and use {pgd,p4d}_populate_kernel()
>   44 O   Sep 06 gregkh@linuxfou (  24) FAILED: patch "[PATCH] mm: introduce and use {pgd,p4d}_populate_kernel()" failed to apply to 6.6-stable tree
>   45 O   Sep 08 Harry Yoo       ( 303) ├─>[PATCH 6.6.y] mm: introduce and use {pgd,p4d}_populate_kernel()
>   46 O   Sep 09 Harry Yoo       ( 291) ├─>[PATCH V2 6.6.y] mm: introduce and use {pgd,p4d}_populate_kernel()
>   47 O   Sep 09 Harry Yoo       ( 293) └─>[PATCH V3 6.6.y] mm: introduce and use {pgd,p4d}_populate_kernel()
>   48 O   Sep 06 gregkh@linuxfou (  24) FAILED: patch "[PATCH] mm: introduce and use {pgd,p4d}_populate_kernel()" failed to apply to 6.1-stable tree
>   49 O   Sep 08 Harry Yoo       ( 303) ├─>[PATCH 6.1.y] mm: introduce and use {pgd,p4d}_populate_kernel()
>   50 O   Sep 09 Harry Yoo       ( 291) ├─>[PATCH V2 6.1.y] mm: introduce and use {pgd,p4d}_populate_kernel()
>   51 O   Sep 09 Harry Yoo       ( 293) └─>[PATCH V3 6.1.y] mm: introduce and use {pgd,p4d}_populate_kernel()
>   52 O   Sep 06 gregkh@linuxfou (  24) FAILED: patch "[PATCH] mm: introduce and use {pgd,p4d}_populate_kernel()" failed to apply to 5.15-stable tree
>   53 O   Sep 08 Harry Yoo       ( 273) ├─>[PATCH 5.15.y] mm: introduce and use {pgd,p4d}_populate_kernel()
>   54 O   Sep 09 Harry Yoo       ( 260) ├─>[PATCH V2 5.15.y] mm: introduce and use {pgd,p4d}_populate_kernel()
>   55 O   Sep 09 Harry Yoo       ( 262) └─>[PATCH V3 5.15.y] mm: introduce and use {pgd,p4d}_populate_kernel()
> 
> The prompt includes extensive documentation about lore.kernel.org's search
> capabilities, limitations (like search index lag), and kernel workflow patterns
> to help AI agents effectively find related messages.
> 
> Assisted-by: Claude Code

Hi Sasha,

it doesn't seem like Assisted-by is the right terminology here, as
the code itself makes me believe it was written wholesale by your
preferred LLM with minimal oversight, and then posted to the list.

A non-exhaustive code review inline, as it quickly became clear
this wasn't worth further time invested in reviewing.


> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>  src/b4/command.py |  17 ++
>  src/b4/dig.py     | 630 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 647 insertions(+)
>  create mode 100644 src/b4/dig.py
> 
> diff --git a/src/b4/command.py b/src/b4/command.py
> index 455124d..f225ae5 100644
> --- a/src/b4/command.py
> +++ b/src/b4/command.py
> @@ -120,6 +120,11 @@ def cmd_diff(cmdargs: argparse.Namespace) -> None:
>      b4.diff.main(cmdargs)
>  
>  
> +def cmd_dig(cmdargs: argparse.Namespace) -> None:
> +    import b4.dig
> +    b4.dig.main(cmdargs)
> +
> +
>  class ConfigOption(argparse.Action):
>      """Action class for storing key=value arguments in a dict."""
>      def __call__(self, parser: argparse.ArgumentParser,
> @@ -399,6 +404,18 @@ def setup_parser() -> argparse.ArgumentParser:
>                            help='Submit the token received via verification email')
>      sp_send.set_defaults(func=cmd_send)
>  
> +    # b4 dig
> +    sp_dig = subparsers.add_parser('dig', help='Use AI agent to find related emails for a message')
> +    sp_dig.add_argument('msgid', nargs='?',
> +                        help='Message ID to analyze, or pipe a raw message')
> +    sp_dig.add_argument('-o', '--output', dest='output', default=None,
> +                        help='Output mbox filename (default: <msgid>-related.mbox)')
> +    sp_dig.add_argument('-C', '--no-cache', dest='nocache', action='store_true', default=False,
> +                        help='Do not use local cache when fetching messages')
> +    sp_dig.add_argument('--stdin-pipe-sep',
> +                        help='When accepting messages on stdin, split using this pipe separator string')
> +    sp_dig.set_defaults(func=cmd_dig)
> +
>      return parser
>  
>  
> diff --git a/src/b4/dig.py b/src/b4/dig.py
> new file mode 100644
> index 0000000..007f7d0
> --- /dev/null
> +++ b/src/b4/dig.py
> @@ -0,0 +1,630 @@
> +#!/usr/bin/env python3
> +# -*- coding: utf-8 -*-
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +#
> +# b4 dig - Use AI agents to find related emails
> +#
> +__author__ = 'Sasha Levin <sashal@kernel.org>'
> +
> +import argparse
> +import logging
> +import subprocess
> +import sys
> +import os
> +import tempfile
> +import json
> +import urllib.parse
> +import gzip
> +import mailbox
> +import email.utils
> +from typing import Optional, List, Dict, Any
> +
> +import b4
> +
> +logger = b4.logger
> +
> +
> +def construct_agent_prompt(msgid: str) -> str:
> +    """Construct a detailed prompt for the AI agent to find related emails."""
> +
> +    # Clean up the message ID
> +    if msgid.startswith('<'):
> +        msgid = msgid[1:]
> +    if msgid.endswith('>'):
> +        msgid = msgid[:-1]

str.removeprefix and str.removesuffix exist for this precise purpose.

> [... snipped robot wrangling ...]
> +
> +
> +def call_agent(prompt: str, agent_cmd: str) -> Optional[str]:
> +    """Call the configured agent script with the prompt."""
> +
> +    # Expand user paths
> +    agent_cmd = os.path.expanduser(agent_cmd)
> +
> +    if not os.path.exists(agent_cmd):
> +        logger.error('Agent command not found: %s', agent_cmd)
> +        return None
> +
> +    if not os.access(agent_cmd, os.X_OK):
> +        logger.error('Agent command is not executable: %s', agent_cmd)
> +        return None

Why does this check exist? Why does the previous check exist? Wouldn't
it be better to just handle the exception subprocess.run will throw?

> +
> +    try:
> +        # Write prompt to a temporary file to avoid shell escaping issues
> +        with tempfile.NamedTemporaryFile(mode='w', suffix='.txt', delete=False) as tmp:
> +            tmp.write(prompt)
> +            tmp_path = tmp.name

I'm so glad we now have tmp_path so I don't have to write out tmp.name every time

> +
> +        # Call the agent script with the prompt file as argument
> +        logger.info('Calling agent: %s %s', agent_cmd, tmp_path)
> +        result = subprocess.run(
> +            [agent_cmd, tmp_path],
> +            capture_output=True,
> +            text=True
> +        )
> +
> +        if result.returncode != 0:
> +            logger.error('Agent returned error code %d', result.returncode)
> +            if result.stderr:
> +                logger.error('Agent stderr: %s', result.stderr)
> +            return None
> +
> +        return result.stdout
> +
> +    except subprocess.TimeoutExpired:

You don't set a timeout in the subprocess.run parameters, so this
is dead code.

> +        logger.error('Agent command timed out after 5 minutes')
> +        return None
> +    except Exception as e:
> +        logger.error('Error calling agent: %s', e)
> +        return None
> +    finally:
> +        # Clean up temp file
> +        if 'tmp_path' in locals():
> +            try:
> +                os.unlink(tmp_path)
> +            except:
> +                pass

This is pointless. Had you (or rather, Claude doing business as you)
not set delete=False, and simply indented everything that needs the
temporary file to be within the `with` clause, then this code could
be removed.

> +
> +
> +def parse_agent_response(response: str) -> List[Dict[str, str]]:
> +    """Parse the agent's response to extract message IDs."""
> +
> +    related = []
> +
> +    try:
> +        # Try to find JSON in the response
> +        # Agent might return additional text, so we look for JSON array
> +        import re
> +        json_match = re.search(r'\[.*?\]', response, re.DOTALL)
> +        if json_match:
> +            json_str = json_match.group(0)
> +            data = json.loads(json_str)
> +
> +            if isinstance(data, list):
> +                for item in data:
> +                    if isinstance(item, dict) and 'msgid' in item:
> +                        related.append({
> +                            'msgid': item.get('msgid', ''),
> +                            'relationship': item.get('relationship', 'related'),
> +                            'reason': item.get('reason', 'No reason provided')
> +                        })
> +        else:
> +            # Fallback: try to extract message IDs from plain text
> +            # Look for patterns that look like message IDs
> +            msgid_pattern = re.compile(r'[a-zA-Z0-9][a-zA-Z0-9\.\-_]+@[a-zA-Z0-9][a-zA-Z0-9\.\-]+\.[a-zA-Z]+')
> +            for match in msgid_pattern.finditer(response):
> +                msgid = match.group(0)
> +                if msgid != '':  # Don't include the original
> +                    related.append({
> +                        'msgid': msgid,
> +                        'relationship': 'related',
> +                        'reason': 'Found in agent response'
> +                    })
> +
> +    except json.JSONDecodeError as e:
> +        logger.warning('Could not parse JSON from agent response: %s', e)
> +    except Exception as e:
> +        logger.error('Error parsing agent response: %s', e)
> +
> +    return related
> +
> +
> +def get_message_info(msgid: str) -> Optional[Dict[str, Any]]:
> +    """Retrieve basic information about a message."""
> +
> +    msgs = b4.get_pi_thread_by_msgid(msgid, onlymsgids={msgid}, with_thread=False)
> +    if not msgs:
> +        return None
> +
> +    msg = msgs[0]
> +
> +    return {
> +        'subject': msg.get('Subject', 'No subject'),
> +        'from': msg.get('From', 'Unknown'),
> +        'date': msg.get('Date', 'Unknown'),
> +        'msgid': msgid
> +    }
> +
> +
> +def download_and_combine_threads(msgid: str, related_messages: List[Dict[str, str]],
> +                                 output_file: str, nocache: bool = False) -> int:
> +    """Download thread mboxes for all related messages and combine into one mbox file."""
> +
> +    message_ids = [msgid]  # Start with original message
> +
> +    # Add all related message IDs
> +    for item in related_messages:
> +        if 'msgid' in item:
> +            message_ids.append(item['msgid'])
> +
> +    # Collect all messages from all threads
> +    seen_msgids = set()
> +    all_messages = []
> +
> +    # Download thread for each message
> +    # But be smart about what we include - don't mix unrelated series
> +    for msg_id in message_ids:
> +        logger.info('Fetching thread for %s', msg_id)
> +
> +        # For better control, fetch just the specific thread, not everything
> +        # Use onlymsgids to limit scope when possible
> +        msgs = b4.get_pi_thread_by_msgid(msg_id, nocache=nocache)
> +
> +        if msgs:
> +            # Try to detect thread boundaries and avoid mixing unrelated series
> +            thread_messages = []
> +            base_subject = None
> +
> +            for msg in msgs:
> +                msg_msgid = b4.LoreMessage.get_clean_msgid(msg)
> +
> +                # Skip if we've already seen this message
> +                if msg_msgid in seen_msgids:
> +                    continue
> +
> +                # Get the subject to check if it's part of the same series
> +                subject = msg.get('Subject', '')
> +
> +                # Extract base subject (remove Re:, [PATCH], version numbers, etc)
> +                import re
> +                base = re.sub(r'^(Re:\s*)*(\[.*?\]\s*)*', '', subject).strip()
> +
> +                # Set the base subject from the first message
> +                if base_subject is None and base:
> +                    base_subject = base
> +
> +                # Add the message
> +                if msg_msgid:
> +                    seen_msgids.add(msg_msgid)
> +                    thread_messages.append(msg)
> +
> +            all_messages.extend(thread_messages)
> +        else:
> +            logger.warning('Could not fetch thread for %s', msg_id)
> +
> +    # Sort messages by date to maintain chronological order
> +    all_messages.sort(key=lambda m: email.utils.parsedate_to_datetime(m.get('Date', 'Thu, 1 Jan 1970 00:00:00 +0000')))
> +
> +    # Write all messages to output mbox file using b4's proper mbox functions
> +    logger.info('Writing %d messages to %s', len(all_messages), output_file)
> +
> +    total_messages = len(all_messages)
> +
> +    if total_messages > 0:
> +        # Use b4's save_mboxrd_mbox function which properly handles mbox format
> +        with open(output_file, 'wb') as outf:
> +            b4.save_mboxrd_mbox(all_messages, outf)
> +
> +    logger.info('Combined mbox contains %d unique messages', total_messages)
> +    return total_messages
> +
> +
> +def main(cmdargs: argparse.Namespace) -> None:
> +    """Main entry point for b4 dig command."""
> +
> +    # Get the message ID
> +    msgid = b4.get_msgid(cmdargs)
> +    if not msgid:
> +        logger.critical('Please provide a message-id')
> +        sys.exit(1)
> +
> +    # Clean up message ID
> +    if msgid.startswith('<'):
> +        msgid = msgid[1:]
> +    if msgid.endswith('>'):
> +        msgid = msgid[:-1]

Well, good thing we're duplicating the subpar code from before.

> +
> +    logger.info('Analyzing message: %s', msgid)
> +
> +    # Get the agent command from config
> +    config = b4.get_main_config()
> +    agent_cmd = None
> +
> +    # Check command-line config override
> +    if hasattr(cmdargs, 'config') and cmdargs.config:
> +        if 'AGENT' in cmdargs.config:
> +            agent_cmd = cmdargs.config['AGENT']

dict.get exists

> +
> +    # Fall back to main config
> +    if not agent_cmd:
> +        agent_cmd = config.get('dig-agent', config.get('agent', None))
> +
> +    if not agent_cmd:
> +        logger.critical('No AI agent configured. Set dig-agent in config or use -c AGENT=/path/to/agent.sh')
> +        logger.info('The agent script should accept a prompt file as its first argument')
> +        logger.info('and return a JSON array of related message IDs to stdout')
> +        sys.exit(1)
> +
> +    # Get info about the original message
> +    logger.info('Fetching original message...')
> +    msg_info = get_message_info(msgid)
> +    if msg_info:
> +        logger.info('Subject: %s', msg_info['subject'])
> +        logger.info('From: %s', msg_info['from'])
> +    else:
> +        logger.warning('Could not retrieve original message info')
> +
> +    # Construct the prompt
> +    logger.info('Constructing agent prompt...')
> +    prompt = construct_agent_prompt(msgid)
> +
> +    # Call the agent
> +    logger.info('Calling AI agent: %s', agent_cmd)
> +    response = call_agent(prompt, agent_cmd)
> +
> +    if not response:
> +        logger.critical('No response from agent')
> +        sys.exit(1)
> +
> +    # Parse the response
> +    logger.info('Parsing agent response...')
> +    related = parse_agent_response(response)
> +
> +    if not related:
> +        logger.info('No related messages found')
> +        sys.exit(0)
> +
> +    # Display simplified results
> +    logger.info('Found %d related messages:', len(related))
> +    print()
> +    print('Related Messages Summary:')
> +    print('-' * 60)
> +
> +    for item in related:
> +        relationship = item.get('relationship', 'related')
> +        reason = item.get('reason', '')
> +
> +        print(f'[{relationship.upper()}] {reason}')
> +
> +    print('-' * 60)
> +    print()
> +
> +    # Generate output mbox filename
> +    if hasattr(cmdargs, 'output') and cmdargs.output:
> +        mbox_file = cmdargs.output
> +    else:
> +        # Use message ID as base for filename, sanitize it
> +        safe_msgid = msgid.replace('/', '_').replace('@', '_at_').replace('<', '').replace('>', '')

str.translate exists

> +        mbox_file = f'{safe_msgid}-related.mbox'
> +
> +    # Download and combine all threads into one mbox
> +    logger.info('Downloading and combining all related threads...')
> +    nocache = hasattr(cmdargs, 'nocache') and cmdargs.nocache

dict.get exists

> +    total_messages = download_and_combine_threads(msgid, related, mbox_file, nocache=nocache)
> +
> +    if total_messages > 0:
> +        logger.info('Success: Combined mbox saved to %s (%d messages)', mbox_file, total_messages)
> +        print(f'✓ Combined mbox file: {mbox_file}')
> +        print(f'  Total messages: {total_messages}')
> +        print(f'  Related threads: {len(related) + 1}')  # +1 for original
> +    else:
> +        logger.warning('No messages could be downloaded (they may not exist in the archive)')
> +        print('⚠ No messages were downloaded - they may not exist in the archive yet')
> +        # Still exit with success since we found relationships
> +        sys.exit(0)
> 

I did not even remotely look over all the code, but when people on
your other agentic evangelism series pointed out how it'll result in
lazy patches from people who should know better, then this is kind of
the type of thing they probably meant.




^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [RFC] b4 dig: Add AI-powered email relationship discovery command
  2025-09-11 14:48           ` Nicolas Frattaroli
@ 2025-09-11 15:05             ` Sasha Levin
  2025-09-11 19:13               ` Nicolas Frattaroli
  0 siblings, 1 reply; 74+ messages in thread
From: Sasha Levin @ 2025-09-11 15:05 UTC (permalink / raw)
  To: Nicolas Frattaroli
  Cc: konstantin, axboe, csander, io-uring, torvalds, workflows

On Thu, Sep 11, 2025 at 04:48:03PM +0200, Nicolas Frattaroli wrote:
>On Tuesday, 9 September 2025 18:32:14 Central European Summer Time Sasha Levin wrote:
>it doesn't seem like Assisted-by is the right terminology here, as
>the code itself makes me believe it was written wholesale by your
>preferred LLM with minimal oversight, and then posted to the list.
>
>A non-exhaustive code review inline, as it quickly became clear
>this wasn't worth further time invested in reviewing.

Thanks for the review!

Indeed, Python isn't my language of choice: this script was a difficult (for
me) attempt at translating an equivalent bash based script that I already had
into python so it could fit into b4.

My intent was for this to start a discussion about this approach rather than
actually be merged into b4.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [RFC] b4 dig: Add AI-powered email relationship discovery command
  2025-09-11 15:05             ` Sasha Levin
@ 2025-09-11 19:13               ` Nicolas Frattaroli
  2025-09-11 19:57                 ` Sasha Levin
  0 siblings, 1 reply; 74+ messages in thread
From: Nicolas Frattaroli @ 2025-09-11 19:13 UTC (permalink / raw)
  To: Sasha Levin
  Cc: konstantin, axboe, csander, io-uring, torvalds, workflows,
	Laurent Pinchart

On Thursday, 11 September 2025 17:05:23 Central European Summer Time Sasha Levin wrote:
> On Thu, Sep 11, 2025 at 04:48:03PM +0200, Nicolas Frattaroli wrote:
> >On Tuesday, 9 September 2025 18:32:14 Central European Summer Time Sasha Levin wrote:
> >it doesn't seem like Assisted-by is the right terminology here, as
> >the code itself makes me believe it was written wholesale by your
> >preferred LLM with minimal oversight, and then posted to the list.
> >
> >A non-exhaustive code review inline, as it quickly became clear
> >this wasn't worth further time invested in reviewing.
> 
> Thanks for the review!
> 
> Indeed, Python isn't my language of choice: this script was a difficult (for
> me) attempt at translating an equivalent bash based script that I already had
> into python so it could fit into b4.

There's something to be said about these tools' habit of empowering
people to think they can judge the output adequately, but I don't
want to detract from the other point I'll try to make in this reply.

> My intent was for this to start a discussion about this approach rather than
> actually be merged into b4.

I know that, and you did get feedback on this approach already from
others, specifically that it did not solve the core issue that is
poorly utilised metadata and instead applies hammer to vaguely nail
shaped thing.

And your reaction was to call them personally biased against this
approach, and to loudly announce you would ignore any further
e-mails from them.

Now while I won't claim Laurent Pinchart isn't one of the louder
critics of your recent LLM evangelism, I can't really see a fault
in his reasoning: your insistence on finding an LLM solution to
every and any problem is papering over the real pain point,
which is that Link: should contain useful information, so that
you can click on the link and get the information and not have
to do a search (LLM assisted or not) for said information.

So the responses you expect to this patch should seemingly meet the
following two criteria:
1. we're not supposed to critique the implementation, as it's an RFC
   and therefore should not get comments on anything but the general
   approach,
2. we're not supposed to critique the general approach, because saying
   that this solution is neither reliable nor efficient is a result
   of personal bias against the underlying technology.

I don't condone the arguments based on energy usage because any use
of electricity in a grid that's not decarbonised will be open to
value judgements. For example, my personal non-workplace-endorsed
opinion is that electricity used on growing zucchini is wasted,
as they are low-nutrient snot pumpkins masquerading as cucumbers.

My main criticism on the approach end of things, if I am allowed
an opinion, is that this does not make Link: tags more meaningful,
nor does it solve the problem of automated tools adding sometimes
useless noise to something humans are supposed to be reading (which,
some may point out, your tool makes even worse.)

While bisecting, I often come across things where I'd love to be
able to immediately see what discussion preceded the problematic
patch with just one click and pageload between. Shoveling GPUs
into Sam Altman's gaping cheeks does not allow me to do that,
or at least not any better than a search on lore with dfn: would
already allow me to do.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [RFC] b4 dig: Add AI-powered email relationship discovery command
  2025-09-11 19:13               ` Nicolas Frattaroli
@ 2025-09-11 19:57                 ` Sasha Levin
  2025-09-15 11:26                   ` Mark Brown
  0 siblings, 1 reply; 74+ messages in thread
From: Sasha Levin @ 2025-09-11 19:57 UTC (permalink / raw)
  To: Nicolas Frattaroli
  Cc: konstantin, axboe, csander, io-uring, torvalds, workflows,
	Laurent Pinchart

On Thu, Sep 11, 2025 at 09:13:28PM +0200, Nicolas Frattaroli wrote:
>On Thursday, 11 September 2025 17:05:23 Central European Summer Time Sasha Levin wrote:
>> On Thu, Sep 11, 2025 at 04:48:03PM +0200, Nicolas Frattaroli wrote:
>> >On Tuesday, 9 September 2025 18:32:14 Central European Summer Time Sasha Levin wrote:
>> >it doesn't seem like Assisted-by is the right terminology here, as
>> >the code itself makes me believe it was written wholesale by your
>> >preferred LLM with minimal oversight, and then posted to the list.
>> >
>> >A non-exhaustive code review inline, as it quickly became clear
>> >this wasn't worth further time invested in reviewing.
>>
>> Thanks for the review!
>>
>> Indeed, Python isn't my language of choice: this script was a difficult (for
>> me) attempt at translating an equivalent bash based script that I already had
>> into python so it could fit into b4.
>
>There's something to be said about these tools' habit of empowering
>people to think they can judge the output adequately, but I don't
>want to detract from the other point I'll try to make in this reply.
>
>> My intent was for this to start a discussion about this approach rather than
>> actually be merged into b4.
>
>I know that, and you did get feedback on this approach already from
>others, specifically that it did not solve the core issue that is
>poorly utilised metadata and instead applies hammer to vaguely nail
>shaped thing.
>
>And your reaction was to call them personally biased against this
>approach, and to loudly announce you would ignore any further
>e-mails from them.
>
>Now while I won't claim Laurent Pinchart isn't one of the louder
>critics of your recent LLM evangelism, I can't really see a fault
>in his reasoning: your insistence on finding an LLM solution to
>every and any problem is papering over the real pain point,
>which is that Link: should contain useful information, so that
>you can click on the link and get the information and not have
>to do a search (LLM assisted or not) for said information.
>
>So the responses you expect to this patch should seemingly meet the
>following two criteria:
>1. we're not supposed to critique the implementation, as it's an RFC
>   and therefore should not get comments on anything but the general
>   approach,
>2. we're not supposed to critique the general approach, because saying
>   that this solution is neither reliable nor efficient is a result
>   of personal bias against the underlying technology.
>
>I don't condone the arguments based on energy usage because any use
>of electricity in a grid that's not decarbonised will be open to
>value judgements. For example, my personal non-workplace-endorsed
>opinion is that electricity used on growing zucchini is wasted,
>as they are low-nutrient snot pumpkins masquerading as cucumbers.
>
>My main criticism on the approach end of things, if I am allowed
>an opinion, is that this does not make Link: tags more meaningful,
>nor does it solve the problem of automated tools adding sometimes
>useless noise to something humans are supposed to be reading (which,
>some may point out, your tool makes even worse.)
>
>While bisecting, I often come across things where I'd love to be
>able to immediately see what discussion preceded the problematic
>patch with just one click and pageload between. Shoveling GPUs
>into Sam Altman's gaping cheeks does not allow me to do that,
>or at least not any better than a search on lore with dfn: would
>already allow me to do.

I very much agree with your general observation:

1. I don't think that this script solves the underlying Link: issue.

2. It papers over the real problem

3. I don't think that today's LLMs can solve any fundamental issue we're facing
in kernel-land.

4. I am really happy (as Laurent said) to apply my big hammer to anything that
looks like a nail.

We've started[1] the workflows@ list (which is how I stumbled on this thread)
about 5-6 years ago when the concern from multiple maintainers was that we all
have our magical scripts, they are seriously ugly, and everyone are ashamed of
sharing them. So this list was an effort to get the ball rolling on folks
sharing some of those ugly workflows and scripts in an attempt to standardize
and improve our processes.

I've shared this very hacky b4-dig script as exactly that: I have a very ugly
bash script that addresses some of the issues Linus brought up around being
able to find more context for a given patch/mail.  I use that script often, it
helps me spend less time on browsing lore (no, dfn: won't find you syzbot
reports or CI failures), and it just "works for me".

I'd love if we end up with a great solution for Link:, I'm not asking anyone to
stop working on that, nor am I claiming that this is a good long-term solution
for the problem. All this is is a utility script that fits my needs *TODAY*.

So no, I wasn't looking for criticism on workflows@ (this isn't even on lkml,
ksummit-discuss, or anything like that).  I was looking to share a workflow
that I have and see if folks have any ideas, suggestions, or would potentially
want to do something like this on their own.  I wasn't looking for an almost
religious "vim vs. emacs"-esque choire of "LLMS SUCK" or comments about Sam
Altman's rear end.

Maybe this is why folks are reluctant to share their ugly scripts?

[1] https://lwn.net/Articles/799134/ - "The session closed with the creation of
a new "workflows" mailing list on vger.kernel.org where developers can discuss
how they work and share their scripts".

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [RFC] b4 dig: Add AI-powered email relationship discovery command
  2025-09-09 16:32         ` [RFC] b4 dig: Add AI-powered email relationship discovery command Sasha Levin
  2025-09-09 17:22           ` Laurent Pinchart
  2025-09-11 14:48           ` Nicolas Frattaroli
@ 2025-09-11 23:24           ` Konstantin Ryabitsev
  2 siblings, 0 replies; 74+ messages in thread
From: Konstantin Ryabitsev @ 2025-09-11 23:24 UTC (permalink / raw)
  To: Sasha Levin; +Cc: axboe, csander, io-uring, torvalds, workflows

On Tue, Sep 09, 2025 at 12:32:14PM -0400, Sasha Levin wrote:
> Add a new 'b4 dig' subcommand that uses AI agents to discover related
> emails for a given message ID. This helps developers find all relevant
> context around patches including previous versions, bug reports, reviews,
> and related discussions.

All right, I've got ollama working on my gaming rig now, so that I can get
back to my summarization work without having to rely on big vendors. Thanks
for pushing me in that direction, because it never felt quite right to be
using a centralized service for this work. At least that large nvidia card
does something other than help me deliver packages to Port Knot City.

> The command:
> - Takes a message ID and constructs a detailed prompt about email relationships
> - Calls a configured AI agent script to analyze and find related messages
> - Downloads all related threads from lore.kernel.org
> - Combines them into a single mbox file for easy review

I'm going to take a look at it, but I want to use "b4 dig" to analyze commits
and establish their provenance. I will then look at whether we can use LLMs to
provide additional perks like summarization or highlights of important
messages (those offering nacks, criticisms, warnings, etc). This seems like the
"least wrong" way of using an LLM at this moment, especially with the legality
of the code it produces still largely untested in courts.

Thanks for submitting this to push me into this direction. I won't take this
directly, but I'll rely on it to see how you get certain things done.

Regards,
-K

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [RFC] b4 dig: Add AI-powered email relationship discovery command
  2025-09-11 19:57                 ` Sasha Levin
@ 2025-09-15 11:26                   ` Mark Brown
  2025-09-15 11:48                     ` Sasha Levin
  0 siblings, 1 reply; 74+ messages in thread
From: Mark Brown @ 2025-09-15 11:26 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Nicolas Frattaroli, konstantin, axboe, csander, io-uring,
	torvalds, workflows, Laurent Pinchart

[-- Attachment #1: Type: text/plain, Size: 1270 bytes --]

On Thu, Sep 11, 2025 at 03:57:30PM -0400, Sasha Levin wrote:

> We've started[1] the workflows@ list (which is how I stumbled on this thread)
> about 5-6 years ago when the concern from multiple maintainers was that we all
> have our magical scripts, they are seriously ugly, and everyone are ashamed of
> sharing them. So this list was an effort to get the ball rolling on folks
> sharing some of those ugly workflows and scripts in an attempt to standardize
> and improve our processes.

> I've shared this very hacky b4-dig script as exactly that: I have a very ugly
> bash script that addresses some of the issues Linus brought up around being
> able to find more context for a given patch/mail.  I use that script often, it
> helps me spend less time on browsing lore (no, dfn: won't find you syzbot
> reports or CI failures), and it just "works for me".

This seems like a great example of a situation where the suggestions
from one of the other thread of asking people to clearly mark when patch
submissions are using these tools would have helped - had the submission
described the above then the Python level review would've gone a lot
differently I think.  Realising during review is a totally different
experience to being told up front.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [RFC] b4 dig: Add AI-powered email relationship discovery command
  2025-09-15 11:26                   ` Mark Brown
@ 2025-09-15 11:48                     ` Sasha Levin
  2025-09-15 12:03                       ` Mark Brown
  0 siblings, 1 reply; 74+ messages in thread
From: Sasha Levin @ 2025-09-15 11:48 UTC (permalink / raw)
  To: Mark Brown
  Cc: Nicolas Frattaroli, konstantin, axboe, csander, io-uring,
	torvalds, workflows, Laurent Pinchart

On Mon, Sep 15, 2025 at 12:26:41PM +0100, Mark Brown wrote:
>On Thu, Sep 11, 2025 at 03:57:30PM -0400, Sasha Levin wrote:
>
>> We've started[1] the workflows@ list (which is how I stumbled on this thread)
>> about 5-6 years ago when the concern from multiple maintainers was that we all
>> have our magical scripts, they are seriously ugly, and everyone are ashamed of
>> sharing them. So this list was an effort to get the ball rolling on folks
>> sharing some of those ugly workflows and scripts in an attempt to standardize
>> and improve our processes.
>
>> I've shared this very hacky b4-dig script as exactly that: I have a very ugly
>> bash script that addresses some of the issues Linus brought up around being
>> able to find more context for a given patch/mail.  I use that script often, it
>> helps me spend less time on browsing lore (no, dfn: won't find you syzbot
>> reports or CI failures), and it just "works for me".
>
>This seems like a great example of a situation where the suggestions
>from one of the other thread of asking people to clearly mark when patch
>submissions are using these tools would have helped - had the submission
>described the above then the Python level review would've gone a lot
>differently I think.  Realising during review is a totally different
>experience to being told up front.

Do you mean using the Assisted-by tags that were discussed in the other thread?

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [RFC] b4 dig: Add AI-powered email relationship discovery command
  2025-09-15 11:48                     ` Sasha Levin
@ 2025-09-15 12:03                       ` Mark Brown
  0 siblings, 0 replies; 74+ messages in thread
From: Mark Brown @ 2025-09-15 12:03 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Nicolas Frattaroli, konstantin, axboe, csander, io-uring,
	torvalds, workflows, Laurent Pinchart

[-- Attachment #1: Type: text/plain, Size: 707 bytes --]

On Mon, Sep 15, 2025 at 07:48:37AM -0400, Sasha Levin wrote:
> On Mon, Sep 15, 2025 at 12:26:41PM +0100, Mark Brown wrote:

> > This seems like a great example of a situation where the suggestions
> > from one of the other thread of asking people to clearly mark when patch
> > submissions are using these tools would have helped - had the submission
> > described the above then the Python level review would've gone a lot
> > differently I think.  Realising during review is a totally different
> > experience to being told up front.

> Do you mean using the Assisted-by tags that were discussed in the other thread?

Not just that, which you did have, but also a mention of how the tools
have been used.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2025-09-15 12:03 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-05 11:18 [GIT PULL] io_uring fix for 6.17-rc5 Jens Axboe
2025-09-05 17:24 ` Linus Torvalds
2025-09-05 17:45   ` Konstantin Ryabitsev
2025-09-05 18:06     ` Linus Torvalds
2025-09-05 19:33       ` Link trailers revisited (was Re: [GIT PULL] io_uring fix for 6.17-rc5) Konstantin Ryabitsev
2025-09-05 20:09         ` Linus Torvalds
2025-09-05 20:47         ` Sasha Levin
2025-09-06 11:27         ` Greg KH
2025-09-06 11:27           ` Greg KH
2025-09-06 11:30             ` Greg KH
2025-09-06 13:51           ` Konstantin Ryabitsev
2025-09-06 15:31             ` Linus Torvalds
2025-09-06 18:50               ` Konstantin Ryabitsev
2025-09-06 19:19                 ` Linus Torvalds
2025-09-08  9:11                   ` Jani Nikula
2025-09-08 11:59                 ` Mark Brown
2025-09-08 20:11         ` dan.j.williams
2025-09-09 11:29           ` Mark Brown
2025-09-09 13:17           ` Rafael J. Wysocki
2025-09-09 14:18             ` Jakub Kicinski
2025-09-09 14:35               ` Jens Axboe
2025-09-09 14:42                 ` Konstantin Ryabitsev
2025-09-09 14:48                   ` Vlastimil Babka
2025-09-09 14:50                     ` Jens Axboe
2025-09-09 15:30                       ` Rafael J. Wysocki
2025-09-09 16:40                       ` Linus Torvalds
2025-09-09 17:08                         ` Mark Brown
2025-09-09 17:50                           ` Linus Torvalds
2025-09-09 17:58                             ` Linus Torvalds
2025-09-09 18:31                               ` Konstantin Ryabitsev
2025-09-09 19:36                                 ` dan.j.williams
2025-09-10  1:12                                 ` dan.j.williams
2025-09-10 12:19                                   ` Mark Brown
2025-09-09 17:25                         ` dan.j.williams
2025-09-09 17:56                           ` Alexei Starovoitov
2025-09-09 18:01                             ` Linus Torvalds
2025-09-09 18:13                               ` Alexei Starovoitov
2025-09-09 18:06                         ` Vlastimil Babka
2025-09-09 18:14                           ` Linus Torvalds
2025-09-09 18:22                             ` Vlastimil Babka
2025-09-09 21:05                               ` Mark Brown
2025-09-10  1:33                                 ` Konstantin Ryabitsev
2025-09-09 14:44                 ` Greg KH
2025-09-09 15:14                 ` Danilo Krummrich
2025-09-09 16:32         ` [RFC] b4 dig: Add AI-powered email relationship discovery command Sasha Levin
2025-09-09 17:22           ` Laurent Pinchart
2025-09-09 17:26             ` Jens Axboe
2025-09-09 18:54               ` Sasha Levin
2025-09-10 10:13                 ` Laurent Pinchart
2025-09-10 10:55                   ` Sasha Levin
2025-09-10 11:29                     ` Laurent Pinchart
2025-09-10 13:38             ` Konstantin Ryabitsev
2025-09-10 14:03               ` Andrew Dona-Couch
2025-09-11 14:48           ` Nicolas Frattaroli
2025-09-11 15:05             ` Sasha Levin
2025-09-11 19:13               ` Nicolas Frattaroli
2025-09-11 19:57                 ` Sasha Levin
2025-09-15 11:26                   ` Mark Brown
2025-09-15 11:48                     ` Sasha Levin
2025-09-15 12:03                       ` Mark Brown
2025-09-11 23:24           ` Konstantin Ryabitsev
2025-09-07 22:04     ` [GIT PULL] io_uring fix for 6.17-rc5 Jonathan Corbet
2025-09-05 19:04   ` Jens Axboe
2025-09-05 19:07     ` Jens Axboe
2025-09-05 19:13     ` Caleb Sander Mateos
2025-09-05 19:16       ` Jens Axboe
2025-09-05 19:15     ` Linus Torvalds
2025-09-05 19:23       ` Jens Axboe
2025-09-05 19:21     ` Linus Torvalds
2025-09-05 19:30       ` Jens Axboe
2025-09-05 20:54         ` Linus Torvalds
2025-09-06  0:01           ` Jens Axboe
2025-09-07 18:47             ` Jonathan Corbet
2025-09-08 22:15             ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox