* Re: [RFC] sched,livepatch: call klp_try_switch_task in __cond_resched [not found] ` <1bd15361edfd4db9fc9271d35e7bbe5edad1b87a.camel@fb.com> @ 2022-05-10 18:42 ` Josh Poimboeuf [not found] ` <47440502-930F-4CBD-B859-3AC9BBFF8FC6@fb.com> 0 siblings, 1 reply; 2+ messages in thread From: Josh Poimboeuf @ 2022-05-10 18:42 UTC (permalink / raw) To: Rik van Riel Cc: song@kernel.org, joe.lawrence@redhat.com, Song Liu, peterz@infradead.org, mingo@redhat.com, vincent.guittot@linaro.org, pmladek@suse.com, live-patching@vger.kernel.org, Kernel Team, linux-kernel@vger.kernel.org, jpoimboe@redhat.com On Tue, May 10, 2022 at 06:07:00PM +0000, Rik van Riel wrote: > On Tue, 2022-05-10 at 09:52 -0700, Josh Poimboeuf wrote: > > On Tue, May 10, 2022 at 04:07:42PM +0000, Rik van Riel wrote: > > > > > > > Now I wonder if we could just hook up a preempt notifier > > > for kernel live patches. All the distro kernels already > > > need the preempt notifier for KVM, anyway... > > > > > > > I wouldn't be opposed to that, but how does it solve this problem? > > If > > as Peter said cond_resched() can be a NOP, then preemption would have > > to > > be from an interrupt, in which case frame pointers aren't reliable. > > > The systems where we are seeing problems do not, as far > as I know, throw softlockup errors, so the kworker > threads that fail to transition to the new KLP version > are sleeping and getting scheduled out at times. Are they sleeping due to an explicit call to cond_resched()? > A KLP transition preempt notifier would help those > kernel threads transition to the new KLP version at > any time they reschedule. ... unless cond_resched() is a no-op due to CONFIG_PREEMPT? > How much it will help is hard to predict, but I should > be able to get results from a fairly large sample size > of systems within a few weeks :) As Peter said, keep in mind that we will need to fix other cases beyond Facebook, i.e., CONFIG_PREEMPT combined with non-x86 arches which don't have ORC so they can't reliably unwind from an IRQ. -- Josh ^ permalink raw reply [flat|nested] 2+ messages in thread
[parent not found: <47440502-930F-4CBD-B859-3AC9BBFF8FC6@fb.com>]
* Re: [RFC] sched,livepatch: call klp_try_switch_task in __cond_resched [not found] ` <47440502-930F-4CBD-B859-3AC9BBFF8FC6@fb.com> @ 2022-05-10 23:04 ` Josh Poimboeuf 0 siblings, 0 replies; 2+ messages in thread From: Josh Poimboeuf @ 2022-05-10 23:04 UTC (permalink / raw) To: Song Liu Cc: Rik van Riel, song@kernel.org, joe.lawrence@redhat.com, peterz@infradead.org, mingo@redhat.com, vincent.guittot@linaro.org, pmladek@suse.com, live-patching@vger.kernel.org, Kernel Team, linux-kernel@vger.kernel.org, jpoimboe@redhat.com On Tue, May 10, 2022 at 07:45:49PM +0000, Song Liu wrote: > >> A KLP transition preempt notifier would help those > >> kernel threads transition to the new KLP version at > >> any time they reschedule. > > > > ... unless cond_resched() is a no-op due to CONFIG_PREEMPT? > > Based on my understanding (and a few other folks we chatted with), > a kernel thread can legally run for extended time, as long as it > calls cond_resched() at a reasonable frequency. Therefore, I > think we should be able to patch such thread easily, unless it > calls cond_resched() with being-patched function in the stack, > of course. But again, with CONFIG_PREEMPT, that doesn't work. > OTOH, Petr's mindset of allowing many minutes for the patch > transition is new to me. I need to think more about it. > Josh, what’s you opinion on this? IIUC, kpatch is designed to > only wait up to 60 seconds (no option to overwrite the time). I wouldn't be necessarily opposed to changing the kpatch timeout to something bigger, or eliminating it altogether in favor of a WARN() after x minutes. > >> How much it will help is hard to predict, but I should > >> be able to get results from a fairly large sample size > >> of systems within a few weeks :) > > > > As Peter said, keep in mind that we will need to fix other cases beyond > > Facebook, i.e., CONFIG_PREEMPT combined with non-x86 arches which don't > > have ORC so they can't reliably unwind from an IRQ. > > I think livepatch transition may fail in different cases, and we > don't need to address all of them in one shoot. Fixing some cases > is an improvement as long as we don't slow down other cases. I > understand that adding tiny overhead to __cond_resched() may end > up as a visible regression. But maybe adding it to > preempt_schedule_common() is light enough? > > Did I miss/misunderstand something? If it's a real bug, we should fix it everywhere, not just for Facebook. Otherwise CONFIG_PREEMPT and/or non-x86 arches become second-class citizens. -- Josh ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2022-05-10 23:04 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20220507174628.2086373-1-song@kernel.org>
[not found] ` <YnkuFrm1YR46OFx/@alley>
[not found] ` <9C7DF147-5112-42E7-9F7C-7159EFDFB766@fb.com>
[not found] ` <YnoawYtoCSvrK7lb@alley>
[not found] ` <3a9bfb4a52b715bd8739d8834409c9549ec7f22f.camel@fb.com>
[not found] ` <YnqIcw+dYsWz/w7g@alley>
[not found] ` <6bf85ff908377508a5f5bcc7c4e75d598b96f388.camel@fb.com>
[not found] ` <20220510165244.ikfh64ertnvodxb4@treble>
[not found] ` <1bd15361edfd4db9fc9271d35e7bbe5edad1b87a.camel@fb.com>
2022-05-10 18:42 ` [RFC] sched,livepatch: call klp_try_switch_task in __cond_resched Josh Poimboeuf
[not found] ` <47440502-930F-4CBD-B859-3AC9BBFF8FC6@fb.com>
2022-05-10 23:04 ` Josh Poimboeuf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox