* [PATCH] tracepoints: Update static_call before tp_funcs when adding a tracepoint @ 2021-07-23 2:33 Steven Rostedt 2021-07-26 15:46 ` Mathieu Desnoyers 0 siblings, 1 reply; 8+ messages in thread From: Steven Rostedt @ 2021-07-23 2:33 UTC (permalink / raw) To: LKML, Linux Trace Devel Cc: Mathieu Desnoyers, Ingo Molnar, Andrew Morton, Stefan Metzmacher, io-uring, Peter Zijlstra From: "Steven Rostedt (VMware)" <[email protected]> Because of the significant overhead that retpolines pose on indirect calls, the tracepoint code was updated to use the new "static_calls" that can modify the running code to directly call a function instead of using an indirect caller, and this function can be changed at runtime. In the tracepoint code that calls all the registered callbacks that are attached to a tracepoint, the following is done: it_func_ptr = rcu_dereference_raw((&__tracepoint_##name)->funcs); if (it_func_ptr) { __data = (it_func_ptr)->data; static_call(tp_func_##name)(__data, args); } If there's just a single callback, the static_call is updated to just call that callback directly. Once another handler is added, then the static caller is updated to call the iterator, that simply loops over all the funcs in the array and calls each of the callbacks like the old method using indirect calling. The issue was discovered with a race between updating the funcs array and updating the static_call. The funcs array was updated first and then the static_call was updated. This is not an issue as long as the first element in the old array is the same as the first element in the new array. But that assumption is incorrect, because callbacks also have a priority field, and if there's a callback added that has a higher priority than the callback on the old array, then it will become the first callback in the new array. This means that it is possible to call the old callback with the new callback data element, which can cause a kernel panic. static_call = callback1() funcs[] = {callback1,data1}; callback2 has higher priority than callback1 CPU 1 CPU 2 ----- ----- new_funcs = {callback2,data2}, {callback1,data1} rcu_assign_pointer(tp->funcs, new_funcs); /* * Now tp->funcs has the new array * but the static_call still calls callback1 */ it_func_ptr = tp->funcs [ new_funcs ] data = it_func_ptr->data [ data2 ] static_call(callback1, data); /* Now callback1 is called with * callback2's data */ [ KERNEL PANIC ] update_static_call(iterator); To prevent this from happening, always switch the static_call to the iterator before assigning the tp->funcs to the new array. The iterator will always properly match the callback with its data. To trigger this bug: In one terminal: while :; do hackbench 50; done In another terminal echo 1 > /sys/kernel/tracing/events/sched/sched_waking/enable while :; do echo 1 > /sys/kernel/tracing/set_event_pid; sleep 0.5 echo 0 > /sys/kernel/tracing/set_event_pid; sleep 0.5 done And it doesn't take long to crash. This is because the set_event_pid adds a callback to the sched_waking tracepoint with a high priority, which will be called before the sched_waking trace event callback is called. Note, the removal to a single callback updates the array first, before changing the static_call to single callback, which is the proper order as the first element in the array is the same as what the static_call is being changed to. Link: https://lore.kernel.org/io-uring/[email protected]/ Cc: [email protected] Fixes: d25e37d89dd2f ("tracepoint: Optimize using static_call()") Reported-by: Stefan Metzmacher <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]> --- kernel/tracepoint.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index 976bf8ce8039..fc32821f8240 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -299,8 +299,8 @@ static int tracepoint_add_func(struct tracepoint *tp, * a pointer to it. This array is referenced by __DO_TRACE from * include/linux/tracepoint.h using rcu_dereference_sched(). */ - rcu_assign_pointer(tp->funcs, tp_funcs); tracepoint_update_call(tp, tp_funcs, false); + rcu_assign_pointer(tp->funcs, tp_funcs); static_key_enable(&tp->key); release_probes(old); -- 2.31.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] tracepoints: Update static_call before tp_funcs when adding a tracepoint 2021-07-23 2:33 [PATCH] tracepoints: Update static_call before tp_funcs when adding a tracepoint Steven Rostedt @ 2021-07-26 15:46 ` Mathieu Desnoyers 2021-07-26 16:56 ` Steven Rostedt 0 siblings, 1 reply; 8+ messages in thread From: Mathieu Desnoyers @ 2021-07-26 15:46 UTC (permalink / raw) To: rostedt Cc: linux-kernel, linux-trace-devel, Ingo Molnar, Andrew Morton, Stefan Metzmacher, io-uring, Peter Zijlstra ----- On Jul 22, 2021, at 10:33 PM, rostedt [email protected] wrote: > From: "Steven Rostedt (VMware)" <[email protected]> > > Because of the significant overhead that retpolines pose on indirect > calls, the tracepoint code was updated to use the new "static_calls" that > can modify the running code to directly call a function instead of using > an indirect caller, and this function can be changed at runtime. > > In the tracepoint code that calls all the registered callbacks that are > attached to a tracepoint, the following is done: > > it_func_ptr = rcu_dereference_raw((&__tracepoint_##name)->funcs); > if (it_func_ptr) { > __data = (it_func_ptr)->data; > static_call(tp_func_##name)(__data, args); > } > > If there's just a single callback, the static_call is updated to just call > that callback directly. Once another handler is added, then the static > caller is updated to call the iterator, that simply loops over all the > funcs in the array and calls each of the callbacks like the old method > using indirect calling. > > The issue was discovered with a race between updating the funcs array and > updating the static_call. The funcs array was updated first and then the > static_call was updated. This is not an issue as long as the first element > in the old array is the same as the first element in the new array. But > that assumption is incorrect, because callbacks also have a priority > field, and if there's a callback added that has a higher priority than the > callback on the old array, then it will become the first callback in the > new array. This means that it is possible to call the old callback with > the new callback data element, which can cause a kernel panic. > [...] Looking into the various transitions, I suspect the issue runs much deeper than this. The sequence of transitions (number of probes) I'm considering is: 0->1 1->2 2->1 1->0 0->1 1->2 I come to three conclusions: Where we have: tracepoint_remove_func() tracepoint_update_call(tp, tp_funcs, tp_funcs[0].func != old[0].func); We should be comparing .data rather than .func, because the same callback can be registered twice with different data, and what we care about here is that the data of array element 0 is unchanged to skip rcu sync. My second conclusion is that it's odd that transition 1->0 leaves the prior function call in place even after it's been removed. When we go back to 0->1, that function call may still be called even though the function is not there anymore. And there is no RCU synchronization on these transitions, so those are all possible scenarios. My third conclusion is that we'd need synchronize RCU whenever tp_funcs[0].data changes for transitions 1->2, 2->1, and 1->2 because the priorities don't guarantee that the first callback stays in the first position, and we also need to rcu sync unconditionally on transition 1->0. We currently only have sync RCU on transition from 2->1 when tp_funcs[0].func changes, which is bogus in many ways. Basically, transitions from the iterator to a specific function should be handled with care (making sure the tp_funcs array is updated and rcu-sync is done), except in the specific case where the prior tp->funcs was NULL, which skips the function call. And unless there is a rcu-sync between the state transitions, we need to consider all prior states as additional original state as well. Therefore, in a 1->0->1 transition sequence, it's very much possible that the old function ends up observing the new callback's data unless we add some rcu sync in between. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] tracepoints: Update static_call before tp_funcs when adding a tracepoint 2021-07-26 15:46 ` Mathieu Desnoyers @ 2021-07-26 16:56 ` Steven Rostedt 2021-07-26 17:39 ` Mathieu Desnoyers 0 siblings, 1 reply; 8+ messages in thread From: Steven Rostedt @ 2021-07-26 16:56 UTC (permalink / raw) To: Mathieu Desnoyers Cc: linux-kernel, linux-trace-devel, Ingo Molnar, Andrew Morton, Stefan Metzmacher, io-uring, Peter Zijlstra On Mon, 26 Jul 2021 11:46:41 -0400 (EDT) Mathieu Desnoyers <[email protected]> wrote: > > [...] > > Looking into the various transitions, I suspect the issue runs much deeper than > this. > > The sequence of transitions (number of probes) I'm considering is: > > 0->1 > 1->2 > 2->1 > 1->0 > 0->1 > 1->2 > > I come to three conclusions: > > Where we have: > > tracepoint_remove_func() > > tracepoint_update_call(tp, tp_funcs, > tp_funcs[0].func != old[0].func); > > We should be comparing .data rather than .func, because the same callback > can be registered twice with different data, and what we care about here > is that the data of array element 0 is unchanged to skip rcu sync. I guess we could do that, as you are right, we are worried about passing the wrong data to the wrong function. If the function is the same, at least it wont crash the kernel as the function can handle that data. But, it could miss the callback that is staying while calling the one that is going instead. Unlikely to happen, but in theory it is enough to fix. > > My second conclusion is that it's odd that transition 1->0 leaves the > prior function call in place even after it's been removed. When we go > back to 0->1, that function call may still be called even though the > function is not there anymore. And there is no RCU synchronization on > these transitions, so those are all possible scenarios. How so? When doing this transition we have: tracepoint_update_call(tp, tp_funcs, false); rcu_assign_pointer(tp->funcs, tp_funcs); static_key_enable(&tp->key); Where that tracepoint_update_call() will reinstall the iterator, and that's a full memory barrier. It even sends IPIs to all other CPUs to make sure all CPUs are synchronized before continuing. By the time we get to static_key_enable(), there will not be any CPUs that see the old function. And the process of updating a static_key also does the same kind of synchronization. > > My third conclusion is that we'd need synchronize RCU whenever tp_funcs[0].data > changes for transitions 1->2, 2->1, and 1->2 because the priorities don't guarantee > that the first callback stays in the first position, and we also need to rcu sync > unconditionally on transition 1->0. We currently only have sync RCU on transition > from 2->1 when tp_funcs[0].func changes, which is bogus in many ways. Going from 1 to 2, there's no issue. We switch to the iterator, which is the old method anyway. It looks directly at the array and matches the data with the func for each element of that array, and the data read initially (before calling the iterator) is ignored. > > Basically, transitions from the iterator to a specific function should be handled > with care (making sure the tp_funcs array is updated and rcu-sync is done), except > in the specific case where the prior tp->funcs was NULL, which skips the function > call. And unless there is a rcu-sync between the state transitions, we need to consider > all prior states as additional original state as well. Therefore, in a 1->0->1 > transition sequence, it's very much possible that the old function ends up observing > the new callback's data unless we add some rcu sync in between. I disagree with the last part, as I explained above. But I do agree that comparing data is probably the better check. -- Steve > > Thoughts ? > > Thanks, > > Mathieu > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] tracepoints: Update static_call before tp_funcs when adding a tracepoint 2021-07-26 16:56 ` Steven Rostedt @ 2021-07-26 17:39 ` Mathieu Desnoyers 2021-07-26 18:49 ` Steven Rostedt 0 siblings, 1 reply; 8+ messages in thread From: Mathieu Desnoyers @ 2021-07-26 17:39 UTC (permalink / raw) To: rostedt Cc: linux-kernel, linux-trace-devel, Ingo Molnar, Andrew Morton, Stefan Metzmacher, io-uring, Peter Zijlstra, paulmck ----- On Jul 26, 2021, at 12:56 PM, rostedt [email protected] wrote: > On Mon, 26 Jul 2021 11:46:41 -0400 (EDT) > Mathieu Desnoyers <[email protected]> wrote: [...] > >> >> My second conclusion is that it's odd that transition 1->0 leaves the >> prior function call in place even after it's been removed. When we go >> back to 0->1, that function call may still be called even though the >> function is not there anymore. And there is no RCU synchronization on >> these transitions, so those are all possible scenarios. > > How so? When doing this transition we have: > > tracepoint_update_call(tp, tp_funcs, false); > rcu_assign_pointer(tp->funcs, tp_funcs); > static_key_enable(&tp->key); > > Where that tracepoint_update_call() will reinstall the iterator, and > that's a full memory barrier. It even sends IPIs to all other CPUs to > make sure all CPUs are synchronized before continuing. > > By the time we get to static_key_enable(), there will not be any CPUs > that see the old function. And the process of updating a static_key > also does the same kind of synchronization. Actually, my explanation was inaccurate. The issue is that the _new_ callback may see the _old_ data. Considering __DO_TRACE_CALL: do { \ struct tracepoint_func *it_func_ptr; \ void *__data; \ it_func_ptr = \ rcu_dereference_raw((&__tracepoint_##name)->funcs); \ if (it_func_ptr) { \ __data = (it_func_ptr)->data; \ ----> [ delayed here on one CPU (e.g. vcpu preempted by the host) ] static_call(tp_func_##name)(__data, args); \ } \ } while (0) It has loaded the tp->funcs of the old callback (so it will try to use the old data). AFAIU, none of the synchronization mechanisms you refer to here (memory barrier, IPIs..) will change the fact that this CPU may still be delayed across the entire 1->0->1 transition sequence, and may end up calling the new callback with the old data. Unless an explicit RCU-sync is done. > >> >> My third conclusion is that we'd need synchronize RCU whenever tp_funcs[0].data >> changes for transitions 1->2, 2->1, and 1->2 because the priorities don't >> guarantee >> that the first callback stays in the first position, and we also need to rcu >> sync >> unconditionally on transition 1->0. We currently only have sync RCU on >> transition >> from 2->1 when tp_funcs[0].func changes, which is bogus in many ways. > > Going from 1 to 2, there's no issue. We switch to the iterator, which > is the old method anyway. It looks directly at the array and matches > the data with the func for each element of that array, and the data > read initially (before calling the iterator) is ignored. This relies on ordering guarantees between RCU assign/dereference and static_call updates/call. It may well be the case, but I'm asking anyway. Are we guaranteed of the following ordering ? CPU A CPU B static_call_update() y = rcu_dereference(x) rcu_assign_pointer(x, ...) do_static_call(y) That load of "x" should never happen after the CPU fetches the new static call instruction. Also, I suspect that transition 2->1 needs an unconditional rcu-sync because you may have a sequence of 3->2->1 (or 1->2->1) where the element 0 data is unchanged between 2->1, but was changed from 3->2 (or from 1->2), which may be observed by the static call. Thanks, Mathieu > >> >> Basically, transitions from the iterator to a specific function should be >> handled >> with care (making sure the tp_funcs array is updated and rcu-sync is done), >> except >> in the specific case where the prior tp->funcs was NULL, which skips the >> function >> call. And unless there is a rcu-sync between the state transitions, we need to >> consider >> all prior states as additional original state as well. Therefore, in a 1->0->1 >> transition sequence, it's very much possible that the old function ends up >> observing >> the new callback's data unless we add some rcu sync in between. > > I disagree with the last part, as I explained above. > > But I do agree that comparing data is probably the better check. > > -- Steve > >> >> Thoughts ? >> >> Thanks, >> >> Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] tracepoints: Update static_call before tp_funcs when adding a tracepoint 2021-07-26 17:39 ` Mathieu Desnoyers @ 2021-07-26 18:49 ` Steven Rostedt 2021-07-26 19:12 ` Mathieu Desnoyers 2021-07-27 11:44 ` Peter Zijlstra 0 siblings, 2 replies; 8+ messages in thread From: Steven Rostedt @ 2021-07-26 18:49 UTC (permalink / raw) To: Mathieu Desnoyers Cc: linux-kernel, linux-trace-devel, Ingo Molnar, Andrew Morton, Stefan Metzmacher, io-uring, Peter Zijlstra, paulmck On Mon, 26 Jul 2021 13:39:18 -0400 (EDT) Mathieu Desnoyers <[email protected]> wrote: > ----- On Jul 26, 2021, at 12:56 PM, rostedt [email protected] wrote: > > > On Mon, 26 Jul 2021 11:46:41 -0400 (EDT) > > Mathieu Desnoyers <[email protected]> wrote: > [...] > > > >> > >> My second conclusion is that it's odd that transition 1->0 leaves the > >> prior function call in place even after it's been removed. When we go > >> back to 0->1, that function call may still be called even though the > >> function is not there anymore. And there is no RCU synchronization on > >> these transitions, so those are all possible scenarios. > > > > How so? When doing this transition we have: > > > > tracepoint_update_call(tp, tp_funcs, false); > > rcu_assign_pointer(tp->funcs, tp_funcs); > > static_key_enable(&tp->key); > > > > Where that tracepoint_update_call() will reinstall the iterator, and > > that's a full memory barrier. It even sends IPIs to all other CPUs to > > make sure all CPUs are synchronized before continuing. > > > > By the time we get to static_key_enable(), there will not be any CPUs > > that see the old function. And the process of updating a static_key > > also does the same kind of synchronization. > > Actually, my explanation was inaccurate. The issue is that the _new_ callback > may see the _old_ data. > > Considering __DO_TRACE_CALL: > > do { \ > struct tracepoint_func *it_func_ptr; \ > void *__data; \ > it_func_ptr = \ > rcu_dereference_raw((&__tracepoint_##name)->funcs); \ > if (it_func_ptr) { \ > __data = (it_func_ptr)->data; \ > > ----> [ delayed here on one CPU (e.g. vcpu preempted by the host) ] > > static_call(tp_func_##name)(__data, args); \ > } \ > } while (0) > > It has loaded the tp->funcs of the old callback (so it will try to use the old > data). > > AFAIU, none of the synchronization mechanisms you refer to here (memory barrier, > IPIs..) will change the fact that this CPU may still be delayed across the entire > 1->0->1 transition sequence, and may end up calling the new callback with the > old data. Unless an explicit RCU-sync is done. OK. I see the issue you are saying. And this came from my assumption that the tracepoint code did a synchronization when unregistering the last callback. But of course it wont because that would make a lot of back to back synchronizations of a large number of tracepoints being unregistered at once. And doing it for all 0->1 or 1->0 or even a 1->0->1 can be costly. One way to handle this is when going from 1->0, set off a worker that will do the synchronization asynchronously, and if a 0->1 comes in, have that block until the synchronization is complete. This should work, and not have too much of an overhead. If one 1->0 starts the synchronization, and one or more 1->0 transitions happen, it will be recorded where the worker will do another synchronization, to make sure all 1->0 have went through a full synchronization before a 0->1 can happen. If a 0->1 comes in while a synchronization is happening, it will note the current "number" for the synchronizations (if another one is queued, it will wait for one more), before it can begin. As locks will be held while waiting for synchronizations to finish, we don't need to worry about another 1->0 coming in while a 0->1 is waiting. Shouldn't be too hard to implement. static unsigned long sync_number; static unsigned long last_sync_number; Going from 1->0: /* tracepoint_mutex held */ mutex_lock(&sync_mutex); sync_number++; mutex_unlock(&sync_mutex); queue_worker(sync_worker); sync_worker() again: run = false; mutex_lock(&sync_mutex); if (last_sync_number != sync_number) { last_sync_number = sync_number; run = true; } mutex_unlock(&sync_mutex); if (!run) return; tracepoint_synchronization_unregister(); wake_up(waiters); goto again; Going from 0->1 again: mutex_lock(&sync_mutex); if (last_sync_number != sync_number) { prepare_to_wait(waiters); block = true; } mutex_unlock(&sync_mutex); /* tracepoint_mutex held, sync_number will not increase */ if (block) { schedule(); /* in case we were woken up by an exiting worker */ goto again; } > > > > >> > >> My third conclusion is that we'd need synchronize RCU whenever tp_funcs[0].data > >> changes for transitions 1->2, 2->1, and 1->2 because the priorities don't > >> guarantee > >> that the first callback stays in the first position, and we also need to rcu > >> sync > >> unconditionally on transition 1->0. We currently only have sync RCU on > >> transition > >> from 2->1 when tp_funcs[0].func changes, which is bogus in many ways. > > > > Going from 1 to 2, there's no issue. We switch to the iterator, which > > is the old method anyway. It looks directly at the array and matches > > the data with the func for each element of that array, and the data > > read initially (before calling the iterator) is ignored. > > This relies on ordering guarantees between RCU assign/dereference and static_call > updates/call. It may well be the case, but I'm asking anyway. > > Are we guaranteed of the following ordering ? > > CPU A CPU B > > static_call_update() The static_call_update() triggers an IPI on all CPUs that perform a full memory barrier. That is, nothing on any CPU will cross the static_call_update(). > y = rcu_dereference(x) rcu_assign_pointer(x, ...) > do_static_call(y) > > That load of "x" should never happen after the CPU fetches the new static call > instruction. The 'y' will always be the new static call (which is the iterator in this case), and it doesn't matter which x it read, because the iterator will read the array just like it was done without static calls. > > Also, I suspect that transition 2->1 needs an unconditional rcu-sync because you > may have a sequence of 3->2->1 (or 1->2->1) where the element 0 data is unchanged > between 2->1, but was changed from 3->2 (or from 1->2), which may be observed by the > static call. I'll agree that we need to add synchronization between 1->0->1, but you have not convinced me on this second part. -- Steve ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] tracepoints: Update static_call before tp_funcs when adding a tracepoint 2021-07-26 18:49 ` Steven Rostedt @ 2021-07-26 19:12 ` Mathieu Desnoyers 2021-07-27 11:44 ` Peter Zijlstra 1 sibling, 0 replies; 8+ messages in thread From: Mathieu Desnoyers @ 2021-07-26 19:12 UTC (permalink / raw) To: rostedt Cc: linux-kernel, linux-trace-devel, Ingo Molnar, Andrew Morton, Stefan Metzmacher, io-uring, Peter Zijlstra, paulmck ----- On Jul 26, 2021, at 2:49 PM, rostedt [email protected] wrote: > On Mon, 26 Jul 2021 13:39:18 -0400 (EDT) > Mathieu Desnoyers <[email protected]> wrote: > >> ----- On Jul 26, 2021, at 12:56 PM, rostedt [email protected] wrote: >> >> > On Mon, 26 Jul 2021 11:46:41 -0400 (EDT) >> > Mathieu Desnoyers <[email protected]> wrote: >> [...] >> > [...] >> >> AFAIU, none of the synchronization mechanisms you refer to here (memory barrier, >> IPIs..) will change the fact that this CPU may still be delayed across the >> entire >> 1->0->1 transition sequence, and may end up calling the new callback with the >> old data. Unless an explicit RCU-sync is done. > > OK. I see the issue you are saying. And this came from my assumption > that the tracepoint code did a synchronization when unregistering the > last callback. But of course it wont because that would make a lot of > back to back synchronizations of a large number of tracepoints being > unregistered at once. > Something along the lines of the work approach you propose should work. >> >> > >> >> >> >> My third conclusion is that we'd need synchronize RCU whenever tp_funcs[0].data >> >> changes for transitions 1->2, 2->1, and 1->2 because the priorities don't >> >> guarantee >> >> that the first callback stays in the first position, and we also need to rcu >> >> sync >> >> unconditionally on transition 1->0. We currently only have sync RCU on >> >> transition >> >> from 2->1 when tp_funcs[0].func changes, which is bogus in many ways. >> > >> > Going from 1 to 2, there's no issue. We switch to the iterator, which >> > is the old method anyway. It looks directly at the array and matches >> > the data with the func for each element of that array, and the data >> > read initially (before calling the iterator) is ignored. >> >> This relies on ordering guarantees between RCU assign/dereference and >> static_call >> updates/call. It may well be the case, but I'm asking anyway. >> >> Are we guaranteed of the following ordering ? >> >> CPU A CPU B >> >> static_call_update() > > The static_call_update() triggers an IPI on all CPUs that perform a full memory > barrier. > > That is, nothing on any CPU will cross the static_call_update(). > >> y = rcu_dereference(x) rcu_assign_pointer(x, ...) >> do_static_call(y) >> >> That load of "x" should never happen after the CPU fetches the new static call >> instruction. > > The 'y' will always be the new static call (which is the iterator in > this case), and it doesn't matter which x it read, because the iterator > will read the array just like it was done without static calls. Here by "y" I mean the pointer loaded by rcu_dereference, which is then passed to the call. I do not mean the call per se. I suspect there is something like a control dependency between loading "x" through rcu_dereference and passing the loaded pointer as argument to the static call (and the instruction fetch of the static call). But I don't recall reading any documentation of this specific ordering. > >> >> Also, I suspect that transition 2->1 needs an unconditional rcu-sync because you >> may have a sequence of 3->2->1 (or 1->2->1) where the element 0 data is >> unchanged >> between 2->1, but was changed from 3->2 (or from 1->2), which may be observed by >> the >> static call. > > > I'll agree that we need to add synchronization between 1->0->1, but you > have not convinced me on this second part. In addition to 1->0->1, there are actually 2 other scenarios here: Transition 1->2 to which the ordering question between RCU and static call is related. Transition 2->1 would need an unconditional rcu-sync, because of transitions 3->2->1 and 1->2->1 which can lead the static call to observe wrong data if the rcu_dereference happens when there are e.g. 3 callbacks registered, and then the static call to the function (single callback) is called on the 3-callbacks array. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] tracepoints: Update static_call before tp_funcs when adding a tracepoint 2021-07-26 18:49 ` Steven Rostedt 2021-07-26 19:12 ` Mathieu Desnoyers @ 2021-07-27 11:44 ` Peter Zijlstra 2021-07-27 13:46 ` Mathieu Desnoyers 1 sibling, 1 reply; 8+ messages in thread From: Peter Zijlstra @ 2021-07-27 11:44 UTC (permalink / raw) To: Steven Rostedt Cc: Mathieu Desnoyers, linux-kernel, linux-trace-devel, Ingo Molnar, Andrew Morton, Stefan Metzmacher, io-uring, paulmck On Mon, Jul 26, 2021 at 02:49:03PM -0400, Steven Rostedt wrote: > OK. I see the issue you are saying. And this came from my assumption > that the tracepoint code did a synchronization when unregistering the > last callback. But of course it wont because that would make a lot of > back to back synchronizations of a large number of tracepoints being > unregistered at once. > > And doing it for all 0->1 or 1->0 or even a 1->0->1 can be costly. > > One way to handle this is when going from 1->0, set off a worker that > will do the synchronization asynchronously, and if a 0->1 comes in, > have that block until the synchronization is complete. This should > work, and not have too much of an overhead. > > If one 1->0 starts the synchronization, and one or more 1->0 > transitions happen, it will be recorded where the worker will do > another synchronization, to make sure all 1->0 have went through a full > synchronization before a 0->1 can happen. > > If a 0->1 comes in while a synchronization is happening, it will note > the current "number" for the synchronizations (if another one is > queued, it will wait for one more), before it can begin. As locks will > be held while waiting for synchronizations to finish, we don't need to > worry about another 1->0 coming in while a 0->1 is waiting. Wouldn't get_state_synchronize_rcu() and cond_synchronize_rcu() get you what you need? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] tracepoints: Update static_call before tp_funcs when adding a tracepoint 2021-07-27 11:44 ` Peter Zijlstra @ 2021-07-27 13:46 ` Mathieu Desnoyers 0 siblings, 0 replies; 8+ messages in thread From: Mathieu Desnoyers @ 2021-07-27 13:46 UTC (permalink / raw) To: Peter Zijlstra Cc: rostedt, linux-kernel, linux-trace-devel, Ingo Molnar, Andrew Morton, Stefan Metzmacher, io-uring, paulmck ----- On Jul 27, 2021, at 7:44 AM, Peter Zijlstra [email protected] wrote: > On Mon, Jul 26, 2021 at 02:49:03PM -0400, Steven Rostedt wrote: >> OK. I see the issue you are saying. And this came from my assumption >> that the tracepoint code did a synchronization when unregistering the >> last callback. But of course it wont because that would make a lot of >> back to back synchronizations of a large number of tracepoints being >> unregistered at once. >> >> And doing it for all 0->1 or 1->0 or even a 1->0->1 can be costly. >> >> One way to handle this is when going from 1->0, set off a worker that >> will do the synchronization asynchronously, and if a 0->1 comes in, >> have that block until the synchronization is complete. This should >> work, and not have too much of an overhead. >> >> If one 1->0 starts the synchronization, and one or more 1->0 >> transitions happen, it will be recorded where the worker will do >> another synchronization, to make sure all 1->0 have went through a full >> synchronization before a 0->1 can happen. >> >> If a 0->1 comes in while a synchronization is happening, it will note >> the current "number" for the synchronizations (if another one is >> queued, it will wait for one more), before it can begin. As locks will >> be held while waiting for synchronizations to finish, we don't need to >> worry about another 1->0 coming in while a 0->1 is waiting. > > Wouldn't get_state_synchronize_rcu() and cond_synchronize_rcu() get you > what you need? Indeed, snapshotting the state and conditionally waiting for a grace period if none happened since the snapshot appears to be the intent here. Using get_state+cond_sync should allow us to do this without any additional worker thread. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-07-27 13:46 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-07-23 2:33 [PATCH] tracepoints: Update static_call before tp_funcs when adding a tracepoint Steven Rostedt 2021-07-26 15:46 ` Mathieu Desnoyers 2021-07-26 16:56 ` Steven Rostedt 2021-07-26 17:39 ` Mathieu Desnoyers 2021-07-26 18:49 ` Steven Rostedt 2021-07-26 19:12 ` Mathieu Desnoyers 2021-07-27 11:44 ` Peter Zijlstra 2021-07-27 13:46 ` Mathieu Desnoyers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox