On 12/17/21 1:11 PM, Jens Axboe wrote: > On 12/17/21 12:45 PM, Linus Torvalds wrote: >> On Fri, Dec 17, 2021 at 9:00 AM Jens Axboe wrote: >>> >>> Just a single fix, fixing an issue with the worker creation change that >>> was merged last week. >> >> Hmm. I've pulled, but looking at the result, this is a classic no-no. >> >> You can't just randomly drop and re-take a lock and sat it's "safe". >> >> Because I don't think it's necessarily safe at all. >> >> When you drop the wqe->lock in the middle of io_wqe_dec_running to >> create a new worker, it means - for example - that "io_worker_exit()" >> can now run immediately on the new worker as far as I can tell. >> >> So one io_worker_exit() m,ay literally race with another one, where >> both are inside that io_wqe_dec_running() at the same time. And then >> they both end up doing >> >> worker->flags = 0; >> current->flags &= ~PF_IO_WORKER; >> >> afterwards in the caller, and not necessarily in the original order. >> And then they'll both possible do >> >> kfree_rcu(worker, rcu); >> >> which sounds like a disaster. > > The worker itself calls io_worker_exit(), so it cannot happen from > within io_wqe_dec_running for the existing one. And that's really all > we care about. The new worker can come and go and we don't really > care about it, we know we're within another worker. > > That said, I totally do agree that this pattern is not a great one > and should be avoided if at all possible. This one should be solvable by > passing back a "do the cancel" information from > io_queue_worker_create(), but that also gets a bit ugly in terms of > having three return types essentially... > > I'll have a think about how to do this in a saner fashion that's more > obviously correct. Something like this gets rid of it, but I'm not a huge fan of patch 1. We could also make it an enum return, but that also gets a bit weird imho. -- Jens Axboe