Hello.

On Tue, Mar 14, 2023 at 10:07:40AM +0000, Daniel Dao <dqminh@cloudflare.com> wrote:
> IMO this violated the principle of cpuset and can be confusing for end users.
> I think I prefer Waiman's suggestion of allowing an implicit move to cpuset
> when enabling cpuset with subtree_control but not explicit moves such as when
> setting cpuset.cpus or writing the pids into cgroup.procs. It's easier to reason
> about and make the failure mode more explicit.
> 
> What do you think ?

I think cpuset should top IO worker's affinity (like sched_setaffinity(2)).
Thus:
- modifying cpuset.cpus	                update task's affinity, for sure
- implicit migration (enabling cpuset)  update task's affinity, effective nop
- explicit migration (meh)              update task's affinity, ¯\_(ツ)_/¯

My understanding of PF_NO_SETAFFINITY is that's for kernel threads that
do work that's functionally needed on a given CPU and thus they cannot
be migrated [1]. As said previously for io_uring workers, affinity is
for performance only.

Hence, I'd also suggest on top of 01e68ce08a30 ("io_uring/io-wq: stop
setting PF_NO_SETAFFINITY on io-wq workers"):

--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -233,7 +233,6 @@ static int io_sq_thread(void *data)
                set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu));
        else
                set_cpus_allowed_ptr(current, cpu_online_mask);
-       current->flags |= PF_NO_SETAFFINITY;

        mutex_lock(&sqd->lock);
        while (1) {

Afterall, io_uring_setup(2) already mentions:
> When cgroup setting cpuset.cpus changes (typically in container
> environment), the bounded cpu set may be changed as well.

HTH,
Michal

[1] Ideally, those should always remain in the root cpuset cgroup.