>>>> I just think we need to make sure the ground rules are sane. I'm going >>>> to write a few test cases to make sure we do the right thing. >>>> >>> >> Ok, let me try to state some rules to discuss: > >> 1. REQ -> LINK_TIMEOUT >> is a valid use case > > Yes > >> 2. timeout is set at the moment of starting execution of operation. >> e.g. REQ1, REQ2|DRAIN -> LINK_TIMEOUT >> >> Timer is set at the moment, when everything is drained and we >> sending REQ. i.e. after completion of REQ1 > > Right, the timeout is prepped before REQ2 is started, armed when it is > started (if not already done). The prep + arm split is important to > ensure that a short timeout doesn't even find REQ2. I've got (and seen the patch) for prep + arm split. Could a submission block/take a long time? If so, it's probably not what user would want. e.g. WRITE -> LINK_TIMEOUT (1s) - submit write (blocking, takes 2s) - and only after this 2s have a chance to arm the timeout. > >> 3. REQ1 -> LINK_TIMEOUT1 -> REQ2 -> LINK_TIMEOUT2 >> >> is valid, and LINK_TIMEOUT2 will be set, at the moment of >> start of REQ2's execution. It also mean, that if >> LINK_TIMEOUT1 fires, it will cancel REQ1, and REQ2 >> with LINK_TIMEOUT2 (with proper return values) > > That's not valid with the patches I sent. It could be, but we'd need to > fix that bit. > It should almost work, if we move linked timeout init/arm code into __io_submit_sqe(). There is also a problem, which it'll solve: If a request is deferred, it will skip timeout initialisation, because io_req_defer() happens before __io_queue_sqe(). io_wq_submit_work() won't initialise/arm the timeout as well, as it use __io_submit_sqe() directly. So - rule 2. doesn't work - free_req() calls io_link_cancel_timeout() for an non-inititialised timeout The case I keep in mind is: read file -> SEND (+LINK_TIMEOUT) -> RECV (+LINK_TIMEOUT) -> write file ... We don't care how long file read/write would take, but would want to limit execution time for network operations. >> 4. REQ1, LINK_TIMEOUT >> is invalid, fail it > > Correct > >> 5. LINK_TIMEOUT1 -> LINK_TIMEOUT2 >> Fail first, link-fail (aka cancelled) for the second one > > Correct > >> 6. REQ1 -> LINK_TIMEOUT1 -> LINK_TIMEOUT2 >> execute REQ1+LINK_TIMEOUT1, and then fail LINK_TIMEOUT2 as >> invalid. Also, LINK_TIMEOUT2 could be just cancelled >> (e.g. if fail_links for REQ1) > > Given case 5, why would this one be legal? > This one is different if we conceptually consider REQ + following LINK_TIMEOUT as a single operation with timeout. If so, it can be said that (REQ1 -> LINK_TIMEOUT1) is valid pair, but LINK_TIMEOUT2 is a following single link timeout, that's more like in 4. 7. If we decide to not implement 3., what's about the case below? REQ1 -> REQ2 -> LINK_TIMEOUT -- Pavel Begunkov