* [RFC] a new way to achieve asynchronous IO @ 2022-06-20 12:01 Hao Xu 2022-06-20 12:03 ` Hao Xu 2022-06-20 13:41 ` Jens Axboe 0 siblings, 2 replies; 9+ messages in thread From: Hao Xu @ 2022-06-20 12:01 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe, Pavel Begunkov Hi, I've some thought on the way of doing async IO. The current model is: (given we are using SQPOLL mode) the sqthread does: (a) Issue a request with nowait/nonblock flag. (b) If it would block, reutrn -EAGAIN (c) The io_uring layer captures this -EAGAIN and wake up/create a io-worker to execute the request synchronously. (d) Try to issue other requests in the above steps again. This implementation has two downsides: (1) we have to find all the block point in the IO stack manually and change them into "nowait/nonblock friendly". (2) when we raise another io-worker to do the request, we submit the request from the very beginning. This isn't a little bit inefficient. While I think we can actually do it in a reverse way: (given we are using SQPOLL mode) the sqthread1 does: (a) Issue a request in the synchronous way (b) If it is blocked/scheduled soon, raise another sqthread2 (c) sqthread2 tries to issue other requests in the same way. This solves problem (1), and may solve (2). For (1), we just do the sqthread waken-up at the beginning of schedule() just like what the io-worker and system-worker do. No need to find all the block point. For (2), we continue the blocked request from where it is blocked when resource is satisfied. What we need to take care is making sure there is only one task submitting the requests. To achieve this, we can maintain a pool of sqthread just like the iowq. I've done a very simple/ugly POC to demonstrate this: https://github.com/HowHsu/linux/commit/183be142493b5a816b58bd95ae4f0926227b587b I also wrote a simple test to test it, which submits two sqes, one read(pipe), one nop request. The first one will be block since no data in the pipe. Then a new sqthread was created/waken up to submit the second one and then some data is written to the pipe(by a unrelated user thread), soon the first sqthread is waken up and continues the request. If the idea sounds no fatal issue I'll change the POC to real patches. Any comments are welcome! ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] a new way to achieve asynchronous IO 2022-06-20 12:01 [RFC] a new way to achieve asynchronous IO Hao Xu @ 2022-06-20 12:03 ` Hao Xu 2022-06-20 13:41 ` Jens Axboe 1 sibling, 0 replies; 9+ messages in thread From: Hao Xu @ 2022-06-20 12:03 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe, Pavel Begunkov On 6/20/22 20:01, Hao Xu wrote: > Hi, > I've some thought on the way of doing async IO. The current model is: > (given we are using SQPOLL mode) > > the sqthread does: > (a) Issue a request with nowait/nonblock flag. > (b) If it would block, reutrn -EAGAIN > (c) The io_uring layer captures this -EAGAIN and wake up/create > a io-worker to execute the request synchronously. > (d) Try to issue other requests in the above steps again. > > This implementation has two downsides: > (1) we have to find all the block point in the IO stack manually and > change them into "nowait/nonblock friendly". > (2) when we raise another io-worker to do the request, we submit the > request from the very beginning. This isn't a little bit inefficient. ^is > > > While I think we can actually do it in a reverse way: > (given we are using SQPOLL mode) > > the sqthread1 does: > (a) Issue a request in the synchronous way > (b) If it is blocked/scheduled soon, raise another sqthread2 > (c) sqthread2 tries to issue other requests in the same way. > > This solves problem (1), and may solve (2). > For (1), we just do the sqthread waken-up at the beginning of schedule() > just like what the io-worker and system-worker do. No need to find all > the block point. > For (2), we continue the blocked request from where it is blocked when > resource is satisfied. > > What we need to take care is making sure there is only one task > submitting the requests. > > To achieve this, we can maintain a pool of sqthread just like the iowq. > > I've done a very simple/ugly POC to demonstrate this: > > https://github.com/HowHsu/linux/commit/183be142493b5a816b58bd95ae4f0926227b587b > > > I also wrote a simple test to test it, which submits two sqes, one > read(pipe), one nop request. The first one will be block since no data > in the pipe. Then a new sqthread was created/waken up to submit the > second one and then some data is written to the pipe(by a unrelated > user thread), soon the first sqthread is waken up and continues the > request. > > If the idea sounds no fatal issue I'll change the POC to real patches. > Any comments are welcome! > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] a new way to achieve asynchronous IO 2022-06-20 12:01 [RFC] a new way to achieve asynchronous IO Hao Xu 2022-06-20 12:03 ` Hao Xu @ 2022-06-20 13:41 ` Jens Axboe 2022-06-21 3:38 ` Hao Xu 2022-06-23 13:31 ` Hao Xu 1 sibling, 2 replies; 9+ messages in thread From: Jens Axboe @ 2022-06-20 13:41 UTC (permalink / raw) To: Hao Xu, io-uring; +Cc: Pavel Begunkov, dvernet On 6/20/22 6:01 AM, Hao Xu wrote: > Hi, > I've some thought on the way of doing async IO. The current model is: > (given we are using SQPOLL mode) > > the sqthread does: > (a) Issue a request with nowait/nonblock flag. > (b) If it would block, reutrn -EAGAIN > (c) The io_uring layer captures this -EAGAIN and wake up/create > a io-worker to execute the request synchronously. > (d) Try to issue other requests in the above steps again. > > This implementation has two downsides: > (1) we have to find all the block point in the IO stack manually and > change them into "nowait/nonblock friendly". > (2) when we raise another io-worker to do the request, we submit the > request from the very beginning. This isn't a little bit inefficient. > > > While I think we can actually do it in a reverse way: > (given we are using SQPOLL mode) > > the sqthread1 does: > (a) Issue a request in the synchronous way > (b) If it is blocked/scheduled soon, raise another sqthread2 > (c) sqthread2 tries to issue other requests in the same way. > > This solves problem (1), and may solve (2). > For (1), we just do the sqthread waken-up at the beginning of schedule() > just like what the io-worker and system-worker do. No need to find all > the block point. > For (2), we continue the blocked request from where it is blocked when > resource is satisfied. > > What we need to take care is making sure there is only one task > submitting the requests. > > To achieve this, we can maintain a pool of sqthread just like the iowq. > > I've done a very simple/ugly POC to demonstrate this: > > https://github.com/HowHsu/linux/commit/183be142493b5a816b58bd95ae4f0926227b587b > > I also wrote a simple test to test it, which submits two sqes, one > read(pipe), one nop request. The first one will be block since no data > in the pipe. Then a new sqthread was created/waken up to submit the > second one and then some data is written to the pipe(by a unrelated > user thread), soon the first sqthread is waken up and continues the > request. > > If the idea sounds no fatal issue I'll change the POC to real patches. > Any comments are welcome! One thing I've always wanted to try out is kind of similar to this, but a superset of it. Basically io-wq isn't an explicit offload mechanism, it just happens automatically if the issue blocks. This applies to both SQPOLL and non-SQPOLL. This takes a page out of the old syslet/threadlet that Ingo Molnar did way back in the day [1], but it never really went anywhere. But the pass-on-block primitive would apply very nice to io_uring. That way it'd work is that any issue, SQPOLL or not, would just assume that it won't block. If it doesn't block, great, we can complete it inline. If it does block, an io-wq thread is grabbed and the context moved there. The io-wq takes over the blocking, and the original issue returns in some fashion that allows us to know it went implicitly async. This may be a bit more involved than what you suggest here, which in nature is similar in how we just hope for the best, and deal with the outcome if we did end up blocking. Do you have any numbers from your approach? [1] https://lore.kernel.org/all/[email protected]/T/ -- Jens Axboe ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] a new way to achieve asynchronous IO 2022-06-20 13:41 ` Jens Axboe @ 2022-06-21 3:38 ` Hao Xu 2022-06-23 13:31 ` Hao Xu 1 sibling, 0 replies; 9+ messages in thread From: Hao Xu @ 2022-06-21 3:38 UTC (permalink / raw) To: Jens Axboe, io-uring; +Cc: Pavel Begunkov, dvernet On 6/20/22 21:41, Jens Axboe wrote: > On 6/20/22 6:01 AM, Hao Xu wrote: >> Hi, >> I've some thought on the way of doing async IO. The current model is: >> (given we are using SQPOLL mode) >> >> the sqthread does: >> (a) Issue a request with nowait/nonblock flag. >> (b) If it would block, reutrn -EAGAIN >> (c) The io_uring layer captures this -EAGAIN and wake up/create >> a io-worker to execute the request synchronously. >> (d) Try to issue other requests in the above steps again. >> >> This implementation has two downsides: >> (1) we have to find all the block point in the IO stack manually and >> change them into "nowait/nonblock friendly". >> (2) when we raise another io-worker to do the request, we submit the >> request from the very beginning. This isn't a little bit inefficient. >> >> >> While I think we can actually do it in a reverse way: >> (given we are using SQPOLL mode) >> >> the sqthread1 does: >> (a) Issue a request in the synchronous way >> (b) If it is blocked/scheduled soon, raise another sqthread2 >> (c) sqthread2 tries to issue other requests in the same way. >> >> This solves problem (1), and may solve (2). >> For (1), we just do the sqthread waken-up at the beginning of schedule() >> just like what the io-worker and system-worker do. No need to find all >> the block point. >> For (2), we continue the blocked request from where it is blocked when >> resource is satisfied. >> >> What we need to take care is making sure there is only one task >> submitting the requests. >> >> To achieve this, we can maintain a pool of sqthread just like the iowq. >> >> I've done a very simple/ugly POC to demonstrate this: >> >> https://github.com/HowHsu/linux/commit/183be142493b5a816b58bd95ae4f0926227b587b >> >> I also wrote a simple test to test it, which submits two sqes, one >> read(pipe), one nop request. The first one will be block since no data >> in the pipe. Then a new sqthread was created/waken up to submit the >> second one and then some data is written to the pipe(by a unrelated >> user thread), soon the first sqthread is waken up and continues the >> request. >> >> If the idea sounds no fatal issue I'll change the POC to real patches. >> Any comments are welcome! > > One thing I've always wanted to try out is kind of similar to this, but > a superset of it. Basically io-wq isn't an explicit offload mechanism, > it just happens automatically if the issue blocks. This applies to both > SQPOLL and non-SQPOLL. > > This takes a page out of the old syslet/threadlet that Ingo Molnar did > way back in the day [1], but it never really went anywhere. But the > pass-on-block primitive would apply very nice to io_uring. > > That way it'd work is that any issue, SQPOLL or not, would just assume > that it won't block. If it doesn't block, great, we can complete it > inline. If it does block, an io-wq thread is grabbed and the context > moved there. The io-wq takes over the blocking, and the original issue > returns in some fashion that allows us to know it went implicitly async. This sounds pretty good, I once thought about this but couldn't figure it out clearly how to return to the desire place in the original context. > > This may be a bit more involved than what you suggest here, which in > nature is similar in how we just hope for the best, and deal with the > outcome if we did end up blocking. > > Do you have any numbers from your approach? Currently no, the POC is only to prove the idea works. Shouldn't be hard to modify it to get some numbers. > > [1] https://lore.kernel.org/all/[email protected]/T/ > I'll take a look and see what I can do. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] a new way to achieve asynchronous IO 2022-06-20 13:41 ` Jens Axboe 2022-06-21 3:38 ` Hao Xu @ 2022-06-23 13:31 ` Hao Xu 2022-06-23 14:08 ` Jens Axboe 1 sibling, 1 reply; 9+ messages in thread From: Hao Xu @ 2022-06-23 13:31 UTC (permalink / raw) To: Jens Axboe, io-uring; +Cc: Pavel Begunkov, dvernet On 6/20/22 21:41, Jens Axboe wrote: > On 6/20/22 6:01 AM, Hao Xu wrote: >> Hi, >> I've some thought on the way of doing async IO. The current model is: >> (given we are using SQPOLL mode) >> >> the sqthread does: >> (a) Issue a request with nowait/nonblock flag. >> (b) If it would block, reutrn -EAGAIN >> (c) The io_uring layer captures this -EAGAIN and wake up/create >> a io-worker to execute the request synchronously. >> (d) Try to issue other requests in the above steps again. >> >> This implementation has two downsides: >> (1) we have to find all the block point in the IO stack manually and >> change them into "nowait/nonblock friendly". >> (2) when we raise another io-worker to do the request, we submit the >> request from the very beginning. This isn't a little bit inefficient. >> >> >> While I think we can actually do it in a reverse way: >> (given we are using SQPOLL mode) >> >> the sqthread1 does: >> (a) Issue a request in the synchronous way >> (b) If it is blocked/scheduled soon, raise another sqthread2 >> (c) sqthread2 tries to issue other requests in the same way. >> >> This solves problem (1), and may solve (2). >> For (1), we just do the sqthread waken-up at the beginning of schedule() >> just like what the io-worker and system-worker do. No need to find all >> the block point. >> For (2), we continue the blocked request from where it is blocked when >> resource is satisfied. >> >> What we need to take care is making sure there is only one task >> submitting the requests. >> >> To achieve this, we can maintain a pool of sqthread just like the iowq. >> >> I've done a very simple/ugly POC to demonstrate this: >> >> https://github.com/HowHsu/linux/commit/183be142493b5a816b58bd95ae4f0926227b587b >> >> I also wrote a simple test to test it, which submits two sqes, one >> read(pipe), one nop request. The first one will be block since no data >> in the pipe. Then a new sqthread was created/waken up to submit the >> second one and then some data is written to the pipe(by a unrelated >> user thread), soon the first sqthread is waken up and continues the >> request. >> >> If the idea sounds no fatal issue I'll change the POC to real patches. >> Any comments are welcome! > > One thing I've always wanted to try out is kind of similar to this, but > a superset of it. Basically io-wq isn't an explicit offload mechanism, > it just happens automatically if the issue blocks. This applies to both > SQPOLL and non-SQPOLL. > > This takes a page out of the old syslet/threadlet that Ingo Molnar did > way back in the day [1], but it never really went anywhere. But the > pass-on-block primitive would apply very nice to io_uring. I've read a part of the syslet/threadlet patchset, seems it has something that I need, my first idea about the new iowq offload is just like syslet----if blocked, trigger a new worker, deliver the context to it, and then update the current context so that we return to the place of sqe submission. But I just didn't know how to do it. By the way, may I ask why the syslet/threadlet is not merged to the mainline. The mail thread is very long, haven't gotten a chance to read all of it. For the approach I posted, I found it is actually SQPOLL-nonrelated. The original conext just wake up a worker in the pool to do the submission, and if one blocks, another one wakes up to do the submission. It is definitely easier to implement than something like syslet(context delivery) since the new worker naturally goes to the place of submission thus no context delivery needed. but a downside is every time we call io_uring_enter to submit a batch of sqes, there is a wakeup at the beginning. I'll try if I can implement a context delivery version. > > That way it'd work is that any issue, SQPOLL or not, would just assume > that it won't block. If it doesn't block, great, we can complete it > inline. If it does block, an io-wq thread is grabbed and the context > moved there. The io-wq takes over the blocking, and the original issue > returns in some fashion that allows us to know it went implicitly async. > > This may be a bit more involved than what you suggest here, which in > nature is similar in how we just hope for the best, and deal with the > outcome if we did end up blocking. > > Do you have any numbers from your approach? > > [1] https://lore.kernel.org/all/[email protected]/T/ > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] a new way to achieve asynchronous IO 2022-06-23 13:31 ` Hao Xu @ 2022-06-23 14:08 ` Jens Axboe 2022-06-27 7:11 ` Hao Xu 2022-07-12 7:11 ` Hao Xu 0 siblings, 2 replies; 9+ messages in thread From: Jens Axboe @ 2022-06-23 14:08 UTC (permalink / raw) To: Hao Xu, io-uring; +Cc: Pavel Begunkov, dvernet On 6/23/22 7:31 AM, Hao Xu wrote: > On 6/20/22 21:41, Jens Axboe wrote: >> On 6/20/22 6:01 AM, Hao Xu wrote: >>> Hi, >>> I've some thought on the way of doing async IO. The current model is: >>> (given we are using SQPOLL mode) >>> >>> the sqthread does: >>> (a) Issue a request with nowait/nonblock flag. >>> (b) If it would block, reutrn -EAGAIN >>> (c) The io_uring layer captures this -EAGAIN and wake up/create >>> a io-worker to execute the request synchronously. >>> (d) Try to issue other requests in the above steps again. >>> >>> This implementation has two downsides: >>> (1) we have to find all the block point in the IO stack manually and >>> change them into "nowait/nonblock friendly". >>> (2) when we raise another io-worker to do the request, we submit the >>> request from the very beginning. This isn't a little bit inefficient. >>> >>> >>> While I think we can actually do it in a reverse way: >>> (given we are using SQPOLL mode) >>> >>> the sqthread1 does: >>> (a) Issue a request in the synchronous way >>> (b) If it is blocked/scheduled soon, raise another sqthread2 >>> (c) sqthread2 tries to issue other requests in the same way. >>> >>> This solves problem (1), and may solve (2). >>> For (1), we just do the sqthread waken-up at the beginning of schedule() >>> just like what the io-worker and system-worker do. No need to find all >>> the block point. >>> For (2), we continue the blocked request from where it is blocked when >>> resource is satisfied. >>> >>> What we need to take care is making sure there is only one task >>> submitting the requests. >>> >>> To achieve this, we can maintain a pool of sqthread just like the iowq. >>> >>> I've done a very simple/ugly POC to demonstrate this: >>> >>> https://github.com/HowHsu/linux/commit/183be142493b5a816b58bd95ae4f0926227b587b >>> >>> I also wrote a simple test to test it, which submits two sqes, one >>> read(pipe), one nop request. The first one will be block since no data >>> in the pipe. Then a new sqthread was created/waken up to submit the >>> second one and then some data is written to the pipe(by a unrelated >>> user thread), soon the first sqthread is waken up and continues the >>> request. >>> >>> If the idea sounds no fatal issue I'll change the POC to real patches. >>> Any comments are welcome! >> >> One thing I've always wanted to try out is kind of similar to this, but >> a superset of it. Basically io-wq isn't an explicit offload mechanism, >> it just happens automatically if the issue blocks. This applies to both >> SQPOLL and non-SQPOLL. >> >> This takes a page out of the old syslet/threadlet that Ingo Molnar did >> way back in the day [1], but it never really went anywhere. But the >> pass-on-block primitive would apply very nice to io_uring. > > I've read a part of the syslet/threadlet patchset, seems it has > something that I need, my first idea about the new iowq offload is > just like syslet----if blocked, trigger a new worker, deliver the > context to it, and then update the current context so that we return > to the place of sqe submission. But I just didn't know how to do it. Exactly, what you mentioned was very close to what I had considered in the past, and what the syslet/threadlet attempted to do. Except it flips it upside down a bit, which I do think is probably the saner way to do it rather than have the original block and fork a new one. > By the way, may I ask why the syslet/threadlet is not merged to the > mainline. The mail thread is very long, haven't gotten a chance to > read all of it. Not quite sure, it's been a long time. IMHO it's a good idea looking for the right interface, which we now have. So the time may be ripe to do something like this, finally. > > For the approach I posted, I found it is actually SQPOLL-nonrelated. > The original conext just wake up a worker in the pool to do the > submission, and if one blocks, another one wakes up to do the > submission. It is definitely easier to implement than something like > syslet(context delivery) since the new worker naturally goes to the > place of submission thus no context delivery needed. but a downside is > every time we call io_uring_enter to submit a batch of sqes, there is a > wakeup at the beginning. > > I'll try if I can implement a context delivery version. Sounds good, thanks. -- Jens Axboe ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] a new way to achieve asynchronous IO 2022-06-23 14:08 ` Jens Axboe @ 2022-06-27 7:11 ` Hao Xu 2022-06-28 13:33 ` Hao Xu 2022-07-12 7:11 ` Hao Xu 1 sibling, 1 reply; 9+ messages in thread From: Hao Xu @ 2022-06-27 7:11 UTC (permalink / raw) To: Jens Axboe, io-uring; +Cc: Pavel Begunkov, dvernet On 6/23/22 22:08, Jens Axboe wrote: > On 6/23/22 7:31 AM, Hao Xu wrote: >> On 6/20/22 21:41, Jens Axboe wrote: >>> On 6/20/22 6:01 AM, Hao Xu wrote: >>>> Hi, >>>> I've some thought on the way of doing async IO. The current model is: >>>> (given we are using SQPOLL mode) >>>> >>>> the sqthread does: >>>> (a) Issue a request with nowait/nonblock flag. >>>> (b) If it would block, reutrn -EAGAIN >>>> (c) The io_uring layer captures this -EAGAIN and wake up/create >>>> a io-worker to execute the request synchronously. >>>> (d) Try to issue other requests in the above steps again. >>>> >>>> This implementation has two downsides: >>>> (1) we have to find all the block point in the IO stack manually and >>>> change them into "nowait/nonblock friendly". >>>> (2) when we raise another io-worker to do the request, we submit the >>>> request from the very beginning. This isn't a little bit inefficient. >>>> >>>> >>>> While I think we can actually do it in a reverse way: >>>> (given we are using SQPOLL mode) >>>> >>>> the sqthread1 does: >>>> (a) Issue a request in the synchronous way >>>> (b) If it is blocked/scheduled soon, raise another sqthread2 >>>> (c) sqthread2 tries to issue other requests in the same way. >>>> >>>> This solves problem (1), and may solve (2). >>>> For (1), we just do the sqthread waken-up at the beginning of schedule() >>>> just like what the io-worker and system-worker do. No need to find all >>>> the block point. >>>> For (2), we continue the blocked request from where it is blocked when >>>> resource is satisfied. >>>> >>>> What we need to take care is making sure there is only one task >>>> submitting the requests. >>>> >>>> To achieve this, we can maintain a pool of sqthread just like the iowq. >>>> >>>> I've done a very simple/ugly POC to demonstrate this: >>>> >>>> https://github.com/HowHsu/linux/commit/183be142493b5a816b58bd95ae4f0926227b587b >>>> >>>> I also wrote a simple test to test it, which submits two sqes, one >>>> read(pipe), one nop request. The first one will be block since no data >>>> in the pipe. Then a new sqthread was created/waken up to submit the >>>> second one and then some data is written to the pipe(by a unrelated >>>> user thread), soon the first sqthread is waken up and continues the >>>> request. >>>> >>>> If the idea sounds no fatal issue I'll change the POC to real patches. >>>> Any comments are welcome! >>> >>> One thing I've always wanted to try out is kind of similar to this, but >>> a superset of it. Basically io-wq isn't an explicit offload mechanism, >>> it just happens automatically if the issue blocks. This applies to both >>> SQPOLL and non-SQPOLL. >>> >>> This takes a page out of the old syslet/threadlet that Ingo Molnar did >>> way back in the day [1], but it never really went anywhere. But the >>> pass-on-block primitive would apply very nice to io_uring. >> >> I've read a part of the syslet/threadlet patchset, seems it has >> something that I need, my first idea about the new iowq offload is >> just like syslet----if blocked, trigger a new worker, deliver the >> context to it, and then update the current context so that we return >> to the place of sqe submission. But I just didn't know how to do it. > > Exactly, what you mentioned was very close to what I had considered in > the past, and what the syslet/threadlet attempted to do. Except it flips > it upside down a bit, which I do think is probably the saner way to do > it rather than have the original block and fork a new one. > >> By the way, may I ask why the syslet/threadlet is not merged to the >> mainline. The mail thread is very long, haven't gotten a chance to >> read all of it. > > Not quite sure, it's been a long time. IMHO it's a good idea looking for > the right interface, which we now have. So the time may be ripe to do > something like this, finally. I've been blocked by an issue: if we deliver context from task a to b, we may have no ways to wake it up... because when the resource which blocks a is released by another task like c, c wakes up a, not b. If we want to make it work, we have to deliver the struct task_struct as well. That means the original task uses a new task_struct and the new task uses the old one. And in the meanwhile we have to maintain the pid, parent task .etc stuff.(since we swap the task_struct, the pid and other stuff also changed). Any thoughts? >> >> For the approach I posted, I found it is actually SQPOLL-nonrelated. >> The original conext just wake up a worker in the pool to do the >> submission, and if one blocks, another one wakes up to do the >> submission. It is definitely easier to implement than something like >> syslet(context delivery) since the new worker naturally goes to the >> place of submission thus no context delivery needed. but a downside is >> every time we call io_uring_enter to submit a batch of sqes, there is a >> wakeup at the beginning. >> >> I'll try if I can implement a context delivery version. > > Sounds good, thanks. > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] a new way to achieve asynchronous IO 2022-06-27 7:11 ` Hao Xu @ 2022-06-28 13:33 ` Hao Xu 0 siblings, 0 replies; 9+ messages in thread From: Hao Xu @ 2022-06-28 13:33 UTC (permalink / raw) To: Jens Axboe, io-uring; +Cc: Pavel Begunkov, dvernet On 6/27/22 15:11, Hao Xu wrote: > On 6/23/22 22:08, Jens Axboe wrote: >> On 6/23/22 7:31 AM, Hao Xu wrote: >>> On 6/20/22 21:41, Jens Axboe wrote: >>>> On 6/20/22 6:01 AM, Hao Xu wrote: >>>>> Hi, >>>>> I've some thought on the way of doing async IO. The current model is: >>>>> (given we are using SQPOLL mode) >>>>> >>>>> the sqthread does: >>>>> (a) Issue a request with nowait/nonblock flag. >>>>> (b) If it would block, reutrn -EAGAIN >>>>> (c) The io_uring layer captures this -EAGAIN and wake up/create >>>>> a io-worker to execute the request synchronously. >>>>> (d) Try to issue other requests in the above steps again. >>>>> >>>>> This implementation has two downsides: >>>>> (1) we have to find all the block point in the IO stack manually and >>>>> change them into "nowait/nonblock friendly". >>>>> (2) when we raise another io-worker to do the request, we submit the >>>>> request from the very beginning. This isn't a little bit inefficient. >>>>> >>>>> >>>>> While I think we can actually do it in a reverse way: >>>>> (given we are using SQPOLL mode) >>>>> >>>>> the sqthread1 does: >>>>> (a) Issue a request in the synchronous way >>>>> (b) If it is blocked/scheduled soon, raise another sqthread2 >>>>> (c) sqthread2 tries to issue other requests in the same way. >>>>> >>>>> This solves problem (1), and may solve (2). >>>>> For (1), we just do the sqthread waken-up at the beginning of >>>>> schedule() >>>>> just like what the io-worker and system-worker do. No need to find all >>>>> the block point. >>>>> For (2), we continue the blocked request from where it is blocked when >>>>> resource is satisfied. >>>>> >>>>> What we need to take care is making sure there is only one task >>>>> submitting the requests. >>>>> >>>>> To achieve this, we can maintain a pool of sqthread just like the >>>>> iowq. >>>>> >>>>> I've done a very simple/ugly POC to demonstrate this: >>>>> >>>>> https://github.com/HowHsu/linux/commit/183be142493b5a816b58bd95ae4f0926227b587b >>>>> >>>>> >>>>> I also wrote a simple test to test it, which submits two sqes, one >>>>> read(pipe), one nop request. The first one will be block since no data >>>>> in the pipe. Then a new sqthread was created/waken up to submit the >>>>> second one and then some data is written to the pipe(by a unrelated >>>>> user thread), soon the first sqthread is waken up and continues the >>>>> request. >>>>> >>>>> If the idea sounds no fatal issue I'll change the POC to real patches. >>>>> Any comments are welcome! >>>> >>>> One thing I've always wanted to try out is kind of similar to this, but >>>> a superset of it. Basically io-wq isn't an explicit offload mechanism, >>>> it just happens automatically if the issue blocks. This applies to both >>>> SQPOLL and non-SQPOLL. >>>> >>>> This takes a page out of the old syslet/threadlet that Ingo Molnar did >>>> way back in the day [1], but it never really went anywhere. But the >>>> pass-on-block primitive would apply very nice to io_uring. >>> >>> I've read a part of the syslet/threadlet patchset, seems it has >>> something that I need, my first idea about the new iowq offload is >>> just like syslet----if blocked, trigger a new worker, deliver the >>> context to it, and then update the current context so that we return >>> to the place of sqe submission. But I just didn't know how to do it. >> >> Exactly, what you mentioned was very close to what I had considered in >> the past, and what the syslet/threadlet attempted to do. Except it flips >> it upside down a bit, which I do think is probably the saner way to do >> it rather than have the original block and fork a new one. >> >>> By the way, may I ask why the syslet/threadlet is not merged to the >>> mainline. The mail thread is very long, haven't gotten a chance to >>> read all of it. >> >> Not quite sure, it's been a long time. IMHO it's a good idea looking for >> the right interface, which we now have. So the time may be ripe to do >> something like this, finally. > > I've been blocked by an issue: > if we deliver context from task a to b, we may have no ways to wake it > up... because when the resource which blocks a is released by another > task like c, c wakes up a, not b. > If we want to make it work, we have to deliver the struct task_struct > as well. That means the original task uses a new task_struct and the > new task uses the old one. And in the meanwhile we have to maintain > the pid, parent task .etc stuff.(since we swap the task_struct, the > pid and other stuff also changed). > Any thoughts? > Just ignore this, seems I misunderstood something.. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] a new way to achieve asynchronous IO 2022-06-23 14:08 ` Jens Axboe 2022-06-27 7:11 ` Hao Xu @ 2022-07-12 7:11 ` Hao Xu 1 sibling, 0 replies; 9+ messages in thread From: Hao Xu @ 2022-07-12 7:11 UTC (permalink / raw) To: Jens Axboe, io-uring; +Cc: Pavel Begunkov, dvernet On 6/23/22 22:08, Jens Axboe wrote: > On 6/23/22 7:31 AM, Hao Xu wrote: >> On 6/20/22 21:41, Jens Axboe wrote: >>> On 6/20/22 6:01 AM, Hao Xu wrote: >>>> Hi, >>>> I've some thought on the way of doing async IO. The current model is: >>>> (given we are using SQPOLL mode) >>>> >>>> the sqthread does: >>>> (a) Issue a request with nowait/nonblock flag. >>>> (b) If it would block, reutrn -EAGAIN >>>> (c) The io_uring layer captures this -EAGAIN and wake up/create >>>> a io-worker to execute the request synchronously. >>>> (d) Try to issue other requests in the above steps again. >>>> >>>> This implementation has two downsides: >>>> (1) we have to find all the block point in the IO stack manually and >>>> change them into "nowait/nonblock friendly". >>>> (2) when we raise another io-worker to do the request, we submit the >>>> request from the very beginning. This isn't a little bit inefficient. >>>> >>>> >>>> While I think we can actually do it in a reverse way: >>>> (given we are using SQPOLL mode) >>>> >>>> the sqthread1 does: >>>> (a) Issue a request in the synchronous way >>>> (b) If it is blocked/scheduled soon, raise another sqthread2 >>>> (c) sqthread2 tries to issue other requests in the same way. >>>> >>>> This solves problem (1), and may solve (2). >>>> For (1), we just do the sqthread waken-up at the beginning of schedule() >>>> just like what the io-worker and system-worker do. No need to find all >>>> the block point. >>>> For (2), we continue the blocked request from where it is blocked when >>>> resource is satisfied. >>>> >>>> What we need to take care is making sure there is only one task >>>> submitting the requests. >>>> >>>> To achieve this, we can maintain a pool of sqthread just like the iowq. >>>> >>>> I've done a very simple/ugly POC to demonstrate this: >>>> >>>> https://github.com/HowHsu/linux/commit/183be142493b5a816b58bd95ae4f0926227b587b >>>> >>>> I also wrote a simple test to test it, which submits two sqes, one >>>> read(pipe), one nop request. The first one will be block since no data >>>> in the pipe. Then a new sqthread was created/waken up to submit the >>>> second one and then some data is written to the pipe(by a unrelated >>>> user thread), soon the first sqthread is waken up and continues the >>>> request. >>>> >>>> If the idea sounds no fatal issue I'll change the POC to real patches. >>>> Any comments are welcome! >>> >>> One thing I've always wanted to try out is kind of similar to this, but >>> a superset of it. Basically io-wq isn't an explicit offload mechanism, >>> it just happens automatically if the issue blocks. This applies to both >>> SQPOLL and non-SQPOLL. >>> >>> This takes a page out of the old syslet/threadlet that Ingo Molnar did >>> way back in the day [1], but it never really went anywhere. But the >>> pass-on-block primitive would apply very nice to io_uring. >> >> I've read a part of the syslet/threadlet patchset, seems it has >> something that I need, my first idea about the new iowq offload is >> just like syslet----if blocked, trigger a new worker, deliver the >> context to it, and then update the current context so that we return >> to the place of sqe submission. But I just didn't know how to do it. [1] > > Exactly, what you mentioned was very close to what I had considered in > the past, and what the syslet/threadlet attempted to do. Except it flips > it upside down a bit, which I do think is probably the saner way to do > it rather than have the original block and fork a new one. Hi Jens, I found that what the syslet does is also block the original and let the new one return to the userspace. If we want to achieve something like [1], we have to consider one problem: Code at block point usually looks like: put task to a wait table schedule move task out from the wait table and code at wakeup point usually looks like: look through the wait table wake up a/all proper tasks the problem is if we deliver the block context to a worker, how can we wake it up since the task in wait table is the original one? What do you think of this? > >> By the way, may I ask why the syslet/threadlet is not merged to the >> mainline. The mail thread is very long, haven't gotten a chance to >> read all of it. > > Not quite sure, it's been a long time. IMHO it's a good idea looking for > the right interface, which we now have. So the time may be ripe to do > something like this, finally. >> >> For the approach I posted, I found it is actually SQPOLL-nonrelated. >> The original conext just wake up a worker in the pool to do the >> submission, and if one blocks, another one wakes up to do the >> submission. It is definitely easier to implement than something like >> syslet(context delivery) since the new worker naturally goes to the >> place of submission thus no context delivery needed. but a downside is >> every time we call io_uring_enter to submit a batch of sqes, there is a >> wakeup at the beginning. >> >> I'll try if I can implement a context delivery version. Based on what I said above, I'm still implementing the old version.. I think the code is useful and can be leveraged to the context delivery version if the latter is viable someday. Regards, Hao > > Sounds good, thanks. > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-07-12 7:12 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-06-20 12:01 [RFC] a new way to achieve asynchronous IO Hao Xu 2022-06-20 12:03 ` Hao Xu 2022-06-20 13:41 ` Jens Axboe 2022-06-21 3:38 ` Hao Xu 2022-06-23 13:31 ` Hao Xu 2022-06-23 14:08 ` Jens Axboe 2022-06-27 7:11 ` Hao Xu 2022-06-28 13:33 ` Hao Xu 2022-07-12 7:11 ` Hao Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox