* [PATCH 0/2] Close a hole where IOCB_NOWAIT reads could sleep @ 2021-07-11 15:09 Matthew Wilcox (Oracle) 2021-07-11 15:09 ` [PATCH 1/2] mm/readahead: Add gfp_flags to ractl Matthew Wilcox (Oracle) ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Matthew Wilcox (Oracle) @ 2021-07-11 15:09 UTC (permalink / raw) To: Jens Axboe Cc: Matthew Wilcox (Oracle), io-uring, linux-mm, linux-fsdevel, linux-aio I noticed a theoretical case where an IOCB_NOWAIT read could sleep: filemap_get_pages filemap_get_read_batch page_cache_sync_readahead page_cache_sync_ra ondemand_readahead do_page_cache_ra page_cache_ra_unbounded gfp_t gfp_mask = readahead_gfp_mask(mapping); memalloc_nofs_save() __page_cache_alloc(gfp_mask); We're in a nofs context, so we're not going to start new IO, but we might wait for writeback to complete. We generally don't want to sleep for IO, particularly not for IO that isn't related to us. Jens, can you run this through your test rig and see if it makes any practical difference? Matthew Wilcox (Oracle) (2): mm/readahead: Add gfp_flags to ractl mm/filemap: Prevent waiting for memory for NOWAIT reads include/linux/pagemap.h | 3 +++ mm/filemap.c | 31 +++++++++++++++++++------------ mm/readahead.c | 16 ++++++++-------- 3 files changed, 30 insertions(+), 20 deletions(-) -- 2.30.2 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] mm/readahead: Add gfp_flags to ractl 2021-07-11 15:09 [PATCH 0/2] Close a hole where IOCB_NOWAIT reads could sleep Matthew Wilcox (Oracle) @ 2021-07-11 15:09 ` Matthew Wilcox (Oracle) 2021-07-12 11:34 ` Christoph Hellwig 2021-07-11 15:09 ` [PATCH 2/2] mm/filemap: Prevent waiting for memory for NOWAIT reads Matthew Wilcox (Oracle) 2021-07-12 1:44 ` [PATCH 0/2] Close a hole where IOCB_NOWAIT reads could sleep Jens Axboe 2 siblings, 1 reply; 8+ messages in thread From: Matthew Wilcox (Oracle) @ 2021-07-11 15:09 UTC (permalink / raw) To: Jens Axboe Cc: Matthew Wilcox (Oracle), io-uring, linux-mm, linux-fsdevel, linux-aio It is currently possible for an I/O request that specifies IOCB_NOWAIT to sleep waiting for I/O to complete in order to allocate pages for readahead. In order to fix that, we need the caller to be able to specify the GFP flags to use for memory allocation in the rest of the readahead path. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> --- include/linux/pagemap.h | 3 +++ mm/readahead.c | 16 ++++++++-------- 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index ed02aa522263..00abb2b2f158 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -833,11 +833,13 @@ static inline int add_to_page_cache(struct page *page, * May be NULL if invoked internally by the filesystem. * @mapping: Readahead this filesystem object. * @ra: File readahead state. May be NULL. + * @gfp_flags: Memory allocation flags to use. */ struct readahead_control { struct file *file; struct address_space *mapping; struct file_ra_state *ra; + gfp_t gfp_flags; /* private: use the readahead_* accessors instead */ pgoff_t _index; unsigned int _nr_pages; @@ -849,6 +851,7 @@ struct readahead_control { .file = f, \ .mapping = m, \ .ra = r, \ + .gfp_flags = readahead_gfp_mask(m), \ ._index = i, \ } diff --git a/mm/readahead.c b/mm/readahead.c index d589f147f4c2..58937d1fe3f7 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -177,7 +177,6 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, struct address_space *mapping = ractl->mapping; unsigned long index = readahead_index(ractl); LIST_HEAD(page_pool); - gfp_t gfp_mask = readahead_gfp_mask(mapping); unsigned long i; /* @@ -212,14 +211,14 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, continue; } - page = __page_cache_alloc(gfp_mask); + page = __page_cache_alloc(ractl->gfp_flags); if (!page) break; if (mapping->a_ops->readpages) { page->index = index + i; list_add(&page->lru, &page_pool); } else if (add_to_page_cache_lru(page, mapping, index + i, - gfp_mask) < 0) { + ractl->gfp_flags) < 0) { put_page(page); read_pages(ractl, &page_pool, true); i = ractl->_index + ractl->_nr_pages - index - 1; @@ -663,7 +662,6 @@ void readahead_expand(struct readahead_control *ractl, struct address_space *mapping = ractl->mapping; struct file_ra_state *ra = ractl->ra; pgoff_t new_index, new_nr_pages; - gfp_t gfp_mask = readahead_gfp_mask(mapping); new_index = new_start / PAGE_SIZE; @@ -675,10 +673,11 @@ void readahead_expand(struct readahead_control *ractl, if (page && !xa_is_value(page)) return; /* Page apparently present */ - page = __page_cache_alloc(gfp_mask); + page = __page_cache_alloc(ractl->gfp_flags); if (!page) return; - if (add_to_page_cache_lru(page, mapping, index, gfp_mask) < 0) { + if (add_to_page_cache_lru(page, mapping, index, + ractl->gfp_flags) < 0) { put_page(page); return; } @@ -698,10 +697,11 @@ void readahead_expand(struct readahead_control *ractl, if (page && !xa_is_value(page)) return; /* Page apparently present */ - page = __page_cache_alloc(gfp_mask); + page = __page_cache_alloc(ractl->gfp_flags); if (!page) return; - if (add_to_page_cache_lru(page, mapping, index, gfp_mask) < 0) { + if (add_to_page_cache_lru(page, mapping, index, + ractl->gfp_flags) < 0) { put_page(page); return; } -- 2.30.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] mm/readahead: Add gfp_flags to ractl 2021-07-11 15:09 ` [PATCH 1/2] mm/readahead: Add gfp_flags to ractl Matthew Wilcox (Oracle) @ 2021-07-12 11:34 ` Christoph Hellwig 2021-07-12 14:37 ` Matthew Wilcox 0 siblings, 1 reply; 8+ messages in thread From: Christoph Hellwig @ 2021-07-12 11:34 UTC (permalink / raw) To: Matthew Wilcox (Oracle) Cc: Jens Axboe, io-uring, linux-mm, linux-fsdevel, linux-aio On Sun, Jul 11, 2021 at 04:09:26PM +0100, Matthew Wilcox (Oracle) wrote: > It is currently possible for an I/O request that specifies IOCB_NOWAIT > to sleep waiting for I/O to complete in order to allocate pages for > readahead. In order to fix that, we need the caller to be able to > specify the GFP flags to use for memory allocation in the rest of the > readahead path. The file systems also need to respect it for their bio or private data allocation. And be able to cope with failure, which they currently don't have to for sleeping bio allocations. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] mm/readahead: Add gfp_flags to ractl 2021-07-12 11:34 ` Christoph Hellwig @ 2021-07-12 14:37 ` Matthew Wilcox 0 siblings, 0 replies; 8+ messages in thread From: Matthew Wilcox @ 2021-07-12 14:37 UTC (permalink / raw) To: Christoph Hellwig Cc: Jens Axboe, io-uring, linux-mm, linux-fsdevel, linux-aio On Mon, Jul 12, 2021 at 12:34:23PM +0100, Christoph Hellwig wrote: > On Sun, Jul 11, 2021 at 04:09:26PM +0100, Matthew Wilcox (Oracle) wrote: > > It is currently possible for an I/O request that specifies IOCB_NOWAIT > > to sleep waiting for I/O to complete in order to allocate pages for > > readahead. In order to fix that, we need the caller to be able to > > specify the GFP flags to use for memory allocation in the rest of the > > readahead path. > > The file systems also need to respect it for their bio or private > data allocation. And be able to cope with failure, which they currently > don't have to for sleeping bio allocations. Yes, they should. This patch doesn't make that problem worse than it is today, and gets the desired GFP flags down to the file systems, which is needed for the full fix. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] mm/filemap: Prevent waiting for memory for NOWAIT reads 2021-07-11 15:09 [PATCH 0/2] Close a hole where IOCB_NOWAIT reads could sleep Matthew Wilcox (Oracle) 2021-07-11 15:09 ` [PATCH 1/2] mm/readahead: Add gfp_flags to ractl Matthew Wilcox (Oracle) @ 2021-07-11 15:09 ` Matthew Wilcox (Oracle) 2021-07-12 1:44 ` [PATCH 0/2] Close a hole where IOCB_NOWAIT reads could sleep Jens Axboe 2 siblings, 0 replies; 8+ messages in thread From: Matthew Wilcox (Oracle) @ 2021-07-11 15:09 UTC (permalink / raw) To: Jens Axboe Cc: Matthew Wilcox (Oracle), io-uring, linux-mm, linux-fsdevel, linux-aio Readahead memory allocations won't block for much today, as they're already marked as NOFS and NORETRY, but they can still sleep, and they shouldn't if the read is marked as IOCB_NOWAIT. Clearing the DIRECT_RECLAIM flag will prevent sleeping. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> --- mm/filemap.c | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index d1458ecf2f51..2be27b686518 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2435,21 +2435,27 @@ static int filemap_create_page(struct file *file, static int filemap_readahead(struct kiocb *iocb, struct file *file, struct address_space *mapping, struct page *page, - pgoff_t last_index) + pgoff_t index, pgoff_t last_index) { + DEFINE_READAHEAD(ractl, file, &file->f_ra, mapping, index); + if (iocb->ki_flags & IOCB_NOIO) return -EAGAIN; - page_cache_async_readahead(mapping, &file->f_ra, file, page, - page->index, last_index - page->index); + if (iocb->ki_flags & IOCB_NOWAIT) + ractl.gfp_flags &= ~__GFP_DIRECT_RECLAIM; + + if (page) + page_cache_async_ra(&ractl, page, last_index - index); + else + page_cache_sync_ra(&ractl, last_index - index); return 0; } static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter, struct pagevec *pvec) { - struct file *filp = iocb->ki_filp; - struct address_space *mapping = filp->f_mapping; - struct file_ra_state *ra = &filp->f_ra; + struct file *file = iocb->ki_filp; + struct address_space *mapping = file->f_mapping; pgoff_t index = iocb->ki_pos >> PAGE_SHIFT; pgoff_t last_index; struct page *page; @@ -2462,16 +2468,16 @@ static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter, filemap_get_read_batch(mapping, index, last_index, pvec); if (!pagevec_count(pvec)) { - if (iocb->ki_flags & IOCB_NOIO) - return -EAGAIN; - page_cache_sync_readahead(mapping, ra, filp, index, - last_index - index); + err = filemap_readahead(iocb, file, mapping, NULL, index, + last_index); + if (err) + return err; filemap_get_read_batch(mapping, index, last_index, pvec); } if (!pagevec_count(pvec)) { if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ)) return -EAGAIN; - err = filemap_create_page(filp, mapping, + err = filemap_create_page(file, mapping, iocb->ki_pos >> PAGE_SHIFT, pvec); if (err == AOP_TRUNCATED_PAGE) goto retry; @@ -2480,7 +2486,8 @@ static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter, page = pvec->pages[pagevec_count(pvec) - 1]; if (PageReadahead(page)) { - err = filemap_readahead(iocb, filp, mapping, page, last_index); + err = filemap_readahead(iocb, file, mapping, page, page->index, + last_index); if (err) goto err; } -- 2.30.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] Close a hole where IOCB_NOWAIT reads could sleep 2021-07-11 15:09 [PATCH 0/2] Close a hole where IOCB_NOWAIT reads could sleep Matthew Wilcox (Oracle) 2021-07-11 15:09 ` [PATCH 1/2] mm/readahead: Add gfp_flags to ractl Matthew Wilcox (Oracle) 2021-07-11 15:09 ` [PATCH 2/2] mm/filemap: Prevent waiting for memory for NOWAIT reads Matthew Wilcox (Oracle) @ 2021-07-12 1:44 ` Jens Axboe 2021-07-24 18:22 ` Matthew Wilcox 2 siblings, 1 reply; 8+ messages in thread From: Jens Axboe @ 2021-07-12 1:44 UTC (permalink / raw) To: Matthew Wilcox (Oracle); +Cc: io-uring, linux-mm, linux-fsdevel, linux-aio On 7/11/21 9:09 AM, Matthew Wilcox (Oracle) wrote: > I noticed a theoretical case where an IOCB_NOWAIT read could sleep: > > filemap_get_pages > filemap_get_read_batch > page_cache_sync_readahead > page_cache_sync_ra > ondemand_readahead > do_page_cache_ra > page_cache_ra_unbounded > gfp_t gfp_mask = readahead_gfp_mask(mapping); > memalloc_nofs_save() > __page_cache_alloc(gfp_mask); > > We're in a nofs context, so we're not going to start new IO, but we might > wait for writeback to complete. We generally don't want to sleep for IO, > particularly not for IO that isn't related to us. > > Jens, can you run this through your test rig and see if it makes any > practical difference? You bet, I'll see if I can trigger this condition and verify we're no longer blocking on writeback. Thanks for hacking this up. -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] Close a hole where IOCB_NOWAIT reads could sleep 2021-07-12 1:44 ` [PATCH 0/2] Close a hole where IOCB_NOWAIT reads could sleep Jens Axboe @ 2021-07-24 18:22 ` Matthew Wilcox 2021-07-24 19:46 ` Jens Axboe 0 siblings, 1 reply; 8+ messages in thread From: Matthew Wilcox @ 2021-07-24 18:22 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, linux-mm, linux-fsdevel, linux-aio On Sun, Jul 11, 2021 at 07:44:07PM -0600, Jens Axboe wrote: > On 7/11/21 9:09 AM, Matthew Wilcox (Oracle) wrote: > > I noticed a theoretical case where an IOCB_NOWAIT read could sleep: > > > > filemap_get_pages > > filemap_get_read_batch > > page_cache_sync_readahead > > page_cache_sync_ra > > ondemand_readahead > > do_page_cache_ra > > page_cache_ra_unbounded > > gfp_t gfp_mask = readahead_gfp_mask(mapping); > > memalloc_nofs_save() > > __page_cache_alloc(gfp_mask); > > > > We're in a nofs context, so we're not going to start new IO, but we might > > wait for writeback to complete. We generally don't want to sleep for IO, > > particularly not for IO that isn't related to us. > > > > Jens, can you run this through your test rig and see if it makes any > > practical difference? > > You bet, I'll see if I can trigger this condition and verify we're no > longer blocking on writeback. Thanks for hacking this up. Did you have any success yet? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] Close a hole where IOCB_NOWAIT reads could sleep 2021-07-24 18:22 ` Matthew Wilcox @ 2021-07-24 19:46 ` Jens Axboe 0 siblings, 0 replies; 8+ messages in thread From: Jens Axboe @ 2021-07-24 19:46 UTC (permalink / raw) To: Matthew Wilcox; +Cc: io-uring, linux-mm, linux-fsdevel, linux-aio On 7/24/21 12:22 PM, Matthew Wilcox wrote: > On Sun, Jul 11, 2021 at 07:44:07PM -0600, Jens Axboe wrote: >> On 7/11/21 9:09 AM, Matthew Wilcox (Oracle) wrote: >>> I noticed a theoretical case where an IOCB_NOWAIT read could sleep: >>> >>> filemap_get_pages >>> filemap_get_read_batch >>> page_cache_sync_readahead >>> page_cache_sync_ra >>> ondemand_readahead >>> do_page_cache_ra >>> page_cache_ra_unbounded >>> gfp_t gfp_mask = readahead_gfp_mask(mapping); >>> memalloc_nofs_save() >>> __page_cache_alloc(gfp_mask); >>> >>> We're in a nofs context, so we're not going to start new IO, but we might >>> wait for writeback to complete. We generally don't want to sleep for IO, >>> particularly not for IO that isn't related to us. >>> >>> Jens, can you run this through your test rig and see if it makes any >>> practical difference? >> >> You bet, I'll see if I can trigger this condition and verify we're no >> longer blocking on writeback. Thanks for hacking this up. > > Did you have any success yet? Sorry forgot to report back - I did run some testing last week, and didn't manage to make it hit the blocking condition. Did various read/write mix on the same file, made sure there was memory pressure, etc. I'll give it another go, please let me know if you have an idea on how to make this easier to hit... I know it's one of those things that you hit all the time in certain workloads (hence why I would love to see it get fixed), but I just didn't manage to provoke it when I tried. -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-07-24 19:46 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-07-11 15:09 [PATCH 0/2] Close a hole where IOCB_NOWAIT reads could sleep Matthew Wilcox (Oracle) 2021-07-11 15:09 ` [PATCH 1/2] mm/readahead: Add gfp_flags to ractl Matthew Wilcox (Oracle) 2021-07-12 11:34 ` Christoph Hellwig 2021-07-12 14:37 ` Matthew Wilcox 2021-07-11 15:09 ` [PATCH 2/2] mm/filemap: Prevent waiting for memory for NOWAIT reads Matthew Wilcox (Oracle) 2021-07-12 1:44 ` [PATCH 0/2] Close a hole where IOCB_NOWAIT reads could sleep Jens Axboe 2021-07-24 18:22 ` Matthew Wilcox 2021-07-24 19:46 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox