* [PATCH 0/2] Remove kiocb ki_complete @ 2020-07-08 22:26 Matthew Wilcox (Oracle) 2020-07-08 22:26 ` [PATCH 1/2] fs: Abstract calling the kiocb completion function Matthew Wilcox (Oracle) ` (3 more replies) 0 siblings, 4 replies; 21+ messages in thread From: Matthew Wilcox (Oracle) @ 2020-07-08 22:26 UTC (permalink / raw) To: Jens Axboe Cc: Matthew Wilcox (Oracle), linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez Save a pointer in the kiocb by using a few bits in ki_flags to index a table of completion functions. Matthew Wilcox (Oracle) (2): fs: Abstract calling the kiocb completion function fs: Remove kiocb->ki_complete crypto/af_alg.c | 2 +- drivers/block/loop.c | 14 +++++++-- drivers/nvme/target/core.c | 10 +++++- drivers/nvme/target/io-cmd-file.c | 10 +++--- drivers/nvme/target/nvmet.h | 2 ++ drivers/target/target_core_file.c | 20 ++++++++++-- drivers/usb/gadget/function/f_fs.c | 2 +- drivers/usb/gadget/legacy/inode.c | 4 +-- fs/aio.c | 50 ++++++++++++++++-------------- fs/block_dev.c | 2 +- fs/ceph/file.c | 2 +- fs/cifs/file.c | 8 ++--- fs/direct-io.c | 2 +- fs/fuse/file.c | 2 +- fs/io_uring.c | 14 ++++++--- fs/iomap/direct-io.c | 3 +- fs/nfs/direct.c | 2 +- fs/ocfs2/file.c | 7 +++-- fs/overlayfs/file.c | 17 +++++++--- fs/read_write.c | 36 +++++++++++++++++++++ include/linux/fs.h | 23 ++++++++++++-- 21 files changed, 168 insertions(+), 64 deletions(-) -- 2.27.0 ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 1/2] fs: Abstract calling the kiocb completion function 2020-07-08 22:26 [PATCH 0/2] Remove kiocb ki_complete Matthew Wilcox (Oracle) @ 2020-07-08 22:26 ` Matthew Wilcox (Oracle) 2020-07-08 22:37 ` Jens Axboe 2020-07-08 22:26 ` [PATCH 2/2] fs: Remove kiocb->ki_complete Matthew Wilcox (Oracle) ` (2 subsequent siblings) 3 siblings, 1 reply; 21+ messages in thread From: Matthew Wilcox (Oracle) @ 2020-07-08 22:26 UTC (permalink / raw) To: Jens Axboe Cc: Matthew Wilcox (Oracle), linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez Introduce complete_kiocb() to abstract the calls to ->ki_complete(). Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> --- crypto/af_alg.c | 2 +- drivers/block/loop.c | 2 +- drivers/usb/gadget/function/f_fs.c | 2 +- drivers/usb/gadget/legacy/inode.c | 4 ++-- fs/aio.c | 2 +- fs/block_dev.c | 2 +- fs/ceph/file.c | 2 +- fs/cifs/file.c | 4 ++-- fs/direct-io.c | 2 +- fs/fuse/file.c | 2 +- fs/io_uring.c | 2 +- fs/iomap/direct-io.c | 3 +-- fs/nfs/direct.c | 2 +- fs/overlayfs/file.c | 2 +- fs/read_write.c | 6 ++++++ include/linux/fs.h | 2 ++ 16 files changed, 24 insertions(+), 17 deletions(-) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index b1cd3535c525..590dbbcd0e9f 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -1045,7 +1045,7 @@ void af_alg_async_cb(struct crypto_async_request *_req, int err) af_alg_free_resources(areq); sock_put(sk); - iocb->ki_complete(iocb, err ? err : (int)resultlen, 0); + complete_kiocb(iocb, err ? err : (int)resultlen, 0); } EXPORT_SYMBOL_GPL(af_alg_async_cb); diff --git a/drivers/block/loop.c b/drivers/block/loop.c index a943207705dd..f7a76e82c88c 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -591,7 +591,7 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, kthread_associate_blkcg(NULL); if (ret != -EIOCBQUEUED) - cmd->iocb.ki_complete(&cmd->iocb, ret, 0); + complete_kiocb(&cmd->iocb, ret, 0); return 0; } diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c index 490d353d5fde..6b0c128fdb7e 100644 --- a/drivers/usb/gadget/function/f_fs.c +++ b/drivers/usb/gadget/function/f_fs.c @@ -829,7 +829,7 @@ static void ffs_user_copy_worker(struct work_struct *work) kthread_unuse_mm(io_data->mm); } - io_data->kiocb->ki_complete(io_data->kiocb, ret, ret); + complete_kiocb(io_data->kiocb, ret, ret); if (io_data->ffs->ffs_eventfd && !kiocb_has_eventfd) eventfd_signal(io_data->ffs->ffs_eventfd, 1); diff --git a/drivers/usb/gadget/legacy/inode.c b/drivers/usb/gadget/legacy/inode.c index 9ee0bfe7bcda..d16ce03c872a 100644 --- a/drivers/usb/gadget/legacy/inode.c +++ b/drivers/usb/gadget/legacy/inode.c @@ -469,7 +469,7 @@ static void ep_user_copy_worker(struct work_struct *work) ret = -EFAULT; /* completing the iocb can drop the ctx and mm, don't touch mm after */ - iocb->ki_complete(iocb, ret, ret); + complete_kiocb(iocb, ret, ret); kfree(priv->buf); kfree(priv->to_free); @@ -498,7 +498,7 @@ static void ep_aio_complete(struct usb_ep *ep, struct usb_request *req) iocb->private = NULL; /* aio_complete() reports bytes-transferred _and_ faults */ - iocb->ki_complete(iocb, req->actual ? req->actual : req->status, + complete_kiocb(iocb, req->actual ? req->actual : req->status, req->status); } else { /* ep_copy_to_user() won't report both; we hide some faults */ diff --git a/fs/aio.c b/fs/aio.c index 91e7cc4a9f17..ca3b123d83f7 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1513,7 +1513,7 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret) ret = -EINTR; /*FALLTHRU*/ default: - req->ki_complete(req, ret, 0); + complete_kiocb(req, ret, 0); } } diff --git a/fs/block_dev.c b/fs/block_dev.c index 182ddf82938f..76c55cd14ad1 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -304,7 +304,7 @@ static void blkdev_bio_end_io(struct bio *bio) ret = blk_status_to_errno(dio->bio.bi_status); } - dio->iocb->ki_complete(iocb, ret, 0); + complete_kiocb(iocb, ret, 0); if (dio->multi_bio) bio_put(&dio->bio); } else { diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 160644ddaeed..4e379c385202 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1040,7 +1040,7 @@ static void ceph_aio_complete(struct inode *inode, ceph_put_cap_refs(ci, (aio_req->write ? CEPH_CAP_FILE_WR : CEPH_CAP_FILE_RD)); - aio_req->iocb->ki_complete(aio_req->iocb, ret, 0); + complete_kiocb(aio_req->iocb, ret, 0); ceph_free_cap_flush(aio_req->prealloc_cf); kfree(aio_req); diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 9b0f8f33f832..cbf36a8a23aa 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -3116,7 +3116,7 @@ static void collect_uncached_write_data(struct cifs_aio_ctx *ctx) mutex_unlock(&ctx->aio_mutex); if (ctx->iocb && ctx->iocb->ki_complete) - ctx->iocb->ki_complete(ctx->iocb, ctx->rc, 0); + complete_kiocb(ctx->iocb, ctx->rc, 0); else complete(&ctx->done); } @@ -3849,7 +3849,7 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx) mutex_unlock(&ctx->aio_mutex); if (ctx->iocb && ctx->iocb->ki_complete) - ctx->iocb->ki_complete(ctx->iocb, ctx->rc, 0); + complete_kiocb(ctx->iocb, ctx->rc, 0); else complete(&ctx->done); } diff --git a/fs/direct-io.c b/fs/direct-io.c index 183299892465..b17dceaeebe5 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -308,7 +308,7 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, unsigned int flags) if (ret > 0 && dio->op == REQ_OP_WRITE) ret = generic_write_sync(dio->iocb, ret); - dio->iocb->ki_complete(dio->iocb, ret, 0); + complete_kiocb(dio->iocb, ret, 0); } kmem_cache_free(dio_cache, dio); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index e573b0cd2737..7eac77d950fb 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -655,7 +655,7 @@ static void fuse_aio_complete(struct fuse_io_priv *io, int err, ssize_t pos) spin_unlock(&fi->lock); } - io->iocb->ki_complete(io->iocb, res, 0); + complete_kiocb(io->iocb, res, 0); } kref_put(&io->refcnt, fuse_io_release); diff --git a/fs/io_uring.c b/fs/io_uring.c index 6e3169834bf7..f06915fcb6b6 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2469,7 +2469,7 @@ static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret) ret = -EINTR; /* fall through */ default: - kiocb->ki_complete(kiocb, ret, 0); + complete_kiocb(kiocb, ret, 0); } } diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index ec7b78e6feca..3b7edd9c3c11 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -133,9 +133,8 @@ static ssize_t iomap_dio_complete(struct iomap_dio *dio) static void iomap_dio_complete_work(struct work_struct *work) { struct iomap_dio *dio = container_of(work, struct iomap_dio, aio.work); - struct kiocb *iocb = dio->iocb; - iocb->ki_complete(iocb, iomap_dio_complete(dio), 0); + complete_kiocb(dio->iocb, iomap_dio_complete(dio), 0); } /* diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 3d113cf8908a..498a58b989cf 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -273,7 +273,7 @@ static void nfs_direct_complete(struct nfs_direct_req *dreq) res = (long) dreq->count; WARN_ON_ONCE(dreq->count < 0); } - dreq->iocb->ki_complete(dreq->iocb, res, 0); + complete_kiocb(dreq->iocb, res, 0); } complete(&dreq->completion); diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index 01820e654a21..78e7439fc4e2 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -275,7 +275,7 @@ static void ovl_aio_rw_complete(struct kiocb *iocb, long res, long res2) struct kiocb *orig_iocb = aio_req->orig_iocb; ovl_aio_cleanup_handler(aio_req); - orig_iocb->ki_complete(orig_iocb, res, res2); + complete_kiocb(orig_iocb, res, res2); } static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter) diff --git a/fs/read_write.c b/fs/read_write.c index bbfa9b12b15e..89151de19f77 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -363,6 +363,12 @@ SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high, } #endif +void complete_kiocb(struct kiocb *iocb, long ret, long ret2) +{ + iocb->ki_complete(iocb, ret, ret2); +} +EXPORT_SYMBOL(complete_kiocb); + int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t count) { struct inode *inode; diff --git a/include/linux/fs.h b/include/linux/fs.h index da90323b9f92..846135aa328d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -346,6 +346,8 @@ static inline bool is_sync_kiocb(struct kiocb *kiocb) return kiocb->ki_complete == NULL; } +void complete_kiocb(struct kiocb *kiocb, long ret, long ret2); + /* * "descriptor" for what we're up to with a read. * This allows us to use the same read code yet -- 2.27.0 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] fs: Abstract calling the kiocb completion function 2020-07-08 22:26 ` [PATCH 1/2] fs: Abstract calling the kiocb completion function Matthew Wilcox (Oracle) @ 2020-07-08 22:37 ` Jens Axboe 2020-07-08 22:40 ` Matthew Wilcox 0 siblings, 1 reply; 21+ messages in thread From: Jens Axboe @ 2020-07-08 22:37 UTC (permalink / raw) To: Matthew Wilcox (Oracle) Cc: linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez On 7/8/20 4:26 PM, Matthew Wilcox (Oracle) wrote: > diff --git a/crypto/af_alg.c b/crypto/af_alg.c > index b1cd3535c525..590dbbcd0e9f 100644 > --- a/crypto/af_alg.c > +++ b/crypto/af_alg.c > @@ -1045,7 +1045,7 @@ void af_alg_async_cb(struct crypto_async_request *_req, int err) > af_alg_free_resources(areq); > sock_put(sk); > > - iocb->ki_complete(iocb, err ? err : (int)resultlen, 0); > + complete_kiocb(iocb, err ? err : (int)resultlen, 0); I'd prefer having it called kiocb_complete(), seems more in line with what you'd expect in terms of naming for an exported interface. -- Jens Axboe ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] fs: Abstract calling the kiocb completion function 2020-07-08 22:37 ` Jens Axboe @ 2020-07-08 22:40 ` Matthew Wilcox 2020-07-08 22:50 ` Jens Axboe 0 siblings, 1 reply; 21+ messages in thread From: Matthew Wilcox @ 2020-07-08 22:40 UTC (permalink / raw) To: Jens Axboe Cc: linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez On Wed, Jul 08, 2020 at 04:37:21PM -0600, Jens Axboe wrote: > On 7/8/20 4:26 PM, Matthew Wilcox (Oracle) wrote: > > diff --git a/crypto/af_alg.c b/crypto/af_alg.c > > index b1cd3535c525..590dbbcd0e9f 100644 > > --- a/crypto/af_alg.c > > +++ b/crypto/af_alg.c > > @@ -1045,7 +1045,7 @@ void af_alg_async_cb(struct crypto_async_request *_req, int err) > > af_alg_free_resources(areq); > > sock_put(sk); > > > > - iocb->ki_complete(iocb, err ? err : (int)resultlen, 0); > > + complete_kiocb(iocb, err ? err : (int)resultlen, 0); > > I'd prefer having it called kiocb_complete(), seems more in line with > what you'd expect in terms of naming for an exported interface. Happy to make that change. It seemed like you preferred the opposite way round with is_sync_kiocb() and init_sync_kiocb() already existing. Should I switch register_kiocb_completion and unregister_kiocb_completion to kiocb_completion_register or kiocb_register_completion? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] fs: Abstract calling the kiocb completion function 2020-07-08 22:40 ` Matthew Wilcox @ 2020-07-08 22:50 ` Jens Axboe 0 siblings, 0 replies; 21+ messages in thread From: Jens Axboe @ 2020-07-08 22:50 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez On 7/8/20 4:40 PM, Matthew Wilcox wrote: > On Wed, Jul 08, 2020 at 04:37:21PM -0600, Jens Axboe wrote: >> On 7/8/20 4:26 PM, Matthew Wilcox (Oracle) wrote: >>> diff --git a/crypto/af_alg.c b/crypto/af_alg.c >>> index b1cd3535c525..590dbbcd0e9f 100644 >>> --- a/crypto/af_alg.c >>> +++ b/crypto/af_alg.c >>> @@ -1045,7 +1045,7 @@ void af_alg_async_cb(struct crypto_async_request *_req, int err) >>> af_alg_free_resources(areq); >>> sock_put(sk); >>> >>> - iocb->ki_complete(iocb, err ? err : (int)resultlen, 0); >>> + complete_kiocb(iocb, err ? err : (int)resultlen, 0); >> >> I'd prefer having it called kiocb_complete(), seems more in line with >> what you'd expect in terms of naming for an exported interface. > > Happy to make that change. It seemed like you preferred the opposite > way round with is_sync_kiocb() and init_sync_kiocb() already existing. > > Should I switch register_kiocb_completion and unregister_kiocb_completion > to kiocb_completion_register or kiocb_register_completion? I prefer the latter here, as per the other email. But as long as kiocb_ is the prefix, I don't really care that much. The latter is how you'd say it to, while the former sounds a bit yoda'ish. -- Jens Axboe ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 2/2] fs: Remove kiocb->ki_complete 2020-07-08 22:26 [PATCH 0/2] Remove kiocb ki_complete Matthew Wilcox (Oracle) 2020-07-08 22:26 ` [PATCH 1/2] fs: Abstract calling the kiocb completion function Matthew Wilcox (Oracle) @ 2020-07-08 22:26 ` Matthew Wilcox (Oracle) 2020-07-08 22:38 ` Jens Axboe ` (2 more replies) 2020-07-08 22:33 ` [PATCH 0/2] Remove kiocb ki_complete Jens Axboe 2020-07-09 10:17 ` Christoph Hellwig 3 siblings, 3 replies; 21+ messages in thread From: Matthew Wilcox (Oracle) @ 2020-07-08 22:26 UTC (permalink / raw) To: Jens Axboe Cc: Matthew Wilcox (Oracle), linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez Use a few bits of ki_flags to indicate which completion function to call. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> --- drivers/block/loop.c | 12 ++++++-- drivers/nvme/target/core.c | 10 ++++++- drivers/nvme/target/io-cmd-file.c | 10 +++---- drivers/nvme/target/nvmet.h | 2 ++ drivers/target/target_core_file.c | 20 +++++++++++-- fs/aio.c | 48 +++++++++++++++++-------------- fs/cifs/file.c | 4 +-- fs/io_uring.c | 12 ++++++-- fs/ocfs2/file.c | 7 +++-- fs/overlayfs/file.c | 15 ++++++++-- fs/read_write.c | 32 ++++++++++++++++++++- include/linux/fs.h | 21 ++++++++++++-- 12 files changed, 145 insertions(+), 48 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index f7a76e82c88c..6bd6e55f3e17 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -513,6 +513,7 @@ static void lo_rw_aio_do_completion(struct loop_cmd *cmd) blk_mq_complete_request(rq); } +static int lo_rw_aio_complete_id; static void lo_rw_aio_complete(struct kiocb *iocb, long ret, long ret2) { struct loop_cmd *cmd = container_of(iocb, struct loop_cmd, iocb); @@ -576,8 +577,8 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, cmd->iocb.ki_pos = pos; cmd->iocb.ki_filp = file; - cmd->iocb.ki_complete = lo_rw_aio_complete; cmd->iocb.ki_flags = IOCB_DIRECT; + kiocb_set_completion(&cmd->iocb, lo_rw_aio_complete_id); cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0); if (cmd->css) kthread_associate_blkcg(cmd->css); @@ -2362,10 +2363,14 @@ static int __init loop_init(void) range = 1UL << MINORBITS; } - err = misc_register(&loop_misc); + err = register_kiocb_completion(lo_rw_aio_complete); if (err < 0) goto err_out; + lo_rw_aio_complete_id = err; + err = misc_register(&loop_misc); + if (err < 0) + goto kiocb_out; if (register_blkdev(LOOP_MAJOR, "loop")) { err = -EIO; @@ -2386,6 +2391,8 @@ static int __init loop_init(void) misc_out: misc_deregister(&loop_misc); +kiocb_out: + unregister_kiocb_completion(lo_rw_aio_complete_id); err_out: return err; } @@ -2413,6 +2420,7 @@ static void __exit loop_exit(void) unregister_blkdev(LOOP_MAJOR, "loop"); misc_deregister(&loop_misc); + unregister_kiocb_completion(lo_rw_aio_complete_id); mutex_unlock(&loop_ctl_mutex); } diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index 6816507fba58..8b622641c667 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -1563,11 +1563,16 @@ static int __init nvmet_init(void) nvmet_ana_group_enabled[NVMET_DEFAULT_ANA_GRPID] = 1; + error = register_kiocb_completion(nvmet_file_io_done); + if (error) + goto out; + nvmet_file_io_done_id = error; + buffered_io_wq = alloc_workqueue("nvmet-buffered-io-wq", WQ_MEM_RECLAIM, 0); if (!buffered_io_wq) { error = -ENOMEM; - goto out; + goto out_kiocb; } error = nvmet_init_discovery(); @@ -1583,6 +1588,8 @@ static int __init nvmet_init(void) nvmet_exit_discovery(); out_free_work_queue: destroy_workqueue(buffered_io_wq); +out_kiocb: + unregister_kiocb_completion(nvmet_file_io_done_id); out: return error; } @@ -1593,6 +1600,7 @@ static void __exit nvmet_exit(void) nvmet_exit_discovery(); ida_destroy(&cntlid_ida); destroy_workqueue(buffered_io_wq); + unregister_kiocb_completion(nvmet_file_io_done_id); BUILD_BUG_ON(sizeof(struct nvmf_disc_rsp_page_entry) != 1024); BUILD_BUG_ON(sizeof(struct nvmf_disc_rsp_page_hdr) != 1024); diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c index 0abbefd9925e..5884039e28e1 100644 --- a/drivers/nvme/target/io-cmd-file.c +++ b/drivers/nvme/target/io-cmd-file.c @@ -123,7 +123,8 @@ static ssize_t nvmet_file_submit_bvec(struct nvmet_req *req, loff_t pos, return call_iter(iocb, &iter); } -static void nvmet_file_io_done(struct kiocb *iocb, long ret, long ret2) +int nvmet_file_io_done_id; +void nvmet_file_io_done(struct kiocb *iocb, long ret, long ret2) { struct nvmet_req *req = container_of(iocb, struct nvmet_req, f.iocb); u16 status = NVME_SC_SUCCESS; @@ -192,12 +193,9 @@ static bool nvmet_file_execute_io(struct nvmet_req *req, int ki_flags) goto complete; } - /* - * A NULL ki_complete ask for synchronous execution, which we want - * for the IOCB_NOWAIT case. - */ + /* No completion means synchronous execution */ if (!(ki_flags & IOCB_NOWAIT)) - req->f.iocb.ki_complete = nvmet_file_io_done; + kiocb_set_completion(&req->f.iocb, nvmet_file_io_done_id); ret = nvmet_file_submit_bvec(req, pos, bv_cnt, total_len, ki_flags); diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h index 809691291e73..d42c8b3bcdb5 100644 --- a/drivers/nvme/target/nvmet.h +++ b/drivers/nvme/target/nvmet.h @@ -340,6 +340,8 @@ struct nvmet_req { }; extern struct workqueue_struct *buffered_io_wq; +extern int nvmet_file_io_done_id; +extern void nvmet_file_io_done(struct kiocb *, long, long); static inline void nvmet_set_result(struct nvmet_req *req, u32 result) { diff --git a/drivers/target/target_core_file.c b/drivers/target/target_core_file.c index 7143d03f0e02..acae6a159e91 100644 --- a/drivers/target/target_core_file.c +++ b/drivers/target/target_core_file.c @@ -243,6 +243,7 @@ struct target_core_file_cmd { struct kiocb iocb; }; +static int cmd_rw_aio_complete_id; static void cmd_rw_aio_complete(struct kiocb *iocb, long ret, long ret2) { struct target_core_file_cmd *cmd; @@ -296,8 +297,8 @@ fd_execute_rw_aio(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, aio_cmd->len = len; aio_cmd->iocb.ki_pos = cmd->t_task_lba * dev->dev_attrib.block_size; aio_cmd->iocb.ki_filp = file; - aio_cmd->iocb.ki_complete = cmd_rw_aio_complete; aio_cmd->iocb.ki_flags = IOCB_DIRECT; + kiocb_set_completion(&aio_cmd->iocb, cmd_rw_aio_complete_id); if (is_write && (cmd->se_cmd_flags & SCF_FUA)) aio_cmd->iocb.ki_flags |= IOCB_DSYNC; @@ -945,12 +946,27 @@ static const struct target_backend_ops fileio_ops = { static int __init fileio_module_init(void) { - return transport_backend_register(&fileio_ops); + int err; + + err = register_kiocb_completion(cmd_rw_aio_complete); + if (err < 0) + return err; + cmd_rw_aio_complete_id = err; + + err = transport_backend_register(&fileio_ops); + if (err) + goto out_kiocb; + return 0; + +out_kiocb: + unregister_kiocb_completion(cmd_rw_aio_complete_id); + return err; } static void __exit fileio_module_exit(void) { target_backend_unregister(&fileio_ops); + unregister_kiocb_completion(cmd_rw_aio_complete_id); } MODULE_DESCRIPTION("TCM FILEIO subsystem plugin"); diff --git a/fs/aio.c b/fs/aio.c index ca3b123d83f7..135f278fffd9 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -258,27 +258,6 @@ static int aio_init_fs_context(struct fs_context *fc) return 0; } -/* aio_setup - * Creates the slab caches used by the aio routines, panic on - * failure as this is done early during the boot sequence. - */ -static int __init aio_setup(void) -{ - static struct file_system_type aio_fs = { - .name = "aio", - .init_fs_context = aio_init_fs_context, - .kill_sb = kill_anon_super, - }; - aio_mnt = kern_mount(&aio_fs); - if (IS_ERR(aio_mnt)) - panic("Failed to create aio fs mount."); - - kiocb_cachep = KMEM_CACHE(aio_kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC); - kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC); - return 0; -} -__initcall(aio_setup); - static void put_aio_ring_file(struct kioctx *ctx) { struct file *aio_ring_file = ctx->aio_ring_file; @@ -1418,6 +1397,7 @@ static void aio_remove_iocb(struct aio_kiocb *iocb) spin_unlock_irqrestore(&ctx->ctx_lock, flags); } +static int aio_complete_rw_id; static void aio_complete_rw(struct kiocb *kiocb, long res, long res2) { struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw); @@ -1446,10 +1426,10 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) { int ret; - req->ki_complete = aio_complete_rw; req->private = NULL; req->ki_pos = iocb->aio_offset; req->ki_flags = iocb_flags(req->ki_filp); + kiocb_set_completion(req, aio_complete_rw_id); if (iocb->aio_flags & IOCB_FLAG_RESFD) req->ki_flags |= IOCB_EVENTFD; req->ki_hint = ki_hint_validate(file_write_hint(req->ki_filp)); @@ -2276,3 +2256,27 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64, return ret; } #endif + +/* + * Creates the slab caches used by the aio routines, panic on + * failure as this is done early during the boot sequence. + */ +static int __init aio_setup(void) +{ + static struct file_system_type aio_fs = { + .name = "aio", + .init_fs_context = aio_init_fs_context, + .kill_sb = kill_anon_super, + }; + aio_mnt = kern_mount(&aio_fs); + if (IS_ERR(aio_mnt)) + panic("Failed to create aio fs mount."); + + kiocb_cachep = KMEM_CACHE(aio_kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC); + kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC); + aio_complete_rw_id = register_kiocb_completion(aio_complete_rw); + BUG_ON(aio_complete_rw_id < 0); + return 0; +} +__initcall(aio_setup); + diff --git a/fs/cifs/file.c b/fs/cifs/file.c index cbf36a8a23aa..b4cfd262a1a4 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -3115,7 +3115,7 @@ static void collect_uncached_write_data(struct cifs_aio_ctx *ctx) mutex_unlock(&ctx->aio_mutex); - if (ctx->iocb && ctx->iocb->ki_complete) + if (ctx->iocb && !is_sync_kiocb(ctx->iocb)) complete_kiocb(ctx->iocb, ctx->rc, 0); else complete(&ctx->done); @@ -3848,7 +3848,7 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx) mutex_unlock(&ctx->aio_mutex); - if (ctx->iocb && ctx->iocb->ki_complete) + if (ctx->iocb && !is_sync_kiocb(ctx->iocb)) complete_kiocb(ctx->iocb, ctx->rc, 0); else complete(&ctx->done); diff --git a/fs/io_uring.c b/fs/io_uring.c index f06915fcb6b6..e0f68aa78596 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2237,6 +2237,7 @@ static void __io_complete_rw(struct io_kiocb *req, long res, long res2, io_complete_rw_common(&req->rw.kiocb, res, cs); } +static int io_complete_rw_id; static void io_complete_rw(struct kiocb *kiocb, long res, long res2) { struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw.kiocb); @@ -2244,6 +2245,7 @@ static void io_complete_rw(struct kiocb *kiocb, long res, long res2) __io_complete_rw(req, res, res2, NULL); } +static int io_complete_rw_iopoll_id; static void io_complete_rw_iopoll(struct kiocb *kiocb, long res, long res2) { struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw.kiocb); @@ -2437,13 +2439,13 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe, return -EOPNOTSUPP; kiocb->ki_flags |= IOCB_HIPRI; - kiocb->ki_complete = io_complete_rw_iopoll; + kiocb_set_completion(kiocb, io_complete_rw_iopoll_id); req->iopoll_completed = 0; io_get_req_task(req); } else { if (kiocb->ki_flags & IOCB_HIPRI) return -EINVAL; - kiocb->ki_complete = io_complete_rw; + kiocb_set_completion(kiocb, io_complete_rw_id); } req->rw.addr = READ_ONCE(sqe->addr); @@ -2480,7 +2482,7 @@ static void kiocb_done(struct kiocb *kiocb, ssize_t ret, if (req->flags & REQ_F_CUR_POS) req->file->f_pos = kiocb->ki_pos; - if (ret >= 0 && kiocb->ki_complete == io_complete_rw) + if (ret >= 0 && kiocb_completion_id(kiocb) == io_complete_rw_id) __io_complete_rw(req, ret, 0, cs); else io_rw_done(kiocb, ret); @@ -8596,6 +8598,10 @@ static int __init io_uring_init(void) BUILD_BUG_ON(ARRAY_SIZE(io_op_defs) != IORING_OP_LAST); BUILD_BUG_ON(__REQ_F_LAST_BIT >= 8 * sizeof(int)); req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC); + io_complete_rw_id = register_kiocb_completion(io_complete_rw); + io_complete_rw_iopoll_id = + register_kiocb_completion(io_complete_rw_iopoll); + BUG_ON(io_complete_rw_id < 0 || io_complete_rw_iopoll_id < 0); return 0; }; __initcall(io_uring_init); diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 85979e2214b3..abcd5257ca34 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -2284,7 +2284,7 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb, struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); int full_coherency = !(osb->s_mount_opt & OCFS2_MOUNT_COHERENCY_BUFFERED); - void *saved_ki_complete = NULL; + int saved_ki_complete = 0; int append_write = ((iocb->ki_pos + count) >= i_size_read(inode) ? 1 : 0); int direct_io = iocb->ki_flags & IOCB_DIRECT ? 1 : 0; @@ -2368,7 +2368,8 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb, /* * Make it a sync io if it's an unaligned aio. */ - saved_ki_complete = xchg(&iocb->ki_complete, NULL); + saved_ki_complete = kiocb_completion_id(iocb); + kiocb_set_completion(iocb, 0); } /* communicate with ocfs2_dio_end_io */ @@ -2416,7 +2417,7 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb, out: if (saved_ki_complete) - xchg(&iocb->ki_complete, saved_ki_complete); + kiocb_set_completion(iocb, saved_ki_complete); if (rw_level != -1) ocfs2_rw_unlock(inode, rw_level); diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index 78e7439fc4e2..5951c4180bc9 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -278,6 +278,8 @@ static void ovl_aio_rw_complete(struct kiocb *iocb, long res, long res2) complete_kiocb(orig_iocb, res, res2); } +static int ovl_aio_rw_complete_id; + static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter) { struct file *file = iocb->ki_filp; @@ -308,7 +310,7 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter) real.flags = 0; aio_req->orig_iocb = iocb; kiocb_clone(&aio_req->iocb, iocb, real.file); - aio_req->iocb.ki_complete = ovl_aio_rw_complete; + kiocb_set_completion(&aio_req->iocb, ovl_aio_rw_complete_id); ret = vfs_iocb_iter_read(real.file, &aio_req->iocb, iter); if (ret != -EIOCBQUEUED) ovl_aio_cleanup_handler(aio_req); @@ -368,7 +370,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) real.flags = 0; aio_req->orig_iocb = iocb; kiocb_clone(&aio_req->iocb, iocb, real.file); - aio_req->iocb.ki_complete = ovl_aio_rw_complete; + kiocb_set_completion(&aio_req->iocb, ovl_aio_rw_complete_id); ret = vfs_iocb_iter_write(real.file, &aio_req->iocb, iter); if (ret != -EIOCBQUEUED) ovl_aio_cleanup_handler(aio_req); @@ -792,16 +794,23 @@ const struct file_operations ovl_file_operations = { int __init ovl_aio_request_cache_init(void) { + ovl_aio_rw_complete_id = register_kiocb_completion(ovl_aio_rw_complete); + if (ovl_aio_rw_complete_id < 0) + return ovl_aio_rw_complete_id; + ovl_aio_request_cachep = kmem_cache_create("ovl_aio_req", sizeof(struct ovl_aio_req), 0, SLAB_HWCACHE_ALIGN, NULL); - if (!ovl_aio_request_cachep) + if (!ovl_aio_request_cachep) { + unregister_kiocb_completion(ovl_aio_rw_complete_id); return -ENOMEM; + } return 0; } void ovl_aio_request_cache_destroy(void) { + unregister_kiocb_completion(ovl_aio_rw_complete_id); kmem_cache_destroy(ovl_aio_request_cachep); } diff --git a/fs/read_write.c b/fs/read_write.c index 89151de19f77..0163cefb9bf1 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -363,9 +363,39 @@ SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high, } #endif +#define IOCB_CB_MAX ((1 << _IOCB_COMPLETION_BITS) - 1) + +typedef void ki_cmpl(struct kiocb *, long ret, long ret2); +static ki_cmpl *ki_cmpls[IOCB_CB_MAX]; + +int register_kiocb_completion(ki_cmpl cb) +{ + int i; + + for (i = 0; i < IOCB_CB_MAX; i++) { + if (ki_cmpls[i]) + continue; + ki_cmpls[i] = cb; + return i + 1; + } + + pr_err("Increase _IOCB_COMPLETION_BITS\n"); + return -EBUSY; +} +EXPORT_SYMBOL(register_kiocb_completion); + +void unregister_kiocb_completion(int id) +{ + ki_cmpls[id - 1] = NULL; +} +EXPORT_SYMBOL(unregister_kiocb_completion); + void complete_kiocb(struct kiocb *iocb, long ret, long ret2) { - iocb->ki_complete(iocb, ret, ret2); + unsigned int id = kiocb_completion_id(iocb); + + if (id > 0) + ki_cmpls[id - 1](iocb, ret, ret2); } EXPORT_SYMBOL(complete_kiocb); diff --git a/include/linux/fs.h b/include/linux/fs.h index 846135aa328d..fa6f98714994 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -320,6 +320,9 @@ enum rw_hint { #define IOCB_NOWAIT (1 << 7) /* iocb->ki_waitq is valid */ #define IOCB_WAITQ (1 << 8) +#define _IOCB_COMPLETION_BITS 4 +#define _IOCB_COMPLETION_SHIFT (32 - _IOCB_COMPLETION_BITS) +#define IOCB_COMPLETION_FNS (~0 << _IOCB_COMPLETION_SHIFT) struct kiocb { struct file *ki_filp; @@ -328,9 +331,8 @@ struct kiocb { randomized_struct_fields_start loff_t ki_pos; - void (*ki_complete)(struct kiocb *iocb, long ret, long ret2); void *private; - int ki_flags; + unsigned int ki_flags; u16 ki_hint; u16 ki_ioprio; /* See linux/ioprio.h */ union { @@ -341,12 +343,25 @@ struct kiocb { randomized_struct_fields_end }; +static inline int kiocb_completion_id(struct kiocb *kiocb) +{ + return kiocb->ki_flags >> _IOCB_COMPLETION_SHIFT; +} + +static inline void kiocb_set_completion(struct kiocb *kiocb, int id) +{ + kiocb->ki_flags = (kiocb->ki_flags & (~IOCB_COMPLETION_FNS)) | + (id << _IOCB_COMPLETION_SHIFT); +} + static inline bool is_sync_kiocb(struct kiocb *kiocb) { - return kiocb->ki_complete == NULL; + return kiocb_completion_id(kiocb) == 0; } void complete_kiocb(struct kiocb *kiocb, long ret, long ret2); +int register_kiocb_completion(void (*)(struct kiocb *, long, long)); +void unregister_kiocb_completion(int id); /* * "descriptor" for what we're up to with a read. -- 2.27.0 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] fs: Remove kiocb->ki_complete 2020-07-08 22:26 ` [PATCH 2/2] fs: Remove kiocb->ki_complete Matthew Wilcox (Oracle) @ 2020-07-08 22:38 ` Jens Axboe 2020-07-09 3:25 ` Jens Axboe 2020-07-09 5:23 ` kernel test robot 2 siblings, 0 replies; 21+ messages in thread From: Jens Axboe @ 2020-07-08 22:38 UTC (permalink / raw) To: Matthew Wilcox (Oracle) Cc: linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez On 7/8/20 4:26 PM, Matthew Wilcox (Oracle) wrote: > @@ -341,12 +343,25 @@ struct kiocb { > randomized_struct_fields_end > }; > > +static inline int kiocb_completion_id(struct kiocb *kiocb) > +{ > + return kiocb->ki_flags >> _IOCB_COMPLETION_SHIFT; > +} > + > +static inline void kiocb_set_completion(struct kiocb *kiocb, int id) > +{ > + kiocb->ki_flags = (kiocb->ki_flags & (~IOCB_COMPLETION_FNS)) | > + (id << _IOCB_COMPLETION_SHIFT); > +} > + > static inline bool is_sync_kiocb(struct kiocb *kiocb) > { > - return kiocb->ki_complete == NULL; > + return kiocb_completion_id(kiocb) == 0; > } > > void complete_kiocb(struct kiocb *kiocb, long ret, long ret2); > +int register_kiocb_completion(void (*)(struct kiocb *, long, long)); > +void unregister_kiocb_completion(int id); Same here, you seem to mix and match whether you prefix with kiocb or not. Why not make them all kiocb_*? kiocb_register_completion() and so forth. -- Jens Axboe ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] fs: Remove kiocb->ki_complete 2020-07-08 22:26 ` [PATCH 2/2] fs: Remove kiocb->ki_complete Matthew Wilcox (Oracle) 2020-07-08 22:38 ` Jens Axboe @ 2020-07-09 3:25 ` Jens Axboe 2020-07-09 5:23 ` kernel test robot 2 siblings, 0 replies; 21+ messages in thread From: Jens Axboe @ 2020-07-09 3:25 UTC (permalink / raw) To: Matthew Wilcox (Oracle) Cc: linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez On 7/8/20 4:26 PM, Matthew Wilcox (Oracle) wrote: > +void unregister_kiocb_completion(int id) > +{ > + ki_cmpls[id - 1] = NULL; > +} > +EXPORT_SYMBOL(unregister_kiocb_completion); This should have a limit check (<= 0 || > max). > void complete_kiocb(struct kiocb *iocb, long ret, long ret2) > { > - iocb->ki_complete(iocb, ret, ret2); > + unsigned int id = kiocb_completion_id(iocb); > + > + if (id > 0) > + ki_cmpls[id - 1](iocb, ret, ret2); > } I'd make id == 0 be a dummy funciton to avoid this branch. -- Jens Axboe ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] fs: Remove kiocb->ki_complete 2020-07-08 22:26 ` [PATCH 2/2] fs: Remove kiocb->ki_complete Matthew Wilcox (Oracle) 2020-07-08 22:38 ` Jens Axboe 2020-07-09 3:25 ` Jens Axboe @ 2020-07-09 5:23 ` kernel test robot 2 siblings, 0 replies; 21+ messages in thread From: kernel test robot @ 2020-07-09 5:23 UTC (permalink / raw) To: Matthew Wilcox (Oracle), Jens Axboe Cc: kbuild-all, clang-built-linux, Matthew Wilcox (Oracle), linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez [-- Attachment #1: Type: text/plain, Size: 2510 bytes --] Hi "Matthew, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on block/for-next] [cannot apply to balbi-usb/testing/next cifs/for-next miklos-vfs/overlayfs-next linus/master v5.8-rc4 next-20200708] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Matthew-Wilcox-Oracle/Remove-kiocb-ki_complete/20200709-062758 base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next config: s390-randconfig-r013-20200708 (attached as .config) compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 02946de3802d3bc65bc9f2eb9b8d4969b5a7add8) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install s390 cross compiling tool for clang build # apt-get install binutils-s390x-linux-gnu # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=s390 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <[email protected]> All warnings (new ones prefixed by >>): In file included from arch/s390/purgatory/purgatory.c:10: In file included from include/linux/kexec.h:18: In file included from include/linux/crash_core.h:6: In file included from include/linux/elfcore.h:9: In file included from arch/s390/include/asm/elf.h:132: In file included from include/linux/compat.h:17: >> include/linux/fs.h:353:41: warning: shifting a negative signed value is undefined [-Wshift-negative-value] kiocb->ki_flags = (kiocb->ki_flags & (~IOCB_COMPLETION_FNS)) | ^~~~~~~~~~~~~~~~~~~ include/linux/fs.h:325:33: note: expanded from macro 'IOCB_COMPLETION_FNS' #define IOCB_COMPLETION_FNS (~0 << _IOCB_COMPLETION_SHIFT) ~~ ^ 1 warning generated. vim +353 include/linux/fs.h 350 351 static inline void kiocb_set_completion(struct kiocb *kiocb, int id) 352 { > 353 kiocb->ki_flags = (kiocb->ki_flags & (~IOCB_COMPLETION_FNS)) | 354 (id << _IOCB_COMPLETION_SHIFT); 355 } 356 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/[email protected] [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 29080 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/2] Remove kiocb ki_complete 2020-07-08 22:26 [PATCH 0/2] Remove kiocb ki_complete Matthew Wilcox (Oracle) 2020-07-08 22:26 ` [PATCH 1/2] fs: Abstract calling the kiocb completion function Matthew Wilcox (Oracle) 2020-07-08 22:26 ` [PATCH 2/2] fs: Remove kiocb->ki_complete Matthew Wilcox (Oracle) @ 2020-07-08 22:33 ` Jens Axboe 2020-07-09 10:17 ` Christoph Hellwig 3 siblings, 0 replies; 21+ messages in thread From: Jens Axboe @ 2020-07-08 22:33 UTC (permalink / raw) To: Matthew Wilcox (Oracle) Cc: linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez On 7/8/20 4:26 PM, Matthew Wilcox (Oracle) wrote: > Save a pointer in the kiocb by using a few bits in ki_flags to index > a table of completion functions. I ran polled and regular IO testing through io_uring, which exercises both completions that we have in there, and it works just fine for me. Didn't test anything beyond that, outside of ensuring the kernel also booted ;-) -- Jens Axboe ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/2] Remove kiocb ki_complete 2020-07-08 22:26 [PATCH 0/2] Remove kiocb ki_complete Matthew Wilcox (Oracle) ` (2 preceding siblings ...) 2020-07-08 22:33 ` [PATCH 0/2] Remove kiocb ki_complete Jens Axboe @ 2020-07-09 10:17 ` Christoph Hellwig 2020-07-09 11:10 ` Matthew Wilcox 3 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2020-07-09 10:17 UTC (permalink / raw) To: Matthew Wilcox (Oracle) Cc: Jens Axboe, linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez I really don't like this series at all. If saves a single pointer but introduces a complicated machinery that just doesn't follow any natural flow. And there doesn't seem to be any good reason for it to start with. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/2] Remove kiocb ki_complete 2020-07-09 10:17 ` Christoph Hellwig @ 2020-07-09 11:10 ` Matthew Wilcox 2020-07-09 13:26 ` Christoph Hellwig 0 siblings, 1 reply; 21+ messages in thread From: Matthew Wilcox @ 2020-07-09 11:10 UTC (permalink / raw) To: Christoph Hellwig Cc: Jens Axboe, linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote: > I really don't like this series at all. If saves a single pointer > but introduces a complicated machinery that just doesn't follow any > natural flow. And there doesn't seem to be any good reason for it to > start with. Jens doesn't want the kiocb to grow beyond a single cacheline, and we want the ability to set the loff_t in userspace for an appending write, so the plan was to replace the ki_complete member in kiocb with an loff_t __user *ki_posp. I don't think it's worth worrying about growing kiocb, personally, but this seemed like the easiest way to make room for a new pointer. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/2] Remove kiocb ki_complete 2020-07-09 11:10 ` Matthew Wilcox @ 2020-07-09 13:26 ` Christoph Hellwig 2020-07-09 13:32 ` Matthew Wilcox ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Christoph Hellwig @ 2020-07-09 13:26 UTC (permalink / raw) To: Matthew Wilcox Cc: Christoph Hellwig, Jens Axboe, linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote: > On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote: > > I really don't like this series at all. If saves a single pointer > > but introduces a complicated machinery that just doesn't follow any > > natural flow. And there doesn't seem to be any good reason for it to > > start with. > > Jens doesn't want the kiocb to grow beyond a single cacheline, and we > want the ability to set the loff_t in userspace for an appending write, > so the plan was to replace the ki_complete member in kiocb with an > loff_t __user *ki_posp. > > I don't think it's worth worrying about growing kiocb, personally, > but this seemed like the easiest way to make room for a new pointer. The user offset pointer has absolutely no business in the the kiocb itself - it is a io_uring concept which needs to go into the io_kiocb, which has 14 bytes left in the last cache line in my build. It would fit in very well there right next to the result and user pointer. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/2] Remove kiocb ki_complete 2020-07-09 13:26 ` Christoph Hellwig @ 2020-07-09 13:32 ` Matthew Wilcox 2020-07-09 13:53 ` Jens Axboe 2020-07-09 13:37 ` Pavel Begunkov 2020-07-09 13:55 ` Jens Axboe 2 siblings, 1 reply; 21+ messages in thread From: Matthew Wilcox @ 2020-07-09 13:32 UTC (permalink / raw) To: Christoph Hellwig Cc: Jens Axboe, linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez On Thu, Jul 09, 2020 at 02:26:11PM +0100, Christoph Hellwig wrote: > On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote: > > On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote: > > > I really don't like this series at all. If saves a single pointer > > > but introduces a complicated machinery that just doesn't follow any > > > natural flow. And there doesn't seem to be any good reason for it to > > > start with. > > > > Jens doesn't want the kiocb to grow beyond a single cacheline, and we > > want the ability to set the loff_t in userspace for an appending write, > > so the plan was to replace the ki_complete member in kiocb with an > > loff_t __user *ki_posp. > > > > I don't think it's worth worrying about growing kiocb, personally, > > but this seemed like the easiest way to make room for a new pointer. > > The user offset pointer has absolutely no business in the the kiocb > itself - it is a io_uring concept which needs to go into the io_kiocb, > which has 14 bytes left in the last cache line in my build. It would > fit in very well there right next to the result and user pointer. I agree. Jens doesn't. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/2] Remove kiocb ki_complete 2020-07-09 13:32 ` Matthew Wilcox @ 2020-07-09 13:53 ` Jens Axboe 0 siblings, 0 replies; 21+ messages in thread From: Jens Axboe @ 2020-07-09 13:53 UTC (permalink / raw) To: Matthew Wilcox, Christoph Hellwig Cc: linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez On 7/9/20 7:32 AM, Matthew Wilcox wrote: > On Thu, Jul 09, 2020 at 02:26:11PM +0100, Christoph Hellwig wrote: >> On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote: >>> On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote: >>>> I really don't like this series at all. If saves a single pointer >>>> but introduces a complicated machinery that just doesn't follow any >>>> natural flow. And there doesn't seem to be any good reason for it to >>>> start with. >>> >>> Jens doesn't want the kiocb to grow beyond a single cacheline, and we >>> want the ability to set the loff_t in userspace for an appending write, >>> so the plan was to replace the ki_complete member in kiocb with an >>> loff_t __user *ki_posp. >>> >>> I don't think it's worth worrying about growing kiocb, personally, >>> but this seemed like the easiest way to make room for a new pointer. >> >> The user offset pointer has absolutely no business in the the kiocb >> itself - it is a io_uring concept which needs to go into the io_kiocb, >> which has 14 bytes left in the last cache line in my build. It would >> fit in very well there right next to the result and user pointer. > > I agree. Jens doesn't. Stop putting words in my mouth, especially when they are totally untrue. I was opposed to growing struct io_rw in io_uring, which is where the extra append variable belonds, beyond a cacheline. You mentioned you could probably shave some bits out of struct kiocb, which is how this completion handling business came about. If kiocb was shrunk, then io_rw has room for the needed variable. At no point have I said that whatever we need to shove in there for io_uring should be in the kiocb, that would not make any sense. I'm just opposed to growing the per-op data field in io_kiocb beyond a cacheline. And that's especially true for something like append writes, which I don't consider super interesting. -- Jens Axboe ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/2] Remove kiocb ki_complete 2020-07-09 13:26 ` Christoph Hellwig 2020-07-09 13:32 ` Matthew Wilcox @ 2020-07-09 13:37 ` Pavel Begunkov 2020-07-09 13:43 ` Matthew Wilcox 2020-07-09 13:55 ` Jens Axboe 2 siblings, 1 reply; 21+ messages in thread From: Pavel Begunkov @ 2020-07-09 13:37 UTC (permalink / raw) To: Christoph Hellwig, Matthew Wilcox Cc: Jens Axboe, linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez On 09/07/2020 16:26, Christoph Hellwig wrote: > On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote: >> On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote: >>> I really don't like this series at all. If saves a single pointer >>> but introduces a complicated machinery that just doesn't follow any >>> natural flow. And there doesn't seem to be any good reason for it to >>> start with. >> >> Jens doesn't want the kiocb to grow beyond a single cacheline, and we >> want the ability to set the loff_t in userspace for an appending write, >> so the plan was to replace the ki_complete member in kiocb with an >> loff_t __user *ki_posp. >> >> I don't think it's worth worrying about growing kiocb, personally, >> but this seemed like the easiest way to make room for a new pointer. > > The user offset pointer has absolutely no business in the the kiocb > itself - it is a io_uring concept which needs to go into the io_kiocb, > which has 14 bytes left in the last cache line in my build. It would > fit in very well there right next to the result and user pointer. After getting a valid offset, io_uring shouldn't do anything but complete the request. And as io_kiocb implicitly contains a CQE entry, not sure we need @append_offset in the first place. Kanchan, could you take a look if you can hide it in req->cflags? -- Pavel Begunkov ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/2] Remove kiocb ki_complete 2020-07-09 13:37 ` Pavel Begunkov @ 2020-07-09 13:43 ` Matthew Wilcox 2020-07-09 13:49 ` Pavel Begunkov 0 siblings, 1 reply; 21+ messages in thread From: Matthew Wilcox @ 2020-07-09 13:43 UTC (permalink / raw) To: Pavel Begunkov Cc: Christoph Hellwig, Jens Axboe, linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez On Thu, Jul 09, 2020 at 04:37:59PM +0300, Pavel Begunkov wrote: > On 09/07/2020 16:26, Christoph Hellwig wrote: > > On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote: > >> On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote: > >>> I really don't like this series at all. If saves a single pointer > >>> but introduces a complicated machinery that just doesn't follow any > >>> natural flow. And there doesn't seem to be any good reason for it to > >>> start with. > >> > >> Jens doesn't want the kiocb to grow beyond a single cacheline, and we > >> want the ability to set the loff_t in userspace for an appending write, > >> so the plan was to replace the ki_complete member in kiocb with an > >> loff_t __user *ki_posp. > >> > >> I don't think it's worth worrying about growing kiocb, personally, > >> but this seemed like the easiest way to make room for a new pointer. > > > > The user offset pointer has absolutely no business in the the kiocb > > itself - it is a io_uring concept which needs to go into the io_kiocb, > > which has 14 bytes left in the last cache line in my build. It would > > fit in very well there right next to the result and user pointer. > > After getting a valid offset, io_uring shouldn't do anything but > complete the request. And as io_kiocb implicitly contains a CQE entry, > not sure we need @append_offset in the first place. > > Kanchan, could you take a look if you can hide it in req->cflags? No, that's not what cflags are for. And besides, there's only 32 bits there. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/2] Remove kiocb ki_complete 2020-07-09 13:43 ` Matthew Wilcox @ 2020-07-09 13:49 ` Pavel Begunkov 2020-07-09 13:53 ` Matthew Wilcox 0 siblings, 1 reply; 21+ messages in thread From: Pavel Begunkov @ 2020-07-09 13:49 UTC (permalink / raw) To: Matthew Wilcox Cc: Christoph Hellwig, Jens Axboe, linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez On 09/07/2020 16:43, Matthew Wilcox wrote: > On Thu, Jul 09, 2020 at 04:37:59PM +0300, Pavel Begunkov wrote: >> On 09/07/2020 16:26, Christoph Hellwig wrote: >>> On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote: >>>> On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote: >>>>> I really don't like this series at all. If saves a single pointer >>>>> but introduces a complicated machinery that just doesn't follow any >>>>> natural flow. And there doesn't seem to be any good reason for it to >>>>> start with. >>>> >>>> Jens doesn't want the kiocb to grow beyond a single cacheline, and we >>>> want the ability to set the loff_t in userspace for an appending write, >>>> so the plan was to replace the ki_complete member in kiocb with an >>>> loff_t __user *ki_posp. >>>> >>>> I don't think it's worth worrying about growing kiocb, personally, >>>> but this seemed like the easiest way to make room for a new pointer. >>> >>> The user offset pointer has absolutely no business in the the kiocb >>> itself - it is a io_uring concept which needs to go into the io_kiocb, >>> which has 14 bytes left in the last cache line in my build. It would >>> fit in very well there right next to the result and user pointer. >> >> After getting a valid offset, io_uring shouldn't do anything but >> complete the request. And as io_kiocb implicitly contains a CQE entry, >> not sure we need @append_offset in the first place. >> >> Kanchan, could you take a look if you can hide it in req->cflags? > > No, that's not what cflags are for. And besides, there's only 32 bits > there. It's there to temporarily store cqe->cflags, if a request can't completed right away. And req->{result,user_data,cflags} are basically an CQE inside io_kiocb. So, it is there exactly for that reason, and whatever way it's going to be encoded in an CQE, io_kiocb can fit it. That was my point. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/2] Remove kiocb ki_complete 2020-07-09 13:49 ` Pavel Begunkov @ 2020-07-09 13:53 ` Matthew Wilcox 2020-07-09 13:59 ` Pavel Begunkov 0 siblings, 1 reply; 21+ messages in thread From: Matthew Wilcox @ 2020-07-09 13:53 UTC (permalink / raw) To: Pavel Begunkov Cc: Christoph Hellwig, Jens Axboe, linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez On Thu, Jul 09, 2020 at 04:49:51PM +0300, Pavel Begunkov wrote: > On 09/07/2020 16:43, Matthew Wilcox wrote: > > On Thu, Jul 09, 2020 at 04:37:59PM +0300, Pavel Begunkov wrote: > >> Kanchan, could you take a look if you can hide it in req->cflags? > > > > No, that's not what cflags are for. And besides, there's only 32 bits > > there. > > It's there to temporarily store cqe->cflags, if a request can't completed > right away. And req->{result,user_data,cflags} are basically an CQE inside > io_kiocb. > > So, it is there exactly for that reason, and whatever way it's going to be > encoded in an CQE, io_kiocb can fit it. That was my point. But it's not going to be encoded in the CQE. Perhaps you should go back to the older thread and read the arguments there. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/2] Remove kiocb ki_complete 2020-07-09 13:53 ` Matthew Wilcox @ 2020-07-09 13:59 ` Pavel Begunkov 0 siblings, 0 replies; 21+ messages in thread From: Pavel Begunkov @ 2020-07-09 13:59 UTC (permalink / raw) To: Matthew Wilcox Cc: Christoph Hellwig, Jens Axboe, linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez On 09/07/2020 16:53, Matthew Wilcox wrote: > On Thu, Jul 09, 2020 at 04:49:51PM +0300, Pavel Begunkov wrote: >> On 09/07/2020 16:43, Matthew Wilcox wrote: >>> On Thu, Jul 09, 2020 at 04:37:59PM +0300, Pavel Begunkov wrote: >>>> Kanchan, could you take a look if you can hide it in req->cflags? >>> >>> No, that's not what cflags are for. And besides, there's only 32 bits >>> there. >> >> It's there to temporarily store cqe->cflags, if a request can't completed >> right away. And req->{result,user_data,cflags} are basically an CQE inside >> io_kiocb. >> >> So, it is there exactly for that reason, and whatever way it's going to be >> encoded in an CQE, io_kiocb can fit it. That was my point. > > But it's not going to be encoded in the CQE. Perhaps you should go back to > the older thread and read the arguments there. Ok, if the thread is stopped on the version with indirection. I just looked the last sent patch. If so, we can also store offset before adding an CQE and getting by with local vars only. Just a thought. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/2] Remove kiocb ki_complete 2020-07-09 13:26 ` Christoph Hellwig 2020-07-09 13:32 ` Matthew Wilcox 2020-07-09 13:37 ` Pavel Begunkov @ 2020-07-09 13:55 ` Jens Axboe 2 siblings, 0 replies; 21+ messages in thread From: Jens Axboe @ 2020-07-09 13:55 UTC (permalink / raw) To: Christoph Hellwig, Matthew Wilcox Cc: linux-block, linux-fsdevel, linux-aio, io-uring, linux-kernel, Kanchan Joshi, Javier Gonzalez On 7/9/20 7:26 AM, Christoph Hellwig wrote: > On Thu, Jul 09, 2020 at 12:10:36PM +0100, Matthew Wilcox wrote: >> On Thu, Jul 09, 2020 at 11:17:05AM +0100, Christoph Hellwig wrote: >>> I really don't like this series at all. If saves a single pointer >>> but introduces a complicated machinery that just doesn't follow any >>> natural flow. And there doesn't seem to be any good reason for it to >>> start with. >> >> Jens doesn't want the kiocb to grow beyond a single cacheline, and we >> want the ability to set the loff_t in userspace for an appending write, >> so the plan was to replace the ki_complete member in kiocb with an >> loff_t __user *ki_posp. >> >> I don't think it's worth worrying about growing kiocb, personally, >> but this seemed like the easiest way to make room for a new pointer. > > The user offset pointer has absolutely no business in the the kiocb > itself - it is a io_uring concept which needs to go into the io_kiocb, Nobody disagrees on that. > which has 14 bytes left in the last cache line in my build. It would > fit in very well there right next to the result and user pointer. Per-op data should not spill into the io_kiocb itself. And I absolutely hate arguments like "oh there's still 14 bytes in there", because then there's 6, then there's none, and now we're going into the next cacheline. io_kiocb is already too fat, it should be getting slimmer, not bigger. And the append write stuff is not nearly interesting enough to a) grow io_kiocb, b) warrant a special case for op private data in the io_kiocb itself. -- Jens Axboe ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2020-07-09 14:01 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-07-08 22:26 [PATCH 0/2] Remove kiocb ki_complete Matthew Wilcox (Oracle) 2020-07-08 22:26 ` [PATCH 1/2] fs: Abstract calling the kiocb completion function Matthew Wilcox (Oracle) 2020-07-08 22:37 ` Jens Axboe 2020-07-08 22:40 ` Matthew Wilcox 2020-07-08 22:50 ` Jens Axboe 2020-07-08 22:26 ` [PATCH 2/2] fs: Remove kiocb->ki_complete Matthew Wilcox (Oracle) 2020-07-08 22:38 ` Jens Axboe 2020-07-09 3:25 ` Jens Axboe 2020-07-09 5:23 ` kernel test robot 2020-07-08 22:33 ` [PATCH 0/2] Remove kiocb ki_complete Jens Axboe 2020-07-09 10:17 ` Christoph Hellwig 2020-07-09 11:10 ` Matthew Wilcox 2020-07-09 13:26 ` Christoph Hellwig 2020-07-09 13:32 ` Matthew Wilcox 2020-07-09 13:53 ` Jens Axboe 2020-07-09 13:37 ` Pavel Begunkov 2020-07-09 13:43 ` Matthew Wilcox 2020-07-09 13:49 ` Pavel Begunkov 2020-07-09 13:53 ` Matthew Wilcox 2020-07-09 13:59 ` Pavel Begunkov 2020-07-09 13:55 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox