From: Stefan Metzmacher <[email protected]>
To: Jens Axboe <[email protected]>
Cc: Samba Technical <[email protected]>,
Linus Torvalds <[email protected]>,
io-uring <[email protected]>
Subject: Re: Problems replacing epoll with io_uring in tevent
Date: Wed, 28 Dec 2022 17:19:52 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
Hi Jens,
any change to get some feedback on these?
https://lore.kernel.org/io-uring/[email protected]/
and
https://lore.kernel.org/io-uring/[email protected]/
Thanks in advance!
metze
Am 27.10.22 um 21:25 schrieb Stefan Metzmacher:
> Hi Jens,
>
>>>> I'm currently trying to prototype for an IORING_POLL_CANCEL_ON_CLOSE
>>>> flag that can be passed to POLL_ADD. With that we'll register
>>>> the request in &req->file->f_uring_poll (similar to the file->f_ep list for epoll)
>>>> Then we only get a real reference to the file during the call to
>>>> vfs_poll() otherwise we drop the fget/fput reference and rely on
>>>> an io_uring_poll_release_file() (similar to eventpoll_release_file())
>>>> to cancel our registered poll request.
>>>
>>> Yes, this is a bit tricky as we hold the file ref across the operation. I'd
>>> be interested in seeing your approach to this, and also how it would
>>> interact with registered files...
>>
>> Here's my current patch:
>> https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=b9cccfac515739fc279c6eec87ce655a96f94685
>> It compiles, but I haven't tested it yet. And I'm not sure if the locking is done correctly...
>
> It doesn't deadlock nor blow up immediately :-)
> And it does fix the problem I had.
>
> So what do you think about that patch?
> Am I doing stupid things there?
>
> These points might be changed:
> - returning -EBADF instead of -ECANCELED
> might be better and allow the caller to avoid
> retrying.
> - I guess we could use a single linked list, but
> I'm mostly used to how struct list_head works.
> And I want something that works first.
> - We may find a better name than IORING_POLL_CANCEL_ON_CLOSE
> - struct io_poll is completely full, as well as io_kiocb->flags
> (maybe we should move flags to 64 bit?),
> so we need to use some other generic struct io_kiocb space,
> which might also be good in order make it possible to keep io_poll_add()
> and io_arm_poll_handler() in common.
> But we may have the new field a bit differently. Note that
> struct io_kiocb (without this patch) still has 32 free bytes before
> 4 64 byte cachelines are filled. With my patch 24 bytes are left...
> - In struct file it might be possible to share a reference list with
> with the epoll code, where each element can indicate it epoll
> or io_uring is used.
>
> I'm pasting it below in order to make it easier to get comments...
>
> metze
>
> fs/file_table.c | 3 ++
> include/linux/fs.h | 1 +
> include/linux/io_uring.h | 12 +++++
> include/linux/io_uring_types.h | 4 ++
> include/uapi/linux/io_uring.h | 1 +
> io_uring/opdef.c | 1 +
> io_uring/poll.c | 100 ++++++++++++++++++++++++++++++++++++++++-
> io_uring/poll.h | 1 +
> 8 files changed, 122 insertions(+), 1 deletion(-)
>
> diff --git a/fs/file_table.c b/fs/file_table.c
> index dd88701e54a9..cad408e9c0f5 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -16,6 +16,7 @@
> #include <linux/security.h>
> #include <linux/cred.h>
> #include <linux/eventpoll.h>
> +#include <linux/io_uring.h>
> #include <linux/rcupdate.h>
> #include <linux/mount.h>
> #include <linux/capability.h>
> @@ -147,6 +148,7 @@ static struct file *__alloc_file(int flags, const struct cred *cred)
> }
>
> atomic_long_set(&f->f_count, 1);
> + INIT_LIST_HEAD(&f->f_uring_poll);
> rwlock_init(&f->f_owner.lock);
> spin_lock_init(&f->f_lock);
> mutex_init(&f->f_pos_lock);
> @@ -309,6 +311,7 @@ static void __fput(struct file *file)
> * in the file cleanup chain.
> */
> eventpoll_release(file);
> + io_uring_poll_release(file);
> locks_remove_file(file);
>
> ima_file_free(file);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e654435f1651..7f99efa7a1dc 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -972,6 +972,7 @@ struct file {
> /* Used by fs/eventpoll.c to link all the hooks to this file */
> struct hlist_head *f_ep;
> #endif /* #ifdef CONFIG_EPOLL */
> + struct list_head f_uring_poll;
> struct address_space *f_mapping;
> errseq_t f_wb_err;
> errseq_t f_sb_err; /* for syncfs */
> diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
> index 43bc8a2edccf..c931ea92c29a 100644
> --- a/include/linux/io_uring.h
> +++ b/include/linux/io_uring.h
> @@ -61,6 +61,15 @@ static inline void io_uring_free(struct task_struct *tsk)
> if (tsk->io_uring)
> __io_uring_free(tsk);
> }
> +
> +void io_uring_poll_release_file(struct file *file);
> +static inline void io_uring_poll_release(struct file *file)
> +{
> + if (likely(list_empty_careful(&file->f_uring_poll)))
> + return;
> +
> + io_uring_poll_release_file(file);
> +}
> #else
> static inline int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
> struct iov_iter *iter, void *ioucmd)
> @@ -92,6 +101,9 @@ static inline const char *io_uring_get_opcode(u8 opcode)
> {
> return "";
> }
> +static inline void io_uring_poll_release(struct file *file)
> +{
> +}
> #endif
>
> #endif
> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> index f5b687a787a3..2373e01c57e7 100644
> --- a/include/linux/io_uring_types.h
> +++ b/include/linux/io_uring_types.h
> @@ -547,8 +547,12 @@ struct io_kiocb {
> union {
> /* used by request caches, completion batching and iopoll */
> struct io_wq_work_node comp_list;
> + struct {
> /* cache ->apoll->events */
> __poll_t apoll_events;
> + u8 poll_cancel_on_close:1;
> + struct list_head f_uring_poll_entry;
> + };
> };
> atomic_t refs;
> atomic_t poll_refs;
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index a2ce8ba7abb5..fe311667cb8c 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -276,6 +276,7 @@ enum io_uring_op {
> #define IORING_POLL_UPDATE_EVENTS (1U << 1)
> #define IORING_POLL_UPDATE_USER_DATA (1U << 2)
> #define IORING_POLL_ADD_LEVEL (1U << 3)
> +#define IORING_POLL_CANCEL_ON_CLOSE (1U << 4)
>
> /*
> * ASYNC_CANCEL flags.
> diff --git a/io_uring/opdef.c b/io_uring/opdef.c
> index 34b08c87ffa5..540ee55961a3 100644
> --- a/io_uring/opdef.c
> +++ b/io_uring/opdef.c
> @@ -131,6 +131,7 @@ const struct io_op_def io_op_defs[] = {
> .name = "POLL_ADD",
> .prep = io_poll_add_prep,
> .issue = io_poll_add,
> + .cleanup = io_poll_cleanup,
> },
> [IORING_OP_POLL_REMOVE] = {
> .audit_skip = 1,
> diff --git a/io_uring/poll.c b/io_uring/poll.c
> index 0d9f49c575e0..d4ccf2f2e815 100644
> --- a/io_uring/poll.c
> +++ b/io_uring/poll.c
> @@ -163,6 +163,19 @@ static inline void io_poll_remove_entry(struct io_poll *poll)
>
> static void io_poll_remove_entries(struct io_kiocb *req)
> {
> + if (!list_empty_careful(&req->f_uring_poll_entry)) {
> + spin_lock(&req->file->f_lock);
> + list_del_init_careful(&req->f_uring_poll_entry);
> + /*
> + * upgrade to a full reference again,
> + * it will be released in the common
> + * cleanup code via io_put_file().
> + */
> + if (!(req->flags & REQ_F_FIXED_FILE))
> + WARN_ON_ONCE(!get_file_rcu(req->file));
> + spin_unlock(&req->file->f_lock);
> + }
> +
> /*
> * Nothing to do if neither of those flags are set. Avoid dipping
> * into the poll/apoll/double cachelines if we can.
> @@ -199,6 +212,54 @@ enum {
> IOU_POLL_REMOVE_POLL_USE_RES = 2,
> };
>
> +static inline struct file *io_poll_get_additional_file_ref(struct io_kiocb *req,
> + unsigned issue_flags)
> +{
> + if (!(req->poll_cancel_on_close))
> + return NULL;
> +
> + if (unlikely(!req->file))
> + return NULL;
> +
> + req->flags |= REQ_F_NEED_CLEANUP;
> +
> + if (list_empty_careful(&req->f_uring_poll_entry)) {
> + /*
> + * This first time we need to add ourself to the
> + * file->f_uring_poll.
> + */
> + spin_lock(&req->file->f_lock);
> + list_add_tail(&req->f_uring_poll_entry, &req->file->f_uring_poll);
> + spin_unlock(&req->file->f_lock);
> + if (!(req->flags & REQ_F_FIXED_FILE)) {
> + /*
> + * If it's not a fixed file,
> + * we can allow the caller to drop the existing
> + * reference.
> + */
> + return req->file;
> + }
> + /*
> + * For fixed files we grab an additional reference
> + */
> + }
> +
> + io_ring_submit_lock(req->ctx, issue_flags);
> + if (unlikely(!req->file)) {
> + io_ring_submit_unlock(req->ctx, issue_flags);
> + return NULL;
> + }
> + rcu_read_lock();
> + if (unlikely(!get_file_rcu(req->file))) {
> + req->file = NULL;
> + req->cqe.fd = -1;
> + io_poll_mark_cancelled(req);
> + }
> + rcu_read_unlock();
> + io_ring_submit_unlock(req->ctx, issue_flags);
> + return req->file;
> +}
> +
> /*
> * All poll tw should go through this. Checks for poll events, manages
> * references, does rewait, etc.
> @@ -230,7 +291,12 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
> /* the mask was stashed in __io_poll_execute */
> if (!req->cqe.res) {
> struct poll_table_struct pt = { ._key = req->apoll_events };
> + unsigned issue_flags = (!*locked) ? IO_URING_F_UNLOCKED : 0;
> + struct file *file_to_put = io_poll_get_additional_file_ref(req, issue_flags);
> + if (unlikely(!req->file))
> + return -ECANCELED;
> req->cqe.res = vfs_poll(req->file, &pt) & req->apoll_events;
> + io_put_file(file_to_put);
> }
>
> if ((unlikely(!req->cqe.res)))
> @@ -499,6 +565,7 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
> unsigned issue_flags)
> {
> struct io_ring_ctx *ctx = req->ctx;
> + struct file *file_to_put;
> int v;
>
> INIT_HLIST_NODE(&req->hash_node);
> @@ -506,6 +573,7 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
> io_init_poll_iocb(poll, mask, io_poll_wake);
> poll->file = req->file;
> req->apoll_events = poll->events;
> + INIT_LIST_HEAD(&req->f_uring_poll_entry);
>
> ipt->pt._key = mask;
> ipt->req = req;
> @@ -529,7 +597,11 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
> if (issue_flags & IO_URING_F_UNLOCKED)
> req->flags &= ~REQ_F_HASH_LOCKED;
>
> + file_to_put = io_poll_get_additional_file_ref(req, issue_flags);
> + if (unlikely(!req->file))
> + return -ECANCELED;
> mask = vfs_poll(req->file, &ipt->pt) & poll->events;
> + io_put_file(file_to_put);
>
> if (unlikely(ipt->error || !ipt->nr_entries)) {
> io_poll_remove_entries(req);
> @@ -857,11 +929,17 @@ int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> if (sqe->buf_index || sqe->off || sqe->addr)
> return -EINVAL;
> flags = READ_ONCE(sqe->len);
> - if (flags & ~IORING_POLL_ADD_MULTI)
> + if (flags & ~(IORING_POLL_ADD_MULTI|IORING_POLL_CANCEL_ON_CLOSE))
> return -EINVAL;
> if ((flags & IORING_POLL_ADD_MULTI) && (req->flags & REQ_F_CQE_SKIP))
> return -EINVAL;
>
> + if (flags & IORING_POLL_CANCEL_ON_CLOSE) {
> + req->poll_cancel_on_close = 1;
> + } else {
> + req->poll_cancel_on_close = 0;
> + }
> +
> poll->events = io_poll_parse_events(sqe, flags);
> return 0;
> }
> @@ -963,3 +1041,23 @@ void io_apoll_cache_free(struct io_cache_entry *entry)
> {
> kfree(container_of(entry, struct async_poll, cache));
> }
> +
> +void io_uring_poll_release_file(struct file *file)
> +{
> + struct io_kiocb *req, *next;
> +
> + list_for_each_entry_safe(req, next, &file->f_uring_poll, f_uring_poll_entry) {
> + io_ring_submit_lock(req->ctx, IO_URING_F_UNLOCKED);
> + io_poll_mark_cancelled(req);
> + list_del_init_careful(&req->f_uring_poll_entry);
> + io_poll_remove_entries(req);
> + req->file = NULL;
> + io_poll_execute(req, 0);
> + io_ring_submit_unlock(req->ctx, IO_URING_F_UNLOCKED);
> + }
> +}
> +
> +void io_poll_cleanup(struct io_kiocb *req)
> +{
> + io_poll_remove_entries(req);
> +}
> diff --git a/io_uring/poll.h b/io_uring/poll.h
> index ef25c26fdaf8..43e6b877f1bc 100644
> --- a/io_uring/poll.h
> +++ b/io_uring/poll.h
> @@ -27,6 +27,7 @@ struct async_poll {
>
> int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
> int io_poll_add(struct io_kiocb *req, unsigned int issue_flags);
> +void io_poll_cleanup(struct io_kiocb *req);
>
> int io_poll_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
> int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags);
>
next prev parent reply other threads:[~2022-12-28 16:22 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-18 14:42 Problems replacing epoll with io_uring in tevent Stefan Metzmacher
2022-10-26 16:00 ` Stefan Metzmacher
2022-10-26 17:08 ` Jens Axboe
2022-10-26 17:41 ` Pavel Begunkov
2022-10-27 8:18 ` Stefan Metzmacher
2022-10-27 8:05 ` Stefan Metzmacher
2022-10-27 19:25 ` Stefan Metzmacher
2022-12-28 16:19 ` Stefan Metzmacher [this message]
2023-01-18 15:56 ` Jens Axboe
2023-02-01 20:29 ` Stefan Metzmacher
2022-10-27 8:51 ` Stefan Metzmacher
2022-10-27 12:12 ` Jens Axboe
2022-10-27 18:35 ` Stefan Metzmacher
2022-10-27 19:54 ` Stefan Metzmacher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox