* [PATCHSET next 0/6] Misc cleanups / optimizations @ 2024-02-06 16:22 Jens Axboe 2024-02-06 16:22 ` [PATCH 1/6] io_uring: expand main struct io_kiocb flags to 64-bits Jens Axboe ` (5 more replies) 0 siblings, 6 replies; 18+ messages in thread From: Jens Axboe @ 2024-02-06 16:22 UTC (permalink / raw) To: io-uring Hi, Nothing major in here: - Expand io_kiocb flags to 64-bits, so we can use two more bits for caching cancelation sequence and pollable state. - Misc cleanups -- Jens Axboe ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/6] io_uring: expand main struct io_kiocb flags to 64-bits 2024-02-06 16:22 [PATCHSET next 0/6] Misc cleanups / optimizations Jens Axboe @ 2024-02-06 16:22 ` Jens Axboe 2024-02-06 22:58 ` Jens Axboe 2024-02-07 0:43 ` Pavel Begunkov 2024-02-06 16:22 ` [PATCH 2/6] io_uring: add io_file_can_poll() helper Jens Axboe ` (4 subsequent siblings) 5 siblings, 2 replies; 18+ messages in thread From: Jens Axboe @ 2024-02-06 16:22 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe We're out of space here, and none of the flags are easily reclaimable. Bump it to 64-bits and re-arrange the struct a bit to avoid gaps. Add a specific bitwise type for the request flags, io_request_flags_t. This will help catch violations of casting this value to a smaller type on 32-bit archs, like unsigned int. No functional changes intended in this patch. Signed-off-by: Jens Axboe <[email protected]> --- include/linux/io_uring_types.h | 87 ++++++++++++++++++--------------- include/trace/events/io_uring.h | 14 +++--- io_uring/filetable.h | 2 +- io_uring/io_uring.c | 9 ++-- 4 files changed, 60 insertions(+), 52 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 854ad67a5f70..5ac18b05d4ee 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -428,7 +428,7 @@ struct io_tw_state { bool locked; }; -enum { +enum io_req_flags { REQ_F_FIXED_FILE_BIT = IOSQE_FIXED_FILE_BIT, REQ_F_IO_DRAIN_BIT = IOSQE_IO_DRAIN_BIT, REQ_F_LINK_BIT = IOSQE_IO_LINK_BIT, @@ -468,70 +468,73 @@ enum { __REQ_F_LAST_BIT, }; +typedef enum io_req_flags __bitwise io_req_flags_t; +#define IO_REQ_FLAG(bitno) ((__force io_req_flags_t) BIT_ULL((bitno))) + enum { /* ctx owns file */ - REQ_F_FIXED_FILE = BIT(REQ_F_FIXED_FILE_BIT), + REQ_F_FIXED_FILE = IO_REQ_FLAG(REQ_F_FIXED_FILE_BIT), /* drain existing IO first */ - REQ_F_IO_DRAIN = BIT(REQ_F_IO_DRAIN_BIT), + REQ_F_IO_DRAIN = IO_REQ_FLAG(REQ_F_IO_DRAIN_BIT), /* linked sqes */ - REQ_F_LINK = BIT(REQ_F_LINK_BIT), + REQ_F_LINK = IO_REQ_FLAG(REQ_F_LINK_BIT), /* doesn't sever on completion < 0 */ - REQ_F_HARDLINK = BIT(REQ_F_HARDLINK_BIT), + REQ_F_HARDLINK = IO_REQ_FLAG(REQ_F_HARDLINK_BIT), /* IOSQE_ASYNC */ - REQ_F_FORCE_ASYNC = BIT(REQ_F_FORCE_ASYNC_BIT), + REQ_F_FORCE_ASYNC = IO_REQ_FLAG(REQ_F_FORCE_ASYNC_BIT), /* IOSQE_BUFFER_SELECT */ - REQ_F_BUFFER_SELECT = BIT(REQ_F_BUFFER_SELECT_BIT), + REQ_F_BUFFER_SELECT = IO_REQ_FLAG(REQ_F_BUFFER_SELECT_BIT), /* IOSQE_CQE_SKIP_SUCCESS */ - REQ_F_CQE_SKIP = BIT(REQ_F_CQE_SKIP_BIT), + REQ_F_CQE_SKIP = IO_REQ_FLAG(REQ_F_CQE_SKIP_BIT), /* fail rest of links */ - REQ_F_FAIL = BIT(REQ_F_FAIL_BIT), + REQ_F_FAIL = IO_REQ_FLAG(REQ_F_FAIL_BIT), /* on inflight list, should be cancelled and waited on exit reliably */ - REQ_F_INFLIGHT = BIT(REQ_F_INFLIGHT_BIT), + REQ_F_INFLIGHT = IO_REQ_FLAG(REQ_F_INFLIGHT_BIT), /* read/write uses file position */ - REQ_F_CUR_POS = BIT(REQ_F_CUR_POS_BIT), + REQ_F_CUR_POS = IO_REQ_FLAG(REQ_F_CUR_POS_BIT), /* must not punt to workers */ - REQ_F_NOWAIT = BIT(REQ_F_NOWAIT_BIT), + REQ_F_NOWAIT = IO_REQ_FLAG(REQ_F_NOWAIT_BIT), /* has or had linked timeout */ - REQ_F_LINK_TIMEOUT = BIT(REQ_F_LINK_TIMEOUT_BIT), + REQ_F_LINK_TIMEOUT = IO_REQ_FLAG(REQ_F_LINK_TIMEOUT_BIT), /* needs cleanup */ - REQ_F_NEED_CLEANUP = BIT(REQ_F_NEED_CLEANUP_BIT), + REQ_F_NEED_CLEANUP = IO_REQ_FLAG(REQ_F_NEED_CLEANUP_BIT), /* already went through poll handler */ - REQ_F_POLLED = BIT(REQ_F_POLLED_BIT), + REQ_F_POLLED = IO_REQ_FLAG(REQ_F_POLLED_BIT), /* buffer already selected */ - REQ_F_BUFFER_SELECTED = BIT(REQ_F_BUFFER_SELECTED_BIT), + REQ_F_BUFFER_SELECTED = IO_REQ_FLAG(REQ_F_BUFFER_SELECTED_BIT), /* buffer selected from ring, needs commit */ - REQ_F_BUFFER_RING = BIT(REQ_F_BUFFER_RING_BIT), + REQ_F_BUFFER_RING = IO_REQ_FLAG(REQ_F_BUFFER_RING_BIT), /* caller should reissue async */ - REQ_F_REISSUE = BIT(REQ_F_REISSUE_BIT), + REQ_F_REISSUE = IO_REQ_FLAG(REQ_F_REISSUE_BIT), /* supports async reads/writes */ - REQ_F_SUPPORT_NOWAIT = BIT(REQ_F_SUPPORT_NOWAIT_BIT), + REQ_F_SUPPORT_NOWAIT = IO_REQ_FLAG(REQ_F_SUPPORT_NOWAIT_BIT), /* regular file */ - REQ_F_ISREG = BIT(REQ_F_ISREG_BIT), + REQ_F_ISREG = IO_REQ_FLAG(REQ_F_ISREG_BIT), /* has creds assigned */ - REQ_F_CREDS = BIT(REQ_F_CREDS_BIT), + REQ_F_CREDS = IO_REQ_FLAG(REQ_F_CREDS_BIT), /* skip refcounting if not set */ - REQ_F_REFCOUNT = BIT(REQ_F_REFCOUNT_BIT), + REQ_F_REFCOUNT = IO_REQ_FLAG(REQ_F_REFCOUNT_BIT), /* there is a linked timeout that has to be armed */ - REQ_F_ARM_LTIMEOUT = BIT(REQ_F_ARM_LTIMEOUT_BIT), + REQ_F_ARM_LTIMEOUT = IO_REQ_FLAG(REQ_F_ARM_LTIMEOUT_BIT), /* ->async_data allocated */ - REQ_F_ASYNC_DATA = BIT(REQ_F_ASYNC_DATA_BIT), + REQ_F_ASYNC_DATA = IO_REQ_FLAG(REQ_F_ASYNC_DATA_BIT), /* don't post CQEs while failing linked requests */ - REQ_F_SKIP_LINK_CQES = BIT(REQ_F_SKIP_LINK_CQES_BIT), + REQ_F_SKIP_LINK_CQES = IO_REQ_FLAG(REQ_F_SKIP_LINK_CQES_BIT), /* single poll may be active */ - REQ_F_SINGLE_POLL = BIT(REQ_F_SINGLE_POLL_BIT), + REQ_F_SINGLE_POLL = IO_REQ_FLAG(REQ_F_SINGLE_POLL_BIT), /* double poll may active */ - REQ_F_DOUBLE_POLL = BIT(REQ_F_DOUBLE_POLL_BIT), + REQ_F_DOUBLE_POLL = IO_REQ_FLAG(REQ_F_DOUBLE_POLL_BIT), /* request has already done partial IO */ - REQ_F_PARTIAL_IO = BIT(REQ_F_PARTIAL_IO_BIT), + REQ_F_PARTIAL_IO = IO_REQ_FLAG(REQ_F_PARTIAL_IO_BIT), /* fast poll multishot mode */ - REQ_F_APOLL_MULTISHOT = BIT(REQ_F_APOLL_MULTISHOT_BIT), + REQ_F_APOLL_MULTISHOT = IO_REQ_FLAG(REQ_F_APOLL_MULTISHOT_BIT), /* recvmsg special flag, clear EPOLLIN */ - REQ_F_CLEAR_POLLIN = BIT(REQ_F_CLEAR_POLLIN_BIT), + REQ_F_CLEAR_POLLIN = IO_REQ_FLAG(REQ_F_CLEAR_POLLIN_BIT), /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ - REQ_F_HASH_LOCKED = BIT(REQ_F_HASH_LOCKED_BIT), + REQ_F_HASH_LOCKED = IO_REQ_FLAG(REQ_F_HASH_LOCKED_BIT), /* don't use lazy poll wake for this request */ - REQ_F_POLL_NO_LAZY = BIT(REQ_F_POLL_NO_LAZY_BIT), + REQ_F_POLL_NO_LAZY = IO_REQ_FLAG(REQ_F_POLL_NO_LAZY_BIT), }; typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); @@ -592,15 +595,14 @@ struct io_kiocb { * and after selection it points to the buffer ID itself. */ u16 buf_index; - unsigned int flags; - struct io_cqe cqe; + atomic_t refs; + + io_req_flags_t flags; struct io_ring_ctx *ctx; struct task_struct *task; - struct io_rsrc_node *rsrc_node; - union { /* store used ubuf, so we can prevent reloading */ struct io_mapped_ubuf *imu; @@ -615,18 +617,23 @@ struct io_kiocb { struct io_buffer_list *buf_list; }; + /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */ + struct hlist_node hash_node; + union { /* used by request caches, completion batching and iopoll */ struct io_wq_work_node comp_list; /* cache ->apoll->events */ __poll_t apoll_events; }; - atomic_t refs; - atomic_t poll_refs; + + struct io_rsrc_node *rsrc_node; + + struct io_cqe cqe; + struct io_task_work io_task_work; + atomic_t poll_refs; unsigned nr_tw; - /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */ - struct hlist_node hash_node; /* internal polling, see IORING_FEAT_FAST_POLL */ struct async_poll *apoll; /* opcode allocated if it needs to store data for async defer */ diff --git a/include/trace/events/io_uring.h b/include/trace/events/io_uring.h index 69454f1f98b0..3d7704a52b73 100644 --- a/include/trace/events/io_uring.h +++ b/include/trace/events/io_uring.h @@ -148,7 +148,7 @@ TRACE_EVENT(io_uring_queue_async_work, __field( void *, req ) __field( u64, user_data ) __field( u8, opcode ) - __field( unsigned int, flags ) + __field( io_req_flags_t, flags ) __field( struct io_wq_work *, work ) __field( int, rw ) @@ -167,10 +167,10 @@ TRACE_EVENT(io_uring_queue_async_work, __assign_str(op_str, io_uring_get_opcode(req->opcode)); ), - TP_printk("ring %p, request %p, user_data 0x%llx, opcode %s, flags 0x%x, %s queue, work %p", + TP_printk("ring %p, request %p, user_data 0x%llx, opcode %s, flags 0x%lx, %s queue, work %p", __entry->ctx, __entry->req, __entry->user_data, - __get_str(op_str), - __entry->flags, __entry->rw ? "hashed" : "normal", __entry->work) + __get_str(op_str), (long) __entry->flags, + __entry->rw ? "hashed" : "normal", __entry->work) ); /** @@ -378,7 +378,7 @@ TRACE_EVENT(io_uring_submit_req, __field( void *, req ) __field( unsigned long long, user_data ) __field( u8, opcode ) - __field( u32, flags ) + __field( io_req_flags_t, flags ) __field( bool, sq_thread ) __string( op_str, io_uring_get_opcode(req->opcode) ) @@ -395,10 +395,10 @@ TRACE_EVENT(io_uring_submit_req, __assign_str(op_str, io_uring_get_opcode(req->opcode)); ), - TP_printk("ring %p, req %p, user_data 0x%llx, opcode %s, flags 0x%x, " + TP_printk("ring %p, req %p, user_data 0x%llx, opcode %s, flags 0x%lx, " "sq_thread %d", __entry->ctx, __entry->req, __entry->user_data, __get_str(op_str), - __entry->flags, __entry->sq_thread) + (long) __entry->flags, __entry->sq_thread) ); /* diff --git a/io_uring/filetable.h b/io_uring/filetable.h index b47adf170c31..b2435c4dca1f 100644 --- a/io_uring/filetable.h +++ b/io_uring/filetable.h @@ -17,7 +17,7 @@ int io_fixed_fd_remove(struct io_ring_ctx *ctx, unsigned int offset); int io_register_file_alloc_range(struct io_ring_ctx *ctx, struct io_uring_file_index_range __user *arg); -unsigned int io_file_get_flags(struct file *file); +io_req_flags_t io_file_get_flags(struct file *file); static inline void io_file_bitmap_clear(struct io_file_table *table, int bit) { diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index cd9a137ad6ce..360a7ee41d3a 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1768,9 +1768,9 @@ static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_flags) } } -unsigned int io_file_get_flags(struct file *file) +io_req_flags_t io_file_get_flags(struct file *file) { - unsigned int res = 0; + io_req_flags_t res = 0; if (S_ISREG(file_inode(file)->i_mode)) res |= REQ_F_ISREG; @@ -2171,7 +2171,8 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, /* req is partially pre-initialised, see io_preinit_req() */ req->opcode = opcode = READ_ONCE(sqe->opcode); /* same numerical values with corresponding REQ_F_*, safe to copy */ - req->flags = sqe_flags = READ_ONCE(sqe->flags); + sqe_flags = READ_ONCE(sqe->flags); + req->flags = (io_req_flags_t) sqe_flags; req->cqe.user_data = READ_ONCE(sqe->user_data); req->file = NULL; req->rsrc_node = NULL; @@ -4153,7 +4154,7 @@ static int __init io_uring_init(void) BUILD_BUG_ON(SQE_COMMON_FLAGS >= (1 << 8)); BUILD_BUG_ON((SQE_VALID_FLAGS | SQE_COMMON_FLAGS) != SQE_VALID_FLAGS); - BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(int)); + BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(u64)); BUILD_BUG_ON(sizeof(atomic_t) != sizeof(u32)); -- 2.43.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/6] io_uring: expand main struct io_kiocb flags to 64-bits 2024-02-06 16:22 ` [PATCH 1/6] io_uring: expand main struct io_kiocb flags to 64-bits Jens Axboe @ 2024-02-06 22:58 ` Jens Axboe 2024-02-07 0:43 ` Pavel Begunkov 1 sibling, 0 replies; 18+ messages in thread From: Jens Axboe @ 2024-02-06 22:58 UTC (permalink / raw) To: io-uring On 2/6/24 9:22 AM, Jens Axboe wrote: > We're out of space here, and none of the flags are easily reclaimable. > Bump it to 64-bits and re-arrange the struct a bit to avoid gaps. > > Add a specific bitwise type for the request flags, io_request_flags_t. > This will help catch violations of casting this value to a smaller type > on 32-bit archs, like unsigned int. > > No functional changes intended in this patch. Looks like I didn't run this through testing after making the 32-bit change, my bad. Here's v2 of it, no changes to the rest. diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 854ad67a5f70..5aef24ef467b 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -468,70 +468,73 @@ enum { __REQ_F_LAST_BIT, }; +typedef u64 __bitwise io_req_flags_t; +#define IO_REQ_FLAG(bitno) ((__force io_req_flags_t) BIT_ULL((bitno))) + enum { /* ctx owns file */ - REQ_F_FIXED_FILE = BIT(REQ_F_FIXED_FILE_BIT), + REQ_F_FIXED_FILE = IO_REQ_FLAG(REQ_F_FIXED_FILE_BIT), /* drain existing IO first */ - REQ_F_IO_DRAIN = BIT(REQ_F_IO_DRAIN_BIT), + REQ_F_IO_DRAIN = IO_REQ_FLAG(REQ_F_IO_DRAIN_BIT), /* linked sqes */ - REQ_F_LINK = BIT(REQ_F_LINK_BIT), + REQ_F_LINK = IO_REQ_FLAG(REQ_F_LINK_BIT), /* doesn't sever on completion < 0 */ - REQ_F_HARDLINK = BIT(REQ_F_HARDLINK_BIT), + REQ_F_HARDLINK = IO_REQ_FLAG(REQ_F_HARDLINK_BIT), /* IOSQE_ASYNC */ - REQ_F_FORCE_ASYNC = BIT(REQ_F_FORCE_ASYNC_BIT), + REQ_F_FORCE_ASYNC = IO_REQ_FLAG(REQ_F_FORCE_ASYNC_BIT), /* IOSQE_BUFFER_SELECT */ - REQ_F_BUFFER_SELECT = BIT(REQ_F_BUFFER_SELECT_BIT), + REQ_F_BUFFER_SELECT = IO_REQ_FLAG(REQ_F_BUFFER_SELECT_BIT), /* IOSQE_CQE_SKIP_SUCCESS */ - REQ_F_CQE_SKIP = BIT(REQ_F_CQE_SKIP_BIT), + REQ_F_CQE_SKIP = IO_REQ_FLAG(REQ_F_CQE_SKIP_BIT), /* fail rest of links */ - REQ_F_FAIL = BIT(REQ_F_FAIL_BIT), + REQ_F_FAIL = IO_REQ_FLAG(REQ_F_FAIL_BIT), /* on inflight list, should be cancelled and waited on exit reliably */ - REQ_F_INFLIGHT = BIT(REQ_F_INFLIGHT_BIT), + REQ_F_INFLIGHT = IO_REQ_FLAG(REQ_F_INFLIGHT_BIT), /* read/write uses file position */ - REQ_F_CUR_POS = BIT(REQ_F_CUR_POS_BIT), + REQ_F_CUR_POS = IO_REQ_FLAG(REQ_F_CUR_POS_BIT), /* must not punt to workers */ - REQ_F_NOWAIT = BIT(REQ_F_NOWAIT_BIT), + REQ_F_NOWAIT = IO_REQ_FLAG(REQ_F_NOWAIT_BIT), /* has or had linked timeout */ - REQ_F_LINK_TIMEOUT = BIT(REQ_F_LINK_TIMEOUT_BIT), + REQ_F_LINK_TIMEOUT = IO_REQ_FLAG(REQ_F_LINK_TIMEOUT_BIT), /* needs cleanup */ - REQ_F_NEED_CLEANUP = BIT(REQ_F_NEED_CLEANUP_BIT), + REQ_F_NEED_CLEANUP = IO_REQ_FLAG(REQ_F_NEED_CLEANUP_BIT), /* already went through poll handler */ - REQ_F_POLLED = BIT(REQ_F_POLLED_BIT), + REQ_F_POLLED = IO_REQ_FLAG(REQ_F_POLLED_BIT), /* buffer already selected */ - REQ_F_BUFFER_SELECTED = BIT(REQ_F_BUFFER_SELECTED_BIT), + REQ_F_BUFFER_SELECTED = IO_REQ_FLAG(REQ_F_BUFFER_SELECTED_BIT), /* buffer selected from ring, needs commit */ - REQ_F_BUFFER_RING = BIT(REQ_F_BUFFER_RING_BIT), + REQ_F_BUFFER_RING = IO_REQ_FLAG(REQ_F_BUFFER_RING_BIT), /* caller should reissue async */ - REQ_F_REISSUE = BIT(REQ_F_REISSUE_BIT), + REQ_F_REISSUE = IO_REQ_FLAG(REQ_F_REISSUE_BIT), /* supports async reads/writes */ - REQ_F_SUPPORT_NOWAIT = BIT(REQ_F_SUPPORT_NOWAIT_BIT), + REQ_F_SUPPORT_NOWAIT = IO_REQ_FLAG(REQ_F_SUPPORT_NOWAIT_BIT), /* regular file */ - REQ_F_ISREG = BIT(REQ_F_ISREG_BIT), + REQ_F_ISREG = IO_REQ_FLAG(REQ_F_ISREG_BIT), /* has creds assigned */ - REQ_F_CREDS = BIT(REQ_F_CREDS_BIT), + REQ_F_CREDS = IO_REQ_FLAG(REQ_F_CREDS_BIT), /* skip refcounting if not set */ - REQ_F_REFCOUNT = BIT(REQ_F_REFCOUNT_BIT), + REQ_F_REFCOUNT = IO_REQ_FLAG(REQ_F_REFCOUNT_BIT), /* there is a linked timeout that has to be armed */ - REQ_F_ARM_LTIMEOUT = BIT(REQ_F_ARM_LTIMEOUT_BIT), + REQ_F_ARM_LTIMEOUT = IO_REQ_FLAG(REQ_F_ARM_LTIMEOUT_BIT), /* ->async_data allocated */ - REQ_F_ASYNC_DATA = BIT(REQ_F_ASYNC_DATA_BIT), + REQ_F_ASYNC_DATA = IO_REQ_FLAG(REQ_F_ASYNC_DATA_BIT), /* don't post CQEs while failing linked requests */ - REQ_F_SKIP_LINK_CQES = BIT(REQ_F_SKIP_LINK_CQES_BIT), + REQ_F_SKIP_LINK_CQES = IO_REQ_FLAG(REQ_F_SKIP_LINK_CQES_BIT), /* single poll may be active */ - REQ_F_SINGLE_POLL = BIT(REQ_F_SINGLE_POLL_BIT), + REQ_F_SINGLE_POLL = IO_REQ_FLAG(REQ_F_SINGLE_POLL_BIT), /* double poll may active */ - REQ_F_DOUBLE_POLL = BIT(REQ_F_DOUBLE_POLL_BIT), + REQ_F_DOUBLE_POLL = IO_REQ_FLAG(REQ_F_DOUBLE_POLL_BIT), /* request has already done partial IO */ - REQ_F_PARTIAL_IO = BIT(REQ_F_PARTIAL_IO_BIT), + REQ_F_PARTIAL_IO = IO_REQ_FLAG(REQ_F_PARTIAL_IO_BIT), /* fast poll multishot mode */ - REQ_F_APOLL_MULTISHOT = BIT(REQ_F_APOLL_MULTISHOT_BIT), + REQ_F_APOLL_MULTISHOT = IO_REQ_FLAG(REQ_F_APOLL_MULTISHOT_BIT), /* recvmsg special flag, clear EPOLLIN */ - REQ_F_CLEAR_POLLIN = BIT(REQ_F_CLEAR_POLLIN_BIT), + REQ_F_CLEAR_POLLIN = IO_REQ_FLAG(REQ_F_CLEAR_POLLIN_BIT), /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ - REQ_F_HASH_LOCKED = BIT(REQ_F_HASH_LOCKED_BIT), + REQ_F_HASH_LOCKED = IO_REQ_FLAG(REQ_F_HASH_LOCKED_BIT), /* don't use lazy poll wake for this request */ - REQ_F_POLL_NO_LAZY = BIT(REQ_F_POLL_NO_LAZY_BIT), + REQ_F_POLL_NO_LAZY = IO_REQ_FLAG(REQ_F_POLL_NO_LAZY_BIT), }; typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); @@ -592,15 +595,14 @@ struct io_kiocb { * and after selection it points to the buffer ID itself. */ u16 buf_index; - unsigned int flags; - struct io_cqe cqe; + atomic_t refs; + + io_req_flags_t flags; struct io_ring_ctx *ctx; struct task_struct *task; - struct io_rsrc_node *rsrc_node; - union { /* store used ubuf, so we can prevent reloading */ struct io_mapped_ubuf *imu; @@ -615,18 +617,23 @@ struct io_kiocb { struct io_buffer_list *buf_list; }; + /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */ + struct hlist_node hash_node; + union { /* used by request caches, completion batching and iopoll */ struct io_wq_work_node comp_list; /* cache ->apoll->events */ __poll_t apoll_events; }; - atomic_t refs; - atomic_t poll_refs; + + struct io_rsrc_node *rsrc_node; + + struct io_cqe cqe; + struct io_task_work io_task_work; + atomic_t poll_refs; unsigned nr_tw; - /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */ - struct hlist_node hash_node; /* internal polling, see IORING_FEAT_FAST_POLL */ struct async_poll *apoll; /* opcode allocated if it needs to store data for async defer */ diff --git a/include/trace/events/io_uring.h b/include/trace/events/io_uring.h index 69454f1f98b0..3d7704a52b73 100644 --- a/include/trace/events/io_uring.h +++ b/include/trace/events/io_uring.h @@ -148,7 +148,7 @@ TRACE_EVENT(io_uring_queue_async_work, __field( void *, req ) __field( u64, user_data ) __field( u8, opcode ) - __field( unsigned int, flags ) + __field( io_req_flags_t, flags ) __field( struct io_wq_work *, work ) __field( int, rw ) @@ -167,10 +167,10 @@ TRACE_EVENT(io_uring_queue_async_work, __assign_str(op_str, io_uring_get_opcode(req->opcode)); ), - TP_printk("ring %p, request %p, user_data 0x%llx, opcode %s, flags 0x%x, %s queue, work %p", + TP_printk("ring %p, request %p, user_data 0x%llx, opcode %s, flags 0x%lx, %s queue, work %p", __entry->ctx, __entry->req, __entry->user_data, - __get_str(op_str), - __entry->flags, __entry->rw ? "hashed" : "normal", __entry->work) + __get_str(op_str), (long) __entry->flags, + __entry->rw ? "hashed" : "normal", __entry->work) ); /** @@ -378,7 +378,7 @@ TRACE_EVENT(io_uring_submit_req, __field( void *, req ) __field( unsigned long long, user_data ) __field( u8, opcode ) - __field( u32, flags ) + __field( io_req_flags_t, flags ) __field( bool, sq_thread ) __string( op_str, io_uring_get_opcode(req->opcode) ) @@ -395,10 +395,10 @@ TRACE_EVENT(io_uring_submit_req, __assign_str(op_str, io_uring_get_opcode(req->opcode)); ), - TP_printk("ring %p, req %p, user_data 0x%llx, opcode %s, flags 0x%x, " + TP_printk("ring %p, req %p, user_data 0x%llx, opcode %s, flags 0x%lx, " "sq_thread %d", __entry->ctx, __entry->req, __entry->user_data, __get_str(op_str), - __entry->flags, __entry->sq_thread) + (long) __entry->flags, __entry->sq_thread) ); /* diff --git a/io_uring/filetable.h b/io_uring/filetable.h index b47adf170c31..b2435c4dca1f 100644 --- a/io_uring/filetable.h +++ b/io_uring/filetable.h @@ -17,7 +17,7 @@ int io_fixed_fd_remove(struct io_ring_ctx *ctx, unsigned int offset); int io_register_file_alloc_range(struct io_ring_ctx *ctx, struct io_uring_file_index_range __user *arg); -unsigned int io_file_get_flags(struct file *file); +io_req_flags_t io_file_get_flags(struct file *file); static inline void io_file_bitmap_clear(struct io_file_table *table, int bit) { diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index cd9a137ad6ce..360a7ee41d3a 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1768,9 +1768,9 @@ static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_flags) } } -unsigned int io_file_get_flags(struct file *file) +io_req_flags_t io_file_get_flags(struct file *file) { - unsigned int res = 0; + io_req_flags_t res = 0; if (S_ISREG(file_inode(file)->i_mode)) res |= REQ_F_ISREG; @@ -2171,7 +2171,8 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, /* req is partially pre-initialised, see io_preinit_req() */ req->opcode = opcode = READ_ONCE(sqe->opcode); /* same numerical values with corresponding REQ_F_*, safe to copy */ - req->flags = sqe_flags = READ_ONCE(sqe->flags); + sqe_flags = READ_ONCE(sqe->flags); + req->flags = (io_req_flags_t) sqe_flags; req->cqe.user_data = READ_ONCE(sqe->user_data); req->file = NULL; req->rsrc_node = NULL; @@ -4153,7 +4154,7 @@ static int __init io_uring_init(void) BUILD_BUG_ON(SQE_COMMON_FLAGS >= (1 << 8)); BUILD_BUG_ON((SQE_VALID_FLAGS | SQE_COMMON_FLAGS) != SQE_VALID_FLAGS); - BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(int)); + BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(u64)); BUILD_BUG_ON(sizeof(atomic_t) != sizeof(u32)); -- Jens Axboe ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/6] io_uring: expand main struct io_kiocb flags to 64-bits 2024-02-06 16:22 ` [PATCH 1/6] io_uring: expand main struct io_kiocb flags to 64-bits Jens Axboe 2024-02-06 22:58 ` Jens Axboe @ 2024-02-07 0:43 ` Pavel Begunkov 2024-02-07 2:18 ` Jens Axboe 1 sibling, 1 reply; 18+ messages in thread From: Pavel Begunkov @ 2024-02-07 0:43 UTC (permalink / raw) To: Jens Axboe, io-uring On 2/6/24 16:22, Jens Axboe wrote: > We're out of space here, and none of the flags are easily reclaimable. > Bump it to 64-bits and re-arrange the struct a bit to avoid gaps. > > Add a specific bitwise type for the request flags, io_request_flags_t. > This will help catch violations of casting this value to a smaller type > on 32-bit archs, like unsigned int. > > No functional changes intended in this patch. > > Signed-off-by: Jens Axboe <[email protected]> > --- > include/linux/io_uring_types.h | 87 ++++++++++++++++++--------------- > include/trace/events/io_uring.h | 14 +++--- > io_uring/filetable.h | 2 +- > io_uring/io_uring.c | 9 ++-- > 4 files changed, 60 insertions(+), 52 deletions(-) > > diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h > index 854ad67a5f70..5ac18b05d4ee 100644 > --- a/include/linux/io_uring_types.h > +++ b/include/linux/io_uring_types.h > @@ -428,7 +428,7 @@ struct io_tw_state { > bool locked; > }; > > -enum { > +enum io_req_flags { > REQ_F_FIXED_FILE_BIT = IOSQE_FIXED_FILE_BIT, > REQ_F_IO_DRAIN_BIT = IOSQE_IO_DRAIN_BIT, > REQ_F_LINK_BIT = IOSQE_IO_LINK_BIT, > @@ -468,70 +468,73 @@ enum { > __REQ_F_LAST_BIT, > }; > > +typedef enum io_req_flags __bitwise io_req_flags_t; > +#define IO_REQ_FLAG(bitno) ((__force io_req_flags_t) BIT_ULL((bitno))) > + > enum { > /* ctx owns file */ > - REQ_F_FIXED_FILE = BIT(REQ_F_FIXED_FILE_BIT), > + REQ_F_FIXED_FILE = IO_REQ_FLAG(REQ_F_FIXED_FILE_BIT), > /* drain existing IO first */ > - REQ_F_IO_DRAIN = BIT(REQ_F_IO_DRAIN_BIT), > + REQ_F_IO_DRAIN = IO_REQ_FLAG(REQ_F_IO_DRAIN_BIT), > /* linked sqes */ > - REQ_F_LINK = BIT(REQ_F_LINK_BIT), > + REQ_F_LINK = IO_REQ_FLAG(REQ_F_LINK_BIT), > /* doesn't sever on completion < 0 */ > - REQ_F_HARDLINK = BIT(REQ_F_HARDLINK_BIT), > + REQ_F_HARDLINK = IO_REQ_FLAG(REQ_F_HARDLINK_BIT), > /* IOSQE_ASYNC */ > - REQ_F_FORCE_ASYNC = BIT(REQ_F_FORCE_ASYNC_BIT), > + REQ_F_FORCE_ASYNC = IO_REQ_FLAG(REQ_F_FORCE_ASYNC_BIT), > /* IOSQE_BUFFER_SELECT */ > - REQ_F_BUFFER_SELECT = BIT(REQ_F_BUFFER_SELECT_BIT), > + REQ_F_BUFFER_SELECT = IO_REQ_FLAG(REQ_F_BUFFER_SELECT_BIT), > /* IOSQE_CQE_SKIP_SUCCESS */ > - REQ_F_CQE_SKIP = BIT(REQ_F_CQE_SKIP_BIT), > + REQ_F_CQE_SKIP = IO_REQ_FLAG(REQ_F_CQE_SKIP_BIT), > > /* fail rest of links */ > - REQ_F_FAIL = BIT(REQ_F_FAIL_BIT), > + REQ_F_FAIL = IO_REQ_FLAG(REQ_F_FAIL_BIT), > /* on inflight list, should be cancelled and waited on exit reliably */ > - REQ_F_INFLIGHT = BIT(REQ_F_INFLIGHT_BIT), > + REQ_F_INFLIGHT = IO_REQ_FLAG(REQ_F_INFLIGHT_BIT), > /* read/write uses file position */ > - REQ_F_CUR_POS = BIT(REQ_F_CUR_POS_BIT), > + REQ_F_CUR_POS = IO_REQ_FLAG(REQ_F_CUR_POS_BIT), > /* must not punt to workers */ > - REQ_F_NOWAIT = BIT(REQ_F_NOWAIT_BIT), > + REQ_F_NOWAIT = IO_REQ_FLAG(REQ_F_NOWAIT_BIT), > /* has or had linked timeout */ > - REQ_F_LINK_TIMEOUT = BIT(REQ_F_LINK_TIMEOUT_BIT), > + REQ_F_LINK_TIMEOUT = IO_REQ_FLAG(REQ_F_LINK_TIMEOUT_BIT), > /* needs cleanup */ > - REQ_F_NEED_CLEANUP = BIT(REQ_F_NEED_CLEANUP_BIT), > + REQ_F_NEED_CLEANUP = IO_REQ_FLAG(REQ_F_NEED_CLEANUP_BIT), > /* already went through poll handler */ > - REQ_F_POLLED = BIT(REQ_F_POLLED_BIT), > + REQ_F_POLLED = IO_REQ_FLAG(REQ_F_POLLED_BIT), > /* buffer already selected */ > - REQ_F_BUFFER_SELECTED = BIT(REQ_F_BUFFER_SELECTED_BIT), > + REQ_F_BUFFER_SELECTED = IO_REQ_FLAG(REQ_F_BUFFER_SELECTED_BIT), > /* buffer selected from ring, needs commit */ > - REQ_F_BUFFER_RING = BIT(REQ_F_BUFFER_RING_BIT), > + REQ_F_BUFFER_RING = IO_REQ_FLAG(REQ_F_BUFFER_RING_BIT), > /* caller should reissue async */ > - REQ_F_REISSUE = BIT(REQ_F_REISSUE_BIT), > + REQ_F_REISSUE = IO_REQ_FLAG(REQ_F_REISSUE_BIT), > /* supports async reads/writes */ > - REQ_F_SUPPORT_NOWAIT = BIT(REQ_F_SUPPORT_NOWAIT_BIT), > + REQ_F_SUPPORT_NOWAIT = IO_REQ_FLAG(REQ_F_SUPPORT_NOWAIT_BIT), > /* regular file */ > - REQ_F_ISREG = BIT(REQ_F_ISREG_BIT), > + REQ_F_ISREG = IO_REQ_FLAG(REQ_F_ISREG_BIT), > /* has creds assigned */ > - REQ_F_CREDS = BIT(REQ_F_CREDS_BIT), > + REQ_F_CREDS = IO_REQ_FLAG(REQ_F_CREDS_BIT), > /* skip refcounting if not set */ > - REQ_F_REFCOUNT = BIT(REQ_F_REFCOUNT_BIT), > + REQ_F_REFCOUNT = IO_REQ_FLAG(REQ_F_REFCOUNT_BIT), > /* there is a linked timeout that has to be armed */ > - REQ_F_ARM_LTIMEOUT = BIT(REQ_F_ARM_LTIMEOUT_BIT), > + REQ_F_ARM_LTIMEOUT = IO_REQ_FLAG(REQ_F_ARM_LTIMEOUT_BIT), > /* ->async_data allocated */ > - REQ_F_ASYNC_DATA = BIT(REQ_F_ASYNC_DATA_BIT), > + REQ_F_ASYNC_DATA = IO_REQ_FLAG(REQ_F_ASYNC_DATA_BIT), > /* don't post CQEs while failing linked requests */ > - REQ_F_SKIP_LINK_CQES = BIT(REQ_F_SKIP_LINK_CQES_BIT), > + REQ_F_SKIP_LINK_CQES = IO_REQ_FLAG(REQ_F_SKIP_LINK_CQES_BIT), > /* single poll may be active */ > - REQ_F_SINGLE_POLL = BIT(REQ_F_SINGLE_POLL_BIT), > + REQ_F_SINGLE_POLL = IO_REQ_FLAG(REQ_F_SINGLE_POLL_BIT), > /* double poll may active */ > - REQ_F_DOUBLE_POLL = BIT(REQ_F_DOUBLE_POLL_BIT), > + REQ_F_DOUBLE_POLL = IO_REQ_FLAG(REQ_F_DOUBLE_POLL_BIT), > /* request has already done partial IO */ > - REQ_F_PARTIAL_IO = BIT(REQ_F_PARTIAL_IO_BIT), > + REQ_F_PARTIAL_IO = IO_REQ_FLAG(REQ_F_PARTIAL_IO_BIT), > /* fast poll multishot mode */ > - REQ_F_APOLL_MULTISHOT = BIT(REQ_F_APOLL_MULTISHOT_BIT), > + REQ_F_APOLL_MULTISHOT = IO_REQ_FLAG(REQ_F_APOLL_MULTISHOT_BIT), > /* recvmsg special flag, clear EPOLLIN */ > - REQ_F_CLEAR_POLLIN = BIT(REQ_F_CLEAR_POLLIN_BIT), > + REQ_F_CLEAR_POLLIN = IO_REQ_FLAG(REQ_F_CLEAR_POLLIN_BIT), > /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ > - REQ_F_HASH_LOCKED = BIT(REQ_F_HASH_LOCKED_BIT), > + REQ_F_HASH_LOCKED = IO_REQ_FLAG(REQ_F_HASH_LOCKED_BIT), > /* don't use lazy poll wake for this request */ > - REQ_F_POLL_NO_LAZY = BIT(REQ_F_POLL_NO_LAZY_BIT), > + REQ_F_POLL_NO_LAZY = IO_REQ_FLAG(REQ_F_POLL_NO_LAZY_BIT), > }; > > typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); > @@ -592,15 +595,14 @@ struct io_kiocb { > * and after selection it points to the buffer ID itself. > */ > u16 buf_index; > - unsigned int flags; > > - struct io_cqe cqe; With the current layout the min number of lines we touch per request is 2 (including the op specific 64B), that's includes setting up cqe at init and using it for completing. Moving cqe down makes it 3. > + atomic_t refs; We're pulling it refs, which is not touched at all in the hot path. Even if there's a hole I'd argue it's better to leave it at the end. > + > + io_req_flags_t flags; > > struct io_ring_ctx *ctx; > struct task_struct *task; > > - struct io_rsrc_node *rsrc_node; It's used in hot paths, registered buffers/files, would be unfortunate to move it to the next line. > - > union { > /* store used ubuf, so we can prevent reloading */ > struct io_mapped_ubuf *imu; > @@ -615,18 +617,23 @@ struct io_kiocb { > struct io_buffer_list *buf_list; > }; > > + /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */ > + struct hlist_node hash_node; > + And we're pulling hash_node into the hottest line, which is used only when we arm a poll and remove poll. So, it's mostly for networking, sends wouldn't use it much, and multishots wouldn't normally touch it. As for ideas how to find space: 1) iopoll_completed completed can be converted to flags2 2) REQ_F_{SINGLE,DOUBLE}_POLL is a weird duplication. Can probably be combined into one flag, or removed at all. Again, sends are usually not so poll heavy and the hot path for recv is multishot. 3) we can probably move req->task down and replace it with get_task() { if (req->ctx->flags & DEFER_TASKRUN) task = ctx->submitter_task; else task = req->task; } The most common user of it -- task_work_add -- already checks the flag and has separate paths, and init/completion paths can be optimised. > union { > /* used by request caches, completion batching and iopoll */ > struct io_wq_work_node comp_list; > /* cache ->apoll->events */ > __poll_t apoll_events; > }; > - atomic_t refs; > - atomic_t poll_refs; > + > + struct io_rsrc_node *rsrc_node; > + > + struct io_cqe cqe; > + > struct io_task_work io_task_work; > + atomic_t poll_refs; > unsigned nr_tw; > - /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */ > - struct hlist_node hash_node; > /* internal polling, see IORING_FEAT_FAST_POLL */ > struct async_poll *apoll; > /* opcode allocated if it needs to store data for async defer */ > diff --git a/include/trace/events/io_uring.h b/include/trace/events/io_uring.h > index 69454f1f98b0..3d7704a52b73 100644 > --- a/include/trace/events/io_uring.h > +++ b/include/trace/events/io_uring.h > @@ -148,7 +148,7 @@ TRACE_EVENT(io_uring_queue_async_work, > __field( void *, req ) > __field( u64, user_data ) > __field( u8, opcode ) > - __field( unsigned int, flags ) > + __field( io_req_flags_t, flags ) > __field( struct io_wq_work *, work ) > __field( int, rw ) > > @@ -167,10 +167,10 @@ TRACE_EVENT(io_uring_queue_async_work, > __assign_str(op_str, io_uring_get_opcode(req->opcode)); > ), > > - TP_printk("ring %p, request %p, user_data 0x%llx, opcode %s, flags 0x%x, %s queue, work %p", > + TP_printk("ring %p, request %p, user_data 0x%llx, opcode %s, flags 0x%lx, %s queue, work %p", > __entry->ctx, __entry->req, __entry->user_data, > - __get_str(op_str), > - __entry->flags, __entry->rw ? "hashed" : "normal", __entry->work) > + __get_str(op_str), (long) __entry->flags, > + __entry->rw ? "hashed" : "normal", __entry->work) > ); > > /** > @@ -378,7 +378,7 @@ TRACE_EVENT(io_uring_submit_req, > __field( void *, req ) > __field( unsigned long long, user_data ) > __field( u8, opcode ) > - __field( u32, flags ) > + __field( io_req_flags_t, flags ) > __field( bool, sq_thread ) > > __string( op_str, io_uring_get_opcode(req->opcode) ) > @@ -395,10 +395,10 @@ TRACE_EVENT(io_uring_submit_req, > __assign_str(op_str, io_uring_get_opcode(req->opcode)); > ), > > - TP_printk("ring %p, req %p, user_data 0x%llx, opcode %s, flags 0x%x, " > + TP_printk("ring %p, req %p, user_data 0x%llx, opcode %s, flags 0x%lx, " > "sq_thread %d", __entry->ctx, __entry->req, > __entry->user_data, __get_str(op_str), > - __entry->flags, __entry->sq_thread) > + (long) __entry->flags, __entry->sq_thread) > ); > > /* > diff --git a/io_uring/filetable.h b/io_uring/filetable.h > index b47adf170c31..b2435c4dca1f 100644 > --- a/io_uring/filetable.h > +++ b/io_uring/filetable.h > @@ -17,7 +17,7 @@ int io_fixed_fd_remove(struct io_ring_ctx *ctx, unsigned int offset); > int io_register_file_alloc_range(struct io_ring_ctx *ctx, > struct io_uring_file_index_range __user *arg); > > -unsigned int io_file_get_flags(struct file *file); > +io_req_flags_t io_file_get_flags(struct file *file); > > static inline void io_file_bitmap_clear(struct io_file_table *table, int bit) > { > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > index cd9a137ad6ce..360a7ee41d3a 100644 > --- a/io_uring/io_uring.c > +++ b/io_uring/io_uring.c > @@ -1768,9 +1768,9 @@ static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_flags) > } > } > > -unsigned int io_file_get_flags(struct file *file) > +io_req_flags_t io_file_get_flags(struct file *file) > { > - unsigned int res = 0; > + io_req_flags_t res = 0; > > if (S_ISREG(file_inode(file)->i_mode)) > res |= REQ_F_ISREG; > @@ -2171,7 +2171,8 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, > /* req is partially pre-initialised, see io_preinit_req() */ > req->opcode = opcode = READ_ONCE(sqe->opcode); > /* same numerical values with corresponding REQ_F_*, safe to copy */ > - req->flags = sqe_flags = READ_ONCE(sqe->flags); > + sqe_flags = READ_ONCE(sqe->flags); > + req->flags = (io_req_flags_t) sqe_flags; > req->cqe.user_data = READ_ONCE(sqe->user_data); > req->file = NULL; > req->rsrc_node = NULL; > @@ -4153,7 +4154,7 @@ static int __init io_uring_init(void) > BUILD_BUG_ON(SQE_COMMON_FLAGS >= (1 << 8)); > BUILD_BUG_ON((SQE_VALID_FLAGS | SQE_COMMON_FLAGS) != SQE_VALID_FLAGS); > > - BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(int)); > + BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(u64)); > > BUILD_BUG_ON(sizeof(atomic_t) != sizeof(u32)); > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/6] io_uring: expand main struct io_kiocb flags to 64-bits 2024-02-07 0:43 ` Pavel Begunkov @ 2024-02-07 2:18 ` Jens Axboe 2024-02-07 3:22 ` Pavel Begunkov 0 siblings, 1 reply; 18+ messages in thread From: Jens Axboe @ 2024-02-07 2:18 UTC (permalink / raw) To: Pavel Begunkov, io-uring On 2/6/24 5:43 PM, Pavel Begunkov wrote: > On 2/6/24 16:22, Jens Axboe wrote: >> We're out of space here, and none of the flags are easily reclaimable. >> Bump it to 64-bits and re-arrange the struct a bit to avoid gaps. >> >> Add a specific bitwise type for the request flags, io_request_flags_t. >> This will help catch violations of casting this value to a smaller type >> on 32-bit archs, like unsigned int. >> >> No functional changes intended in this patch. >> >> Signed-off-by: Jens Axboe <[email protected]> >> --- >> include/linux/io_uring_types.h | 87 ++++++++++++++++++--------------- >> include/trace/events/io_uring.h | 14 +++--- >> io_uring/filetable.h | 2 +- >> io_uring/io_uring.c | 9 ++-- >> 4 files changed, 60 insertions(+), 52 deletions(-) >> >> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h >> index 854ad67a5f70..5ac18b05d4ee 100644 >> --- a/include/linux/io_uring_types.h >> +++ b/include/linux/io_uring_types.h >> @@ -428,7 +428,7 @@ struct io_tw_state { >> bool locked; >> }; >> -enum { >> +enum io_req_flags { >> REQ_F_FIXED_FILE_BIT = IOSQE_FIXED_FILE_BIT, >> REQ_F_IO_DRAIN_BIT = IOSQE_IO_DRAIN_BIT, >> REQ_F_LINK_BIT = IOSQE_IO_LINK_BIT, >> @@ -468,70 +468,73 @@ enum { >> __REQ_F_LAST_BIT, >> }; >> +typedef enum io_req_flags __bitwise io_req_flags_t; >> +#define IO_REQ_FLAG(bitno) ((__force io_req_flags_t) BIT_ULL((bitno))) >> + >> enum { >> /* ctx owns file */ >> - REQ_F_FIXED_FILE = BIT(REQ_F_FIXED_FILE_BIT), >> + REQ_F_FIXED_FILE = IO_REQ_FLAG(REQ_F_FIXED_FILE_BIT), >> /* drain existing IO first */ >> - REQ_F_IO_DRAIN = BIT(REQ_F_IO_DRAIN_BIT), >> + REQ_F_IO_DRAIN = IO_REQ_FLAG(REQ_F_IO_DRAIN_BIT), >> /* linked sqes */ >> - REQ_F_LINK = BIT(REQ_F_LINK_BIT), >> + REQ_F_LINK = IO_REQ_FLAG(REQ_F_LINK_BIT), >> /* doesn't sever on completion < 0 */ >> - REQ_F_HARDLINK = BIT(REQ_F_HARDLINK_BIT), >> + REQ_F_HARDLINK = IO_REQ_FLAG(REQ_F_HARDLINK_BIT), >> /* IOSQE_ASYNC */ >> - REQ_F_FORCE_ASYNC = BIT(REQ_F_FORCE_ASYNC_BIT), >> + REQ_F_FORCE_ASYNC = IO_REQ_FLAG(REQ_F_FORCE_ASYNC_BIT), >> /* IOSQE_BUFFER_SELECT */ >> - REQ_F_BUFFER_SELECT = BIT(REQ_F_BUFFER_SELECT_BIT), >> + REQ_F_BUFFER_SELECT = IO_REQ_FLAG(REQ_F_BUFFER_SELECT_BIT), >> /* IOSQE_CQE_SKIP_SUCCESS */ >> - REQ_F_CQE_SKIP = BIT(REQ_F_CQE_SKIP_BIT), >> + REQ_F_CQE_SKIP = IO_REQ_FLAG(REQ_F_CQE_SKIP_BIT), >> /* fail rest of links */ >> - REQ_F_FAIL = BIT(REQ_F_FAIL_BIT), >> + REQ_F_FAIL = IO_REQ_FLAG(REQ_F_FAIL_BIT), >> /* on inflight list, should be cancelled and waited on exit reliably */ >> - REQ_F_INFLIGHT = BIT(REQ_F_INFLIGHT_BIT), >> + REQ_F_INFLIGHT = IO_REQ_FLAG(REQ_F_INFLIGHT_BIT), >> /* read/write uses file position */ >> - REQ_F_CUR_POS = BIT(REQ_F_CUR_POS_BIT), >> + REQ_F_CUR_POS = IO_REQ_FLAG(REQ_F_CUR_POS_BIT), >> /* must not punt to workers */ >> - REQ_F_NOWAIT = BIT(REQ_F_NOWAIT_BIT), >> + REQ_F_NOWAIT = IO_REQ_FLAG(REQ_F_NOWAIT_BIT), >> /* has or had linked timeout */ >> - REQ_F_LINK_TIMEOUT = BIT(REQ_F_LINK_TIMEOUT_BIT), >> + REQ_F_LINK_TIMEOUT = IO_REQ_FLAG(REQ_F_LINK_TIMEOUT_BIT), >> /* needs cleanup */ >> - REQ_F_NEED_CLEANUP = BIT(REQ_F_NEED_CLEANUP_BIT), >> + REQ_F_NEED_CLEANUP = IO_REQ_FLAG(REQ_F_NEED_CLEANUP_BIT), >> /* already went through poll handler */ >> - REQ_F_POLLED = BIT(REQ_F_POLLED_BIT), >> + REQ_F_POLLED = IO_REQ_FLAG(REQ_F_POLLED_BIT), >> /* buffer already selected */ >> - REQ_F_BUFFER_SELECTED = BIT(REQ_F_BUFFER_SELECTED_BIT), >> + REQ_F_BUFFER_SELECTED = IO_REQ_FLAG(REQ_F_BUFFER_SELECTED_BIT), >> /* buffer selected from ring, needs commit */ >> - REQ_F_BUFFER_RING = BIT(REQ_F_BUFFER_RING_BIT), >> + REQ_F_BUFFER_RING = IO_REQ_FLAG(REQ_F_BUFFER_RING_BIT), >> /* caller should reissue async */ >> - REQ_F_REISSUE = BIT(REQ_F_REISSUE_BIT), >> + REQ_F_REISSUE = IO_REQ_FLAG(REQ_F_REISSUE_BIT), >> /* supports async reads/writes */ >> - REQ_F_SUPPORT_NOWAIT = BIT(REQ_F_SUPPORT_NOWAIT_BIT), >> + REQ_F_SUPPORT_NOWAIT = IO_REQ_FLAG(REQ_F_SUPPORT_NOWAIT_BIT), >> /* regular file */ >> - REQ_F_ISREG = BIT(REQ_F_ISREG_BIT), >> + REQ_F_ISREG = IO_REQ_FLAG(REQ_F_ISREG_BIT), >> /* has creds assigned */ >> - REQ_F_CREDS = BIT(REQ_F_CREDS_BIT), >> + REQ_F_CREDS = IO_REQ_FLAG(REQ_F_CREDS_BIT), >> /* skip refcounting if not set */ >> - REQ_F_REFCOUNT = BIT(REQ_F_REFCOUNT_BIT), >> + REQ_F_REFCOUNT = IO_REQ_FLAG(REQ_F_REFCOUNT_BIT), >> /* there is a linked timeout that has to be armed */ >> - REQ_F_ARM_LTIMEOUT = BIT(REQ_F_ARM_LTIMEOUT_BIT), >> + REQ_F_ARM_LTIMEOUT = IO_REQ_FLAG(REQ_F_ARM_LTIMEOUT_BIT), >> /* ->async_data allocated */ >> - REQ_F_ASYNC_DATA = BIT(REQ_F_ASYNC_DATA_BIT), >> + REQ_F_ASYNC_DATA = IO_REQ_FLAG(REQ_F_ASYNC_DATA_BIT), >> /* don't post CQEs while failing linked requests */ >> - REQ_F_SKIP_LINK_CQES = BIT(REQ_F_SKIP_LINK_CQES_BIT), >> + REQ_F_SKIP_LINK_CQES = IO_REQ_FLAG(REQ_F_SKIP_LINK_CQES_BIT), >> /* single poll may be active */ >> - REQ_F_SINGLE_POLL = BIT(REQ_F_SINGLE_POLL_BIT), >> + REQ_F_SINGLE_POLL = IO_REQ_FLAG(REQ_F_SINGLE_POLL_BIT), >> /* double poll may active */ >> - REQ_F_DOUBLE_POLL = BIT(REQ_F_DOUBLE_POLL_BIT), >> + REQ_F_DOUBLE_POLL = IO_REQ_FLAG(REQ_F_DOUBLE_POLL_BIT), >> /* request has already done partial IO */ >> - REQ_F_PARTIAL_IO = BIT(REQ_F_PARTIAL_IO_BIT), >> + REQ_F_PARTIAL_IO = IO_REQ_FLAG(REQ_F_PARTIAL_IO_BIT), >> /* fast poll multishot mode */ >> - REQ_F_APOLL_MULTISHOT = BIT(REQ_F_APOLL_MULTISHOT_BIT), >> + REQ_F_APOLL_MULTISHOT = IO_REQ_FLAG(REQ_F_APOLL_MULTISHOT_BIT), >> /* recvmsg special flag, clear EPOLLIN */ >> - REQ_F_CLEAR_POLLIN = BIT(REQ_F_CLEAR_POLLIN_BIT), >> + REQ_F_CLEAR_POLLIN = IO_REQ_FLAG(REQ_F_CLEAR_POLLIN_BIT), >> /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ >> - REQ_F_HASH_LOCKED = BIT(REQ_F_HASH_LOCKED_BIT), >> + REQ_F_HASH_LOCKED = IO_REQ_FLAG(REQ_F_HASH_LOCKED_BIT), >> /* don't use lazy poll wake for this request */ >> - REQ_F_POLL_NO_LAZY = BIT(REQ_F_POLL_NO_LAZY_BIT), >> + REQ_F_POLL_NO_LAZY = IO_REQ_FLAG(REQ_F_POLL_NO_LAZY_BIT), >> }; >> typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); >> @@ -592,15 +595,14 @@ struct io_kiocb { >> * and after selection it points to the buffer ID itself. >> */ >> u16 buf_index; >> - unsigned int flags; >> - struct io_cqe cqe; > > With the current layout the min number of lines we touch per > request is 2 (including the op specific 64B), that's includes > setting up cqe at init and using it for completing. Moving cqe > down makes it 3. > >> + atomic_t refs; > > We're pulling it refs, which is not touched at all in the hot > path. Even if there's a hole I'd argue it's better to leave it > at the end. > >> + >> + io_req_flags_t flags; >> struct io_ring_ctx *ctx; >> struct task_struct *task; >> - struct io_rsrc_node *rsrc_node; > > It's used in hot paths, registered buffers/files, would be > unfortunate to move it to the next line. Yep I did feel a bit bad about that one... Let me take another stab at it. >> - >> union { >> /* store used ubuf, so we can prevent reloading */ >> struct io_mapped_ubuf *imu; >> @@ -615,18 +617,23 @@ struct io_kiocb { >> struct io_buffer_list *buf_list; >> }; >> + /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */ >> + struct hlist_node hash_node; >> + > > And we're pulling hash_node into the hottest line, which is > used only when we arm a poll and remove poll. So, it's mostly > for networking, sends wouldn't use it much, and multishots > wouldn't normally touch it. > > As for ideas how to find space: > 1) iopoll_completed completed can be converted to flags2 That's a good idea, but won't immediately find any space as it'd just leave a hole anyway. But would be good to note in there perhaps, you never know when it needs re-arranging again. > 2) REQ_F_{SINGLE,DOUBLE}_POLL is a weird duplication. Can > probably be combined into one flag, or removed at all. > Again, sends are usually not so poll heavy and the hot > path for recv is multishot. Normal receive is also a hot path, even if multishot should be preferred in general. Ditto on non-sockets but still pollable files, doing eg read for example. > 3) we can probably move req->task down and replace it with > > get_task() { > if (req->ctx->flags & DEFER_TASKRUN) > task = ctx->submitter_task; > else > task = req->task; > } Assuming ctx flags is hot, which is would generally be, that's not a bad idea at all. I'll do another loop over this one. -- Jens Axboe ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/6] io_uring: expand main struct io_kiocb flags to 64-bits 2024-02-07 2:18 ` Jens Axboe @ 2024-02-07 3:22 ` Pavel Begunkov 0 siblings, 0 replies; 18+ messages in thread From: Pavel Begunkov @ 2024-02-07 3:22 UTC (permalink / raw) To: Jens Axboe, io-uring On 2/7/24 02:18, Jens Axboe wrote: > On 2/6/24 5:43 PM, Pavel Begunkov wrote: >> On 2/6/24 16:22, Jens Axboe wrote: >>> We're out of space here, and none of the flags are easily reclaimable. >>> Bump it to 64-bits and re-arrange the struct a bit to avoid gaps. >>> >>> Add a specific bitwise type for the request flags, io_request_flags_t. >>> This will help catch violations of casting this value to a smaller type >>> on 32-bit archs, like unsigned int. >>> >>> No functional changes intended in this patch. >>> >>> Signed-off-by: Jens Axboe <[email protected]> >>> --- ... >>> typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); >>> @@ -592,15 +595,14 @@ struct io_kiocb { >>> * and after selection it points to the buffer ID itself. >>> */ >>> u16 buf_index; >>> - unsigned int flags; >>> - struct io_cqe cqe; >> >> With the current layout the min number of lines we touch per >> request is 2 (including the op specific 64B), that's includes >> setting up cqe at init and using it for completing. Moving cqe >> down makes it 3. >> >>> + atomic_t refs; >> >> We're pulling it refs, which is not touched at all in the hot >> path. Even if there's a hole I'd argue it's better to leave it >> at the end. >> >>> + >>> + io_req_flags_t flags; >>> struct io_ring_ctx *ctx; >>> struct task_struct *task; >>> - struct io_rsrc_node *rsrc_node; >> >> It's used in hot paths, registered buffers/files, would be >> unfortunate to move it to the next line. > > Yep I did feel a bit bad about that one... Let me take another stab at > it. > >>> - >>> union { >>> /* store used ubuf, so we can prevent reloading */ >>> struct io_mapped_ubuf *imu; >>> @@ -615,18 +617,23 @@ struct io_kiocb { >>> struct io_buffer_list *buf_list; >>> }; >>> + /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */ >>> + struct hlist_node hash_node; >>> + >> >> And we're pulling hash_node into the hottest line, which is >> used only when we arm a poll and remove poll. So, it's mostly >> for networking, sends wouldn't use it much, and multishots >> wouldn't normally touch it. >> >> As for ideas how to find space: >> 1) iopoll_completed completed can be converted to flags2 > > That's a good idea, but won't immediately find any space as it'd just > leave a hole anyway. But would be good to note in there perhaps, you > never know when it needs re-arranging again. struct io_kiocb { unsigned flags; ... u8 flags2; }; I rather proposed to have this, which is definitely borderline ugly but certainly an option. >> 2) REQ_F_{SINGLE,DOUBLE}_POLL is a weird duplication. Can >> probably be combined into one flag, or removed at all. >> Again, sends are usually not so poll heavy and the hot >> path for recv is multishot. > > Normal receive is also a hot path, even if multishot should be preferred The degree of hotness is arguable. It's poll, which takes a spinlock (and disables irqs), does an indirect call, goes into he socket internals there touching pretty contended parts like sock_wq. The relative overhead of looking at f_ops should be nothing. But the thought was more about combining them, REQ_F_POLL_ACTIVE, and clear only if it's not double poll. > in general. Ditto on non-sockets but still pollable files, doing eg read > for example. > >> 3) we can probably move req->task down and replace it with >> >> get_task() { >> if (req->ctx->flags & DEFER_TASKRUN) >> task = ctx->submitter_task; >> else >> task = req->task; >> } > > Assuming ctx flags is hot, which is would generally be, that's not a bad > idea at all. As mentioned, task_work_add would be the main user, and there is already a different branch for DEFER_TASKRUN, to it implicitly knows that ctx->submitter_task is correct. > I'll do another loop over this one. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/6] io_uring: add io_file_can_poll() helper 2024-02-06 16:22 [PATCHSET next 0/6] Misc cleanups / optimizations Jens Axboe 2024-02-06 16:22 ` [PATCH 1/6] io_uring: expand main struct io_kiocb flags to 64-bits Jens Axboe @ 2024-02-06 16:22 ` Jens Axboe 2024-02-07 0:57 ` Pavel Begunkov 2024-02-06 16:22 ` [PATCH 3/6] io_uring/cancel: don't default to setting req->work.cancel_seq Jens Axboe ` (3 subsequent siblings) 5 siblings, 1 reply; 18+ messages in thread From: Jens Axboe @ 2024-02-06 16:22 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe This adds a flag to avoid dipping dereferencing file and then f_op to figure out if the file has a poll handler defined or not. We generally call this at least twice for networked workloads. Signed-off-by: Jens Axboe <[email protected]> --- include/linux/io_uring_types.h | 3 +++ io_uring/io_uring.c | 2 +- io_uring/io_uring.h | 12 ++++++++++++ io_uring/kbuf.c | 2 +- io_uring/poll.c | 2 +- io_uring/rw.c | 6 +++--- 6 files changed, 21 insertions(+), 6 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 5ac18b05d4ee..7f06cee02b58 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -463,6 +463,7 @@ enum io_req_flags { REQ_F_SUPPORT_NOWAIT_BIT, REQ_F_ISREG_BIT, REQ_F_POLL_NO_LAZY_BIT, + REQ_F_CAN_POLL_BIT, /* not a real bit, just to check we're not overflowing the space */ __REQ_F_LAST_BIT, @@ -535,6 +536,8 @@ enum { REQ_F_HASH_LOCKED = IO_REQ_FLAG(REQ_F_HASH_LOCKED_BIT), /* don't use lazy poll wake for this request */ REQ_F_POLL_NO_LAZY = IO_REQ_FLAG(REQ_F_POLL_NO_LAZY_BIT), + /* file is pollable */ + REQ_F_CAN_POLL = IO_REQ_FLAG(REQ_F_CAN_POLL_BIT), }; typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 360a7ee41d3a..d0e06784926f 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1969,7 +1969,7 @@ void io_wq_submit_work(struct io_wq_work *work) if (req->flags & REQ_F_FORCE_ASYNC) { bool opcode_poll = def->pollin || def->pollout; - if (opcode_poll && file_can_poll(req->file)) { + if (opcode_poll && io_file_can_poll(req)) { needs_poll = true; issue_flags |= IO_URING_F_NONBLOCK; } diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index d5495710c178..2952551fe345 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -5,6 +5,7 @@ #include <linux/lockdep.h> #include <linux/resume_user_mode.h> #include <linux/kasan.h> +#include <linux/poll.h> #include <linux/io_uring_types.h> #include <uapi/linux/eventpoll.h> #include "io-wq.h" @@ -398,4 +399,15 @@ static inline size_t uring_sqe_size(struct io_ring_ctx *ctx) return 2 * sizeof(struct io_uring_sqe); return sizeof(struct io_uring_sqe); } + +static inline bool io_file_can_poll(struct io_kiocb *req) +{ + if (req->flags & REQ_F_CAN_POLL) + return true; + if (file_can_poll(req->file)) { + req->flags |= REQ_F_CAN_POLL; + return true; + } + return false; +} #endif diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 18df5a9d2f5e..71880615bb78 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -180,7 +180,7 @@ static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len, req->buf_list = bl; req->buf_index = buf->bid; - if (issue_flags & IO_URING_F_UNLOCKED || !file_can_poll(req->file)) { + if (issue_flags & IO_URING_F_UNLOCKED || !io_file_can_poll(req)) { /* * If we came in unlocked, we have no choice but to consume the * buffer here, otherwise nothing ensures that the buffer won't diff --git a/io_uring/poll.c b/io_uring/poll.c index 7513afc7b702..4afec733fef6 100644 --- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -727,7 +727,7 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags) if (!def->pollin && !def->pollout) return IO_APOLL_ABORTED; - if (!file_can_poll(req->file)) + if (!io_file_can_poll(req)) return IO_APOLL_ABORTED; if (!(req->flags & REQ_F_APOLL_MULTISHOT)) mask |= EPOLLONESHOT; diff --git a/io_uring/rw.c b/io_uring/rw.c index d5e79d9bdc71..0fb7a045163a 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -682,7 +682,7 @@ static bool io_rw_should_retry(struct io_kiocb *req) * just use poll if we can, and don't attempt if the fs doesn't * support callback based unlocks */ - if (file_can_poll(req->file) || !(req->file->f_mode & FMODE_BUF_RASYNC)) + if (io_file_can_poll(req) || !(req->file->f_mode & FMODE_BUF_RASYNC)) return false; wait->wait.func = io_async_buf_func; @@ -831,7 +831,7 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags) * If we can poll, just do that. For a vectored read, we'll * need to copy state first. */ - if (file_can_poll(req->file) && !io_issue_defs[req->opcode].vectored) + if (io_file_can_poll(req) && !io_issue_defs[req->opcode].vectored) return -EAGAIN; /* IOPOLL retry should happen for io-wq threads */ if (!force_nonblock && !(req->ctx->flags & IORING_SETUP_IOPOLL)) @@ -930,7 +930,7 @@ int io_read_mshot(struct io_kiocb *req, unsigned int issue_flags) /* * Multishot MUST be used on a pollable file */ - if (!file_can_poll(req->file)) + if (!io_file_can_poll(req)) return -EBADFD; ret = __io_read(req, issue_flags); -- 2.43.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/6] io_uring: add io_file_can_poll() helper 2024-02-06 16:22 ` [PATCH 2/6] io_uring: add io_file_can_poll() helper Jens Axboe @ 2024-02-07 0:57 ` Pavel Begunkov 2024-02-07 2:15 ` Jens Axboe 0 siblings, 1 reply; 18+ messages in thread From: Pavel Begunkov @ 2024-02-07 0:57 UTC (permalink / raw) To: Jens Axboe, io-uring On 2/6/24 16:22, Jens Axboe wrote: > This adds a flag to avoid dipping dereferencing file and then f_op > to figure out if the file has a poll handler defined or not. We > generally call this at least twice for networked workloads. Sends are not using poll every time. For recv, we touch it in io_arm_poll_handler(), which is done only once, and so ammortised to 0 for multishots. Looking at the patch, the second time we might care about is in io_ring_buffer_select(), but I'd argue that it shouldn't be there in the first place. It's fragile, and I don't see why selected buffers would care specifically about polling but not asking more generally "can it go true async"? For reads you might want to also test FMODE_BUF_RASYNC. Also note that when called from recv we already know that it's pollable, it might be much easier to pass it in as an argument. > Signed-off-by: Jens Axboe <[email protected]> > --- > include/linux/io_uring_types.h | 3 +++ > io_uring/io_uring.c | 2 +- > io_uring/io_uring.h | 12 ++++++++++++ > io_uring/kbuf.c | 2 +- > io_uring/poll.c | 2 +- > io_uring/rw.c | 6 +++--- > 6 files changed, 21 insertions(+), 6 deletions(-) > > diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h > index 5ac18b05d4ee..7f06cee02b58 100644 > --- a/include/linux/io_uring_types.h > +++ b/include/linux/io_uring_types.h > @@ -463,6 +463,7 @@ enum io_req_flags { > REQ_F_SUPPORT_NOWAIT_BIT, > REQ_F_ISREG_BIT, > REQ_F_POLL_NO_LAZY_BIT, > + REQ_F_CAN_POLL_BIT, > > /* not a real bit, just to check we're not overflowing the space */ > __REQ_F_LAST_BIT, > @@ -535,6 +536,8 @@ enum { > REQ_F_HASH_LOCKED = IO_REQ_FLAG(REQ_F_HASH_LOCKED_BIT), > /* don't use lazy poll wake for this request */ > REQ_F_POLL_NO_LAZY = IO_REQ_FLAG(REQ_F_POLL_NO_LAZY_BIT), > + /* file is pollable */ > + REQ_F_CAN_POLL = IO_REQ_FLAG(REQ_F_CAN_POLL_BIT), > }; > > typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > index 360a7ee41d3a..d0e06784926f 100644 > --- a/io_uring/io_uring.c > +++ b/io_uring/io_uring.c > @@ -1969,7 +1969,7 @@ void io_wq_submit_work(struct io_wq_work *work) > if (req->flags & REQ_F_FORCE_ASYNC) { > bool opcode_poll = def->pollin || def->pollout; > > - if (opcode_poll && file_can_poll(req->file)) { > + if (opcode_poll && io_file_can_poll(req)) { > needs_poll = true; > issue_flags |= IO_URING_F_NONBLOCK; > } > diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h > index d5495710c178..2952551fe345 100644 > --- a/io_uring/io_uring.h > +++ b/io_uring/io_uring.h > @@ -5,6 +5,7 @@ > #include <linux/lockdep.h> > #include <linux/resume_user_mode.h> > #include <linux/kasan.h> > +#include <linux/poll.h> > #include <linux/io_uring_types.h> > #include <uapi/linux/eventpoll.h> > #include "io-wq.h" > @@ -398,4 +399,15 @@ static inline size_t uring_sqe_size(struct io_ring_ctx *ctx) > return 2 * sizeof(struct io_uring_sqe); > return sizeof(struct io_uring_sqe); > } > + > +static inline bool io_file_can_poll(struct io_kiocb *req) > +{ > + if (req->flags & REQ_F_CAN_POLL) > + return true; > + if (file_can_poll(req->file)) { > + req->flags |= REQ_F_CAN_POLL; > + return true; > + } > + return false; > +} > #endif > diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c > index 18df5a9d2f5e..71880615bb78 100644 > --- a/io_uring/kbuf.c > +++ b/io_uring/kbuf.c > @@ -180,7 +180,7 @@ static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len, > req->buf_list = bl; > req->buf_index = buf->bid; > > - if (issue_flags & IO_URING_F_UNLOCKED || !file_can_poll(req->file)) { > + if (issue_flags & IO_URING_F_UNLOCKED || !io_file_can_poll(req)) { > /* > * If we came in unlocked, we have no choice but to consume the > * buffer here, otherwise nothing ensures that the buffer won't > diff --git a/io_uring/poll.c b/io_uring/poll.c > index 7513afc7b702..4afec733fef6 100644 > --- a/io_uring/poll.c > +++ b/io_uring/poll.c > @@ -727,7 +727,7 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags) > > if (!def->pollin && !def->pollout) > return IO_APOLL_ABORTED; > - if (!file_can_poll(req->file)) > + if (!io_file_can_poll(req)) > return IO_APOLL_ABORTED; > if (!(req->flags & REQ_F_APOLL_MULTISHOT)) > mask |= EPOLLONESHOT; > diff --git a/io_uring/rw.c b/io_uring/rw.c > index d5e79d9bdc71..0fb7a045163a 100644 > --- a/io_uring/rw.c > +++ b/io_uring/rw.c > @@ -682,7 +682,7 @@ static bool io_rw_should_retry(struct io_kiocb *req) > * just use poll if we can, and don't attempt if the fs doesn't > * support callback based unlocks > */ > - if (file_can_poll(req->file) || !(req->file->f_mode & FMODE_BUF_RASYNC)) > + if (io_file_can_poll(req) || !(req->file->f_mode & FMODE_BUF_RASYNC)) > return false; > > wait->wait.func = io_async_buf_func; > @@ -831,7 +831,7 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags) > * If we can poll, just do that. For a vectored read, we'll > * need to copy state first. > */ > - if (file_can_poll(req->file) && !io_issue_defs[req->opcode].vectored) > + if (io_file_can_poll(req) && !io_issue_defs[req->opcode].vectored) > return -EAGAIN; > /* IOPOLL retry should happen for io-wq threads */ > if (!force_nonblock && !(req->ctx->flags & IORING_SETUP_IOPOLL)) > @@ -930,7 +930,7 @@ int io_read_mshot(struct io_kiocb *req, unsigned int issue_flags) > /* > * Multishot MUST be used on a pollable file > */ > - if (!file_can_poll(req->file)) > + if (!io_file_can_poll(req)) > return -EBADFD; > > ret = __io_read(req, issue_flags); -- Pavel Begunkov ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/6] io_uring: add io_file_can_poll() helper 2024-02-07 0:57 ` Pavel Begunkov @ 2024-02-07 2:15 ` Jens Axboe 2024-02-07 3:33 ` Pavel Begunkov 0 siblings, 1 reply; 18+ messages in thread From: Jens Axboe @ 2024-02-07 2:15 UTC (permalink / raw) To: Pavel Begunkov, io-uring On 2/6/24 5:57 PM, Pavel Begunkov wrote: > On 2/6/24 16:22, Jens Axboe wrote: >> This adds a flag to avoid dipping dereferencing file and then f_op >> to figure out if the file has a poll handler defined or not. We >> generally call this at least twice for networked workloads. > > Sends are not using poll every time. For recv, we touch it > in io_arm_poll_handler(), which is done only once, and so > ammortised to 0 for multishots. Correct > Looking at the patch, the second time we might care about is > in io_ring_buffer_select(), but I'd argue that it shouldn't > be there in the first place. It's fragile, and I don't see > why selected buffers would care specifically about polling > but not asking more generally "can it go true async"? For > reads you might want to also test FMODE_BUF_RASYNC. That is indeed the second case that is hit, and I don't think we can easily get around that which is the reason for the hint. > Also note that when called from recv we already know that > it's pollable, it might be much easier to pass it in as an > argument. I did think about that, but I don't see a clean way to do it. We could potentially do it as an issue flag, but that seems kind of ugly to me. Open to suggestions! -- Jens Axboe ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/6] io_uring: add io_file_can_poll() helper 2024-02-07 2:15 ` Jens Axboe @ 2024-02-07 3:33 ` Pavel Begunkov 0 siblings, 0 replies; 18+ messages in thread From: Pavel Begunkov @ 2024-02-07 3:33 UTC (permalink / raw) To: Jens Axboe, io-uring On 2/7/24 02:15, Jens Axboe wrote: > On 2/6/24 5:57 PM, Pavel Begunkov wrote: >> On 2/6/24 16:22, Jens Axboe wrote: >>> This adds a flag to avoid dipping dereferencing file and then f_op >>> to figure out if the file has a poll handler defined or not. We >>> generally call this at least twice for networked workloads. >> >> Sends are not using poll every time. For recv, we touch it >> in io_arm_poll_handler(), which is done only once, and so >> ammortised to 0 for multishots. > > Correct > >> Looking at the patch, the second time we might care about is >> in io_ring_buffer_select(), but I'd argue that it shouldn't >> be there in the first place. It's fragile, and I don't see >> why selected buffers would care specifically about polling >> but not asking more generally "can it go true async"? For >> reads you might want to also test FMODE_BUF_RASYNC. > > That is indeed the second case that is hit, and I don't think we can > easily get around that which is the reason for the hint. > >> Also note that when called from recv we already know that >> it's pollable, it might be much easier to pass it in as an >> argument. > > I did think about that, but I don't see a clean way to do it. We could > potentially do it as an issue flag, but that seems kind of ugly to me. > Open to suggestions! I'd argue passing it as an argument is much much cleaner and more robust design wise, those leaked abstractions are always fragile and unreliable. And now there is an argument that it's even faster because for recv you can just pass "true". IOW, I'd prefer here potentially a slightly uglier but safer code. Surely it'd have been be great to move this "eject buffer" thing out of the selection func and let the caller decide, but I haven't stared at the code for long enough to say anything concrete. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/6] io_uring/cancel: don't default to setting req->work.cancel_seq 2024-02-06 16:22 [PATCHSET next 0/6] Misc cleanups / optimizations Jens Axboe 2024-02-06 16:22 ` [PATCH 1/6] io_uring: expand main struct io_kiocb flags to 64-bits Jens Axboe 2024-02-06 16:22 ` [PATCH 2/6] io_uring: add io_file_can_poll() helper Jens Axboe @ 2024-02-06 16:22 ` Jens Axboe 2024-02-06 16:22 ` [PATCH 4/6] io_uring: move io_kiocb->nr_tw into comp_list union Jens Axboe ` (2 subsequent siblings) 5 siblings, 0 replies; 18+ messages in thread From: Jens Axboe @ 2024-02-06 16:22 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe Just leave it unset by default, avoiding dipping into the last cacheline (which is otherwise untouched) for the fast path of using poll to drive networked traffic. Add a flag that tells us if the sequence is valid or not, and then we can defer actually assigning the flag and sequence until someone runs cancelations. Signed-off-by: Jens Axboe <[email protected]> --- include/linux/io_uring_types.h | 3 +++ io_uring/cancel.c | 3 +-- io_uring/cancel.h | 10 ++++++++++ io_uring/io_uring.c | 1 - io_uring/poll.c | 6 +----- 5 files changed, 15 insertions(+), 8 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 7f06cee02b58..69a043ff8460 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -464,6 +464,7 @@ enum io_req_flags { REQ_F_ISREG_BIT, REQ_F_POLL_NO_LAZY_BIT, REQ_F_CAN_POLL_BIT, + REQ_F_CANCEL_SEQ_BIT, /* not a real bit, just to check we're not overflowing the space */ __REQ_F_LAST_BIT, @@ -538,6 +539,8 @@ enum { REQ_F_POLL_NO_LAZY = IO_REQ_FLAG(REQ_F_POLL_NO_LAZY_BIT), /* file is pollable */ REQ_F_CAN_POLL = IO_REQ_FLAG(REQ_F_CAN_POLL_BIT), + /* cancel sequence is set and valid */ + REQ_F_CANCEL_SEQ = IO_REQ_FLAG(REQ_F_CANCEL_SEQ_BIT), }; typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); diff --git a/io_uring/cancel.c b/io_uring/cancel.c index 8a8b07dfc444..acfcdd7f059a 100644 --- a/io_uring/cancel.c +++ b/io_uring/cancel.c @@ -58,9 +58,8 @@ bool io_cancel_req_match(struct io_kiocb *req, struct io_cancel_data *cd) return false; if (cd->flags & IORING_ASYNC_CANCEL_ALL) { check_seq: - if (cd->seq == req->work.cancel_seq) + if (io_cancel_match_sequence(req, cd->seq)) return false; - req->work.cancel_seq = cd->seq; } return true; diff --git a/io_uring/cancel.h b/io_uring/cancel.h index c0a8e7c520b6..76b32e65c03c 100644 --- a/io_uring/cancel.h +++ b/io_uring/cancel.h @@ -25,4 +25,14 @@ void init_hash_table(struct io_hash_table *table, unsigned size); int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg); bool io_cancel_req_match(struct io_kiocb *req, struct io_cancel_data *cd); +static inline bool io_cancel_match_sequence(struct io_kiocb *req, int sequence) +{ + if ((req->flags & REQ_F_CANCEL_SEQ) && sequence == req->work.cancel_seq) + return true; + + req->flags |= REQ_F_CANCEL_SEQ; + req->work.cancel_seq = sequence; + return false; +} + #endif diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index d0e06784926f..9b499864f10d 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -463,7 +463,6 @@ static void io_prep_async_work(struct io_kiocb *req) req->work.list.next = NULL; req->work.flags = 0; - req->work.cancel_seq = atomic_read(&ctx->cancel_seq); if (req->flags & REQ_F_FORCE_ASYNC) req->work.flags |= IO_WQ_WORK_CONCURRENT; diff --git a/io_uring/poll.c b/io_uring/poll.c index 4afec733fef6..3f3380dc5f68 100644 --- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -588,10 +588,7 @@ static int __io_arm_poll_handler(struct io_kiocb *req, struct io_poll_table *ipt, __poll_t mask, unsigned issue_flags) { - struct io_ring_ctx *ctx = req->ctx; - INIT_HLIST_NODE(&req->hash_node); - req->work.cancel_seq = atomic_read(&ctx->cancel_seq); io_init_poll_iocb(poll, mask); poll->file = req->file; req->apoll_events = poll->events; @@ -818,9 +815,8 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only, if (poll_only && req->opcode != IORING_OP_POLL_ADD) continue; if (cd->flags & IORING_ASYNC_CANCEL_ALL) { - if (cd->seq == req->work.cancel_seq) + if (io_cancel_match_sequence(req, cd->seq)) continue; - req->work.cancel_seq = cd->seq; } *out_bucket = hb; return req; -- 2.43.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/6] io_uring: move io_kiocb->nr_tw into comp_list union 2024-02-06 16:22 [PATCHSET next 0/6] Misc cleanups / optimizations Jens Axboe ` (2 preceding siblings ...) 2024-02-06 16:22 ` [PATCH 3/6] io_uring/cancel: don't default to setting req->work.cancel_seq Jens Axboe @ 2024-02-06 16:22 ` Jens Axboe 2024-02-06 16:22 ` [PATCH 5/6] io_uring: mark the need to lock/unlock the ring as unlikely Jens Axboe 2024-02-06 16:22 ` [PATCH 6/6] io_uring/rw: remove dead file == NULL check Jens Axboe 5 siblings, 0 replies; 18+ messages in thread From: Jens Axboe @ 2024-02-06 16:22 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe comp_list is only used for completion purposes, which it why it currently shares space with apoll_events (which is only used for poll triggering). nr_rw is also not used with comp_list, the former is just used for local task_list wakeup optimizations. This doesn't save any space in io_kiocb, rather it now leaves a 32-bit hole that can be used for something else, when the need arises. Signed-off-by: Jens Axboe <[email protected]> --- include/linux/io_uring_types.h | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 69a043ff8460..8c0742f5b57e 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -602,8 +602,6 @@ struct io_kiocb { */ u16 buf_index; - atomic_t refs; - io_req_flags_t flags; struct io_ring_ctx *ctx; @@ -629,8 +627,11 @@ struct io_kiocb { union { /* used by request caches, completion batching and iopoll */ struct io_wq_work_node comp_list; - /* cache ->apoll->events */ - __poll_t apoll_events; + struct { + /* cache ->apoll->events */ + __poll_t apoll_events; + unsigned nr_tw; + }; }; struct io_rsrc_node *rsrc_node; @@ -639,7 +640,7 @@ struct io_kiocb { struct io_task_work io_task_work; atomic_t poll_refs; - unsigned nr_tw; + atomic_t refs; /* internal polling, see IORING_FEAT_FAST_POLL */ struct async_poll *apoll; /* opcode allocated if it needs to store data for async defer */ -- 2.43.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 5/6] io_uring: mark the need to lock/unlock the ring as unlikely 2024-02-06 16:22 [PATCHSET next 0/6] Misc cleanups / optimizations Jens Axboe ` (3 preceding siblings ...) 2024-02-06 16:22 ` [PATCH 4/6] io_uring: move io_kiocb->nr_tw into comp_list union Jens Axboe @ 2024-02-06 16:22 ` Jens Axboe 2024-02-06 16:22 ` [PATCH 6/6] io_uring/rw: remove dead file == NULL check Jens Axboe 5 siblings, 0 replies; 18+ messages in thread From: Jens Axboe @ 2024-02-06 16:22 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe Any of the fast paths will already have this locked, this helper only exists to deal with io-wq invoking request issue where we do not have the ctx->uring_lock held already. This means that any common or fast path will already have this locked, mark it as such. Signed-off-by: Jens Axboe <[email protected]> --- io_uring/io_uring.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 2952551fe345..46795ee462df 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -208,7 +208,7 @@ static inline void io_ring_submit_unlock(struct io_ring_ctx *ctx, unsigned issue_flags) { lockdep_assert_held(&ctx->uring_lock); - if (issue_flags & IO_URING_F_UNLOCKED) + if (unlikely(issue_flags & IO_URING_F_UNLOCKED)) mutex_unlock(&ctx->uring_lock); } @@ -221,7 +221,7 @@ static inline void io_ring_submit_lock(struct io_ring_ctx *ctx, * The only exception is when we've detached the request and issue it * from an async worker thread, grab the lock for that case. */ - if (issue_flags & IO_URING_F_UNLOCKED) + if (unlikely(issue_flags & IO_URING_F_UNLOCKED)) mutex_lock(&ctx->uring_lock); lockdep_assert_held(&ctx->uring_lock); } -- 2.43.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 6/6] io_uring/rw: remove dead file == NULL check 2024-02-06 16:22 [PATCHSET next 0/6] Misc cleanups / optimizations Jens Axboe ` (4 preceding siblings ...) 2024-02-06 16:22 ` [PATCH 5/6] io_uring: mark the need to lock/unlock the ring as unlikely Jens Axboe @ 2024-02-06 16:22 ` Jens Axboe 5 siblings, 0 replies; 18+ messages in thread From: Jens Axboe @ 2024-02-06 16:22 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe Any read/write opcode has needs_file == true, which means that we would've failed the request long before reaching the issue stage if we didn't successfully assign a file. This check has been dead forever, and is really a leftover from generic code. Signed-off-by: Jens Axboe <[email protected]> --- io_uring/rw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/io_uring/rw.c b/io_uring/rw.c index 0fb7a045163a..8ba93fffc23a 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -721,7 +721,7 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode) struct file *file = req->file; int ret; - if (unlikely(!file || !(file->f_mode & mode))) + if (unlikely(!(file->f_mode & mode))) return -EBADF; if (!(req->flags & REQ_F_FIXED_FILE)) -- 2.43.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCHSET v2 0/6] Misc cleanups / optimizations @ 2024-02-07 17:17 Jens Axboe 2024-02-07 17:17 ` [PATCH 1/6] io_uring: expand main struct io_kiocb flags to 64-bits Jens Axboe 0 siblings, 1 reply; 18+ messages in thread From: Jens Axboe @ 2024-02-07 17:17 UTC (permalink / raw) To: io-uring Hi, Nothing major in here: - Expand io_kiocb flags to 64-bits, so we can use two more bits for caching cancelation sequence and pollable state. - Misc cleanups Changes since v1: - Drop nr_tw union with comp_list, that breaks iopoll with DEFER_TASKRUN usage. - Rearrange io_kiocb again in patch 1, now just moving nr_tw up to fill the new hole, and shifting rsrc_node down to keep io_comp_list in the 2nd cacheline. - Add cleanup patch for io_req_complete_post() -- Jens Axboe ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/6] io_uring: expand main struct io_kiocb flags to 64-bits 2024-02-07 17:17 [PATCHSET v2 0/6] Misc cleanups / optimizations Jens Axboe @ 2024-02-07 17:17 ` Jens Axboe 2024-02-08 20:08 ` Gabriel Krisman Bertazi 0 siblings, 1 reply; 18+ messages in thread From: Jens Axboe @ 2024-02-07 17:17 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe We're out of space here, and none of the flags are easily reclaimable. Bump it to 64-bits and re-arrange the struct a bit to avoid gaps. Add a specific bitwise type for the request flags, io_request_flags_t. This will help catch violations of casting this value to a smaller type on 32-bit archs, like unsigned int. This creates a hole in the io_kiocb, so move nr_tw up and rsrc_node down to retain needing only cacheline 0 and 1 for non-polled opcodes. No functional changes intended in this patch. Signed-off-by: Jens Axboe <[email protected]> --- include/linux/io_uring_types.h | 77 ++++++++++++++++++--------------- include/trace/events/io_uring.h | 14 +++--- io_uring/filetable.h | 2 +- io_uring/io_uring.c | 9 ++-- 4 files changed, 55 insertions(+), 47 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 854ad67a5f70..56bf733d3ee6 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -468,70 +468,73 @@ enum { __REQ_F_LAST_BIT, }; +typedef u64 __bitwise io_req_flags_t; +#define IO_REQ_FLAG(bitno) ((__force io_req_flags_t) BIT_ULL((bitno))) + enum { /* ctx owns file */ - REQ_F_FIXED_FILE = BIT(REQ_F_FIXED_FILE_BIT), + REQ_F_FIXED_FILE = IO_REQ_FLAG(REQ_F_FIXED_FILE_BIT), /* drain existing IO first */ - REQ_F_IO_DRAIN = BIT(REQ_F_IO_DRAIN_BIT), + REQ_F_IO_DRAIN = IO_REQ_FLAG(REQ_F_IO_DRAIN_BIT), /* linked sqes */ - REQ_F_LINK = BIT(REQ_F_LINK_BIT), + REQ_F_LINK = IO_REQ_FLAG(REQ_F_LINK_BIT), /* doesn't sever on completion < 0 */ - REQ_F_HARDLINK = BIT(REQ_F_HARDLINK_BIT), + REQ_F_HARDLINK = IO_REQ_FLAG(REQ_F_HARDLINK_BIT), /* IOSQE_ASYNC */ - REQ_F_FORCE_ASYNC = BIT(REQ_F_FORCE_ASYNC_BIT), + REQ_F_FORCE_ASYNC = IO_REQ_FLAG(REQ_F_FORCE_ASYNC_BIT), /* IOSQE_BUFFER_SELECT */ - REQ_F_BUFFER_SELECT = BIT(REQ_F_BUFFER_SELECT_BIT), + REQ_F_BUFFER_SELECT = IO_REQ_FLAG(REQ_F_BUFFER_SELECT_BIT), /* IOSQE_CQE_SKIP_SUCCESS */ - REQ_F_CQE_SKIP = BIT(REQ_F_CQE_SKIP_BIT), + REQ_F_CQE_SKIP = IO_REQ_FLAG(REQ_F_CQE_SKIP_BIT), /* fail rest of links */ - REQ_F_FAIL = BIT(REQ_F_FAIL_BIT), + REQ_F_FAIL = IO_REQ_FLAG(REQ_F_FAIL_BIT), /* on inflight list, should be cancelled and waited on exit reliably */ - REQ_F_INFLIGHT = BIT(REQ_F_INFLIGHT_BIT), + REQ_F_INFLIGHT = IO_REQ_FLAG(REQ_F_INFLIGHT_BIT), /* read/write uses file position */ - REQ_F_CUR_POS = BIT(REQ_F_CUR_POS_BIT), + REQ_F_CUR_POS = IO_REQ_FLAG(REQ_F_CUR_POS_BIT), /* must not punt to workers */ - REQ_F_NOWAIT = BIT(REQ_F_NOWAIT_BIT), + REQ_F_NOWAIT = IO_REQ_FLAG(REQ_F_NOWAIT_BIT), /* has or had linked timeout */ - REQ_F_LINK_TIMEOUT = BIT(REQ_F_LINK_TIMEOUT_BIT), + REQ_F_LINK_TIMEOUT = IO_REQ_FLAG(REQ_F_LINK_TIMEOUT_BIT), /* needs cleanup */ - REQ_F_NEED_CLEANUP = BIT(REQ_F_NEED_CLEANUP_BIT), + REQ_F_NEED_CLEANUP = IO_REQ_FLAG(REQ_F_NEED_CLEANUP_BIT), /* already went through poll handler */ - REQ_F_POLLED = BIT(REQ_F_POLLED_BIT), + REQ_F_POLLED = IO_REQ_FLAG(REQ_F_POLLED_BIT), /* buffer already selected */ - REQ_F_BUFFER_SELECTED = BIT(REQ_F_BUFFER_SELECTED_BIT), + REQ_F_BUFFER_SELECTED = IO_REQ_FLAG(REQ_F_BUFFER_SELECTED_BIT), /* buffer selected from ring, needs commit */ - REQ_F_BUFFER_RING = BIT(REQ_F_BUFFER_RING_BIT), + REQ_F_BUFFER_RING = IO_REQ_FLAG(REQ_F_BUFFER_RING_BIT), /* caller should reissue async */ - REQ_F_REISSUE = BIT(REQ_F_REISSUE_BIT), + REQ_F_REISSUE = IO_REQ_FLAG(REQ_F_REISSUE_BIT), /* supports async reads/writes */ - REQ_F_SUPPORT_NOWAIT = BIT(REQ_F_SUPPORT_NOWAIT_BIT), + REQ_F_SUPPORT_NOWAIT = IO_REQ_FLAG(REQ_F_SUPPORT_NOWAIT_BIT), /* regular file */ - REQ_F_ISREG = BIT(REQ_F_ISREG_BIT), + REQ_F_ISREG = IO_REQ_FLAG(REQ_F_ISREG_BIT), /* has creds assigned */ - REQ_F_CREDS = BIT(REQ_F_CREDS_BIT), + REQ_F_CREDS = IO_REQ_FLAG(REQ_F_CREDS_BIT), /* skip refcounting if not set */ - REQ_F_REFCOUNT = BIT(REQ_F_REFCOUNT_BIT), + REQ_F_REFCOUNT = IO_REQ_FLAG(REQ_F_REFCOUNT_BIT), /* there is a linked timeout that has to be armed */ - REQ_F_ARM_LTIMEOUT = BIT(REQ_F_ARM_LTIMEOUT_BIT), + REQ_F_ARM_LTIMEOUT = IO_REQ_FLAG(REQ_F_ARM_LTIMEOUT_BIT), /* ->async_data allocated */ - REQ_F_ASYNC_DATA = BIT(REQ_F_ASYNC_DATA_BIT), + REQ_F_ASYNC_DATA = IO_REQ_FLAG(REQ_F_ASYNC_DATA_BIT), /* don't post CQEs while failing linked requests */ - REQ_F_SKIP_LINK_CQES = BIT(REQ_F_SKIP_LINK_CQES_BIT), + REQ_F_SKIP_LINK_CQES = IO_REQ_FLAG(REQ_F_SKIP_LINK_CQES_BIT), /* single poll may be active */ - REQ_F_SINGLE_POLL = BIT(REQ_F_SINGLE_POLL_BIT), + REQ_F_SINGLE_POLL = IO_REQ_FLAG(REQ_F_SINGLE_POLL_BIT), /* double poll may active */ - REQ_F_DOUBLE_POLL = BIT(REQ_F_DOUBLE_POLL_BIT), + REQ_F_DOUBLE_POLL = IO_REQ_FLAG(REQ_F_DOUBLE_POLL_BIT), /* request has already done partial IO */ - REQ_F_PARTIAL_IO = BIT(REQ_F_PARTIAL_IO_BIT), + REQ_F_PARTIAL_IO = IO_REQ_FLAG(REQ_F_PARTIAL_IO_BIT), /* fast poll multishot mode */ - REQ_F_APOLL_MULTISHOT = BIT(REQ_F_APOLL_MULTISHOT_BIT), + REQ_F_APOLL_MULTISHOT = IO_REQ_FLAG(REQ_F_APOLL_MULTISHOT_BIT), /* recvmsg special flag, clear EPOLLIN */ - REQ_F_CLEAR_POLLIN = BIT(REQ_F_CLEAR_POLLIN_BIT), + REQ_F_CLEAR_POLLIN = IO_REQ_FLAG(REQ_F_CLEAR_POLLIN_BIT), /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ - REQ_F_HASH_LOCKED = BIT(REQ_F_HASH_LOCKED_BIT), + REQ_F_HASH_LOCKED = IO_REQ_FLAG(REQ_F_HASH_LOCKED_BIT), /* don't use lazy poll wake for this request */ - REQ_F_POLL_NO_LAZY = BIT(REQ_F_POLL_NO_LAZY_BIT), + REQ_F_POLL_NO_LAZY = IO_REQ_FLAG(REQ_F_POLL_NO_LAZY_BIT), }; typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); @@ -592,15 +595,17 @@ struct io_kiocb { * and after selection it points to the buffer ID itself. */ u16 buf_index; - unsigned int flags; + + unsigned nr_tw; + + /* REQ_F_* flags */ + io_req_flags_t flags; struct io_cqe cqe; struct io_ring_ctx *ctx; struct task_struct *task; - struct io_rsrc_node *rsrc_node; - union { /* store used ubuf, so we can prevent reloading */ struct io_mapped_ubuf *imu; @@ -621,10 +626,12 @@ struct io_kiocb { /* cache ->apoll->events */ __poll_t apoll_events; }; + + struct io_rsrc_node *rsrc_node; + atomic_t refs; atomic_t poll_refs; struct io_task_work io_task_work; - unsigned nr_tw; /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */ struct hlist_node hash_node; /* internal polling, see IORING_FEAT_FAST_POLL */ diff --git a/include/trace/events/io_uring.h b/include/trace/events/io_uring.h index 69454f1f98b0..3d7704a52b73 100644 --- a/include/trace/events/io_uring.h +++ b/include/trace/events/io_uring.h @@ -148,7 +148,7 @@ TRACE_EVENT(io_uring_queue_async_work, __field( void *, req ) __field( u64, user_data ) __field( u8, opcode ) - __field( unsigned int, flags ) + __field( io_req_flags_t, flags ) __field( struct io_wq_work *, work ) __field( int, rw ) @@ -167,10 +167,10 @@ TRACE_EVENT(io_uring_queue_async_work, __assign_str(op_str, io_uring_get_opcode(req->opcode)); ), - TP_printk("ring %p, request %p, user_data 0x%llx, opcode %s, flags 0x%x, %s queue, work %p", + TP_printk("ring %p, request %p, user_data 0x%llx, opcode %s, flags 0x%lx, %s queue, work %p", __entry->ctx, __entry->req, __entry->user_data, - __get_str(op_str), - __entry->flags, __entry->rw ? "hashed" : "normal", __entry->work) + __get_str(op_str), (long) __entry->flags, + __entry->rw ? "hashed" : "normal", __entry->work) ); /** @@ -378,7 +378,7 @@ TRACE_EVENT(io_uring_submit_req, __field( void *, req ) __field( unsigned long long, user_data ) __field( u8, opcode ) - __field( u32, flags ) + __field( io_req_flags_t, flags ) __field( bool, sq_thread ) __string( op_str, io_uring_get_opcode(req->opcode) ) @@ -395,10 +395,10 @@ TRACE_EVENT(io_uring_submit_req, __assign_str(op_str, io_uring_get_opcode(req->opcode)); ), - TP_printk("ring %p, req %p, user_data 0x%llx, opcode %s, flags 0x%x, " + TP_printk("ring %p, req %p, user_data 0x%llx, opcode %s, flags 0x%lx, " "sq_thread %d", __entry->ctx, __entry->req, __entry->user_data, __get_str(op_str), - __entry->flags, __entry->sq_thread) + (long) __entry->flags, __entry->sq_thread) ); /* diff --git a/io_uring/filetable.h b/io_uring/filetable.h index b47adf170c31..b2435c4dca1f 100644 --- a/io_uring/filetable.h +++ b/io_uring/filetable.h @@ -17,7 +17,7 @@ int io_fixed_fd_remove(struct io_ring_ctx *ctx, unsigned int offset); int io_register_file_alloc_range(struct io_ring_ctx *ctx, struct io_uring_file_index_range __user *arg); -unsigned int io_file_get_flags(struct file *file); +io_req_flags_t io_file_get_flags(struct file *file); static inline void io_file_bitmap_clear(struct io_file_table *table, int bit) { diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index cd9a137ad6ce..b8ca907b77eb 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1768,9 +1768,9 @@ static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_flags) } } -unsigned int io_file_get_flags(struct file *file) +io_req_flags_t io_file_get_flags(struct file *file) { - unsigned int res = 0; + io_req_flags_t res = 0; if (S_ISREG(file_inode(file)->i_mode)) res |= REQ_F_ISREG; @@ -2171,7 +2171,8 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, /* req is partially pre-initialised, see io_preinit_req() */ req->opcode = opcode = READ_ONCE(sqe->opcode); /* same numerical values with corresponding REQ_F_*, safe to copy */ - req->flags = sqe_flags = READ_ONCE(sqe->flags); + sqe_flags = READ_ONCE(sqe->flags); + req->flags = (io_req_flags_t) sqe_flags; req->cqe.user_data = READ_ONCE(sqe->user_data); req->file = NULL; req->rsrc_node = NULL; @@ -4153,7 +4154,7 @@ static int __init io_uring_init(void) BUILD_BUG_ON(SQE_COMMON_FLAGS >= (1 << 8)); BUILD_BUG_ON((SQE_VALID_FLAGS | SQE_COMMON_FLAGS) != SQE_VALID_FLAGS); - BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(int)); + BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof_field(struct io_kiocb, flags)); BUILD_BUG_ON(sizeof(atomic_t) != sizeof(u32)); -- 2.43.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/6] io_uring: expand main struct io_kiocb flags to 64-bits 2024-02-07 17:17 ` [PATCH 1/6] io_uring: expand main struct io_kiocb flags to 64-bits Jens Axboe @ 2024-02-08 20:08 ` Gabriel Krisman Bertazi 2024-02-08 20:22 ` Jens Axboe 0 siblings, 1 reply; 18+ messages in thread From: Gabriel Krisman Bertazi @ 2024-02-08 20:08 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring Jens Axboe <[email protected]> writes: > - TP_printk("ring %p, request %p, user_data 0x%llx, opcode %s, flags 0x%x, %s queue, work %p", > + TP_printk("ring %p, request %p, user_data 0x%llx, opcode %s, flags 0x%lx, %s queue, work %p", > __entry->ctx, __entry->req, __entry->user_data, > - __get_str(op_str), > - __entry->flags, __entry->rw ? "hashed" : "normal", __entry->work) > + __get_str(op_str), (long) __entry->flags, Hi Jens, Minor, but on 32-bit kernel the cast is wrong since sizeof(long)==4. Afaik, io_uring still builds on 32-bit archs. If you use (unsigned long long), it will be 64 bit anywhere. > + __entry->rw ? "hashed" : "normal", __entry->work) > ); > > /** > @@ -378,7 +378,7 @@ TRACE_EVENT(io_uring_submit_req, > __field( void *, req ) > __field( unsigned long long, user_data ) > __field( u8, opcode ) > - __field( u32, flags ) > + __field( io_req_flags_t, flags ) > __field( bool, sq_thread ) > > __string( op_str, io_uring_get_opcode(req->opcode) ) > @@ -395,10 +395,10 @@ TRACE_EVENT(io_uring_submit_req, > __assign_str(op_str, io_uring_get_opcode(req->opcode)); > ), > > - TP_printk("ring %p, req %p, user_data 0x%llx, opcode %s, flags 0x%x, " > + TP_printk("ring %p, req %p, user_data 0x%llx, opcode %s, flags 0x%lx, " > "sq_thread %d", __entry->ctx, __entry->req, > __entry->user_data, __get_str(op_str), > - __entry->flags, __entry->sq_thread) > + (long) __entry->flags, __entry->sq_thread) likewise. > ); > > /* > diff --git a/io_uring/filetable.h b/io_uring/filetable.h > index b47adf170c31..b2435c4dca1f 100644 > --- a/io_uring/filetable.h > +++ b/io_uring/filetable.h > @@ -17,7 +17,7 @@ int io_fixed_fd_remove(struct io_ring_ctx *ctx, unsigned int offset); > int io_register_file_alloc_range(struct io_ring_ctx *ctx, > struct io_uring_file_index_range __user *arg); > > -unsigned int io_file_get_flags(struct file *file); > +io_req_flags_t io_file_get_flags(struct file *file); > > static inline void io_file_bitmap_clear(struct io_file_table *table, int bit) > { > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > index cd9a137ad6ce..b8ca907b77eb 100644 > --- a/io_uring/io_uring.c > +++ b/io_uring/io_uring.c > @@ -1768,9 +1768,9 @@ static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_flags) > } > } > > -unsigned int io_file_get_flags(struct file *file) > +io_req_flags_t io_file_get_flags(struct file *file) > { > - unsigned int res = 0; > + io_req_flags_t res = 0; > > if (S_ISREG(file_inode(file)->i_mode)) > res |= REQ_F_ISREG; > @@ -2171,7 +2171,8 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, > /* req is partially pre-initialised, see io_preinit_req() */ > req->opcode = opcode = READ_ONCE(sqe->opcode); > /* same numerical values with corresponding REQ_F_*, safe to copy */ > - req->flags = sqe_flags = READ_ONCE(sqe->flags); > + sqe_flags = READ_ONCE(sqe->flags); Did you consider that READ_ONCE won't protect from load tearing the userspace value in 32-bit architectures? It builds silently, though, and I suspect it is mostly fine in the current code, but might become a bug eventually. Thanks, -- Gabriel Krisman Bertazi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/6] io_uring: expand main struct io_kiocb flags to 64-bits 2024-02-08 20:08 ` Gabriel Krisman Bertazi @ 2024-02-08 20:22 ` Jens Axboe 2024-02-08 20:52 ` Gabriel Krisman Bertazi 0 siblings, 1 reply; 18+ messages in thread From: Jens Axboe @ 2024-02-08 20:22 UTC (permalink / raw) To: Gabriel Krisman Bertazi; +Cc: io-uring On 2/8/24 1:08 PM, Gabriel Krisman Bertazi wrote: > Jens Axboe <[email protected]> writes: > > >> - TP_printk("ring %p, request %p, user_data 0x%llx, opcode %s, flags 0x%x, %s queue, work %p", >> + TP_printk("ring %p, request %p, user_data 0x%llx, opcode %s, flags 0x%lx, %s queue, work %p", >> __entry->ctx, __entry->req, __entry->user_data, >> - __get_str(op_str), >> - __entry->flags, __entry->rw ? "hashed" : "normal", __entry->work) >> + __get_str(op_str), (long) __entry->flags, > > Hi Jens, > > Minor, but on 32-bit kernel the cast is wrong since > sizeof(long)==4. Afaik, io_uring still builds on 32-bit archs. > > If you use (unsigned long long), it will be 64 bit anywhere. Ah thanks, I'll make that edit. >> @@ -2171,7 +2171,8 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, >> /* req is partially pre-initialised, see io_preinit_req() */ >> req->opcode = opcode = READ_ONCE(sqe->opcode); >> /* same numerical values with corresponding REQ_F_*, safe to copy */ >> - req->flags = sqe_flags = READ_ONCE(sqe->flags); >> + sqe_flags = READ_ONCE(sqe->flags); > > Did you consider that READ_ONCE won't protect from load tearing the > userspace value in 32-bit architectures? It builds silently, though, and > I suspect it is mostly fine in the current code, but might become a bug > eventually. sqe->flags is just a byte, so no tearing is possible here. The only thing that changed type is req->flags. -- Jens Axboe ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/6] io_uring: expand main struct io_kiocb flags to 64-bits 2024-02-08 20:22 ` Jens Axboe @ 2024-02-08 20:52 ` Gabriel Krisman Bertazi 0 siblings, 0 replies; 18+ messages in thread From: Gabriel Krisman Bertazi @ 2024-02-08 20:52 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring Jens Axboe <[email protected]> writes: > On 2/8/24 1:08 PM, Gabriel Krisman Bertazi wrote: >> Jens Axboe <[email protected]> writes: >> >> >>> - TP_printk("ring %p, request %p, user_data 0x%llx, opcode %s, flags 0x%x, %s queue, work %p", >>> + TP_printk("ring %p, request %p, user_data 0x%llx, opcode %s, flags 0x%lx, %s queue, work %p", >>> __entry->ctx, __entry->req, __entry->user_data, >>> - __get_str(op_str), >>> - __entry->flags, __entry->rw ? "hashed" : "normal", __entry->work) >>> + __get_str(op_str), (long) __entry->flags, >> >> Hi Jens, >> >> Minor, but on 32-bit kernel the cast is wrong since >> sizeof(long)==4. Afaik, io_uring still builds on 32-bit archs. >> >> If you use (unsigned long long), it will be 64 bit anywhere. > > Ah thanks, I'll make that edit. > >>> @@ -2171,7 +2171,8 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, >>> /* req is partially pre-initialised, see io_preinit_req() */ >>> req->opcode = opcode = READ_ONCE(sqe->opcode); >>> /* same numerical values with corresponding REQ_F_*, safe to copy */ >>> - req->flags = sqe_flags = READ_ONCE(sqe->flags); >>> + sqe_flags = READ_ONCE(sqe->flags); >> >> Did you consider that READ_ONCE won't protect from load tearing the >> userspace value in 32-bit architectures? It builds silently, though, and >> I suspect it is mostly fine in the current code, but might become a bug >> eventually. > > sqe->flags is just a byte, so no tearing is possible here. The only > thing that changed type is req->flags. You're right, of course. I confused the source of the read with struct io_kiocb. -- Gabriel Krisman Bertazi ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-02-08 20:52 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-02-06 16:22 [PATCHSET next 0/6] Misc cleanups / optimizations Jens Axboe 2024-02-06 16:22 ` [PATCH 1/6] io_uring: expand main struct io_kiocb flags to 64-bits Jens Axboe 2024-02-06 22:58 ` Jens Axboe 2024-02-07 0:43 ` Pavel Begunkov 2024-02-07 2:18 ` Jens Axboe 2024-02-07 3:22 ` Pavel Begunkov 2024-02-06 16:22 ` [PATCH 2/6] io_uring: add io_file_can_poll() helper Jens Axboe 2024-02-07 0:57 ` Pavel Begunkov 2024-02-07 2:15 ` Jens Axboe 2024-02-07 3:33 ` Pavel Begunkov 2024-02-06 16:22 ` [PATCH 3/6] io_uring/cancel: don't default to setting req->work.cancel_seq Jens Axboe 2024-02-06 16:22 ` [PATCH 4/6] io_uring: move io_kiocb->nr_tw into comp_list union Jens Axboe 2024-02-06 16:22 ` [PATCH 5/6] io_uring: mark the need to lock/unlock the ring as unlikely Jens Axboe 2024-02-06 16:22 ` [PATCH 6/6] io_uring/rw: remove dead file == NULL check Jens Axboe -- strict thread matches above, loose matches on Subject: below -- 2024-02-07 17:17 [PATCHSET v2 0/6] Misc cleanups / optimizations Jens Axboe 2024-02-07 17:17 ` [PATCH 1/6] io_uring: expand main struct io_kiocb flags to 64-bits Jens Axboe 2024-02-08 20:08 ` Gabriel Krisman Bertazi 2024-02-08 20:22 ` Jens Axboe 2024-02-08 20:52 ` Gabriel Krisman Bertazi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox