From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42B0AEB64DD for ; Thu, 13 Jul 2023 04:35:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233617AbjGMEfq (ORCPT ); Thu, 13 Jul 2023 00:35:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41112 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233855AbjGMEfj (ORCPT ); Thu, 13 Jul 2023 00:35:39 -0400 Received: from out-3.mta1.migadu.com (out-3.mta1.migadu.com [IPv6:2001:41d0:203:375::3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EAC22213E for ; Wed, 12 Jul 2023 21:35:16 -0700 (PDT) Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1689222915; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c6klrh59rWQ6J2FCmK1SmLHxhKSESr6uzI/AZ5wCkHE=; b=biGk8zUQouiledIIofF3VyrBUK83SZXKN1mzNsa4PkYS89OSh2hMlBHKsq70Uo5XgssaP5 6X/lFlW+J+X0afBUCjKPz+QSXikBzcRfFQNT/m52VoUYSt79cOs+DWeqma+MGnytJH4a8d hqFj835t/oILTmamataDBPpvdVefPcQ= Date: Thu, 13 Jul 2023 12:35:07 +0800 MIME-Version: 1.0 Subject: Re: [PATCH 3/3] io_uring: add support for getdents Content-Language: en-US To: Christian Brauner Cc: io-uring@vger.kernel.org, Jens Axboe , Dominique Martinet , Pavel Begunkov , Alexander Viro , Stefan Roesch , Clay Harris , Dave Chinner , linux-fsdevel@vger.kernel.org, Wanpeng Li References: <20230711114027.59945-1-hao.xu@linux.dev> <20230711114027.59945-4-hao.xu@linux.dev> <20230712-alltag-abberufen-67a615152bee@brauner> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Hao Xu In-Reply-To: <20230712-alltag-abberufen-67a615152bee@brauner> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org On 7/12/23 23:27, Christian Brauner wrote: > On Tue, Jul 11, 2023 at 07:40:27PM +0800, Hao Xu wrote: >> From: Hao Xu >> >> This add support for getdents64 to io_uring, acting exactly like the >> syscall: the directory is iterated from it's current's position as >> stored in the file struct, and the file's position is updated exactly as >> if getdents64 had been called. >> >> For filesystems that support NOWAIT in iterate_shared(), try to use it >> first; if a user already knows the filesystem they use do not support >> nowait they can force async through IOSQE_ASYNC in the sqe flags, >> avoiding the need to bounce back through a useless EAGAIN return. >> >> Co-developed-by: Dominique Martinet >> Signed-off-by: Dominique Martinet >> Signed-off-by: Hao Xu >> --- >> include/uapi/linux/io_uring.h | 7 ++++ >> io_uring/fs.c | 60 +++++++++++++++++++++++++++++++++++ >> io_uring/fs.h | 3 ++ >> io_uring/opdef.c | 8 +++++ >> 4 files changed, 78 insertions(+) >> >> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h >> index 08720c7bd92f..6c0d521135a6 100644 >> --- a/include/uapi/linux/io_uring.h >> +++ b/include/uapi/linux/io_uring.h >> @@ -65,6 +65,7 @@ struct io_uring_sqe { >> __u32 xattr_flags; >> __u32 msg_ring_flags; >> __u32 uring_cmd_flags; >> + __u32 getdents_flags; >> }; >> __u64 user_data; /* data to be passed back at completion time */ >> /* pack this to avoid bogus arm OABI complaints */ >> @@ -235,6 +236,7 @@ enum io_uring_op { >> IORING_OP_URING_CMD, >> IORING_OP_SEND_ZC, >> IORING_OP_SENDMSG_ZC, >> + IORING_OP_GETDENTS, >> >> /* this goes last, obviously */ >> IORING_OP_LAST, >> @@ -273,6 +275,11 @@ enum io_uring_op { >> */ >> #define SPLICE_F_FD_IN_FIXED (1U << 31) /* the last bit of __u32 */ >> >> +/* >> + * sqe->getdents_flags >> + */ >> +#define IORING_GETDENTS_REWIND (1U << 0) >> + >> /* >> * POLL_ADD flags. Note that since sqe->poll_events is the flag space, the >> * command flags for POLL_ADD are stored in sqe->len. >> diff --git a/io_uring/fs.c b/io_uring/fs.c >> index f6a69a549fd4..77f00577e09c 100644 >> --- a/io_uring/fs.c >> +++ b/io_uring/fs.c >> @@ -47,6 +47,13 @@ struct io_link { >> int flags; >> }; >> >> +struct io_getdents { >> + struct file *file; >> + struct linux_dirent64 __user *dirent; >> + unsigned int count; >> + int flags; >> +}; >> + >> int io_renameat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) >> { >> struct io_rename *ren = io_kiocb_to_cmd(req, struct io_rename); >> @@ -291,3 +298,56 @@ void io_link_cleanup(struct io_kiocb *req) >> putname(sl->oldpath); >> putname(sl->newpath); >> } >> + >> +int io_getdents_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) >> +{ >> + struct io_getdents *gd = io_kiocb_to_cmd(req, struct io_getdents); >> + >> + if (READ_ONCE(sqe->off) != 0) >> + return -EINVAL; >> + >> + gd->dirent = u64_to_user_ptr(READ_ONCE(sqe->addr)); >> + gd->count = READ_ONCE(sqe->len); >> + >> + return 0; >> +} >> + >> +int io_getdents(struct io_kiocb *req, unsigned int issue_flags) >> +{ >> + struct io_getdents *gd = io_kiocb_to_cmd(req, struct io_getdents); >> + struct file *file; >> + unsigned long getdents_flags = 0; >> + bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK; >> + bool should_lock = false; >> + int ret; >> + >> + if (force_nonblock) { >> + if (!(req->file->f_mode & FMODE_NOWAIT)) >> + return -EAGAIN; >> + >> + getdents_flags = DIR_CONTEXT_F_NOWAIT; > > I mentioned this on the other patch but it seems really pointless to > have that extra flag. I would really like to hear a good reason for > this. > >> + } >> + >> + file = req->file; >> + if (file && (file->f_mode & FMODE_ATOMIC_POS)) { >> + if (file_count(file) > 1) > > Assume we have a regular non-threaded process that just opens an fd to a > file. The process registers an async readdir request via that fd for the > file with io_uring and goes to do other stuff while waiting for the > result. > > Some time later, io_uring gets to io_getdents() and the task is still > single threaded and the file hasn't been shared in the meantime. So > io_getdents() doesn't take the lock and starts the readdir() call. > > Concurrently, the process that registered the io_uring request was free > to do other stuff and issued a synchronous readdir() system call which > calls fdget_pos(). Since the fdtable still isn't shared it doesn't > increment f_count and doesn't acquire the mutex. Now there's another > concurrent readdir() going on. > > (Similar thing can happen if the process creates a thread for example.) > > Two readdir() requests now proceed concurrently which is not intended. > Now to verify that this race can't happen with io_uring: > > * regular fds: > It seems that io_uring calls fget() on each regular file descriptor > when an async request is registered. So that means that io_uring > always hold its own explicit reference here. > So as long as the original task is alive or another thread is alive > f_count is guaranteed to be > 1 and so the mutex would always be > acquired. > > If the registering process dies right before io_uring gets to the > io_getdents() request no other process can steal the fd anymore and in > that case the readdir call would not lock. But that's fine. > > * fixed fds: > I don't know the reference counting rules here. io_uring would need to > ensure that it's impossible for two async readdir requests via a fixed > fd to race because f_count is == 1. > > Iiuc, if a process registers a file it opened as a fixed file and > immediately closes the fd afterwards - without anyone else holding a > reference to that file - and only uses the fixed fd going forward, the > f_count of that file in io_uring's fixed file table is always 1. > > So one could issue any number of concurrent readdir requests with no > mutual exclusion. So for fixed files there definitely is a race, no? Hi Christian, The ref logic for fixed file is that it does fdget() when registering the file, and fdput() when unregistering it. So the ref in between is always > 1. The fixed file feature is to reduce frequent fdget/fdput, but it does call them at the register/unregister time. > > All of that could ofc be simplified if we could just always acquire the > mutex in fdget_pos() and other places and drop that file_count(file) > 1 > optimization everywhere. But I have no idea if the optimization for not > acquiring the mutex if f_count == 1 is worth it? > > I hope I didn't confuse myself here. > > Jens, do yo have any input here? > >> + should_lock = true; >> + } >> + if (should_lock) { >> + if (!force_nonblock) >> + mutex_lock(&file->f_pos_lock); >> + else if (!mutex_trylock(&file->f_pos_lock)) >> + return -EAGAIN; >> + } > > Open-coding this seems extremely brittle with an invitation for subtle > bugs. Could you elaborate on this, I'm not sure that I understand it quite well. Sorry for my poor English. Thanks, Hao > >> + >> + ret = vfs_getdents(file, gd->dirent, gd->count, getdents_flags); >> + if (should_lock) >> + mutex_unlock(&file->f_pos_lock); >> + >> + if (ret == -EAGAIN && force_nonblock) >> + return -EAGAIN; >> + >> + io_req_set_res(req, ret, 0); >> + return 0; >> +} >> + >> diff --git a/io_uring/fs.h b/io_uring/fs.h >> index 0bb5efe3d6bb..f83a6f3a678d 100644 >> --- a/io_uring/fs.h >> +++ b/io_uring/fs.h >> @@ -18,3 +18,6 @@ int io_symlinkat(struct io_kiocb *req, unsigned int issue_flags); >> int io_linkat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); >> int io_linkat(struct io_kiocb *req, unsigned int issue_flags); >> void io_link_cleanup(struct io_kiocb *req); >> + >> +int io_getdents_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); >> +int io_getdents(struct io_kiocb *req, unsigned int issue_flags); >> diff --git a/io_uring/opdef.c b/io_uring/opdef.c >> index 3b9c6489b8b6..1bae6b2a8d0b 100644 >> --- a/io_uring/opdef.c >> +++ b/io_uring/opdef.c >> @@ -428,6 +428,11 @@ const struct io_issue_def io_issue_defs[] = { >> .prep = io_eopnotsupp_prep, >> #endif >> }, >> + [IORING_OP_GETDENTS] = { >> + .needs_file = 1, >> + .prep = io_getdents_prep, >> + .issue = io_getdents, >> + }, >> }; >> >> >> @@ -648,6 +653,9 @@ const struct io_cold_def io_cold_defs[] = { >> .fail = io_sendrecv_fail, >> #endif >> }, >> + [IORING_OP_GETDENTS] = { >> + .name = "GETDENTS", >> + }, >> }; >> >> const char *io_uring_get_opcode(u8 opcode) >> -- >> 2.25.1 >>