public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCHSET v2 0/6] Support selectable file descriptors
@ 2020-03-04 18:00 Jens Axboe
  2020-03-04 18:00 ` [PATCH 1/6] fs: openat2: Extend open_how to allow userspace-selected fds Jens Axboe
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Jens Axboe @ 2020-03-04 18:00 UTC (permalink / raw)
  To: io-uring; +Cc: jlayton, josh

One of the fabled features with chains has long been the desire to
support things like:

<open fileX><read from fileX><close fileX>

in a single chain. This currently doesn't work, since the read/close
depends on what file descriptor we get on open.

The original attempt at solving this provided a means to pass
descriptors between chains in a link, this version takes a different
route. Based on Josh's support for O_SPECIFIC_FD, we can instead control
what fd value we're going to get out of open (or accept). With that in
place, we don't need to do any magic to make this work. The above chain
then becomes:

<open fileX with fd Y><read from fd Y><close fd Y>

which is a lot more useful, and allows any sort of weird chains without
needing to nest "last open" file descriptors.

Updated the test program to use this approach:

https://git.kernel.dk/cgit/liburing/plain/test/orc.c?h=fd-select

which forces the use of fd==89 for the open, and then uses that for the
read and close.

Outside of this adaptation, fixed a few bugs and cleaned things up.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] fs: openat2: Extend open_how to allow userspace-selected fds
  2020-03-04 18:00 [PATCHSET v2 0/6] Support selectable file descriptors Jens Axboe
@ 2020-03-04 18:00 ` Jens Axboe
  2020-03-04 18:00 ` [PATCH 2/6] io_uring: move CLOSE req->file checking into handler Jens Axboe
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2020-03-04 18:00 UTC (permalink / raw)
  To: io-uring; +Cc: jlayton, josh, Jens Axboe

From: Josh Triplett <[email protected]>

Inspired by the X protocol's handling of XIDs, allow userspace to select
the file descriptor opened by openat2, so that it can use the resulting
file descriptor in subsequent system calls without waiting for the
response to openat2.

In io_uring, this allows sequences like openat2/read/close without
waiting for the openat2 to complete. Multiple such sequences can
overlap, as long as each uses a distinct file descriptor.

Add a new O_SPECIFIC_FD open flag to enable this behavior, only accepted
by openat2 for now (ignored by open/openat like all unknown flags). Add
an fd field to struct open_how (along with appropriate padding, and
verify that the padding is 0 to allow replacing the padding with a field
in the future).

The file table has a corresponding new function
get_specific_unused_fd_flags, which gets the specified file descriptor
if O_SPECIFIC_FD is set (and the fd isn't -1); otherwise it falls back
to get_unused_fd_flags, to simplify callers.

The specified file descriptor must not already be open; if it is,
get_specific_unused_fd_flags will fail with -EBUSY. This helps catch
userspace errors.

When O_SPECIFIC_FD is set, and fd is not -1, openat2 will use the
specified file descriptor rather than finding the lowest available one.

Test program:

    #include <err.h>
    #include <fcntl.h>
    #include <stdio.h>
    #include <unistd.h>

    int main(void)
    {
        struct open_how how = { .flags = O_RDONLY | O_SPECIFIC_FD, .fd = 42 };
        int fd = openat2(AT_FDCWD, "/dev/null", &how, sizeof(how));
        if (fd < 0)
            err(1, "openat2");
        printf("fd=%d\n", fd); // prints fd=42
        return 0;
    }

Signed-off-by: Josh Triplett <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
---
 fs/fcntl.c                       |  2 +-
 fs/file.c                        | 33 ++++++++++++++++++++++++++++++++
 fs/io_uring.c                    |  2 +-
 fs/open.c                        |  6 ++++--
 include/linux/fcntl.h            |  5 +++--
 include/linux/file.h             |  1 +
 include/uapi/asm-generic/fcntl.h |  4 ++++
 include/uapi/linux/openat2.h     |  2 ++
 8 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 9bc167562ee8..1396bf8d9357 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -1031,7 +1031,7 @@ static int __init fcntl_init(void)
 	 * Exceptions: O_NONBLOCK is a two bit define on parisc; O_NDELAY
 	 * is defined as O_NONBLOCK on some platforms and not on others.
 	 */
-	BUILD_BUG_ON(21 - 1 /* for O_RDONLY being 0 */ !=
+	BUILD_BUG_ON(22 - 1 /* for O_RDONLY being 0 */ !=
 		HWEIGHT32(
 			(VALID_OPEN_FLAGS & ~(O_NONBLOCK | O_NDELAY)) |
 			__FMODE_EXEC | __FMODE_NONOTIFY));
diff --git a/fs/file.c b/fs/file.c
index a364e1a9b7e8..1986d82fcf8f 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -564,6 +564,39 @@ void put_unused_fd(unsigned int fd)
 
 EXPORT_SYMBOL(put_unused_fd);
 
+int get_specific_unused_fd_flags(unsigned int fd, unsigned flags)
+{
+	int ret;
+	struct fdtable *fdt;
+	struct files_struct *files = current->files;
+
+	if (!(flags & O_SPECIFIC_FD) || fd == -1)
+		return get_unused_fd_flags(flags);
+
+	if (fd >= rlimit(RLIMIT_NOFILE))
+		return -EBADF;
+
+	spin_lock(&files->file_lock);
+	ret = expand_files(files, fd);
+	if (unlikely(ret < 0))
+		goto out_unlock;
+	fdt = files_fdtable(files);
+	if (fdt->fd[fd]) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+	__set_open_fd(fd, fdt);
+	if (flags & O_CLOEXEC)
+		__set_close_on_exec(fd, fdt);
+	else
+		__clear_close_on_exec(fd, fdt);
+	ret = fd;
+
+out_unlock:
+	spin_unlock(&files->file_lock);
+	return ret;
+}
+
 /*
  * Install a file pointer in the fd array.
  *
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6b32f45d3612..0fcd6968cf0f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2962,7 +2962,7 @@ static int io_openat2(struct io_kiocb *req, bool force_nonblock)
 	if (ret)
 		goto err;
 
-	ret = get_unused_fd_flags(req->open.how.flags);
+	ret = get_specific_unused_fd_flags(req->open.how.fd, req->open.how.flags);
 	if (ret < 0)
 		goto err;
 
diff --git a/fs/open.c b/fs/open.c
index 0788b3715731..570166eb11eb 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -961,7 +961,7 @@ EXPORT_SYMBOL(open_with_fake_path);
 inline struct open_how build_open_how(int flags, umode_t mode)
 {
 	struct open_how how = {
-		.flags = flags & VALID_OPEN_FLAGS,
+		.flags = flags & VALID_OPEN_FLAGS & ~O_SPECIFIC_FD,
 		.mode = mode & S_IALLUGO,
 	};
 
@@ -1144,7 +1144,7 @@ static long do_sys_openat2(int dfd, const char __user *filename,
 	if (IS_ERR(tmp))
 		return PTR_ERR(tmp);
 
-	fd = get_unused_fd_flags(how->flags);
+	fd = get_specific_unused_fd_flags(how->fd, how->flags);
 	if (fd >= 0) {
 		struct file *f = do_filp_open(dfd, tmp, &op);
 		if (IS_ERR(f)) {
@@ -1194,6 +1194,8 @@ SYSCALL_DEFINE4(openat2, int, dfd, const char __user *, filename,
 	err = copy_struct_from_user(&tmp, sizeof(tmp), how, usize);
 	if (err)
 		return err;
+	if (tmp.pad != 0)
+		return -EINVAL;
 
 	/* O_LARGEFILE is only allowed for non-O_PATH. */
 	if (!(tmp.flags & O_PATH) && force_o_largefile())
diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h
index 7bcdcf4f6ab2..728849bcd8fa 100644
--- a/include/linux/fcntl.h
+++ b/include/linux/fcntl.h
@@ -10,7 +10,7 @@
 	(O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC | \
 	 O_APPEND | O_NDELAY | O_NONBLOCK | O_NDELAY | __O_SYNC | O_DSYNC | \
 	 FASYNC	| O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \
-	 O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE)
+	 O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE | O_SPECIFIC_FD)
 
 /* List of all valid flags for the how->upgrade_mask argument: */
 #define VALID_UPGRADE_FLAGS \
@@ -23,7 +23,8 @@
 
 /* List of all open_how "versions". */
 #define OPEN_HOW_SIZE_VER0	24 /* sizeof first published struct */
-#define OPEN_HOW_SIZE_LATEST	OPEN_HOW_SIZE_VER0
+#define OPEN_HOW_SIZE_VER1	32 /* added fd and pad */
+#define OPEN_HOW_SIZE_LATEST	OPEN_HOW_SIZE_VER1
 
 #ifndef force_o_largefile
 #define force_o_largefile() (!IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T))
diff --git a/include/linux/file.h b/include/linux/file.h
index c6c7b24ea9f7..2bf699b36506 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -87,6 +87,7 @@ extern void set_close_on_exec(unsigned int fd, int flag);
 extern bool get_close_on_exec(unsigned int fd);
 extern int get_unused_fd_flags(unsigned flags);
 extern void put_unused_fd(unsigned int fd);
+extern int get_specific_unused_fd_flags(unsigned int fd, unsigned flags);
 
 extern void fd_install(unsigned int fd, struct file *file);
 
diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
index 9dc0bf0c5a6e..d3de5b8b3955 100644
--- a/include/uapi/asm-generic/fcntl.h
+++ b/include/uapi/asm-generic/fcntl.h
@@ -89,6 +89,10 @@
 #define __O_TMPFILE	020000000
 #endif
 
+#ifndef O_SPECIFIC_FD
+#define O_SPECIFIC_FD	01000000000	/* open as specified fd */
+#endif
+
 /* a horrid kludge trying to make sure that this will fail on old kernels */
 #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
 #define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT)      
diff --git a/include/uapi/linux/openat2.h b/include/uapi/linux/openat2.h
index 58b1eb711360..50d1206b64c2 100644
--- a/include/uapi/linux/openat2.h
+++ b/include/uapi/linux/openat2.h
@@ -20,6 +20,8 @@ struct open_how {
 	__u64 flags;
 	__u64 mode;
 	__u64 resolve;
+	__s32 fd;
+	__u32 pad; /* Must be 0 in the current version */
 };
 
 /* how->resolve flags for openat2(2). */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/6] io_uring: move CLOSE req->file checking into handler
  2020-03-04 18:00 [PATCHSET v2 0/6] Support selectable file descriptors Jens Axboe
  2020-03-04 18:00 ` [PATCH 1/6] fs: openat2: Extend open_how to allow userspace-selected fds Jens Axboe
@ 2020-03-04 18:00 ` Jens Axboe
  2020-03-04 18:00 ` [PATCH 3/6] io_uring: move read/write side file based prep into op handler Jens Axboe
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2020-03-04 18:00 UTC (permalink / raw)
  To: io-uring; +Cc: jlayton, josh, Jens Axboe

In preparation for not needing req->file in on the prep side at all.

Signed-off-by: Jens Axboe <[email protected]>
---
 fs/io_uring.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 0fcd6968cf0f..c29a721114e0 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3367,10 +3367,8 @@ static int io_close_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 		return -EBADF;
 
 	req->close.fd = READ_ONCE(sqe->fd);
-	if (req->file->f_op == &io_uring_fops ||
-	    req->close.fd == req->ctx->ring_fd)
+	if (req->close.fd == req->ctx->ring_fd)
 		return -EBADF;
-
 	return 0;
 }
 
@@ -3400,6 +3398,9 @@ static int io_close(struct io_kiocb *req, bool force_nonblock)
 {
 	int ret;
 
+	if (req->file->f_op == &io_uring_fops)
+		return -EBADF;
+
 	req->close.put_file = NULL;
 	ret = __close_fd_get_file(req->close.fd, &req->close.put_file);
 	if (ret < 0)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6] io_uring: move read/write side file based prep into op handler
  2020-03-04 18:00 [PATCHSET v2 0/6] Support selectable file descriptors Jens Axboe
  2020-03-04 18:00 ` [PATCH 1/6] fs: openat2: Extend open_how to allow userspace-selected fds Jens Axboe
  2020-03-04 18:00 ` [PATCH 2/6] io_uring: move CLOSE req->file checking into handler Jens Axboe
@ 2020-03-04 18:00 ` Jens Axboe
  2020-03-04 18:00 ` [PATCH 4/6] io_uring: support deferred retrival of file from fd Jens Axboe
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2020-03-04 18:00 UTC (permalink / raw)
  To: io-uring; +Cc: jlayton, josh, Jens Axboe

In preparation for not needing req->file in on the prep side at all.

Signed-off-by: Jens Axboe <[email protected]>
---
 fs/io_uring.c | 73 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 44 insertions(+), 29 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index c29a721114e0..ed70ac09ca18 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2030,37 +2030,19 @@ static bool io_file_supports_async(struct file *file)
 	return false;
 }
 
-static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
-		      bool force_nonblock)
+static int io_prep_rw(struct io_kiocb *req, bool force_nonblock)
 {
-	struct io_ring_ctx *ctx = req->ctx;
 	struct kiocb *kiocb = &req->rw.kiocb;
-	unsigned ioprio;
-	int ret;
 
 	if (S_ISREG(file_inode(req->file)->i_mode))
 		req->flags |= REQ_F_ISREG;
 
-	kiocb->ki_pos = READ_ONCE(sqe->off);
 	if (kiocb->ki_pos == -1 && !(req->file->f_mode & FMODE_STREAM)) {
 		req->flags |= REQ_F_CUR_POS;
 		kiocb->ki_pos = req->file->f_pos;
 	}
 	kiocb->ki_hint = ki_hint_validate(file_write_hint(kiocb->ki_filp));
-	kiocb->ki_flags = iocb_flags(kiocb->ki_filp);
-	ret = kiocb_set_rw_flags(kiocb, READ_ONCE(sqe->rw_flags));
-	if (unlikely(ret))
-		return ret;
-
-	ioprio = READ_ONCE(sqe->ioprio);
-	if (ioprio) {
-		ret = ioprio_check_cap(ioprio);
-		if (ret)
-			return ret;
-
-		kiocb->ki_ioprio = ioprio;
-	} else
-		kiocb->ki_ioprio = get_current_ioprio();
+	kiocb->ki_flags |= iocb_flags(kiocb->ki_filp);
 
 	/* don't allow async punt if RWF_NOWAIT was requested */
 	if ((kiocb->ki_flags & IOCB_NOWAIT) ||
@@ -2070,7 +2052,7 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	if (force_nonblock)
 		kiocb->ki_flags |= IOCB_NOWAIT;
 
-	if (ctx->flags & IORING_SETUP_IOPOLL) {
+	if (req->ctx->flags & IORING_SETUP_IOPOLL) {
 		if (!(kiocb->ki_flags & IOCB_DIRECT) ||
 		    !kiocb->ki_filp->f_op->iopoll)
 			return -EOPNOTSUPP;
@@ -2084,6 +2066,31 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 		kiocb->ki_complete = io_complete_rw;
 	}
 
+	return 0;
+}
+
+static int io_sqe_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+	struct kiocb *kiocb = &req->rw.kiocb;
+	unsigned ioprio;
+	int ret;
+
+	kiocb->ki_pos = READ_ONCE(sqe->off);
+	kiocb->ki_flags = 0;
+	ret = kiocb_set_rw_flags(kiocb, READ_ONCE(sqe->rw_flags));
+	if (unlikely(ret))
+		return ret;
+
+	ioprio = READ_ONCE(sqe->ioprio);
+	if (ioprio) {
+		ret = ioprio_check_cap(ioprio);
+		if (ret)
+			return ret;
+
+		kiocb->ki_ioprio = ioprio;
+	} else
+		kiocb->ki_ioprio = get_current_ioprio();
+
 	req->rw.addr = READ_ONCE(sqe->addr);
 	req->rw.len = READ_ONCE(sqe->len);
 	/* we own ->private, reuse it for the buffer index  / buffer ID */
@@ -2487,13 +2494,10 @@ static int io_read_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	struct iov_iter iter;
 	ssize_t ret;
 
-	ret = io_prep_rw(req, sqe, force_nonblock);
+	ret = io_sqe_prep_rw(req, sqe);
 	if (ret)
 		return ret;
 
-	if (unlikely(!(req->file->f_mode & FMODE_READ)))
-		return -EBADF;
-
 	/* either don't need iovec imported or already have it */
 	if (!req->io || req->flags & REQ_F_NEED_CLEANUP)
 		return 0;
@@ -2518,6 +2522,13 @@ static int io_read(struct io_kiocb *req, bool force_nonblock)
 	size_t iov_count;
 	ssize_t io_size, ret;
 
+	if (unlikely(!(req->file->f_mode & FMODE_READ)))
+		return -EBADF;
+
+	ret = io_prep_rw(req, force_nonblock);
+	if (ret)
+		return ret;
+
 	ret = io_import_iovec(READ, req, &iovec, &iter, !force_nonblock);
 	if (ret < 0)
 		return ret;
@@ -2576,13 +2587,10 @@ static int io_write_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	struct iov_iter iter;
 	ssize_t ret;
 
-	ret = io_prep_rw(req, sqe, force_nonblock);
+	ret = io_sqe_prep_rw(req, sqe);
 	if (ret)
 		return ret;
 
-	if (unlikely(!(req->file->f_mode & FMODE_WRITE)))
-		return -EBADF;
-
 	/* either don't need iovec imported or already have it */
 	if (!req->io || req->flags & REQ_F_NEED_CLEANUP)
 		return 0;
@@ -2607,6 +2615,13 @@ static int io_write(struct io_kiocb *req, bool force_nonblock)
 	size_t iov_count;
 	ssize_t ret, io_size;
 
+	if (unlikely(!(req->file->f_mode & FMODE_WRITE)))
+		return -EBADF;
+
+	ret = io_prep_rw(req, force_nonblock);
+	if (ret)
+		return ret;
+
 	ret = io_import_iovec(WRITE, req, &iovec, &iter, !force_nonblock);
 	if (ret < 0)
 		return ret;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/6] io_uring: support deferred retrival of file from fd
  2020-03-04 18:00 [PATCHSET v2 0/6] Support selectable file descriptors Jens Axboe
                   ` (2 preceding siblings ...)
  2020-03-04 18:00 ` [PATCH 3/6] io_uring: move read/write side file based prep into op handler Jens Axboe
@ 2020-03-04 18:00 ` Jens Axboe
  2020-03-04 18:00 ` [PATCH 5/6] net: allow specific fd selection for __sys_accept4_file() Jens Axboe
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2020-03-04 18:00 UTC (permalink / raw)
  To: io-uring; +Cc: jlayton, josh, Jens Axboe

We might not be able to grab the file we need at prep time, if the fd
needed is dependent on completion of another request. Support deferred
fd lookup.

Signed-off-by: Jens Axboe <[email protected]>
---
 fs/io_uring.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index ed70ac09ca18..a1fc0d2acf91 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -511,6 +511,7 @@ enum {
 	REQ_F_OVERFLOW_BIT,
 	REQ_F_POLLED_BIT,
 	REQ_F_BUFFER_SELECTED_BIT,
+	REQ_F_DEFERRED_FD_BIT,
 
 	/* not a real bit, just to check we're not overflowing the space */
 	__REQ_F_LAST_BIT,
@@ -562,6 +563,8 @@ enum {
 	REQ_F_POLLED		= BIT(REQ_F_POLLED_BIT),
 	/* buffer already selected */
 	REQ_F_BUFFER_SELECTED	= BIT(REQ_F_BUFFER_SELECTED_BIT),
+	/* file assignment has been deferred */
+	REQ_F_DEFERRED_FD	= BIT(REQ_F_DEFERRED_FD_BIT),
 };
 
 struct async_poll {
@@ -599,6 +602,7 @@ struct io_kiocb {
 	struct io_async_ctx		*io;
 	bool				needs_fixed_file;
 	u8				opcode;
+	int				deferred_fd;
 
 	struct io_ring_ctx	*ctx;
 	struct list_head	list;
@@ -5002,6 +5006,12 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	struct io_ring_ctx *ctx = req->ctx;
 	int ret;
 
+	if (req->flags & REQ_F_DEFERRED_FD) {
+		ret = io_file_get(NULL, req, req->deferred_fd, &req->file, false);
+		if (ret)
+			return ret;
+	}
+
 	switch (req->opcode) {
 	case IORING_OP_NOP:
 		ret = io_nop(req);
@@ -5328,7 +5338,7 @@ static int io_req_set_file(struct io_submit_state *state, struct io_kiocb *req,
 			   const struct io_uring_sqe *sqe)
 {
 	unsigned flags;
-	int fd;
+	int fd, ret;
 	bool fixed;
 
 	flags = READ_ONCE(sqe->flags);
@@ -5341,7 +5351,14 @@ static int io_req_set_file(struct io_submit_state *state, struct io_kiocb *req,
 	if (unlikely(!fixed && req->needs_fixed_file))
 		return -EBADF;
 
-	return io_file_get(state, req, fd, &req->file, fixed);
+	ret = io_file_get(state, req, fd, &req->file, fixed);
+	if (ret) {
+		if (fixed)
+			return ret;
+		req->deferred_fd = fd;
+		req->flags |= REQ_F_DEFERRED_FD;
+	}
+	return 0;
 }
 
 static int io_grab_files(struct io_kiocb *req)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/6] net: allow specific fd selection for __sys_accept4_file()
  2020-03-04 18:00 [PATCHSET v2 0/6] Support selectable file descriptors Jens Axboe
                   ` (3 preceding siblings ...)
  2020-03-04 18:00 ` [PATCH 4/6] io_uring: support deferred retrival of file from fd Jens Axboe
@ 2020-03-04 18:00 ` Jens Axboe
  2020-03-04 18:00 ` [PATCH 6/6] io_uring: allow specific fd for IORING_OP_ACCEPT Jens Axboe
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2020-03-04 18:00 UTC (permalink / raw)
  To: io-uring; +Cc: jlayton, josh, Jens Axboe

If 'open_fd' being passed in is != -1, make accept4() use the specific
fd instead of finding a free unused one.

Signed-off-by: Jens Axboe <[email protected]>
---
 include/linux/socket.h | 2 +-
 net/socket.c           | 8 +++++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index fc59ac825561..ce6c97d4a439 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -405,7 +405,7 @@ extern int __sys_sendto(int fd, void __user *buff, size_t len,
 			int addr_len);
 extern int __sys_accept4_file(struct file *file, unsigned file_flags,
 			struct sockaddr __user *upeer_sockaddr,
-			 int __user *upeer_addrlen, int flags);
+			 int __user *upeer_addrlen, int flags, int open_fd);
 extern int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr,
 			 int __user *upeer_addrlen, int flags);
 extern int __sys_socket(int family, int type, int protocol);
diff --git a/net/socket.c b/net/socket.c
index 70ede74ab24b..a9793b778701 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1707,7 +1707,7 @@ SYSCALL_DEFINE2(listen, int, fd, int, backlog)
 
 int __sys_accept4_file(struct file *file, unsigned file_flags,
 		       struct sockaddr __user *upeer_sockaddr,
-		       int __user *upeer_addrlen, int flags)
+		       int __user *upeer_addrlen, int flags, int open_fd)
 {
 	struct socket *sock, *newsock;
 	struct file *newfile;
@@ -1719,6 +1719,8 @@ int __sys_accept4_file(struct file *file, unsigned file_flags,
 
 	if (SOCK_NONBLOCK != O_NONBLOCK && (flags & SOCK_NONBLOCK))
 		flags = (flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
+	if (open_fd != -1)
+		flags |= O_SPECIFIC_FD;
 
 	sock = sock_from_file(file, &err);
 	if (!sock)
@@ -1738,7 +1740,7 @@ int __sys_accept4_file(struct file *file, unsigned file_flags,
 	 */
 	__module_get(newsock->ops->owner);
 
-	newfd = get_unused_fd_flags(flags);
+	newfd = get_specific_unused_fd_flags(open_fd, flags);
 	if (unlikely(newfd < 0)) {
 		err = newfd;
 		sock_release(newsock);
@@ -1807,7 +1809,7 @@ int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr,
 	f = fdget(fd);
 	if (f.file) {
 		ret = __sys_accept4_file(f.file, 0, upeer_sockaddr,
-						upeer_addrlen, flags);
+						upeer_addrlen, flags, -1);
 		if (f.flags)
 			fput(f.file);
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/6] io_uring: allow specific fd for IORING_OP_ACCEPT
  2020-03-04 18:00 [PATCHSET v2 0/6] Support selectable file descriptors Jens Axboe
                   ` (4 preceding siblings ...)
  2020-03-04 18:00 ` [PATCH 5/6] net: allow specific fd selection for __sys_accept4_file() Jens Axboe
@ 2020-03-04 18:00 ` Jens Axboe
  2020-03-04 19:02   ` Josh Triplett
  2020-03-04 19:03 ` [PATCHSET v2 0/6] Support selectable file descriptors Josh Triplett
  2020-03-09 20:33 ` Stefan Metzmacher
  7 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2020-03-04 18:00 UTC (permalink / raw)
  To: io-uring; +Cc: jlayton, josh, Jens Axboe

sqe->len can be set to a specific fd we want to use for accept4(), so
it uses that one instead of just assigning a free unused one.

Signed-off-by: Jens Axboe <[email protected]>
---
 fs/io_uring.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index a1fc0d2acf91..11bd143847ad 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -354,6 +354,7 @@ struct io_accept {
 	struct sockaddr __user		*addr;
 	int __user			*addr_len;
 	int				flags;
+	int				open_fd;
 };
 
 struct io_sync {
@@ -3943,12 +3944,15 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 
 	if (unlikely(req->ctx->flags & (IORING_SETUP_IOPOLL|IORING_SETUP_SQPOLL)))
 		return -EINVAL;
-	if (sqe->ioprio || sqe->len || sqe->buf_index)
+	if (sqe->ioprio || sqe->buf_index)
 		return -EINVAL;
 
 	accept->addr = u64_to_user_ptr(READ_ONCE(sqe->addr));
 	accept->addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr2));
 	accept->flags = READ_ONCE(sqe->accept_flags);
+	accept->open_fd = READ_ONCE(sqe->len);
+	if (!accept->open_fd)
+		accept->open_fd = -1;
 	return 0;
 #else
 	return -EOPNOTSUPP;
@@ -3964,7 +3968,8 @@ static int __io_accept(struct io_kiocb *req, bool force_nonblock)
 
 	file_flags = force_nonblock ? O_NONBLOCK : 0;
 	ret = __sys_accept4_file(req->file, file_flags, accept->addr,
-					accept->addr_len, accept->flags);
+					accept->addr_len, accept->flags,
+					accept->open_fd);
 	if (ret == -EAGAIN && force_nonblock)
 		return -EAGAIN;
 	if (ret == -ERESTARTSYS)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] io_uring: allow specific fd for IORING_OP_ACCEPT
  2020-03-04 18:00 ` [PATCH 6/6] io_uring: allow specific fd for IORING_OP_ACCEPT Jens Axboe
@ 2020-03-04 19:02   ` Josh Triplett
  2020-03-04 19:09     ` Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: Josh Triplett @ 2020-03-04 19:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, jlayton

On Wed, Mar 04, 2020 at 11:00:16AM -0700, Jens Axboe wrote:
> sqe->len can be set to a specific fd we want to use for accept4(), so
> it uses that one instead of just assigning a free unused one.
[...]
> +	accept->open_fd = READ_ONCE(sqe->len);
> +	if (!accept->open_fd)
> +		accept->open_fd = -1;

0 is a valid file descriptor. I realize that it seems unlikely, but I
went to a fair bit of trouble in my patch series to ensure that
userspace could use any valid file descriptor openat2, without a corner
case like "you can't open as file descriptor 0", even though it would
have been more convenient to just say "if you pass a non-zero fd in
open_how". Please consider accepting a flag to determine the validity of
open_fd.

- Josh Triplett

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET v2 0/6] Support selectable file descriptors
  2020-03-04 18:00 [PATCHSET v2 0/6] Support selectable file descriptors Jens Axboe
                   ` (5 preceding siblings ...)
  2020-03-04 18:00 ` [PATCH 6/6] io_uring: allow specific fd for IORING_OP_ACCEPT Jens Axboe
@ 2020-03-04 19:03 ` Josh Triplett
  2020-03-04 19:10   ` Jens Axboe
  2020-03-09 20:33 ` Stefan Metzmacher
  7 siblings, 1 reply; 19+ messages in thread
From: Josh Triplett @ 2020-03-04 19:03 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, jlayton

On Wed, Mar 04, 2020 at 11:00:10AM -0700, Jens Axboe wrote:
> One of the fabled features with chains has long been the desire to
> support things like:
> 
> <open fileX><read from fileX><close fileX>
> 
> in a single chain. This currently doesn't work, since the read/close
> depends on what file descriptor we get on open.
> 
> The original attempt at solving this provided a means to pass
> descriptors between chains in a link, this version takes a different
> route. Based on Josh's support for O_SPECIFIC_FD, we can instead control
> what fd value we're going to get out of open (or accept). With that in
> place, we don't need to do any magic to make this work. The above chain
> then becomes:
> 
> <open fileX with fd Y><read from fd Y><close fd Y>
> 
> which is a lot more useful, and allows any sort of weird chains without
> needing to nest "last open" file descriptors.
> 
> Updated the test program to use this approach:
> 
> https://git.kernel.dk/cgit/liburing/plain/test/orc.c?h=fd-select
> 
> which forces the use of fd==89 for the open, and then uses that for the
> read and close.
> 
> Outside of this adaptation, fixed a few bugs and cleaned things up.

I posted one comment about an issue in patch 6.

Patches 2-5 look great; for those:
Reviewed-by: Josh Triplett <[email protected]>

Thanks for picking this up and running with it!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] io_uring: allow specific fd for IORING_OP_ACCEPT
  2020-03-04 19:02   ` Josh Triplett
@ 2020-03-04 19:09     ` Jens Axboe
  2020-03-04 19:51       ` Josh Triplett
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2020-03-04 19:09 UTC (permalink / raw)
  To: Josh Triplett; +Cc: io-uring, jlayton

On 3/4/20 12:02 PM, Josh Triplett wrote:
> On Wed, Mar 04, 2020 at 11:00:16AM -0700, Jens Axboe wrote:
>> sqe->len can be set to a specific fd we want to use for accept4(), so
>> it uses that one instead of just assigning a free unused one.
> [...]
>> +	accept->open_fd = READ_ONCE(sqe->len);
>> +	if (!accept->open_fd)
>> +		accept->open_fd = -1;
> 
> 0 is a valid file descriptor. I realize that it seems unlikely, but I
> went to a fair bit of trouble in my patch series to ensure that
> userspace could use any valid file descriptor openat2, without a corner
> case like "you can't open as file descriptor 0", even though it would
> have been more convenient to just say "if you pass a non-zero fd in
> open_how". Please consider accepting a flag to determine the validity of
> open_fd.

Heh, I actually just changed this, just added that as a temporary hack to
verify that things were working. Now SOCK_SPECIFIC_FD is required, and we
just gate on that. OP_ACCEPT disallowed fd != 0 before, so we continue
to do that if SOCK_SPECIFIC_FD isn't set:

https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-fd-select

top two patches.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET v2 0/6] Support selectable file descriptors
  2020-03-04 19:03 ` [PATCHSET v2 0/6] Support selectable file descriptors Josh Triplett
@ 2020-03-04 19:10   ` Jens Axboe
  2020-03-04 19:28     ` Jeff Layton
  2020-03-04 19:56     ` Josh Triplett
  0 siblings, 2 replies; 19+ messages in thread
From: Jens Axboe @ 2020-03-04 19:10 UTC (permalink / raw)
  To: Josh Triplett; +Cc: io-uring, jlayton

On 3/4/20 12:03 PM, Josh Triplett wrote:
> On Wed, Mar 04, 2020 at 11:00:10AM -0700, Jens Axboe wrote:
>> One of the fabled features with chains has long been the desire to
>> support things like:
>>
>> <open fileX><read from fileX><close fileX>
>>
>> in a single chain. This currently doesn't work, since the read/close
>> depends on what file descriptor we get on open.
>>
>> The original attempt at solving this provided a means to pass
>> descriptors between chains in a link, this version takes a different
>> route. Based on Josh's support for O_SPECIFIC_FD, we can instead control
>> what fd value we're going to get out of open (or accept). With that in
>> place, we don't need to do any magic to make this work. The above chain
>> then becomes:
>>
>> <open fileX with fd Y><read from fd Y><close fd Y>
>>
>> which is a lot more useful, and allows any sort of weird chains without
>> needing to nest "last open" file descriptors.
>>
>> Updated the test program to use this approach:
>>
>> https://git.kernel.dk/cgit/liburing/plain/test/orc.c?h=fd-select
>>
>> which forces the use of fd==89 for the open, and then uses that for the
>> read and close.
>>
>> Outside of this adaptation, fixed a few bugs and cleaned things up.
> 
> I posted one comment about an issue in patch 6.
> 
> Patches 2-5 look great; for those:
> Reviewed-by: Josh Triplett <[email protected]>
> 
> Thanks for picking this up and running with it!

Thanks for doing the prep work! I think it turned out that much better
for it.

Are you going to post your series for general review? I just stole
your 1 patch that was needed for me.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET v2 0/6] Support selectable file descriptors
  2020-03-04 19:10   ` Jens Axboe
@ 2020-03-04 19:28     ` Jeff Layton
  2020-03-04 19:50       ` Jens Axboe
  2020-03-04 19:56     ` Josh Triplett
  1 sibling, 1 reply; 19+ messages in thread
From: Jeff Layton @ 2020-03-04 19:28 UTC (permalink / raw)
  To: Jens Axboe, Josh Triplett; +Cc: io-uring

On Wed, 2020-03-04 at 12:10 -0700, Jens Axboe wrote:
> On 3/4/20 12:03 PM, Josh Triplett wrote:
> > On Wed, Mar 04, 2020 at 11:00:10AM -0700, Jens Axboe wrote:
> > > One of the fabled features with chains has long been the desire to
> > > support things like:
> > > 
> > > <open fileX><read from fileX><close fileX>
> > > 
> > > in a single chain. This currently doesn't work, since the read/close
> > > depends on what file descriptor we get on open.
> > > 
> > > The original attempt at solving this provided a means to pass
> > > descriptors between chains in a link, this version takes a different
> > > route. Based on Josh's support for O_SPECIFIC_FD, we can instead control
> > > what fd value we're going to get out of open (or accept). With that in
> > > place, we don't need to do any magic to make this work. The above chain
> > > then becomes:
> > > 
> > > <open fileX with fd Y><read from fd Y><close fd Y>
> > > 
> > > which is a lot more useful, and allows any sort of weird chains without
> > > needing to nest "last open" file descriptors.
> > > 
> > > Updated the test program to use this approach:
> > > 
> > > https://git.kernel.dk/cgit/liburing/plain/test/orc.c?h=fd-select
> > > 
> > > which forces the use of fd==89 for the open, and then uses that for the
> > > read and close.
> > > 
> > > Outside of this adaptation, fixed a few bugs and cleaned things up.
> > 
> > I posted one comment about an issue in patch 6.
> > 
> > Patches 2-5 look great; for those:
> > Reviewed-by: Josh Triplett <[email protected]>
> > 
> > Thanks for picking this up and running with it!
> 
> Thanks for doing the prep work! I think it turned out that much better
> for it.
> 
> Are you going to post your series for general review? I just stole
> your 1 patch that was needed for me.
> 

This does seem like a better approach overall.

How should userland programs pick fds to use for this though? Should you
just start with some reasonably high number that you don't expect to
have been used by the current process or is there some more reliable way
to do it?

-- 
Jeff Layton <[email protected]>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET v2 0/6] Support selectable file descriptors
  2020-03-04 19:28     ` Jeff Layton
@ 2020-03-04 19:50       ` Jens Axboe
  0 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2020-03-04 19:50 UTC (permalink / raw)
  To: Jeff Layton, Josh Triplett; +Cc: io-uring

On 3/4/20 12:28 PM, Jeff Layton wrote:
> On Wed, 2020-03-04 at 12:10 -0700, Jens Axboe wrote:
>> On 3/4/20 12:03 PM, Josh Triplett wrote:
>>> On Wed, Mar 04, 2020 at 11:00:10AM -0700, Jens Axboe wrote:
>>>> One of the fabled features with chains has long been the desire to
>>>> support things like:
>>>>
>>>> <open fileX><read from fileX><close fileX>
>>>>
>>>> in a single chain. This currently doesn't work, since the read/close
>>>> depends on what file descriptor we get on open.
>>>>
>>>> The original attempt at solving this provided a means to pass
>>>> descriptors between chains in a link, this version takes a different
>>>> route. Based on Josh's support for O_SPECIFIC_FD, we can instead control
>>>> what fd value we're going to get out of open (or accept). With that in
>>>> place, we don't need to do any magic to make this work. The above chain
>>>> then becomes:
>>>>
>>>> <open fileX with fd Y><read from fd Y><close fd Y>
>>>>
>>>> which is a lot more useful, and allows any sort of weird chains without
>>>> needing to nest "last open" file descriptors.
>>>>
>>>> Updated the test program to use this approach:
>>>>
>>>> https://git.kernel.dk/cgit/liburing/plain/test/orc.c?h=fd-select
>>>>
>>>> which forces the use of fd==89 for the open, and then uses that for the
>>>> read and close.
>>>>
>>>> Outside of this adaptation, fixed a few bugs and cleaned things up.
>>>
>>> I posted one comment about an issue in patch 6.
>>>
>>> Patches 2-5 look great; for those:
>>> Reviewed-by: Josh Triplett <[email protected]>
>>>
>>> Thanks for picking this up and running with it!
>>
>> Thanks for doing the prep work! I think it turned out that much better
>> for it.
>>
>> Are you going to post your series for general review? I just stole
>> your 1 patch that was needed for me.
>>
> 
> This does seem like a better approach overall.
> 
> How should userland programs pick fds to use for this though? Should you
> just start with some reasonably high number that you don't expect to
> have been used by the current process or is there some more reliable way
> to do it?

If you look at Josh's separate posting, he has a patch that sets the
min_fd for the normal dynamic allocations. With that, you could set that
to eg 1000, and then know that anything below that would be fair game to
use for the selectable fd values.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] io_uring: allow specific fd for IORING_OP_ACCEPT
  2020-03-04 19:09     ` Jens Axboe
@ 2020-03-04 19:51       ` Josh Triplett
  0 siblings, 0 replies; 19+ messages in thread
From: Josh Triplett @ 2020-03-04 19:51 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, jlayton

On Wed, Mar 04, 2020 at 12:09:14PM -0700, Jens Axboe wrote:
> On 3/4/20 12:02 PM, Josh Triplett wrote:
> > On Wed, Mar 04, 2020 at 11:00:16AM -0700, Jens Axboe wrote:
> >> sqe->len can be set to a specific fd we want to use for accept4(), so
> >> it uses that one instead of just assigning a free unused one.
> > [...]
> >> +	accept->open_fd = READ_ONCE(sqe->len);
> >> +	if (!accept->open_fd)
> >> +		accept->open_fd = -1;
> > 
> > 0 is a valid file descriptor. I realize that it seems unlikely, but I
> > went to a fair bit of trouble in my patch series to ensure that
> > userspace could use any valid file descriptor openat2, without a corner
> > case like "you can't open as file descriptor 0", even though it would
> > have been more convenient to just say "if you pass a non-zero fd in
> > open_how". Please consider accepting a flag to determine the validity of
> > open_fd.
> 
> Heh, I actually just changed this, just added that as a temporary hack to
> verify that things were working. Now SOCK_SPECIFIC_FD is required, and we
> just gate on that. OP_ACCEPT disallowed fd != 0 before, so we continue
> to do that if SOCK_SPECIFIC_FD isn't set:
> 
> https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-fd-select
> 
> top two patches.

That looks much better, thank you!

For the entire 6-patch series through commit
2e4ccbb5e66eaa5963dbaf502e8adf1c063c086b:

Reviewed-by: Josh Triplett <[email protected]>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET v2 0/6] Support selectable file descriptors
  2020-03-04 19:10   ` Jens Axboe
  2020-03-04 19:28     ` Jeff Layton
@ 2020-03-04 19:56     ` Josh Triplett
  2020-03-04 20:00       ` Jens Axboe
  1 sibling, 1 reply; 19+ messages in thread
From: Josh Triplett @ 2020-03-04 19:56 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, jlayton

On Wed, Mar 04, 2020 at 12:10:08PM -0700, Jens Axboe wrote:
> On 3/4/20 12:03 PM, Josh Triplett wrote:
> > On Wed, Mar 04, 2020 at 11:00:10AM -0700, Jens Axboe wrote:
> >> One of the fabled features with chains has long been the desire to
> >> support things like:
> >>
> >> <open fileX><read from fileX><close fileX>
> >>
> >> in a single chain. This currently doesn't work, since the read/close
> >> depends on what file descriptor we get on open.
> >>
> >> The original attempt at solving this provided a means to pass
> >> descriptors between chains in a link, this version takes a different
> >> route. Based on Josh's support for O_SPECIFIC_FD, we can instead control
> >> what fd value we're going to get out of open (or accept). With that in
> >> place, we don't need to do any magic to make this work. The above chain
> >> then becomes:
> >>
> >> <open fileX with fd Y><read from fd Y><close fd Y>
> >>
> >> which is a lot more useful, and allows any sort of weird chains without
> >> needing to nest "last open" file descriptors.
> >>
> >> Updated the test program to use this approach:
> >>
> >> https://git.kernel.dk/cgit/liburing/plain/test/orc.c?h=fd-select
> >>
> >> which forces the use of fd==89 for the open, and then uses that for the
> >> read and close.
> >>
> >> Outside of this adaptation, fixed a few bugs and cleaned things up.
> > 
> > I posted one comment about an issue in patch 6.
> > 
> > Patches 2-5 look great; for those:
> > Reviewed-by: Josh Triplett <[email protected]>
> > 
> > Thanks for picking this up and running with it!
> 
> Thanks for doing the prep work! I think it turned out that much better
> for it.
> 
> Are you going to post your series for general review? I just stole
> your 1 patch that was needed for me.

Since your patch series depends on mine, please feel free to run with
the series. Would you mind adding my patch 1 and 3 at the end of your
series? You need patch 1 to make this more usable for userspace, and
patch 3 would allow for an OP_PIPE which I'd love to have.

Do you plan to submit this during the next merge window?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET v2 0/6] Support selectable file descriptors
  2020-03-04 19:56     ` Josh Triplett
@ 2020-03-04 20:00       ` Jens Axboe
  2020-03-04 20:09         ` Josh Triplett
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2020-03-04 20:00 UTC (permalink / raw)
  To: Josh Triplett; +Cc: io-uring, jlayton

On 3/4/20 12:56 PM, Josh Triplett wrote:
> On Wed, Mar 04, 2020 at 12:10:08PM -0700, Jens Axboe wrote:
>> On 3/4/20 12:03 PM, Josh Triplett wrote:
>>> On Wed, Mar 04, 2020 at 11:00:10AM -0700, Jens Axboe wrote:
>>>> One of the fabled features with chains has long been the desire to
>>>> support things like:
>>>>
>>>> <open fileX><read from fileX><close fileX>
>>>>
>>>> in a single chain. This currently doesn't work, since the read/close
>>>> depends on what file descriptor we get on open.
>>>>
>>>> The original attempt at solving this provided a means to pass
>>>> descriptors between chains in a link, this version takes a different
>>>> route. Based on Josh's support for O_SPECIFIC_FD, we can instead control
>>>> what fd value we're going to get out of open (or accept). With that in
>>>> place, we don't need to do any magic to make this work. The above chain
>>>> then becomes:
>>>>
>>>> <open fileX with fd Y><read from fd Y><close fd Y>
>>>>
>>>> which is a lot more useful, and allows any sort of weird chains without
>>>> needing to nest "last open" file descriptors.
>>>>
>>>> Updated the test program to use this approach:
>>>>
>>>> https://git.kernel.dk/cgit/liburing/plain/test/orc.c?h=fd-select
>>>>
>>>> which forces the use of fd==89 for the open, and then uses that for the
>>>> read and close.
>>>>
>>>> Outside of this adaptation, fixed a few bugs and cleaned things up.
>>>
>>> I posted one comment about an issue in patch 6.
>>>
>>> Patches 2-5 look great; for those:
>>> Reviewed-by: Josh Triplett <[email protected]>
>>>
>>> Thanks for picking this up and running with it!
>>
>> Thanks for doing the prep work! I think it turned out that much better
>> for it.
>>
>> Are you going to post your series for general review? I just stole
>> your 1 patch that was needed for me.
> 
> Since your patch series depends on mine, please feel free to run with
> the series. Would you mind adding my patch 1 and 3 at the end of your
> series? You need patch 1 to make this more usable for userspace, and
> patch 3 would allow for an OP_PIPE which I'd love to have.

Let me add patch 1 to the top of the stack, for the pipe part that
probably should be taken in separately. But not a huge deal to me,
as long as we can get it reviewed.

I'll post the series broader soon.

> Do you plan to submit this during the next merge window?

Maybe? In terms of timing, I think we're well within the opportunity
to do so, at least.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET v2 0/6] Support selectable file descriptors
  2020-03-04 20:00       ` Jens Axboe
@ 2020-03-04 20:09         ` Josh Triplett
  2020-03-04 20:14           ` Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: Josh Triplett @ 2020-03-04 20:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, jlayton

On Wed, Mar 04, 2020 at 01:00:05PM -0700, Jens Axboe wrote:
> On 3/4/20 12:56 PM, Josh Triplett wrote:
> > On Wed, Mar 04, 2020 at 12:10:08PM -0700, Jens Axboe wrote:
> >> On 3/4/20 12:03 PM, Josh Triplett wrote:
> >>> On Wed, Mar 04, 2020 at 11:00:10AM -0700, Jens Axboe wrote:
> >>>> One of the fabled features with chains has long been the desire to
> >>>> support things like:
> >>>>
> >>>> <open fileX><read from fileX><close fileX>
> >>>>
> >>>> in a single chain. This currently doesn't work, since the read/close
> >>>> depends on what file descriptor we get on open.
> >>>>
> >>>> The original attempt at solving this provided a means to pass
> >>>> descriptors between chains in a link, this version takes a different
> >>>> route. Based on Josh's support for O_SPECIFIC_FD, we can instead control
> >>>> what fd value we're going to get out of open (or accept). With that in
> >>>> place, we don't need to do any magic to make this work. The above chain
> >>>> then becomes:
> >>>>
> >>>> <open fileX with fd Y><read from fd Y><close fd Y>
> >>>>
> >>>> which is a lot more useful, and allows any sort of weird chains without
> >>>> needing to nest "last open" file descriptors.
> >>>>
> >>>> Updated the test program to use this approach:
> >>>>
> >>>> https://git.kernel.dk/cgit/liburing/plain/test/orc.c?h=fd-select
> >>>>
> >>>> which forces the use of fd==89 for the open, and then uses that for the
> >>>> read and close.
> >>>>
> >>>> Outside of this adaptation, fixed a few bugs and cleaned things up.
> >>>
> >>> I posted one comment about an issue in patch 6.
> >>>
> >>> Patches 2-5 look great; for those:
> >>> Reviewed-by: Josh Triplett <[email protected]>
> >>>
> >>> Thanks for picking this up and running with it!
> >>
> >> Thanks for doing the prep work! I think it turned out that much better
> >> for it.
> >>
> >> Are you going to post your series for general review? I just stole
> >> your 1 patch that was needed for me.
> > 
> > Since your patch series depends on mine, please feel free to run with
> > the series. Would you mind adding my patch 1 and 3 at the end of your
> > series? You need patch 1 to make this more usable for userspace, and
> > patch 3 would allow for an OP_PIPE which I'd love to have.
> 
> Let me add patch 1 to the top of the stack, for the pipe part that
> probably should be taken in separately. But not a huge deal to me,
> as long as we can get it reviewed.

That works for me; I don't mind if the pipe support goes in a bit later.
And there are many other fd-producing syscalls that need support for
userspace-selected FDs, including signalfd4, eventfd2, timerfd_create,
epoll_create1, memfd_create, userfaultfd, and the pidfd family.

> > Do you plan to submit this during the next merge window?
>
> Maybe? In terms of timing, I think we're well within the opportunity
> to do so, at least.

I look forward to seeing it go in.

- Josh Triplett

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET v2 0/6] Support selectable file descriptors
  2020-03-04 20:09         ` Josh Triplett
@ 2020-03-04 20:14           ` Jens Axboe
  0 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2020-03-04 20:14 UTC (permalink / raw)
  To: Josh Triplett; +Cc: io-uring, jlayton

On 3/4/20 1:09 PM, Josh Triplett wrote:
> On Wed, Mar 04, 2020 at 01:00:05PM -0700, Jens Axboe wrote:
>> On 3/4/20 12:56 PM, Josh Triplett wrote:
>>> On Wed, Mar 04, 2020 at 12:10:08PM -0700, Jens Axboe wrote:
>>>> On 3/4/20 12:03 PM, Josh Triplett wrote:
>>>>> On Wed, Mar 04, 2020 at 11:00:10AM -0700, Jens Axboe wrote:
>>>>>> One of the fabled features with chains has long been the desire to
>>>>>> support things like:
>>>>>>
>>>>>> <open fileX><read from fileX><close fileX>
>>>>>>
>>>>>> in a single chain. This currently doesn't work, since the read/close
>>>>>> depends on what file descriptor we get on open.
>>>>>>
>>>>>> The original attempt at solving this provided a means to pass
>>>>>> descriptors between chains in a link, this version takes a different
>>>>>> route. Based on Josh's support for O_SPECIFIC_FD, we can instead control
>>>>>> what fd value we're going to get out of open (or accept). With that in
>>>>>> place, we don't need to do any magic to make this work. The above chain
>>>>>> then becomes:
>>>>>>
>>>>>> <open fileX with fd Y><read from fd Y><close fd Y>
>>>>>>
>>>>>> which is a lot more useful, and allows any sort of weird chains without
>>>>>> needing to nest "last open" file descriptors.
>>>>>>
>>>>>> Updated the test program to use this approach:
>>>>>>
>>>>>> https://git.kernel.dk/cgit/liburing/plain/test/orc.c?h=fd-select
>>>>>>
>>>>>> which forces the use of fd==89 for the open, and then uses that for the
>>>>>> read and close.
>>>>>>
>>>>>> Outside of this adaptation, fixed a few bugs and cleaned things up.
>>>>>
>>>>> I posted one comment about an issue in patch 6.
>>>>>
>>>>> Patches 2-5 look great; for those:
>>>>> Reviewed-by: Josh Triplett <[email protected]>
>>>>>
>>>>> Thanks for picking this up and running with it!
>>>>
>>>> Thanks for doing the prep work! I think it turned out that much better
>>>> for it.
>>>>
>>>> Are you going to post your series for general review? I just stole
>>>> your 1 patch that was needed for me.
>>>
>>> Since your patch series depends on mine, please feel free to run with
>>> the series. Would you mind adding my patch 1 and 3 at the end of your
>>> series? You need patch 1 to make this more usable for userspace, and
>>> patch 3 would allow for an OP_PIPE which I'd love to have.
>>
>> Let me add patch 1 to the top of the stack, for the pipe part that
>> probably should be taken in separately. But not a huge deal to me,
>> as long as we can get it reviewed.
> 
> That works for me; I don't mind if the pipe support goes in a bit later.
> And there are many other fd-producing syscalls that need support for
> userspace-selected FDs, including signalfd4, eventfd2, timerfd_create,
> epoll_create1, memfd_create, userfaultfd, and the pidfd family.

Right, at least on the io_uring front, adding IORING_OP_SOCKET and
providing support for SOCK_SPECIFIC_FD through that would be trivial and
a few lines of change. In general, we can more easily do all that
through io_uring, as we have room to shove in that extra 'fd'.

I've queued up patch 1 as well.

>>> Do you plan to submit this during the next merge window?
>>
>> Maybe? In terms of timing, I think we're well within the opportunity
>> to do so, at least.
> 
> I look forward to seeing it go in.

Me too :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET v2 0/6] Support selectable file descriptors
  2020-03-04 18:00 [PATCHSET v2 0/6] Support selectable file descriptors Jens Axboe
                   ` (6 preceding siblings ...)
  2020-03-04 19:03 ` [PATCHSET v2 0/6] Support selectable file descriptors Josh Triplett
@ 2020-03-09 20:33 ` Stefan Metzmacher
  7 siblings, 0 replies; 19+ messages in thread
From: Stefan Metzmacher @ 2020-03-09 20:33 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: jlayton, josh


[-- Attachment #1.1: Type: text/plain, Size: 1294 bytes --]

Hi Jens,

> One of the fabled features with chains has long been the desire to
> support things like:
> 
> <open fileX><read from fileX><close fileX>
> 
> in a single chain. This currently doesn't work, since the read/close
> depends on what file descriptor we get on open.
> 
> The original attempt at solving this provided a means to pass
> descriptors between chains in a link, this version takes a different
> route. Based on Josh's support for O_SPECIFIC_FD, we can instead control
> what fd value we're going to get out of open (or accept). With that in
> place, we don't need to do any magic to make this work. The above chain
> then becomes:
> 
> <open fileX with fd Y><read from fd Y><close fd Y>
> 
> which is a lot more useful, and allows any sort of weird chains without
> needing to nest "last open" file descriptors.

I like that approach, however I'm wondering if this able to be used by
libraries as it requires coordination of fd values within the whole
process. I don't think this would be a reason to block that feature,
but it might be good to be added to the manpages.

As future addition I'd like to get support for something similar
but for io_uring fixed file slots. That could also be used by libraries
using a private ring fd.

metze


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-03-09 20:33 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-04 18:00 [PATCHSET v2 0/6] Support selectable file descriptors Jens Axboe
2020-03-04 18:00 ` [PATCH 1/6] fs: openat2: Extend open_how to allow userspace-selected fds Jens Axboe
2020-03-04 18:00 ` [PATCH 2/6] io_uring: move CLOSE req->file checking into handler Jens Axboe
2020-03-04 18:00 ` [PATCH 3/6] io_uring: move read/write side file based prep into op handler Jens Axboe
2020-03-04 18:00 ` [PATCH 4/6] io_uring: support deferred retrival of file from fd Jens Axboe
2020-03-04 18:00 ` [PATCH 5/6] net: allow specific fd selection for __sys_accept4_file() Jens Axboe
2020-03-04 18:00 ` [PATCH 6/6] io_uring: allow specific fd for IORING_OP_ACCEPT Jens Axboe
2020-03-04 19:02   ` Josh Triplett
2020-03-04 19:09     ` Jens Axboe
2020-03-04 19:51       ` Josh Triplett
2020-03-04 19:03 ` [PATCHSET v2 0/6] Support selectable file descriptors Josh Triplett
2020-03-04 19:10   ` Jens Axboe
2020-03-04 19:28     ` Jeff Layton
2020-03-04 19:50       ` Jens Axboe
2020-03-04 19:56     ` Josh Triplett
2020-03-04 20:00       ` Jens Axboe
2020-03-04 20:09         ` Josh Triplett
2020-03-04 20:14           ` Jens Axboe
2020-03-09 20:33 ` Stefan Metzmacher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox