public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: David Laight <david.laight.linux@gmail.com>
To: Breno Leitao <leitao@debian.org>
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	Willem de Bruijn <willemb@google.com>,
	metze@samba.org, axboe@kernel.dk,
	Stanislav Fomichev <sdf@fomichev.me>,
	io-uring@vger.kernel.org, bpf@vger.kernel.org,
	netdev@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH net-next RFC 0/3] net: move .getsockopt away from __user buffers
Date: Mon, 2 Feb 2026 22:31:31 +0000	[thread overview]
Message-ID: <20260202223131.44e81ba1@pumpkin> (raw)
In-Reply-To: <aYCUpNyNihhR7no0@gmail.com>

On Mon, 2 Feb 2026 04:32:42 -0800
Breno Leitao <leitao@debian.org> wrote:

> Hello David,
> 
> On Fri, Jan 30, 2026 at 08:52:27PM +0000, David Laight wrote:
> 
> > The system call wrapper can do the user copies, it can also suppress
> > the write if the value is unchanged (which matters with clac/slac).  
> 
> This aligns with my proposal: using an in-kernel optlen that protocol
> functions can operate on directly:
> 
> 	typedef struct sockopt {
> 		struct iov_iter iter;
> 		int optlen;
> 	} sockopt_t;
> 
> > The obvious change would be to pass the length itself and make the
> > return value -ERRNO or the size.  
> 
> I explored this approach to avoid embedding optlen in sockopt (which was
> Linus' original suggestion). I attempted returning the length both via
> iov_iter and as a return value, but neither proved ideal.
> 
> > #define GETSOCKOPT_RVAL(errval, size) (1 << 31 | (errval) << 20 | (size))
> > which would get picked in the rval < 0 path.
> > It would also let 'return 0' mean 'don't change the size' requiring
> > a special return for the one (or two?) places that want to set the
> > size to zero and return success.  
> 
> My conclusion is that encoding both optlen and error in the return value
> requires pointer manipulation that isn't justified for this slow path.
> While technically feasible, the resulting "mixed pointer abomination"
> won't be worth it.

Not really, they are both just numbers.
99% of the protocol code can just do 'return -Exxxx' or 'return size'.
That is all simple and foolproof.
The calling code (not many copies) does:
	rval = foo->getsockopt(..., size_in);
	size_out = size_in;
	if (rval >= 0) {
		if (rval > 0)
			size_out = rval;
		rval = 0;
	} else {
		/* abnormal path */
		if ((rval & (1 << 30))) {
			size_out = rval & 0xffffff;
			rval = -((rval & ~(1 << 31)) >> 20);
		}
	}
	if (size_out != size_in)
		put_user(size_out);
	return rval;
(Or something similar depending on exactly how the values are merged.)

> 
> > There is not much point making the 'optval' parameter more than
> > a structure of a user and kernel address - one of which will be NULL.
> > (This is safer than sockptr_t's discriminant union.)  
> 
> This approach forces every protocol to distinguish between userspace and
> kernelspace, then perform the appropriate copy:
> 
>   static inline int mgetsockopt(void *kernel_optlen, void *user_optlen, ..)
>   {
> 	....
> 	if (kernel_optlen)
> 		memcpy(kernel_optlen, newoptlen, ...
> 	else
> 		copy_to_user(user_optlen, newoptlen, ...
>   }

That is a function provided by the implementation.
It is no different from using the ones that act on iov_iter.
The real difficultly is stopping the usual culprits (bpf an io_uring)
from cheating and looking inside the structures.

> Additionally, you'd need safeguards ensuring callers never pass both user
> and kernel pointers simultaneously. This seems significantly worse than
> using sockptr.

Sockptr has the real disadvantage that it is very easy to mix up the
kernel and user pointers (there is some horrid code that looks inside).
If you have separate pointers that can't happen.
You might access NULL, but you are never going to use the wrong address.
Remember some systems (s390?) use the same numbers for user and kernel
addresses - you have to get it right.
In any case, if both addresses are set you can just have a rule that
one is used by preference - it isn't a problem.

There might be legitimate reasons for setting both pointers.
Consider setsockopt, the wrapper could copy small user structures
into an on-stack buffer.
The structure would then need to contain the address/length of the
kernel buffer as well as the actual user address in case the code
wants to read more that the expected data length.
For a kernel caller you also want the actual length of the buffer
as a separate field from the length of the [sg]etsockopt().

I'm not sure what fields you need for the address buffer.
Probably 'user address', 'kernel address' and 'kernel length',
what you don't need is support for scatter-gather, page list,
pipes etc.


> 
> --breno
> 


      reply	other threads:[~2026-02-02 22:31 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-30 18:46 [PATCH net-next RFC 0/3] net: move .getsockopt away from __user buffers Breno Leitao
2026-01-30 18:46 ` [PATCH net-next RFC 1/3] net: add getsockopt_iter callback to proto_ops Breno Leitao
2026-01-30 18:46 ` [PATCH net-next RFC 2/3] net: prefer getsockopt_iter in do_sock_getsockopt Breno Leitao
2026-01-30 18:46 ` [PATCH net-next RFC 3/3] netlink: convert to getsockopt_iter Breno Leitao
2026-01-30 20:52 ` [PATCH net-next RFC 0/3] net: move .getsockopt away from __user buffers David Laight
2026-01-31  1:19   ` Linus Torvalds
2026-01-31 15:37     ` David Laight
2026-01-31 15:53       ` Jens Axboe
2026-02-02 12:32   ` Breno Leitao
2026-02-02 22:31     ` David Laight [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260202223131.44e81ba1@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=io-uring@vger.kernel.org \
    --cc=kernel-team@meta.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=metze@samba.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=torvalds@linux-foundation.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox