public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* SOCKET_URING_OP_GETSOCKOPT SOL_SOCKET restriction
@ 2025-03-28 14:27 Stefan Metzmacher
  2025-03-28 14:30 ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Metzmacher @ 2025-03-28 14:27 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring

Hi Jens,

while playing with the kernel QUIC driver [1],
I noticed it does a lot of getsockopt() and setsockopt()
calls to sync the required state into and out of the kernel.

My long term plan is to let the userspace quic handshake logic
work with SOCKET_URING_OP_GETSOCKOPT and SOCKET_URING_OP_SETSOCKOPT.

The used level is SOL_QUIC and that won't work
as io_uring_cmd_getsockopt() has a restriction to
SOL_SOCKET, while there's no restriction in
io_uring_cmd_setsockopt().

What's the reason to have that restriction?
And why is it only for the get path and not
the set path?

metze

[1] https://github.com/lxin/quic

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SOCKET_URING_OP_GETSOCKOPT SOL_SOCKET restriction
  2025-03-28 14:27 SOCKET_URING_OP_GETSOCKOPT SOL_SOCKET restriction Stefan Metzmacher
@ 2025-03-28 14:30 ` Jens Axboe
  2025-03-28 15:02   ` Stefan Metzmacher
  2025-03-28 15:02   ` Pavel Begunkov
  0 siblings, 2 replies; 13+ messages in thread
From: Jens Axboe @ 2025-03-28 14:30 UTC (permalink / raw)
  To: Stefan Metzmacher; +Cc: io-uring

On 3/28/25 8:27 AM, Stefan Metzmacher wrote:
> Hi Jens,
> 
> while playing with the kernel QUIC driver [1],
> I noticed it does a lot of getsockopt() and setsockopt()
> calls to sync the required state into and out of the kernel.
> 
> My long term plan is to let the userspace quic handshake logic
> work with SOCKET_URING_OP_GETSOCKOPT and SOCKET_URING_OP_SETSOCKOPT.
> 
> The used level is SOL_QUIC and that won't work
> as io_uring_cmd_getsockopt() has a restriction to
> SOL_SOCKET, while there's no restriction in
> io_uring_cmd_setsockopt().
> 
> What's the reason to have that restriction?
> And why is it only for the get path and not
> the set path?

There's absolutely no reason for that, looks like a pure oversight?!

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SOCKET_URING_OP_GETSOCKOPT SOL_SOCKET restriction
  2025-03-28 14:30 ` Jens Axboe
@ 2025-03-28 15:02   ` Stefan Metzmacher
  2025-03-28 15:08     ` Stefan Metzmacher
  2025-03-28 16:24     ` Breno Leitao
  2025-03-28 15:02   ` Pavel Begunkov
  1 sibling, 2 replies; 13+ messages in thread
From: Stefan Metzmacher @ 2025-03-28 15:02 UTC (permalink / raw)
  To: Jens Axboe, Breno Leitao; +Cc: io-uring

Am 28.03.25 um 15:30 schrieb Jens Axboe:
> On 3/28/25 8:27 AM, Stefan Metzmacher wrote:
>> Hi Jens,
>>
>> while playing with the kernel QUIC driver [1],
>> I noticed it does a lot of getsockopt() and setsockopt()
>> calls to sync the required state into and out of the kernel.
>>
>> My long term plan is to let the userspace quic handshake logic
>> work with SOCKET_URING_OP_GETSOCKOPT and SOCKET_URING_OP_SETSOCKOPT.
>>
>> The used level is SOL_QUIC and that won't work
>> as io_uring_cmd_getsockopt() has a restriction to
>> SOL_SOCKET, while there's no restriction in
>> io_uring_cmd_setsockopt().
>>
>> What's the reason to have that restriction?
>> And why is it only for the get path and not
>> the set path?
> 
> There's absolutely no reason for that, looks like a pure oversight?!

It seems RFC had the limitation on both:
https://lore.kernel.org/io-uring/20230724142237.358769-1-leitao@debian.org/

v0 had it only for get because of some userpointer restrictions:
https://lore.kernel.org/io-uring/20230724142237.358769-1-leitao@debian.org/

The merged v7 also talks about the restriction:
https://lore.kernel.org/all/20231016134750.1381153-1-leitao@debian.org/

Adding Breno ...

It seems proto_ops->getsockopt is the problem as it's not changed
to sockptr_t yet.

metze

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SOCKET_URING_OP_GETSOCKOPT SOL_SOCKET restriction
  2025-03-28 14:30 ` Jens Axboe
  2025-03-28 15:02   ` Stefan Metzmacher
@ 2025-03-28 15:02   ` Pavel Begunkov
  2025-03-28 15:03     ` Pavel Begunkov
  2025-03-28 16:34     ` Jens Axboe
  1 sibling, 2 replies; 13+ messages in thread
From: Pavel Begunkov @ 2025-03-28 15:02 UTC (permalink / raw)
  To: Jens Axboe, Stefan Metzmacher; +Cc: io-uring, Breno Leitao

On 3/28/25 14:30, Jens Axboe wrote:
> On 3/28/25 8:27 AM, Stefan Metzmacher wrote:
>> Hi Jens,
>>
>> while playing with the kernel QUIC driver [1],
>> I noticed it does a lot of getsockopt() and setsockopt()
>> calls to sync the required state into and out of the kernel.
>>
>> My long term plan is to let the userspace quic handshake logic
>> work with SOCKET_URING_OP_GETSOCKOPT and SOCKET_URING_OP_SETSOCKOPT.
>>
>> The used level is SOL_QUIC and that won't work
>> as io_uring_cmd_getsockopt() has a restriction to
>> SOL_SOCKET, while there's no restriction in
>> io_uring_cmd_setsockopt().
>>
>> What's the reason to have that restriction?
>> And why is it only for the get path and not
>> the set path?
> 
> There's absolutely no reason for that, looks like a pure oversight?!

Cc Breno, he can explain better, but IIRC that's because most
of set/get sockopt options expect user pointers to be passed in,
and io_uring wants to use kernel memory. It's plumbed for
SOL_SOCKET with sockptr_t, but there was a push back against
converting the rest.

The implications are not the uapi side. For example, io_uring
get/setsockopt returns err / len in a cqe->res. We can't do
that for SOL_SOCKET without the kernel pointer support, otherwise
io-uring uapi would need to get hacky. E.g. you'd need to
pass another user pointer is an SQE for socklen and read it
after completion.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SOCKET_URING_OP_GETSOCKOPT SOL_SOCKET restriction
  2025-03-28 15:02   ` Pavel Begunkov
@ 2025-03-28 15:03     ` Pavel Begunkov
  2025-03-28 16:34     ` Jens Axboe
  1 sibling, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2025-03-28 15:03 UTC (permalink / raw)
  To: Jens Axboe, Stefan Metzmacher; +Cc: io-uring, Breno Leitao

On 3/28/25 15:02, Pavel Begunkov wrote:
> On 3/28/25 14:30, Jens Axboe wrote:
>> On 3/28/25 8:27 AM, Stefan Metzmacher wrote:
>>> Hi Jens,
>>>
>>> while playing with the kernel QUIC driver [1],
>>> I noticed it does a lot of getsockopt() and setsockopt()
>>> calls to sync the required state into and out of the kernel.
>>>
>>> My long term plan is to let the userspace quic handshake logic
>>> work with SOCKET_URING_OP_GETSOCKOPT and SOCKET_URING_OP_SETSOCKOPT.
>>>
>>> The used level is SOL_QUIC and that won't work
>>> as io_uring_cmd_getsockopt() has a restriction to
>>> SOL_SOCKET, while there's no restriction in
>>> io_uring_cmd_setsockopt().
>>>
>>> What's the reason to have that restriction?
>>> And why is it only for the get path and not
>>> the set path?
>>
>> There's absolutely no reason for that, looks like a pure oversight?!
> 
> Cc Breno, he can explain better, but IIRC that's because most
> of set/get sockopt options expect user pointers to be passed in,
> and io_uring wants to use kernel memory. It's plumbed for
> SOL_SOCKET with sockptr_t, but there was a push back against
> converting the rest.
> 
> The implications are not the uapi side. For example, io_uring

s/not/on/

> get/setsockopt returns err / len in a cqe->res. We can't do
> that for SOL_SOCKET without the kernel pointer support, otherwise
> io-uring uapi would need to get hacky. E.g. you'd need to
> pass another user pointer is an SQE for socklen and read it
> after completion.
> 

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SOCKET_URING_OP_GETSOCKOPT SOL_SOCKET restriction
  2025-03-28 15:02   ` Stefan Metzmacher
@ 2025-03-28 15:08     ` Stefan Metzmacher
  2025-03-28 16:24     ` Breno Leitao
  1 sibling, 0 replies; 13+ messages in thread
From: Stefan Metzmacher @ 2025-03-28 15:08 UTC (permalink / raw)
  To: Jens Axboe, Breno Leitao, Christoph Hellwig; +Cc: io-uring

Am 28.03.25 um 16:02 schrieb Stefan Metzmacher:
> Am 28.03.25 um 15:30 schrieb Jens Axboe:
>> On 3/28/25 8:27 AM, Stefan Metzmacher wrote:
>>> Hi Jens,
>>>
>>> while playing with the kernel QUIC driver [1],
>>> I noticed it does a lot of getsockopt() and setsockopt()
>>> calls to sync the required state into and out of the kernel.
>>>
>>> My long term plan is to let the userspace quic handshake logic
>>> work with SOCKET_URING_OP_GETSOCKOPT and SOCKET_URING_OP_SETSOCKOPT.
>>>
>>> The used level is SOL_QUIC and that won't work
>>> as io_uring_cmd_getsockopt() has a restriction to
>>> SOL_SOCKET, while there's no restriction in
>>> io_uring_cmd_setsockopt().
>>>
>>> What's the reason to have that restriction?
>>> And why is it only for the get path and not
>>> the set path?
>>
>> There's absolutely no reason for that, looks like a pure oversight?!
> 
> It seems RFC had the limitation on both:
> https://lore.kernel.org/io-uring/20230724142237.358769-1-leitao@debian.org/
> 
> v0 had it only for get because of some userpointer restrictions:
> https://lore.kernel.org/io-uring/20230724142237.358769-1-leitao@debian.org/
> 
> The merged v7 also talks about the restriction:
> https://lore.kernel.org/all/20231016134750.1381153-1-leitao@debian.org/
> 
> Adding Breno ...
> 
> It seems proto_ops->getsockopt is the problem as it's not changed
> to sockptr_t yet.

commit a7b75c5a8c41445f33efb663887ff5f5c3b4454b
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Jul 23 08:09:07 2020 +0200

     net: pass a sockptr_t into ->setsockopt

     Rework the remaining setsockopt code to pass a sockptr_t instead of a
     plain user pointer.  This removes the last remaining set_fs(KERNEL_DS)
     outside of architecture specific code.

     Signed-off-by: Christoph Hellwig <hch@lst.de>
     Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154]
     Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
     Signed-off-by: David S. Miller <davem@davemloft.net>

only converted setsockopt function pointer...

Is there any reason not to change getsockopt too?
Except for someone needs to do the work?

metze


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SOCKET_URING_OP_GETSOCKOPT SOL_SOCKET restriction
  2025-03-28 15:02   ` Stefan Metzmacher
  2025-03-28 15:08     ` Stefan Metzmacher
@ 2025-03-28 16:24     ` Breno Leitao
  1 sibling, 0 replies; 13+ messages in thread
From: Breno Leitao @ 2025-03-28 16:24 UTC (permalink / raw)
  To: Stefan Metzmacher; +Cc: Jens Axboe, io-uring

Hello Stefan,

On Fri, Mar 28, 2025 at 04:02:35PM +0100, Stefan Metzmacher wrote:
> Am 28.03.25 um 15:30 schrieb Jens Axboe:
> > On 3/28/25 8:27 AM, Stefan Metzmacher wrote:
> > > Hi Jens,
> > > 
> > > while playing with the kernel QUIC driver [1],
> > > I noticed it does a lot of getsockopt() and setsockopt()
> > > calls to sync the required state into and out of the kernel.
> > > 
> > > My long term plan is to let the userspace quic handshake logic
> > > work with SOCKET_URING_OP_GETSOCKOPT and SOCKET_URING_OP_SETSOCKOPT.
> > > 
> > > The used level is SOL_QUIC and that won't work
> > > as io_uring_cmd_getsockopt() has a restriction to
> > > SOL_SOCKET, while there's no restriction in
> > > io_uring_cmd_setsockopt().
> > > 
> > > What's the reason to have that restriction?
> > > And why is it only for the get path and not
> > > the set path?
> > 
> > There's absolutely no reason for that, looks like a pure oversight?!
> 
> It seems RFC had the limitation on both:
> https://lore.kernel.org/io-uring/20230724142237.358769-1-leitao@debian.org/
> 
> v0 had it only for get because of some userpointer restrictions:
> https://lore.kernel.org/io-uring/20230724142237.358769-1-leitao@debian.org/
> 
> The merged v7 also talks about the restriction:
> https://lore.kernel.org/all/20231016134750.1381153-1-leitao@debian.org/
> 
> Adding Breno ...
> 
> It seems proto_ops->getsockopt is the problem as it's not changed
> to sockptr_t yet.

Correct. That is because Linus detests sockptr and didn't recommend
adding new code to use it.

 > New code does *not* have that excuse.

  https://lore.kernel.org/all/CAHk-=wgGV61xrG=gO0=dXH64o2TDWWrXn1mx-CX885JZ7h84Og@mail.gmail.com/

This was raised by Jakub in:

  https://lore.kernel.org/all/20230905154951.0d0d3962@kernel.org/

So, in order to implement the missing part, we need to move this to
something else. The initial suggestion was to use iovec, but, I found it
very hard to move that code to iovec.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SOCKET_URING_OP_GETSOCKOPT SOL_SOCKET restriction
  2025-03-28 15:02   ` Pavel Begunkov
  2025-03-28 15:03     ` Pavel Begunkov
@ 2025-03-28 16:34     ` Jens Axboe
  2025-03-28 17:18       ` Pavel Begunkov
  1 sibling, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2025-03-28 16:34 UTC (permalink / raw)
  To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Breno Leitao

On 3/28/25 9:02 AM, Pavel Begunkov wrote:
> On 3/28/25 14:30, Jens Axboe wrote:
>> On 3/28/25 8:27 AM, Stefan Metzmacher wrote:
>>> Hi Jens,
>>>
>>> while playing with the kernel QUIC driver [1],
>>> I noticed it does a lot of getsockopt() and setsockopt()
>>> calls to sync the required state into and out of the kernel.
>>>
>>> My long term plan is to let the userspace quic handshake logic
>>> work with SOCKET_URING_OP_GETSOCKOPT and SOCKET_URING_OP_SETSOCKOPT.
>>>
>>> The used level is SOL_QUIC and that won't work
>>> as io_uring_cmd_getsockopt() has a restriction to
>>> SOL_SOCKET, while there's no restriction in
>>> io_uring_cmd_setsockopt().
>>>
>>> What's the reason to have that restriction?
>>> And why is it only for the get path and not
>>> the set path?
>>
>> There's absolutely no reason for that, looks like a pure oversight?!
> 
> Cc Breno, he can explain better, but IIRC that's because most
> of set/get sockopt options expect user pointers to be passed in,
> and io_uring wants to use kernel memory. It's plumbed for
> SOL_SOCKET with sockptr_t, but there was a push back against
> converting the rest.

Gah yes, now I remember. What's pretty annoying though, as it leaves the
get/setsockopt parts less useful than they should be, compared to the
regular syscalls.

Did we ever ponder ways of getting this sorted out on the net side?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SOCKET_URING_OP_GETSOCKOPT SOL_SOCKET restriction
  2025-03-28 16:34     ` Jens Axboe
@ 2025-03-28 17:18       ` Pavel Begunkov
  2025-03-28 17:21         ` Pavel Begunkov
  0 siblings, 1 reply; 13+ messages in thread
From: Pavel Begunkov @ 2025-03-28 17:18 UTC (permalink / raw)
  To: Jens Axboe, Stefan Metzmacher; +Cc: io-uring, Breno Leitao

On 3/28/25 16:34, Jens Axboe wrote:
> On 3/28/25 9:02 AM, Pavel Begunkov wrote:
>> On 3/28/25 14:30, Jens Axboe wrote:
>>> On 3/28/25 8:27 AM, Stefan Metzmacher wrote:
>>>> Hi Jens,
>>>>
>>>> while playing with the kernel QUIC driver [1],
>>>> I noticed it does a lot of getsockopt() and setsockopt()
>>>> calls to sync the required state into and out of the kernel.
>>>>
>>>> My long term plan is to let the userspace quic handshake logic
>>>> work with SOCKET_URING_OP_GETSOCKOPT and SOCKET_URING_OP_SETSOCKOPT.
>>>>
>>>> The used level is SOL_QUIC and that won't work
>>>> as io_uring_cmd_getsockopt() has a restriction to
>>>> SOL_SOCKET, while there's no restriction in
>>>> io_uring_cmd_setsockopt().
>>>>
>>>> What's the reason to have that restriction?
>>>> And why is it only for the get path and not
>>>> the set path?
>>>
>>> There's absolutely no reason for that, looks like a pure oversight?!
>>
>> Cc Breno, he can explain better, but IIRC that's because most
>> of set/get sockopt options expect user pointers to be passed in,
>> and io_uring wants to use kernel memory. It's plumbed for
>> SOL_SOCKET with sockptr_t, but there was a push back against
>> converting the rest.
> 
> Gah yes, now I remember. What's pretty annoying though, as it leaves the
> get/setsockopt parts less useful than they should be, compared to the
> regular syscalls.
> 
> Did we ever ponder ways of getting this sorted out on the net side?

I remember Breno looking at several different options.

Breno, can you remind me, why can't we convert ->getsockopt to
take a normal kernel ptr for length while passing a user ptr
for value as before?

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SOCKET_URING_OP_GETSOCKOPT SOL_SOCKET restriction
  2025-03-28 17:18       ` Pavel Begunkov
@ 2025-03-28 17:21         ` Pavel Begunkov
  2025-03-28 18:22           ` Breno Leitao
  2025-03-28 19:41           ` Stefan Metzmacher
  0 siblings, 2 replies; 13+ messages in thread
From: Pavel Begunkov @ 2025-03-28 17:21 UTC (permalink / raw)
  To: Jens Axboe, Stefan Metzmacher; +Cc: io-uring, Breno Leitao

On 3/28/25 17:18, Pavel Begunkov wrote:
> On 3/28/25 16:34, Jens Axboe wrote:
>> On 3/28/25 9:02 AM, Pavel Begunkov wrote:
>>> On 3/28/25 14:30, Jens Axboe wrote:
>>>> On 3/28/25 8:27 AM, Stefan Metzmacher wrote:
>>>>> Hi Jens,
>>>>>
>>>>> while playing with the kernel QUIC driver [1],
>>>>> I noticed it does a lot of getsockopt() and setsockopt()
>>>>> calls to sync the required state into and out of the kernel.
>>>>>
>>>>> My long term plan is to let the userspace quic handshake logic
>>>>> work with SOCKET_URING_OP_GETSOCKOPT and SOCKET_URING_OP_SETSOCKOPT.
>>>>>
>>>>> The used level is SOL_QUIC and that won't work
>>>>> as io_uring_cmd_getsockopt() has a restriction to
>>>>> SOL_SOCKET, while there's no restriction in
>>>>> io_uring_cmd_setsockopt().
>>>>>
>>>>> What's the reason to have that restriction?
>>>>> And why is it only for the get path and not
>>>>> the set path?
>>>>
>>>> There's absolutely no reason for that, looks like a pure oversight?!
>>>
>>> Cc Breno, he can explain better, but IIRC that's because most
>>> of set/get sockopt options expect user pointers to be passed in,
>>> and io_uring wants to use kernel memory. It's plumbed for
>>> SOL_SOCKET with sockptr_t, but there was a push back against
>>> converting the rest.
>>
>> Gah yes, now I remember. What's pretty annoying though, as it leaves the
>> get/setsockopt parts less useful than they should be, compared to the
>> regular syscalls.
>>
>> Did we ever ponder ways of getting this sorted out on the net side?
> 
> I remember Breno looking at several different options.
> 
> Breno, can you remind me, why can't we convert ->getsockopt to
> take a normal kernel ptr for length while passing a user ptr
> for value as before?

Similar to this:

getsockopt_syscall(void __user *len_uptr) {
	int klen;

	copy_from_user(&klen, len_uptr);
	->getsockopt(&klen);
	copy_to_user(len_uptr, &klen);
}

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SOCKET_URING_OP_GETSOCKOPT SOL_SOCKET restriction
  2025-03-28 17:21         ` Pavel Begunkov
@ 2025-03-28 18:22           ` Breno Leitao
  2025-03-29 10:59             ` Pavel Begunkov
  2025-03-28 19:41           ` Stefan Metzmacher
  1 sibling, 1 reply; 13+ messages in thread
From: Breno Leitao @ 2025-03-28 18:22 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: Jens Axboe, Stefan Metzmacher, io-uring

Hello Pavel,

On Fri, Mar 28, 2025 at 05:21:06PM +0000, Pavel Begunkov wrote:
> On 3/28/25 17:18, Pavel Begunkov wrote:
> > On 3/28/25 16:34, Jens Axboe wrote:
> > > On 3/28/25 9:02 AM, Pavel Begunkov wrote:
> > > > On 3/28/25 14:30, Jens Axboe wrote:
> > > > > On 3/28/25 8:27 AM, Stefan Metzmacher wrote:
> > I remember Breno looking at several different options.
> > 
> > Breno, can you remind me, why can't we convert ->getsockopt to
> > take a normal kernel ptr for length while passing a user ptr
> > for value as before?
> 
> Similar to this:
> 
> getsockopt_syscall(void __user *len_uptr) {
> 	int klen;
> 
> 	copy_from_user(&klen, len_uptr);
> 	->getsockopt(&klen);
> 	copy_to_user(len_uptr, &klen);
> }

We have a few limitations if I remember correct:

getsockopt() callback expects __user pointers:

	int             (*getsockopt)(struct socket *sock, int level,
			int optname, char __user *optval, int __user *optlen);


So, you cannot copy the memory content and call ->getsockopt() with
kernel memory.

A solution was to use sockptr, as done by setsockopt(), but, that was
discouraged.

Another important thing, some getsockopt() callback changes the pointer,
so, doing copy_to_user() directly in the getsocktopt callback, which
would break your approach above.

I understand that the next steps here are:

 1) Make getsockopt() operate with either userspace or kernel buffer.
    a) This buffer needs will be written and read on both side. I.e, you
    pass data in the buffer from userspace to kernel space, and kernel
    will overwrite that buffer in kernelspace.

    In other words, this is a read-write buffer (which is not something
    we have in iovec IIRC).

 2) Call the same callbacks from io_uring subsystem using kernel memory

 3) Regular syscalls will continue to user userspace memory.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SOCKET_URING_OP_GETSOCKOPT SOL_SOCKET restriction
  2025-03-28 17:21         ` Pavel Begunkov
  2025-03-28 18:22           ` Breno Leitao
@ 2025-03-28 19:41           ` Stefan Metzmacher
  1 sibling, 0 replies; 13+ messages in thread
From: Stefan Metzmacher @ 2025-03-28 19:41 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Breno Leitao

Am 28.03.25 um 18:21 schrieb Pavel Begunkov:
> On 3/28/25 17:18, Pavel Begunkov wrote:
>> On 3/28/25 16:34, Jens Axboe wrote:
>>> On 3/28/25 9:02 AM, Pavel Begunkov wrote:
>>>> On 3/28/25 14:30, Jens Axboe wrote:
>>>>> On 3/28/25 8:27 AM, Stefan Metzmacher wrote:
>>>>>> Hi Jens,
>>>>>>
>>>>>> while playing with the kernel QUIC driver [1],
>>>>>> I noticed it does a lot of getsockopt() and setsockopt()
>>>>>> calls to sync the required state into and out of the kernel.
>>>>>>
>>>>>> My long term plan is to let the userspace quic handshake logic
>>>>>> work with SOCKET_URING_OP_GETSOCKOPT and SOCKET_URING_OP_SETSOCKOPT.
>>>>>>
>>>>>> The used level is SOL_QUIC and that won't work
>>>>>> as io_uring_cmd_getsockopt() has a restriction to
>>>>>> SOL_SOCKET, while there's no restriction in
>>>>>> io_uring_cmd_setsockopt().
>>>>>>
>>>>>> What's the reason to have that restriction?
>>>>>> And why is it only for the get path and not
>>>>>> the set path?
>>>>>
>>>>> There's absolutely no reason for that, looks like a pure oversight?!
>>>>
>>>> Cc Breno, he can explain better, but IIRC that's because most
>>>> of set/get sockopt options expect user pointers to be passed in,
>>>> and io_uring wants to use kernel memory. It's plumbed for
>>>> SOL_SOCKET with sockptr_t, but there was a push back against
>>>> converting the rest.
>>>
>>> Gah yes, now I remember. What's pretty annoying though, as it leaves the
>>> get/setsockopt parts less useful than they should be, compared to the
>>> regular syscalls.
>>>
>>> Did we ever ponder ways of getting this sorted out on the net side?
>>
>> I remember Breno looking at several different options.
>>
>> Breno, can you remind me, why can't we convert ->getsockopt to
>> take a normal kernel ptr for length while passing a user ptr
>> for value as before?
> 
> Similar to this:
> 
> getsockopt_syscall(void __user *len_uptr) {
>      int klen;
> 
>      copy_from_user(&klen, len_uptr);
>      ->getsockopt(&klen);
>      copy_to_user(len_uptr, &klen);
> }

Yes, I was thinking about something similar.

I'll give it a go next week.

Thanks!
metze


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: SOCKET_URING_OP_GETSOCKOPT SOL_SOCKET restriction
  2025-03-28 18:22           ` Breno Leitao
@ 2025-03-29 10:59             ` Pavel Begunkov
  0 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2025-03-29 10:59 UTC (permalink / raw)
  To: Breno Leitao; +Cc: Jens Axboe, Stefan Metzmacher, io-uring

On 3/28/25 18:22, Breno Leitao wrote:
> Hello Pavel,
> 
> On Fri, Mar 28, 2025 at 05:21:06PM +0000, Pavel Begunkov wrote:
>> On 3/28/25 17:18, Pavel Begunkov wrote:
>>> On 3/28/25 16:34, Jens Axboe wrote:
>>>> On 3/28/25 9:02 AM, Pavel Begunkov wrote:
>>>>> On 3/28/25 14:30, Jens Axboe wrote:
>>>>>> On 3/28/25 8:27 AM, Stefan Metzmacher wrote:
>>> I remember Breno looking at several different options.
>>>
>>> Breno, can you remind me, why can't we convert ->getsockopt to
>>> take a normal kernel ptr for length while passing a user ptr
>>> for value as before?
>>
>> Similar to this:
>>
>> getsockopt_syscall(void __user *len_uptr) {
>> 	int klen;
>>
>> 	copy_from_user(&klen, len_uptr);
>> 	->getsockopt(&klen);
>> 	copy_to_user(len_uptr, &klen);
>> }
> 
> We have a few limitations if I remember correct:
> 
> getsockopt() callback expects __user pointers:
> 
> 	int             (*getsockopt)(struct socket *sock, int level,
> 			int optname, char __user *optval, int __user *optlen);
> 
> 
> So, you cannot copy the memory content and call ->getsockopt() with
> kernel memory.

Right, I'm rather asking about changing the callback to pass
a kernel pointer and make the caller to do the copy_to_user
if needed.

> A solution was to use sockptr, as done by setsockopt(), but, that was
> discouraged.
> 
> Another important thing, some getsockopt() callback changes the pointer,
> so, doing copy_to_user() directly in the getsocktopt callback, which
> would break your approach above.

Do you mean writing to it? That's why the snippet passes a kernel
_pointer_ to length. Did I misunderstand you?

> I understand that the next steps here are:
> 
>   1) Make getsockopt() operate with either userspace or kernel buffer.
>      a) This buffer needs will be written and read on both side. I.e, you
>      pass data in the buffer from userspace to kernel space, and kernel
>      will overwrite that buffer in kernelspace.
> 
>      In other words, this is a read-write buffer (which is not something
>      we have in iovec IIRC).
> 
>   2) Call the same callbacks from io_uring subsystem using kernel memory
> 
>   3) Regular syscalls will continue to user userspace memory.
> 

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-03-29 10:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-28 14:27 SOCKET_URING_OP_GETSOCKOPT SOL_SOCKET restriction Stefan Metzmacher
2025-03-28 14:30 ` Jens Axboe
2025-03-28 15:02   ` Stefan Metzmacher
2025-03-28 15:08     ` Stefan Metzmacher
2025-03-28 16:24     ` Breno Leitao
2025-03-28 15:02   ` Pavel Begunkov
2025-03-28 15:03     ` Pavel Begunkov
2025-03-28 16:34     ` Jens Axboe
2025-03-28 17:18       ` Pavel Begunkov
2025-03-28 17:21         ` Pavel Begunkov
2025-03-28 18:22           ` Breno Leitao
2025-03-29 10:59             ` Pavel Begunkov
2025-03-28 19:41           ` Stefan Metzmacher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox