shutdown not affecting connection?

public inbox for [email protected]
 help / color / mirror / Atom feed

* shutdown not affecting connection?
@ 2020-02-08 13:55 Glauber Costa
  2020-02-08 14:26 ` Pavel Begunkov
  2020-02-08 18:48 ` Andres Freund
  0 siblings, 2 replies; 10+ messages in thread
From: Glauber Costa @ 2020-02-08 13:55 UTC (permalink / raw)
  To: io-uring, Jens Axboe, Avi Kivity

Hi

I've been trying to make sense of some weird behavior with the seastar
implementation of io_uring, and started to suspect a bug in io_uring's
connect.

The situation is as follows:

- A connect() call is issued (and in the backend I can choose if I use
uring or not)
- The connection is supposed to take a while to establish.
- I call shutdown on the file descriptor

If io_uring is not used:
- connect() starts by  returning EINPROGRESS as expected, and after
the shutdown the file descriptor is finally made ready for epoll. I
call getsockopt(SOL_SOCKET, SO_ERROR), and see the error (104)

if io_uring is used:
- if the SQE has the IOSQE_ASYNC flag on, connect() never returns.
- if the SQE *does not* have the IOSQE_ASYNC flag on, then most of the
time the test works as intended and connect() returns 104, but
occasionally it hangs too. Note that, seastar may choose not to call
io_uring_enter immediately and batch sqes.

Sounds like some kind of race?

I know C++ probably stinks like the devil for you guys, but if you are
curious to see the code, this fails one of our unit tests:

https://github.com/scylladb/seastar/blob/master/tests/unit/connect_test.cc
See test_connection_attempt_is_shutdown
(above is the master seastar tree, not including the io_uring implementation)

Please let me know if this rings a bell and if there is anything I
should be verifying here

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: shutdown not affecting connection?
  2020-02-08 13:55 shutdown not affecting connection? Glauber Costa
@ 2020-02-08 14:26 ` Pavel Begunkov
  2020-02-08 18:42   ` Glauber Costa
  2020-02-08 18:48 ` Andres Freund
  1 sibling, 1 reply; 10+ messages in thread
From: Pavel Begunkov @ 2020-02-08 14:26 UTC (permalink / raw)
  To: Glauber Costa, io-uring, Jens Axboe, Avi Kivity

Hi

On 2/8/2020 4:55 PM, Glauber Costa wrote:
> Hi
> 
> I've been trying to make sense of some weird behavior with the seastar
> implementation of io_uring, and started to suspect a bug in io_uring's
> connect.
> 
> The situation is as follows:
> 
> - A connect() call is issued (and in the backend I can choose if I use
> uring or not)
> - The connection is supposed to take a while to establish.
> - I call shutdown on the file descriptor
> 
> If io_uring is not used:
> - connect() starts by  returning EINPROGRESS as expected, and after
> the shutdown the file descriptor is finally made ready for epoll. I
> call getsockopt(SOL_SOCKET, SO_ERROR), and see the error (104)
> 
> if io_uring is used:
> - if the SQE has the IOSQE_ASYNC flag on, connect() never returns.
> - if the SQE *does not* have the IOSQE_ASYNC flag on, then most of the
> time the test works as intended and connect() returns 104, but
> occasionally it hangs too. Note that, seastar may choose not to call
> io_uring_enter immediately and batch sqes.
> 
> Sounds like some kind of race?
> 
> I know C++ probably stinks like the devil for you guys, but if you are
> curious to see the code, this fails one of our unit tests:
> 
> https://github.com/scylladb/seastar/blob/master/tests/unit/connect_test.cc
> See test_connection_attempt_is_shutdown
> (above is the master seastar tree, not including the io_uring implementation)
> 
Is this chaining with connect().then_wrapped() asynchronous? Like kind
of future/promise stuff? I wonder, if connect() and shutdown() there may
be executed in the reverse order.

The hung with IOSQE_ASYNC sounds strange anyway.


> Please let me know if this rings a bell and if there is anything I
> should be verifying here
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: shutdown not affecting connection?
  2020-02-08 14:26 ` Pavel Begunkov
@ 2020-02-08 18:42   ` Glauber Costa
  2020-02-08 18:48     ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Glauber Costa @ 2020-02-08 18:42 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: io-uring, Jens Axboe, Avi Kivity

Hi

BTW, my apologies but I should have specified the kernel I am running:
90206ac99c1f25b7f7a4c2c40a0b9d4561ffa9bf

On Sat, Feb 8, 2020 at 9:26 AM Pavel Begunkov <[email protected]> wrote:
>
> Hi
>
> On 2/8/2020 4:55 PM, Glauber Costa wrote:
> > Hi
> >
> > I've been trying to make sense of some weird behavior with the seastar
> > implementation of io_uring, and started to suspect a bug in io_uring's
> > connect.
> >
> > The situation is as follows:
> >
> > - A connect() call is issued (and in the backend I can choose if I use
> > uring or not)
> > - The connection is supposed to take a while to establish.
> > - I call shutdown on the file descriptor
> >
> > If io_uring is not used:
> > - connect() starts by  returning EINPROGRESS as expected, and after
> > the shutdown the file descriptor is finally made ready for epoll. I
> > call getsockopt(SOL_SOCKET, SO_ERROR), and see the error (104)
> >
> > if io_uring is used:
> > - if the SQE has the IOSQE_ASYNC flag on, connect() never returns.
> > - if the SQE *does not* have the IOSQE_ASYNC flag on, then most of the
> > time the test works as intended and connect() returns 104, but
> > occasionally it hangs too. Note that, seastar may choose not to call
> > io_uring_enter immediately and batch sqes.
> >
> > Sounds like some kind of race?
> >
> > I know C++ probably stinks like the devil for you guys, but if you are
> > curious to see the code, this fails one of our unit tests:
> >
> > https://github.com/scylladb/seastar/blob/master/tests/unit/connect_test.cc
> > See test_connection_attempt_is_shutdown
> > (above is the master seastar tree, not including the io_uring implementation)
> >
> Is this chaining with connect().then_wrapped() asynchronous? Like kind
> of future/promise stuff?

Correct.
then_wrapped executes eventually when connect returns either success or failure

> I wonder, if connect() and shutdown() there may
> be executed in the reverse order.

The methods connect and shutdown will execute in this order.
But connect will just queue something that will later be sent down to
the kernel.

I initially suspected an ordering issue on my side. What made me start
suspecting a bug
are two reasons:
- I can force the code to grab an sqe and call io_uring_enter at the
moment the connect()
call happens : I see no change.
- that IOSQE_ASYNC changes this behavior, as you acknowledged yourself.

It seems to me that if shutdown happens when the sqe is sitting on a
kernel queue somewhere
the connection will hang forever instead of failing right away as I would expect
- if shutdown happens after the call to io_uring_enter
>
> The hung with IOSQE_ASYNC sounds strange anyway.
>
>
> > Please let me know if this rings a bell and if there is anything I
> > should be verifying here
> >
>
> --
> Pavel Begunkov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: shutdown not affecting connection?
  2020-02-08 18:42   ` Glauber Costa
@ 2020-02-08 18:48     ` Avi Kivity
  2020-02-08 18:57       ` Glauber Costa
  0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2020-02-08 18:48 UTC (permalink / raw)
  To: Glauber Costa, Pavel Begunkov; +Cc: io-uring, Jens Axboe

On 2/8/20 8:42 PM, Glauber Costa wrote:
> Hi
>
> BTW, my apologies but I should have specified the kernel I am running:
> 90206ac99c1f25b7f7a4c2c40a0b9d4561ffa9bf
>
> On Sat, Feb 8, 2020 at 9:26 AM Pavel Begunkov <[email protected]> wrote:
>> Hi
>>
>> On 2/8/2020 4:55 PM, Glauber Costa wrote:
>>> Hi
>>>
>>> I've been trying to make sense of some weird behavior with the seastar
>>> implementation of io_uring, and started to suspect a bug in io_uring's
>>> connect.
>>>
>>> The situation is as follows:
>>>
>>> - A connect() call is issued (and in the backend I can choose if I use
>>> uring or not)
>>> - The connection is supposed to take a while to establish.
>>> - I call shutdown on the file descriptor
>>>
>>> If io_uring is not used:
>>> - connect() starts by  returning EINPROGRESS as expected, and after
>>> the shutdown the file descriptor is finally made ready for epoll. I
>>> call getsockopt(SOL_SOCKET, SO_ERROR), and see the error (104)
>>>
>>> if io_uring is used:
>>> - if the SQE has the IOSQE_ASYNC flag on, connect() never returns.
>>> - if the SQE *does not* have the IOSQE_ASYNC flag on, then most of the
>>> time the test works as intended and connect() returns 104, but
>>> occasionally it hangs too. Note that, seastar may choose not to call
>>> io_uring_enter immediately and batch sqes.
>>>
>>> Sounds like some kind of race?
>>>
>>> I know C++ probably stinks like the devil for you guys, but if you are
>>> curious to see the code, this fails one of our unit tests:
>>>
>>> https://github.com/scylladb/seastar/blob/master/tests/unit/connect_test.cc
>>> See test_connection_attempt_is_shutdown
>>> (above is the master seastar tree, not including the io_uring implementation)
>>>
>> Is this chaining with connect().then_wrapped() asynchronous? Like kind
>> of future/promise stuff?
> Correct.
> then_wrapped executes eventually when connect returns either success or failure
>
>> I wonder, if connect() and shutdown() there may
>> be executed in the reverse order.
> The methods connect and shutdown will execute in this order.
> But connect will just queue something that will later be sent down to
> the kernel.
>
> I initially suspected an ordering issue on my side. What made me start
> suspecting a bug
> are two reasons:
> - I can force the code to grab an sqe and call io_uring_enter at the
> moment the connect()
> call happens : I see no change.
> - that IOSQE_ASYNC changes this behavior, as you acknowledged yourself.
>
> It seems to me that if shutdown happens when the sqe is sitting on a
> kernel queue somewhere
> the connection will hang forever instead of failing right away as I would expect
> - if shutdown happens after the call to io_uring_enter



You can try to cancel the sqe before you shutdown the socket. This will 
flush the queue (even if the cancellation fails).


However, if you io_uring_enter before calling shutdown and connect does 
not return, I'd consider that a kernel bug. Perhaps you can reduce the 
problem to a small C reproducer?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: shutdown not affecting connection?
  2020-02-08 18:48     ` Avi Kivity
@ 2020-02-08 18:57       ` Glauber Costa
  2020-02-08 20:20         ` Glauber Costa
  0 siblings, 1 reply; 10+ messages in thread
From: Glauber Costa @ 2020-02-08 18:57 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Pavel Begunkov, io-uring, Jens Axboe

On Sat, Feb 8, 2020 at 1:48 PM Avi Kivity <[email protected]> wrote:
>
> On 2/8/20 8:42 PM, Glauber Costa wrote:
> > Hi
> >
> > BTW, my apologies but I should have specified the kernel I am running:
> > 90206ac99c1f25b7f7a4c2c40a0b9d4561ffa9bf
> >
> > On Sat, Feb 8, 2020 at 9:26 AM Pavel Begunkov <[email protected]> wrote:
> >> Hi
> >>
> >> On 2/8/2020 4:55 PM, Glauber Costa wrote:
> >>> Hi
> >>>
> >>> I've been trying to make sense of some weird behavior with the seastar
> >>> implementation of io_uring, and started to suspect a bug in io_uring's
> >>> connect.
> >>>
> >>> The situation is as follows:
> >>>
> >>> - A connect() call is issued (and in the backend I can choose if I use
> >>> uring or not)
> >>> - The connection is supposed to take a while to establish.
> >>> - I call shutdown on the file descriptor
> >>>
> >>> If io_uring is not used:
> >>> - connect() starts by  returning EINPROGRESS as expected, and after
> >>> the shutdown the file descriptor is finally made ready for epoll. I
> >>> call getsockopt(SOL_SOCKET, SO_ERROR), and see the error (104)
> >>>
> >>> if io_uring is used:
> >>> - if the SQE has the IOSQE_ASYNC flag on, connect() never returns.
> >>> - if the SQE *does not* have the IOSQE_ASYNC flag on, then most of the
> >>> time the test works as intended and connect() returns 104, but
> >>> occasionally it hangs too. Note that, seastar may choose not to call
> >>> io_uring_enter immediately and batch sqes.
> >>>
> >>> Sounds like some kind of race?
> >>>
> >>> I know C++ probably stinks like the devil for you guys, but if you are
> >>> curious to see the code, this fails one of our unit tests:
> >>>
> >>> https://github.com/scylladb/seastar/blob/master/tests/unit/connect_test.cc
> >>> See test_connection_attempt_is_shutdown
> >>> (above is the master seastar tree, not including the io_uring implementation)
> >>>
> >> Is this chaining with connect().then_wrapped() asynchronous? Like kind
> >> of future/promise stuff?
> > Correct.
> > then_wrapped executes eventually when connect returns either success or failure
> >
> >> I wonder, if connect() and shutdown() there may
> >> be executed in the reverse order.
> > The methods connect and shutdown will execute in this order.
> > But connect will just queue something that will later be sent down to
> > the kernel.
> >
> > I initially suspected an ordering issue on my side. What made me start
> > suspecting a bug
> > are two reasons:
> > - I can force the code to grab an sqe and call io_uring_enter at the
> > moment the connect()
> > call happens : I see no change.
> > - that IOSQE_ASYNC changes this behavior, as you acknowledged yourself.
> >
> > It seems to me that if shutdown happens when the sqe is sitting on a
> > kernel queue somewhere
> > the connection will hang forever instead of failing right away as I would expect
> > - if shutdown happens after the call to io_uring_enter
>
>
>
> You can try to cancel the sqe before you shutdown the socket. This will
> flush the queue (even if the cancellation fails).
>
>
> However, if you io_uring_enter before calling shutdown and connect does
> not return, I'd consider that a kernel bug.

That is very definitely what it happens, since I've changed the code
to do it synchronously
(with our flush() implementation, which will loop until an sqe can be
acquired and io_uring_enter
returns success), so at this point I am sure this is in the kernel.




> Perhaps you can reduce the
> problem to a small C reproducer?
>
That was my intended next step, yes
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: shutdown not affecting connection?
  2020-02-08 18:57       ` Glauber Costa
@ 2020-02-08 20:20         ` Glauber Costa
  2020-02-08 20:28           ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Glauber Costa @ 2020-02-08 20:20 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Pavel Begunkov, io-uring, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 766 bytes --]

>
>
> > Perhaps you can reduce the
> > problem to a small C reproducer?
> >
> That was my intended next step, yes

s***, I didn't resist and I had to explain to my wife that no, I don't
like io_uring more than I like her.

But here it is.

This is a modification of test/connect.c.
I added a pthread comparison example that should achieve the same
sequence of events:
- try to sync connect
- wait a bit
- shutdown

I added a fixed wait for pthread to make sure that shutdown is not
called before connect.

For io_uring, the shutdown is configurable with the program argument.
This works just fine if I sleep before shutdown (as I would expect from a race).
This hangs every time if I don't.

Unless I am missing something I don't think this is the expected behavior

[-- Attachment #2: connect.c --]
[-- Type: text/x-csrc, Size: 3491 bytes --]

/*
 * Check that IORING_OP_CONNECT works, with and without other side
 * being open.
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <poll.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>

#include "liburing.h"

#include <pthread.h>

static int create_socket(void)
{
	int fd;

	fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
	if (fd == -1) {
		perror("socket()");
		return -1;
	}

	return fd;
}

static int submit_and_wait(struct io_uring *ring, int fd, int *res, int sleep_for)
{
	struct io_uring_cqe *cqe;
	int ret;

	ret = io_uring_submit(ring);
	if (ret != 1) {
		fprintf(stderr, "io_using_submit: got %d\n", ret);
		return 1;
	}
	if (sleep_for) {
	    sleep(sleep_for);
	}
	shutdown(fd, SHUT_RDWR);

	while (!io_uring_cq_ready(ring)) {}
	ret = io_uring_peek_cqe(ring, &cqe);
	if (ret < 0) {
	    printf("peek cqe?\n");
	    return -1;
	}

	*res = cqe->res;
	io_uring_cqe_seen(ring, cqe);
	return 0;
}

static void *
thread_start(void *arg)
{
	int fd = (int)(uint64_t)arg;
	sleep(1);
	shutdown(fd, SHUT_RDWR);
	return NULL;
}

static int pthread_connect_socket(int fd, int *code)
{
	struct sockaddr_in addr;
	int ret;

	memset(&addr, 0, sizeof(addr));
	addr.sin_family = AF_INET;
	addr.sin_port = 0;
	addr.sin_addr.s_addr = 0x10010ac;

	pthread_attr_t attr;
	int s = pthread_attr_init(&attr);
	if (s != 0) {
	    perror("pthread_attr_init");
	    return -1;
	}

	pthread_t thr;
	s = pthread_create(&thr, &attr, &thread_start, (void*)(uint64_t)fd);
	if (s != 0) {
	    perror("pthread_create");
	    return -1;
	}

	ret = connect(fd, (struct sockaddr*)&addr, sizeof(addr));

	return (ret < 0);
}

static int connect_socket(struct io_uring *ring, int fd, int *code, int sleep_for)
{
	struct io_uring_sqe *sqe;
	struct sockaddr_in addr;
	int res;

	memset(&addr, 0, sizeof(addr));
	addr.sin_family = AF_INET;
	addr.sin_port = 0;
	addr.sin_addr.s_addr = 0x10010ac;

	sqe = io_uring_get_sqe(ring);
	if (!sqe) {
		fprintf(stderr, "unable to get sqe\n");
		return -1;
	}

	io_uring_prep_connect(sqe, fd, (struct sockaddr*)&addr, sizeof(addr));
	sqe->user_data = 1;
	io_uring_sqe_set_flags(sqe, IOSQE_ASYNC);

	return submit_and_wait(ring, fd, &res, sleep_for);
}

static int test_connect(struct io_uring *ring, int use_uring, int sleep_for)
{
	int connect_fd;
	int ret, code = 0;

	connect_fd = create_socket();
	if (connect_fd == -1)
		return -1;

	if (use_uring) {
		ret = connect_socket(ring, connect_fd, &code, sleep_for);
	} else {
		ret = pthread_connect_socket(connect_fd, &code);
	}

	if (ret == -1)
		goto err;

	if (code != 0) {
		fprintf(stderr, "connect failed with %d\n", code);
		goto err;
	}

	close(connect_fd);

	return 0;

err:
	close(connect_fd);
	return -1;
}

int main(int argc, char *argv[])
{
	struct io_uring ring;
	int ret;
	int sleep_for;
   
	if (argc == 1) {
	    sleep_for = 0;
	} else {
	    sleep_for = atoi(argv[1]);
	}

	ret = io_uring_queue_init(8, &ring, 0);
	if (ret == -1) {
		perror("io_uring_queue_setup()");
		return 1;
	}

	ret = test_connect(&ring, 0, sleep_for);
	if (ret == 0) {
	    printf("shutting down a socket trying to connect works with pthread\n");
	} else {
	    return -1;
	}
	
	ret = test_connect(&ring, 1, sleep_for);
	if (ret == 0) {
	    printf("shutting down a socket trying to connect with uring works too, waited %d s before shutdown\n", sleep_for);
	} else {
	    return -1;
	}

	io_uring_queue_exit(&ring);
	return 0;
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: shutdown not affecting connection?
  2020-02-08 20:20         ` Glauber Costa
@ 2020-02-08 20:28           ` Avi Kivity
  2020-02-08 20:43             ` Glauber Costa
  0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2020-02-08 20:28 UTC (permalink / raw)
  To: Glauber Costa; +Cc: Pavel Begunkov, io-uring, Jens Axboe

On 2/8/20 10:20 PM, Glauber Costa wrote:
>>
>>> Perhaps you can reduce the
>>> problem to a small C reproducer?
>>>
>> That was my intended next step, yes
> s***, I didn't resist and I had to explain to my wife that no, I don't
> like io_uring more than I like her.
>
> But here it is.
>
> This is a modification of test/connect.c.
> I added a pthread comparison example that should achieve the same
> sequence of events:
> - try to sync connect
> - wait a bit
> - shutdown
>
> I added a fixed wait for pthread to make sure that shutdown is not
> called before connect.
>
> For io_uring, the shutdown is configurable with the program argument.
> This works just fine if I sleep before shutdown (as I would expect from a race).
> This hangs every time if I don't.
>
> Unless I am missing something I don't think this is the expected behavior


I think it is understandable. Since the socket is blocking uring moves 
the work to a workqueue, and the shutdown() happens before the workqueue 
has had a chance to process the connection attempt. So we'll have to 
cancel the sqe.


Jens, does the blocking connect doesn't consume a kernel thread while 
it's waiting for a connection? Or does it just set things up and move on?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: shutdown not affecting connection?
  2020-02-08 20:28           ` Avi Kivity
@ 2020-02-08 20:43             ` Glauber Costa
  0 siblings, 0 replies; 10+ messages in thread
From: Glauber Costa @ 2020-02-08 20:43 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Pavel Begunkov, io-uring, Jens Axboe

On Sat, Feb 8, 2020 at 3:29 PM Avi Kivity <[email protected]> wrote:
>
> On 2/8/20 10:20 PM, Glauber Costa wrote:
> >>
> >>> Perhaps you can reduce the
> >>> problem to a small C reproducer?
> >>>
> >> That was my intended next step, yes
> > s***, I didn't resist and I had to explain to my wife that no, I don't
> > like io_uring more than I like her.
> >
> > But here it is.
> >
> > This is a modification of test/connect.c.
> > I added a pthread comparison example that should achieve the same
> > sequence of events:
> > - try to sync connect
> > - wait a bit
> > - shutdown
> >
> > I added a fixed wait for pthread to make sure that shutdown is not
> > called before connect.
> >
> > For io_uring, the shutdown is configurable with the program argument.
> > This works just fine if I sleep before shutdown (as I would expect from a race).
> > This hangs every time if I don't.
> >
> > Unless I am missing something I don't think this is the expected behavior
>
>
> I think it is understandable. Since the socket is blocking uring moves
> the work to a workqueue, and the shutdown() happens before the workqueue
> has had a chance to process the connection attempt. So we'll have to
> cancel the sqe.

It does seem to me that this implies that every shutdown must imply a cancel
to a connection.

From the user's perspective, this still feels like a bug to me:
the fact that we had to move this to a work queue is an implementation detail:
1) we asked the kernel to do something
2) the kernel returned
3) we called shutdown() to expecting that cancel to go away and never returned.

If cancel-after-connect to avoid these races is the intended behavior,
it would be nice to
get this documented somehow in the io_uring fantastic documentation.

In hindsight, cancel-on-shutdown is quite obvious and natural.
But I just spent two days to make this obvious and natural.

>
>
> Jens, does the blocking connect doesn't consume a kernel thread while
> it's waiting for a connection? Or does it just set things up and move on?
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: shutdown not affecting connection?
  2020-02-08 13:55 shutdown not affecting connection? Glauber Costa
  2020-02-08 14:26 ` Pavel Begunkov
@ 2020-02-08 18:48 ` Andres Freund
  2020-02-08 18:54   ` Glauber Costa
  1 sibling, 1 reply; 10+ messages in thread
From: Andres Freund @ 2020-02-08 18:48 UTC (permalink / raw)
  To: Glauber Costa; +Cc: io-uring, Jens Axboe, Avi Kivity

Hi,

On 2020-02-08 08:55:25 -0500, Glauber Costa wrote:
> - A connect() call is issued (and in the backend I can choose if I use
> uring or not)
> - The connection is supposed to take a while to establish.
> - I call shutdown on the file descriptor
> 
> If io_uring is not used:
> - connect() starts by  returning EINPROGRESS as expected, and after
> the shutdown the file descriptor is finally made ready for epoll. I
> call getsockopt(SOL_SOCKET, SO_ERROR), and see the error (104)
> 
> if io_uring is used:
> - if the SQE has the IOSQE_ASYNC flag on, connect() never returns.

That should be easy enough to reproduce without seastar as it sounds
deterministic - how about modifying liburing's test/connect.c test to
behave this way?

Hm, any chance you set O_NONBLOCK on the fd, before calling the async
connect?

Wonder if io_connect()
	file_flags = force_nonblock ? O_NONBLOCK : 0;

	ret = __sys_connect_file(req->file, &io->connect.address,
					req->connect.addr_len, file_flags);
	if ((ret == -EAGAIN || ret == -EINPROGRESS) && force_nonblock) {
fully takes into account that __sys_connect_file
	err = sock->ops->connect(sock, (struct sockaddr *)address, addrlen,
				 sock->file->f_flags | file_flags);
appears to leave O_NONBLOCK set on the file in place, which'd then
not block in the wq?

Greetings,

Andres Freund

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: shutdown not affecting connection?
  2020-02-08 18:48 ` Andres Freund
@ 2020-02-08 18:54   ` Glauber Costa
  0 siblings, 0 replies; 10+ messages in thread
From: Glauber Costa @ 2020-02-08 18:54 UTC (permalink / raw)
  To: Andres Freund; +Cc: io-uring, Jens Axboe, Avi Kivity

On Sat, Feb 8, 2020 at 1:48 PM Andres Freund <[email protected]> wrote:
>
> Hi,
>
> On 2020-02-08 08:55:25 -0500, Glauber Costa wrote:
> > - A connect() call is issued (and in the backend I can choose if I use
> > uring or not)
> > - The connection is supposed to take a while to establish.
> > - I call shutdown on the file descriptor
> >
> > If io_uring is not used:
> > - connect() starts by  returning EINPROGRESS as expected, and after
> > the shutdown the file descriptor is finally made ready for epoll. I
> > call getsockopt(SOL_SOCKET, SO_ERROR), and see the error (104)
> >
> > if io_uring is used:
> > - if the SQE has the IOSQE_ASYNC flag on, connect() never returns.
>
> That should be easy enough to reproduce without seastar as it sounds
> deterministic - how about modifying liburing's test/connect.c test to
> behave this way?

My plan was to work on that on Monday, but I wanted to get the message
earlier in case it was a known issue or rang an obvious bell. It seems like it's
not, so I'll stick to my plan.

>
> Hm, any chance you set O_NONBLOCK on the fd, before calling the async
> connect?
>
In fact I do the opposite, and I force-remove the O_NONBLOCK flag.

But I actually played around with it while chasing this, and I did, at
some point
set O_NONBLOCK.

This is what the seastar code for connect (without uring) looks like:

// socket is non-block here
    pfd->get_file_desc().connect(sa.u.sa, sa.length());
    return pfd->writeable().then([pfd]() mutable {
        auto err = pfd->get_file_desc().getsockopt<int>(SOL_SOCKET, SO_ERROR);
        if (err != 0) {
            throw std::system_error(err, std::system_category());
        }
        return make_ready_future<>();
    });

So it essentially issues a nonblock connect, writes for the fd to be
writeable, and then uses getsockopt to figure out what happened.

With io_uring, what I see on an unblocked socket is:
- it returns EINPROGRESS as I would expect
- it is not ever made writeable.




> Wonder if io_connect()
>         file_flags = force_nonblock ? O_NONBLOCK : 0;
>
>         ret = __sys_connect_file(req->file, &io->connect.address,
>                                         req->connect.addr_len, file_flags);
>         if ((ret == -EAGAIN || ret == -EINPROGRESS) && force_nonblock) {
> fully takes into account that __sys_connect_file
>         err = sock->ops->connect(sock, (struct sockaddr *)address, addrlen,
>                                  sock->file->f_flags | file_flags);
> appears to leave O_NONBLOCK set on the file in place, which'd then
> not block in the wq?
>

Isn't not-block the exact opposite of I am seeing ?
If this was really not blocking, I'd expect that to return me
something immediately, even if it was the wrong thing

> Greetings,
>
> Andres Freund

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-02-08 20:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-02-08 13:55 shutdown not affecting connection? Glauber Costa
2020-02-08 14:26 ` Pavel Begunkov
2020-02-08 18:42   ` Glauber Costa
2020-02-08 18:48     ` Avi Kivity
2020-02-08 18:57       ` Glauber Costa
2020-02-08 20:20         ` Glauber Costa
2020-02-08 20:28           ` Avi Kivity
2020-02-08 20:43             ` Glauber Costa
2020-02-08 18:48 ` Andres Freund
2020-02-08 18:54   ` Glauber Costa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox