public inbox for [email protected]
 help / color / mirror / Atom feed
From: Helge Deller <[email protected]>
To: Jens Axboe <[email protected]>, [email protected]
Cc: John David Anglin <[email protected]>,
	linux-parisc <[email protected]>
Subject: Re: io_uring failure on parisc (32-bit userspace and 64-bit kernel)
Date: Mon, 13 Feb 2023 21:59:38 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 2/13/23 17:15, Jens Axboe wrote:
> On 2/12/23 3:31?PM, Helge Deller wrote:
>> On 2/12/23 23:20, Helge Deller wrote:
>>> On 2/12/23 22:48, Jens Axboe wrote:
>>>> On 2/12/23 1:01?PM, Helge Deller wrote:
>>>>> On 2/12/23 20:42, Jens Axboe wrote:
>>>>>> On 2/12/23 12:35?PM, Helge Deller wrote:
>>>>>>> On 2/12/23 15:03, Helge Deller wrote:
>>>>>>>> On 2/12/23 14:35, Jens Axboe wrote:
>>>>>>>>> On 2/12/23 6:28?AM, Helge Deller wrote:
>>>>>>>>>> On 2/12/23 14:16, Jens Axboe wrote:
>>>>>>>>>>> On 2/12/23 2:47?AM, Helge Deller wrote:
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> We see io-uring failures on the parisc architecture with this testcase:
>>>>>>>>>>>> https://github.com/axboe/liburing/blob/master/examples/io_uring-test.c
>>>>>>>>>>>>
>>>>>>>>>>>> parisc is always big-endian 32-bit userspace, with either 32- or 64-bit kernel.
>>>>>>>>>>>>
>>>>>>>>>>>> On a 64-bit kernel (6.1.11):
>>>>>>>>>>>> deller@parisc:~$ ./io_uring-test test.file
>>>>>>>>>>>> ret=0, wanted 4096
>>>>>>>>>>>> Submitted=4, completed=1, bytes=0
>>>>>>>>>>>> -> failure
>>>>>>>>>>>>
>>>>>>>>>>>> strace shows:
>>>>>>>>>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>>>>>>>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf7522000
>>>>>>>>>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf6922000
>>>>>>>>>>>> openat(AT_FDCWD, "libell0-dbgsym_0.56-2_hppa.deb", O_RDONLY|O_DIRECT) = 4
>>>>>>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=689308, ...}) = 0
>>>>>>>>>>>> getrandom("\x5c\xcf\x38\x2d", 4, GRND_NONBLOCK) = 4
>>>>>>>>>>>> brk(NULL)                               = 0x4ae000
>>>>>>>>>>>> brk(0x4cf000)                           = 0x4cf000
>>>>>>>>>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 0
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Running the same testcase on a 32-bit kernel (6.1.11) works:
>>>>>>>>>>>> root@debian:~# ./io_uring-test test.file
>>>>>>>>>>>> Submitted=4, completed=4, bytes=16384
>>>>>>>>>>>> -> ok.
>>>>>>>>>>>>
>>>>>>>>>>>> strace:
>>>>>>>>>>>> io_uring_setup(4, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=4, cq_entries=8, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|0x1f80, sq_off={head=0, tail=16, ring_mask=64, ring_entries=72, flags=84, dropped=80, array=224}, cq_off={head=32, tail=48, ring_mask=68, ring_entries=76, overflow=92, cqes=96, flags=0x58 /* IORING_CQ_??? */}}) = 3
>>>>>>>>>>>> mmap2(NULL, 240, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0) = 0xf6d4c000
>>>>>>>>>>>> mmap2(NULL, 256, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x10000000) = 0xf694c000
>>>>>>>>>>>> openat(AT_FDCWD, "trace.dat", O_RDONLY|O_DIRECT) = 4
>>>>>>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=1855488, ...}) = 0
>>>>>>>>>>>> getrandom("\xb2\x3f\x0c\x65", 4, GRND_NONBLOCK) = 4
>>>>>>>>>>>> brk(NULL)                               = 0x15000
>>>>>>>>>>>> brk(0x36000)                            = 0x36000
>>>>>>>>>>>> io_uring_enter(3, 4, 0, 0, NULL, 8)     = 4
>>>>>>>>>>>>
>>>>>>>>>>>> I'm happy to test any patch if someone has an idea....
>>>>>>>>>>>
>>>>>>>>>>> No idea what this could be, to be honest. I tried your qemu vm image,
>>>>>>>>>>> and it does boot, but it's missing keys to be able to update apt and
>>>>>>>>>>> install packages... After fiddling with this for 30 min I gave up, any
>>>>>>>>>>> chance you can update the sid image? Given how slow this thing is
>>>>>>>>>>> running, it'd take me all day to do a fresh install and I have to admit
>>>>>>>>>>> I'm not THAT motivated about parisc to do that :)
>>>>>>>>>>
>>>>>>>>>> Yes, I will update that image, but qemu currently only supports a
>>>>>>>>>> 32-bit PA-RISC CPU which can only run the 32-bit kernel. So even if I
>>>>>>>>>> update it, you won't be able to reproduce it, as it only happens with
>>>>>>>>>> the 64-bit kernel. I'm sure it's some kind of missing 32-to-64bit
>>>>>>>>>> translation in the kernel, which triggers only big-endian machines.
>>>>>>>>>
>>>>>>>>> I built my own kernel for it, so that should be fine, correct?
>>>>>>>>
>>>>>>>> No, as qemu won't boot the 64-bit kernel.
>>>>>>>>
>>>>>>>>> We'll see soon enough, managed to disable enough checks on the
>>>>>>>>> debian-10 image to actually make it install packages.
>>>>>>>>>
>>>>>>>>>> Does powerpc with a 64-bit ppc64 kernel work?
>>>>>>>>>> I'd assume it will show the same issue.
>>>>>>>>>
>>>>>>>>> No idea... Only stuff I use and test on is x86-64/32 and arm64.
>>>>>>>>
>>>>>>>> Would be interesting if someone could test...
>>>>>>>>
>>>>>>>>>> I will try to add some printks and compare the output of 32- and
>>>>>>>>>> 64-bit kernels. If you have some suggestion where to add such (which?)
>>>>>>>>>> debug code, it would help me a lot.
>>>>>>>>>
>>>>>>>>> I'd just try:
>>>>>>>>>
>>>>>>>>> echo 1 > /sys/kernel/debug/tracing/events/io_uring
>>>>>>>>
>>>>>>>> I'll try, but will take some time...
>>>>>>>>
>>>>>>>
>>>>>>> At entry of io_submit_sqes(), io_sqring_entries() returns 0, because
>>>>>>> ctx->rings->sq.tail is 0 (wrongly on broken 64-bit, but ok value 4 on 32-bit), and
>>>>>>> ctx->cached_sq_head is 0 in both cases.
>>>>>>
>>>>>> cached_sq_head will get updated as sqes are consumed, but since sq.tail
>>>>>> is zero, there's nothing to submit as far as io_uring is concerned.
>>>>>>
>>>>>> Can you dump addresses/offsets of the sq and cq heads/tails in userspace
>>>>>> and in the kernel? They are u32, so same size of 32 and 64-bit.
>>>>>
>>>>> For both kernels (32- and 64-bit) I get:
>>>>> p->sq_off.head = 0  p->sq_off.tail = 16
>>>>> p->cq_off.head = 32  p->cq_off.tail = 48
>>>>
>>>> So all that looks as expected. Is it perhaps some mmap thing on 64-bit
>>>> kernels? The kernel isn't seeing the updates. You could add the below
>>>> debugging, and keep your kernel side stuff. Sounds like they don't quite
>>>> agree.
>>>>
>>>>
>>>> diff --git a/examples/io_uring-test.c b/examples/io_uring-test.c
>>>> index 1a685360bff6..f1cfda90c018 100644
>>>> --- a/examples/io_uring-test.c
>>>> +++ b/examples/io_uring-test.c
>>>> @@ -73,7 +73,9 @@ int main(int argc, char *argv[])
>>>>                break;
>>>>        } while (1);
>>>>
>>>> +    printf("pre-submit sq head/tail %d/%d, %d/%d\n", *ring.sq.khead, *ring.sq.ktail, ring.sq.sqe_head, ring.sq.sqe_tail);
>>>>        ret = io_uring_submit(&ring);
>>>> +    printf("post-submit sq head/tail %d/%d, %d/%d\n", *ring.sq.khead, *ring.sq.ktail, ring.sq.sqe_head, ring.sq.sqe_tail);
>>>>        if (ret < 0) {
>>>>            fprintf(stderr, "io_uring_submit: %s\n", strerror(-ret));
>>>>            return 1;
>>>
>>> Result is:
>>> pre-submit sq head/tail 0/0, 0/4
>>> ..
>>> post-submit sq head/tail 0/4, 4/4
>>
>> I have to correct myself!
>> The problem exists on both, 32- and 64-bit kernels.
>> My current testing with 32-bit kernel was on qemu, but after booting the same kernel
>> on a physical box, I see the testcase failing on the 32-bit kernel too.
>>
>> So, probably some cache-flushing / alias-handling is needed....
>> This is a 1-CPU box, so SMP isn't involved.
>
> Yep sounds like it. What's the caching architecture of parisc?

parisc is Virtually Indexed, Physically Tagged (VIPT).


> Something
> like this perhaps, but may not be complete as we'd need to do something
> for the cqe writes too I think, not just the cq tail.
>
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index db623b3185c8..ab0d1297bb0e 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -2338,6 +2338,8 @@ static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx)
>   	unsigned head, mask = ctx->sq_entries - 1;
>   	unsigned sq_idx = ctx->cached_sq_head++ & mask;
>
> +	flush_dcache_page(virt_to_page(ctx->sq_array + sq_idx));
> +
>   	/*
>   	 * The cached sq head (or cq tail) serves two purposes:
>   	 *
> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
> index ab4b2a1c3b7e..b132f44a9364 100644
> --- a/io_uring/io_uring.h
> +++ b/io_uring/io_uring.h
> @@ -220,6 +220,7 @@ static inline void io_commit_cqring(struct io_ring_ctx *ctx)
>   {
>   	/* order cqe stores with ring update */
>   	smp_store_release(&ctx->rings->cq.tail, ctx->cached_cq_tail);
> +	flush_dcache_page(virt_to_page(ctx->rings));
>   }
>
>   /* requires smb_mb() prior, see wq_has_sleeper() */

Thanks for the patch!
Sadly it doesn't fix the problem, as the kernel still sees
ctx->rings->sq.tail as being 0.
Interestingly it worked once (not reproduceable) directly after bootup,
which indicates that we at least look at the right address from kernel side.

So, still needs more debugging/testing.

Helge

  reply	other threads:[~2023-02-13 20:59 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-12  9:47 io_uring failure on parisc (32-bit userspace and 64-bit kernel) Helge Deller
2023-02-12 13:16 ` Jens Axboe
2023-02-12 13:28   ` Helge Deller
2023-02-12 13:35     ` Jens Axboe
2023-02-12 14:00       ` Jens Axboe
2023-02-12 14:03       ` Helge Deller
2023-02-12 19:35         ` Helge Deller
2023-02-12 19:42           ` Jens Axboe
2023-02-12 20:01             ` Helge Deller
2023-02-12 21:48               ` Jens Axboe
2023-02-12 22:20                 ` Helge Deller
2023-02-12 22:31                   ` Helge Deller
2023-02-13 16:15                     ` Jens Axboe
2023-02-13 20:59                       ` Helge Deller [this message]
2023-02-13 21:05                         ` Jens Axboe
2023-02-13 22:05                           ` Helge Deller
2023-02-13 22:50                             ` John David Anglin
2023-02-14 23:09                               ` io_uring failure on parisc with VIPT caches Helge Deller
2023-02-14 23:29                                 ` Jens Axboe
2023-02-15  2:12                                   ` John David Anglin
2023-02-15 15:16                                     ` Jens Axboe
2023-02-15 15:52                                       ` Helge Deller
2023-02-15 15:56                                         ` Jens Axboe
2023-02-15 16:02                                           ` Helge Deller
2023-02-15 16:04                                             ` Jens Axboe
2023-02-15 21:40                                               ` Helge Deller
2023-02-15 23:04                                                 ` Jens Axboe
2023-02-15 16:38                                           ` John David Anglin
2023-02-15 17:01                                             ` Jens Axboe
2023-02-15 19:00                                               ` Jens Axboe
2023-02-15 19:16                                                 ` Jens Axboe
2023-02-15 20:27                                                   ` John David Anglin
2023-02-15 20:37                                                     ` Jens Axboe
2023-02-15 21:06                                                       ` John David Anglin
2023-02-15 21:38                                                         ` Jens Axboe
2023-02-15 21:39                                                         ` John David Anglin
2023-02-15 22:10                                                           ` John David Anglin
2023-02-15 23:02                                                             ` Jens Axboe
2023-02-15 23:43                                                               ` John David Anglin
2023-02-16  2:40                                                               ` John David Anglin
2023-02-16  2:50                                                                 ` Jens Axboe
2023-02-16  8:24                                                                   ` Helge Deller
2023-02-16 15:22                                                                     ` Jens Axboe
2023-02-16 20:35                                                                     ` John David Anglin
2023-02-15 23:03                                                           ` Jens Axboe
2023-02-15 19:20                                                 ` John David Anglin
2023-02-15 19:24                                                   ` Jens Axboe
2023-02-15 16:18                                         ` John David Anglin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox