Re: io_uring failure on parisc with VIPT caches

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Helge Deller <[email protected]>
To: Jens Axboe <[email protected]>,
	John David Anglin <[email protected]>,
	[email protected], [email protected],
	James Bottomley <[email protected]>
Subject: Re: io_uring failure on parisc with VIPT caches
Date: Wed, 15 Feb 2023 16:52:06 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 2/15/23 16:16, Jens Axboe wrote:
> On 2/14/23 7:12?PM, John David Anglin wrote:
>> On 2023-02-14 6:29 p.m., Jens Axboe wrote:
>>> On 2/14/23 4:09?PM, Helge Deller wrote:
>>>> * John David Anglin<[email protected]>:
>>>>> On 2023-02-13 5:05 p.m., Helge Deller wrote:
>>>>>> On 2/13/23 22:05, Jens Axboe wrote:
>>>>>>> On 2/13/23 1:59?PM, Helge Deller wrote:
>>>>>>>>> Yep sounds like it. What's the caching architecture of parisc?
>>>>>>>> parisc is Virtually Indexed, Physically Tagged (VIPT).
>>>>>>> That's what I assumed, so virtual aliasing is what we're dealing with
>>>>>>> here.
>>>>>>>
>>>>>>>> Thanks for the patch!
>>>>>>>> Sadly it doesn't fix the problem, as the kernel still sees
>>>>>>>> ctx->rings->sq.tail as being 0.
>>>>>>>> Interestingly it worked once (not reproduceable) directly after bootup,
>>>>>>>> which indicates that we at least look at the right address from kernel side.
>>>>>>>>
>>>>>>>> So, still needs more debugging/testing.
>>>>>>> It's not like this is untested stuff, so yeah it'll generally be
>>>>>>> correct, it just seems that parisc is a bit odd in that the virtual
>>>>>>> aliasing occurs between the kernel and userspace addresses too. At least
>>>>>>> that's what it seems like.
>>>>>> True.
>>>>>>
>>>>>>> But I wonder if what needs flushing is the user side, not the kernel
>>>>>>> side? Either that, or my patch is not flushing the right thing on the
>>>>>>> kernel side.
>>>> The patch below seems to fix the issue.
>>>>
>>>> I've successfuly tested it with the io_uring-test testcase on
>>>> physical parisc machines with 32- and 64-bit 6.1.11 kernels.
>>>>
>>>> The idea is similiar on how a file is mmapped shared by two
>>>> userspace processes by keeping the lower bits of the virtual address
>>>> the same.
>>>>
>>>> Cache flushes from userspace don't seem to be needed.
>>> Are they from the kernel side, if the lower bits mean we end up
>>> with the same coloring? Because I think this is a bit of a big
>>> hammer, in terms of overhead for flushing. As an example, on arm64
>>> that is perfectly fine with the existing code, it's about a 20-25%
>>> performance hit.
>>
>> The io_uring-test testcase still works on rp3440 with the kernel
>> flushes removed.
>
> That's what I suspected, the important bit here is just aligning it for
> identical coloring. Can you confirm if the below works for you? Had to
> fiddle it a bit to get it to work without coloring.

Yes, the patch works for me on 32- and 64-bit, even with PA8900 CPUs...

Is there maybe somewhere a more detailled testcase which I could try too?

Some nits below...

> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index db623b3185c8..1d4562067949 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -72,6 +72,7 @@
>   #include <linux/io_uring.h>
>   #include <linux/audit.h>
>   #include <linux/security.h>
> +#include <asm/shmparam.h>
>
>   #define CREATE_TRACE_POINTS
>   #include <trace/events/io_uring.h>
> @@ -3200,6 +3201,51 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
>   	return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
>   }
>
> +static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
> +			unsigned long addr, unsigned long len,
> +			unsigned long pgoff, unsigned long flags)
> +{
> +	const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
> +	struct vm_unmapped_area_info info;
> +	void *ptr;
> +
> +	ptr = io_uring_validate_mmap_request(filp, pgoff, len);
> +	if (IS_ERR(ptr))
> +		return -ENOMEM;
> +
> +	/* we do not support requesting a specific address */
> +	if (addr)
> +		return -EINVAL;

With this ^ we disallow users to provide a proposed address.
I think this is ok and I suggest to keep it that way.

Alternatively one could check the given address against the
alignment which is calculated below, but this will make the
code IMHO unnecessary bigger.

> +
> +	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
> +	info.length = len;
> +	info.low_limit = max(PAGE_SIZE, mmap_min_addr);
> +	info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base);
> +	info.align_mask = PAGE_MASK;
> +	info.align_offset = (unsigned long) ptr;

For parisc I introduced SHM_COLOUR because it allows userspace
to map a shared file initially at any PAGE_SIZE-aligned address.
Only if then a second user maps the same file, the aliasing will be enforced.

Other platforms just have SHMLBA, and for some SHMLBA is > PAGE_SIZE.
So, instead of above code, this untested code might be better for those other
platforms ?
info.align_mask = PAGE_MASK & (SHMLBA - 1);
info.align_offset = (unsigned long)ptr & (SHMLBA - 1);

this is ok ->
> +#ifdef SHM_COLOUR
> +	info.align_mask &= (SHM_COLOUR - 1);
> +	info.align_offset &= (SHM_COLOUR - 1)

^^ misses a ";" at the end.

Helge

> +#endif
> +
> +	/*
> +	 * A failed mmap() very likely causes application failure,
> +	 * so fall back to the bottom-up function here. This scenario
> +	 * can happen with large stack limits and large mmap()
> +	 * allocations.
> +	 */
> +	addr = vm_unmapped_area(&info);
> +	if (offset_in_page(addr)) {
> +		VM_BUG_ON(addr != -ENOMEM);
> +		info.flags = 0;
> +		info.low_limit = TASK_UNMAPPED_BASE;
> +		info.high_limit = mmap_end;
> +		addr = vm_unmapped_area(&info);
> +	}
> +
> +	return addr;
> +}
> +
>   #else /* !CONFIG_MMU */
>
>   static int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
> @@ -3414,6 +3460,8 @@ static const struct file_operations io_uring_fops = {
>   #ifndef CONFIG_MMU
>   	.get_unmapped_area = io_uring_nommu_get_unmapped_area,
>   	.mmap_capabilities = io_uring_nommu_mmap_capabilities,
> +#else
> +	.get_unmapped_area = io_uring_mmu_get_unmapped_area,
>   #endif
>   	.poll		= io_uring_poll,
>   #ifdef CONFIG_PROC_FS
>

next prev parent reply	other threads:[~2023-02-15 15:52 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-12  9:47 io_uring failure on parisc (32-bit userspace and 64-bit kernel) Helge Deller
2023-02-12 13:16 ` Jens Axboe
2023-02-12 13:28   ` Helge Deller
2023-02-12 13:35     ` Jens Axboe
2023-02-12 14:00       ` Jens Axboe
2023-02-12 14:03       ` Helge Deller
2023-02-12 19:35         ` Helge Deller
2023-02-12 19:42           ` Jens Axboe
2023-02-12 20:01             ` Helge Deller
2023-02-12 21:48               ` Jens Axboe
2023-02-12 22:20                 ` Helge Deller
2023-02-12 22:31                   ` Helge Deller
2023-02-13 16:15                     ` Jens Axboe
2023-02-13 20:59                       ` Helge Deller
2023-02-13 21:05                         ` Jens Axboe
2023-02-13 22:05                           ` Helge Deller
2023-02-13 22:50                             ` John David Anglin
2023-02-14 23:09                               ` io_uring failure on parisc with VIPT caches Helge Deller
2023-02-14 23:29                                 ` Jens Axboe
2023-02-15  2:12                                   ` John David Anglin
2023-02-15 15:16                                     ` Jens Axboe
2023-02-15 15:52                                       ` Helge Deller [this message]
2023-02-15 15:56                                         ` Jens Axboe
2023-02-15 16:02                                           ` Helge Deller
2023-02-15 16:04                                             ` Jens Axboe
2023-02-15 21:40                                               ` Helge Deller
2023-02-15 23:04                                                 ` Jens Axboe
2023-02-15 16:38                                           ` John David Anglin
2023-02-15 17:01                                             ` Jens Axboe
2023-02-15 19:00                                               ` Jens Axboe
2023-02-15 19:16                                                 ` Jens Axboe
2023-02-15 20:27                                                   ` John David Anglin
2023-02-15 20:37                                                     ` Jens Axboe
2023-02-15 21:06                                                       ` John David Anglin
2023-02-15 21:38                                                         ` Jens Axboe
2023-02-15 21:39                                                         ` John David Anglin
2023-02-15 22:10                                                           ` John David Anglin
2023-02-15 23:02                                                             ` Jens Axboe
2023-02-15 23:43                                                               ` John David Anglin
2023-02-16  2:40                                                               ` John David Anglin
2023-02-16  2:50                                                                 ` Jens Axboe
2023-02-16  8:24                                                                   ` Helge Deller
2023-02-16 15:22                                                                     ` Jens Axboe
2023-02-16 20:35                                                                     ` John David Anglin
2023-02-15 23:03                                                           ` Jens Axboe
2023-02-15 19:20                                                 ` John David Anglin
2023-02-15 19:24                                                   ` Jens Axboe
2023-02-15 16:18                                         ` John David Anglin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox