public inbox for [email protected]
 help / color / mirror / Atom feed
From: Miklos Szeredi <[email protected]>
To: Jens Axboe <[email protected]>
Cc: [email protected]
Subject: Re: io_uring_prep_openat_direct() and link/drain
Date: Tue, 29 Mar 2022 20:31:33 +0200	[thread overview]
Message-ID: <CAJfpegs=GcTuXcor-pbhaAxDKeS5XRy5rwTGXUcZM0BYYUK2LA@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>

On Tue, 29 Mar 2022 at 20:26, Jens Axboe <[email protected]> wrote:
>
> On 3/29/22 12:21 PM, Miklos Szeredi wrote:
> > On Tue, 29 Mar 2022 at 19:04, Jens Axboe <[email protected]> wrote:
> >>
> >> On 3/29/22 10:08 AM, Jens Axboe wrote:
> >>> On 3/29/22 7:20 AM, Miklos Szeredi wrote:
> >>>> Hi,
> >>>>
> >>>> I'm trying to read multiple files with io_uring and getting stuck,
> >>>> because the link and drain flags don't seem to do what they are
> >>>> documented to do.
> >>>>
> >>>> Kernel is v5.17 and liburing is compiled from the git tree at
> >>>> 7a3a27b6a384 ("add tests for nonblocking accept sockets").
> >>>>
> >>>> Without those flags the attached example works some of the time, but
> >>>> that's probably accidental since ordering is not ensured.
> >>>>
> >>>> Adding the drain or link flags make it even worse (fail in casese that
> >>>> the unordered one didn't).
> >>>>
> >>>> What am I missing?
> >>>
> >>> I don't think you're missing anything, it looks like a bug. What you
> >>> want here is:
> >>>
> >>> prep_open_direct(sqe);
> >>> sqe->flags |= IOSQE_IO_LINK;
> >>> ...
> >>> prep_read(sqe);
> >
> > So with the below merge this works.   But if instead I do
> >
> > prep_open_direct(sqe);
> >  ...
> > prep_read(sqe);
> > sqe->flags |= IOSQE_IO_DRAIN;
> >
> > than it doesn't.  Shouldn't drain have a stronger ordering guarantee than link?
>
> I didn't test that, but I bet it's running into the same kind of issue
> wrt prep. Are you getting -EBADF? The drain will indeed ensure that
> _execution_ doesn't start until the previous requests have completed,
> but it's still prepared before.
>
> For your use case, IO_LINK is what you want and that must work.
>
> I'll check the drain case just in case, it may in fact work if you just
> edit the code base you're running now and remove these two lines from
> io_init_req():
>
> if (unlikely(!req->file)) {
> -        if (!ctx->submit_state.link.head)
> -                return -EBADF;
>         req->result = fd;
>         req->flags |= REQ_F_DEFERRED_FILE;
> }
>
> to not make it dependent on link.head. Probably not a bad idea in
> general, as the rest of the handlers have been audited for req->file
> usage in prep.

Nope, that results in the following Oops:

BUG: kernel NULL pointer dereference, address: 0000000000000044
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 3 PID: 1126 Comm: readfiles Not tainted
5.17.0-00065-g3287b182c9c3-dirty #623
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-29-g6a62e0cb0dfe-prebuilt.qemu.org 04/01/2014
RIP: 0010:io_rw_init_file+0x15/0x170
Code: 00 6d 22 82 0f 95 c0 83 c0 02 c3 66 2e 0f 1f 84 00 00 00 00 00
0f 1f 44 00 00 41 55 41 54 55 53 4c 8b 2f 4c 8b 67 58 8b 6f 20 <41> 23
75 44 0f 84 28 01 00 00 48 89 fb f6 47 44 01 0f 84 08 01 00
RSP: 0018:ffffc9000108fba8 EFLAGS: 00010207
RAX: 0000000000000001 RBX: ffff888103ddd688 RCX: ffffc9000108fc18
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff888103ddd600
RBP: 0000000000000000 R08: ffffc9000108fbd8 R09: 00007ffffffff000
R10: 0000000000020000 R11: 000056012e2ce2e0 R12: ffff88810276b800
R13: 0000000000000000 R14: 0000000000000000 R15: ffff888103ddd600
FS:  00007f9058d72580(0000) GS:ffff888237d80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000044 CR3: 0000000100966004 CR4: 0000000000370ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 io_read+0x65/0x4d0
 ? select_task_rq_fair+0x602/0xf20
 ? newidle_balance.constprop.0+0x2ff/0x3a0
 io_issue_sqe+0xd86/0x21a0
 ? __schedule+0x228/0x610
 ? timerqueue_del+0x2a/0x40
 io_req_task_submit+0x26/0x100
 tctx_task_work+0x172/0x4b0
 task_work_run+0x5c/0x90
 io_cqring_wait+0x48d/0x790
 ? io_eventfd_put+0x20/0x20
 __do_sys_io_uring_enter+0x28d/0x5e0
 ? __cond_resched+0x16/0x40
 ? task_work_run+0x61/0x90
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x56012d87159c
Code: c2 41 8b 54 24 04 8b bd cc 00 00 00 41 83 ca 10 f6 85 d0 00 00
00 01 4d 8b 44 24 10 44 0f 44 d0 45 8b 4c 24 0c 44 89 f0 0f 05 <41> 89
c3 85 c0 0f 88 4a ff ff ff 41 29 04 24 bf 01 00 00 00 48 85
RSP: 002b:00007ffc8db5c550 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000056012d87159c
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007ffc8db5c620 R08: 0000000000000000 R09: 0000000000000008
R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffc8db5c580
R13: 00007ffc8db5c618 R14: 00000000000001aa R15: 0000000000000000
 </TASK>
Modules linked in:
CR2: 0000000000000044
---[ end trace 0000000000000000 ]---

  reply	other threads:[~2022-03-29 18:31 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-29 13:20 io_uring_prep_openat_direct() and link/drain Miklos Szeredi
2022-03-29 16:08 ` Jens Axboe
2022-03-29 17:04   ` Jens Axboe
2022-03-29 18:21     ` Miklos Szeredi
2022-03-29 18:26       ` Jens Axboe
2022-03-29 18:31         ` Miklos Szeredi [this message]
2022-03-29 18:40           ` Jens Axboe
2022-03-29 19:30             ` Miklos Szeredi
2022-03-29 20:03               ` Jens Axboe
2022-03-30  8:18                 ` Miklos Szeredi
2022-03-30 12:35                   ` Jens Axboe
2022-03-30 12:43                     ` Miklos Szeredi
2022-03-30 12:48                       ` Jens Axboe
2022-03-30 12:51                         ` Miklos Szeredi
2022-03-30 14:58                           ` Miklos Szeredi
2022-03-30 15:05                             ` Jens Axboe
2022-03-30 15:12                               ` Miklos Szeredi
2022-03-30 15:17                                 ` Jens Axboe
2022-03-30 15:53                                   ` Jens Axboe
2022-03-30 17:49                                     ` Jens Axboe
2022-04-01  8:40                                       ` Miklos Szeredi
2022-04-01 15:36                                         ` Jens Axboe
2022-04-01 16:02                                           ` Miklos Szeredi
2022-04-01 16:21                                             ` Jens Axboe
2022-04-02  1:17                                               ` Jens Axboe
2022-04-05  7:45                                                 ` Miklos Szeredi
2022-04-05 14:44                                                   ` Jens Axboe
2022-04-21 12:31                                                     ` Miklos Szeredi
2022-04-21 12:34                                                       ` Jens Axboe
2022-04-21 12:39                                                         ` Miklos Szeredi
2022-04-21 12:41                                                           ` Jens Axboe
2022-04-21 13:10                                                             ` Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJfpegs=GcTuXcor-pbhaAxDKeS5XRy5rwTGXUcZM0BYYUK2LA@mail.gmail.com' \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox