From: Jens Axboe <axboe@kernel.dk>
To: Jacob Thompson <jacobT@beta.pyu.ca>, io-uring@vger.kernel.org
Subject: Re: CQE repeats the first item?
Date: Sun, 5 Oct 2025 19:36:28 -0600 [thread overview]
Message-ID: <4bb29dbd-cc25-4f5d-9156-c37c918d2b42@kernel.dk> (raw)
In-Reply-To: <d5f48608-5a19-434b-bb48-e60c91e01599@kernel.dk>
[-- Attachment #1: Type: text/plain, Size: 4014 bytes --]
On 10/5/25 7:31 PM, Jens Axboe wrote:
> On 10/5/25 7:25 PM, Jacob Thompson wrote:
>> On Sun, Oct 05, 2025 at 07:09:53PM -0600, Jens Axboe wrote:
>>> On 10/5/25 3:54 PM, Jacob Thompson wrote:
>>>> On Sun, Oct 05, 2025 at 02:56:05PM -0600, Jens Axboe wrote:
>>>>> On 10/5/25 2:21 PM, Jacob Thompson wrote:
>>>>>> I'm doing something wrong and I wanted to know if anyone knows what I
>>>>>> did wrong from the description I'm using syscalls to call
>>>>>> io_uring_setup and io_uring_enter. I managed to submit 1 item without
>>>>>> an issue but any more gets me the first item over and over again. In
>>>>>> my test I did a memset -1 on cqes and sqes, I memset 0 the first ten
>>>>>> sqes with different user_data (0x1234 + i), and I used the opcode
>>>>>> IORING_OP_NOP. I called "io_uring_enter(fd, 10, 0,
>>>>>> IORING_ENTER_SQ_WAKEUP, 0)" and looked at the cq. Item 11 has the
>>>>>> user_data as '18446744073709551615' which is correct, but the first 10
>>>>>> all has user_data be 0x1234 which is weird AF since only one item has
>>>>>> that user_data and I submited 10 I considered maybe the debugger was
>>>>>> giving me incorrect values so I tried printing the user data in a
>>>>>> loop, I have no idea why the first one repeats 10 times. I only called
>>>>>> enter once
>>>>>>
>>>>>> Id is 4660
>>>>>> Id is 4660
>>>>>> Id is 4660
>>>>>> Id is 4660
>>>>>> Id is 4660
>>>>>> Id is 4660
>>>>>> Id is 4660
>>>>>> Id is 4660
>>>>>> Id is 4660
>>>>>> Id is 4660
>>>>>> Id is 18446744073709551615
>>>>>
>>>>> You're presumably not updating your side of the CQ ring correctly, see
>>>>> what liburing does when you call io_uring_cqe_seen(). If that's not it,
>>>>> then you're probably mishandling something else and an example would be
>>>>> useful as otherwise I'd just be guessing. There's really not much to go
>>>>> from in this report.
>>>>>
>>>>> --
>>>>> Jens Axboe
>>>>
>>>> I tried reproducing it in a smaller file. Assume I did everything wrong but somehow I seem to get results and they're not correct.
>>>>
>>>> The codebase I'd like to use this in has very little activity (could go seconds without a single syscall), then execute a few hundreds-thousand (which I like to be async).
>>>> SQPOLL sounds like the one best for my usecase. You can see I updated the sq tail before enter and I used IORING_ENTER_SQ_WAKEUP + slept for a second.
>>>> The sq tail isn't zero which means I have results? and you can see its 10 of the same user_data
>>>>
>>>> cq head is 0 enter result was 10
>>>> 1234 0
>>>> 1234 0
>>>> 1234 0
>>>> 1234 0
>>>> 1234 0
>>>> 1234 0
>>>> 1234 0
>>>> 1234 0
>>>> 1234 0
>>>> 1234 0
>>>> FFFFFFFF -1
>>>
>>> I looked at your test code, and you're setting up 10 NOP requests with
>>> userdata == 0x1234, and hence you get 10 completions with that userdata.
>>> For some reason you iterate 11 CQEs, which means your last one is the one
>>> that you already filled with -1.
>>>
>>> In other words, it very much looks like it's working as it should. Any
>>> reason why you're using the raw interface rather than liburing? All of
>>> this seems to be not understanding how the ring works, and liburing
>>> helps isolate you from that. The SQ ring doesn't tell you anything about
>>> whether you have results (CQEs?), the difference between the SQ head and
>>> tail just tell you if there's something to submit. The CQ ring head and
>>> tail would tell you if there are CQEs to reap or not.
>>>
>>> --
>>> Jens Axboe
>>
>> You must be seeing something that I'm not. I had a +i in the line,
>> should the user_data not increment every item? The line was
>> 'sqes[i].user_data = 0x1234+i;'. The 11th iteration is intentional to
>> see the value of the memset earlier.
>
> You're not using IORING_SETUP_NO_SQARRAY, hence it's submitting index 0
> every time. In other words, you're submitting the same SQE 10 times, not
> 10 different SQEs. That then yields 10 completions for an SQE with the
> same userdata, and hence your CQEs all look identical.
Ala the attached.
--
Jens Axboe
[-- Attachment #2: test.cpp --]
[-- Type: text/x-c++src, Size: 2486 bytes --]
#include <unistd.h>
#include <stdio.h>
#include <assert.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <linux/io_uring.h>
int io_uring_setup(unsigned entries, io_uring_params*params) { return syscall(__NR_io_uring_setup, entries, params); }
int io_uring_enter(int ring_fd, unsigned int to_submit, unsigned int min_complete, unsigned int flags, void*sig) { return (int) syscall(__NR_io_uring_enter, ring_fd, to_submit, min_complete, flags, 0); }
typedef int* IntPtr;
#define X(NAME) NAME = (int*)(p+params.sq_off.NAME)
struct sqinfo
{
IntPtr head, tail, ring_mask, ring_entries, flags, dropped;
int*array;
void set(char*p, io_uring_params¶ms) { X(head); X(tail); X(ring_mask); X(ring_entries); X(flags); X(dropped); array = (int*)(p+params.sq_off.array); }
};
#undef X
#define X(NAME) NAME = (int*)(p+params.cq_off.NAME)
struct cqinfo
{
IntPtr head, tail, ring_mask, ring_entries, overflow, flags;
io_uring_cqe*cqes;
void set(char*p, io_uring_params¶ms) { X(head); X(tail); X(ring_mask); X(ring_entries); X(overflow); X(flags); cqes = (io_uring_cqe*)(p+params.cq_off.cqes); }
};
#undef X
int main(int argc, char*argv[])
{
int queue_size = 256;
io_uring_params param{}; // zero init
param.flags = IORING_SETUP_NO_SQARRAY;
int ringFD = io_uring_setup(queue_size, ¶m);
assert(ringFD>0);
assert(param.features & IORING_FEAT_SINGLE_MMAP);
auto base_length = param.sq_off.array + param.sq_entries*4;
char *base = (char*) mmap(0, base_length, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, ringFD, IORING_OFF_SQ_RING);
assert(base != MAP_FAILED);
auto sqes = (io_uring_sqe*)mmap(0, param.sq_entries * sizeof(io_uring_sqe), PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, ringFD, IORING_OFF_SQES);
assert(sqes != MAP_FAILED);
unsigned tail;
cqinfo cq;
sqinfo sq;
cq.set(base, param);
sq.set(base, param);
// Take 10 items
assert(*sq.tail == 0);
for(int i=0; i<10; i++) {
memset(&sqes[i], 0, sizeof(struct io_uring_sqe));
sqes[i].opcode = IORING_OP_NOP;
sqes[i].user_data = 0x1234+i;
}
__atomic_store_n(sq.tail, 10, __ATOMIC_RELEASE);
//int res = io_uring_enter(ringFD, 10, 10, IORING_ENTER_SQ_WAIT, 0);
int res = io_uring_enter(ringFD, 10, 10, IORING_ENTER_SQ_WAKEUP, 0);
sleep(1);
tail = __atomic_load_n(cq.tail, __ATOMIC_ACQUIRE);
printf("cq tail is %d enter result was %d\n", tail, res);
for(int i=0; i<tail; i++) {
printf("%X %d\n", cq.cqes[i].user_data, cq.cqes[i].res);
}
return 0;
}
next prev parent reply other threads:[~2025-10-06 1:36 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-05 20:21 CQE repeats the first item? Jacob Thompson
2025-10-05 20:56 ` Jens Axboe
2025-10-05 21:54 ` Jacob Thompson
2025-10-06 1:09 ` Jens Axboe
2025-10-06 1:25 ` Jacob Thompson
2025-10-06 1:31 ` Jens Axboe
2025-10-06 1:36 ` Jens Axboe [this message]
2025-10-06 2:01 ` Jacob Thompson
2025-10-06 13:56 ` Jens Axboe
2025-10-06 21:45 ` Jacob Thompson
2025-10-06 21:58 ` Jens Axboe
2025-10-05 22:10 ` Jacob Thompson
2025-10-06 1:10 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4bb29dbd-cc25-4f5d-9156-c37c918d2b42@kernel.dk \
--to=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
--cc=jacobT@beta.pyu.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox