public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [SECURITY] io_uring: multishot poll stall via EPOLL_URING_WAKE (apoll_events not synced)
@ 2026-04-20 17:50 Azizcan Daştan
  2026-04-20 17:57 ` Jens Axboe
  0 siblings, 1 reply; 2+ messages in thread
From: Azizcan Daştan @ 2026-04-20 17:50 UTC (permalink / raw)
  To: io-uring

[-- Attachment #1: Type: text/plain, Size: 1819 bytes --]

Bug Description: I am reporting a logic bug in the Linux kernel's
io_uring poll subsystem that causes multishot POLL_ADD requests to
permanently stall.

Bug Analysis: io_poll_wake() sets EPOLLONESHOT on poll->events when it
detects EPOLL_URING_WAKE (to break circular dependency), but it does
not update req->apoll_events. io_poll_check_events() reads
apoll_events to decide if the poll is oneshot. Seeing no EPOLLONESHOT,
it takes the multishot path and posts a CQE with IORING_CQE_F_MORE,
but the waitqueue entry was already removed. Consequently, the request
is permanently stuck.

Root Cause: The issue lies within the following logic: if (mask &
EPOLL_URING_WAKE) { poll->events |= EPOLLONESHOT; /* req->apoll_events
is NOT updated here */ }

The fix requires updating req->apoll_events: req->apoll_events |= EPOLLONESHOT;

Regression History: This bug class was previously fixed in commit
aacf2f9f382c (2022-06-21). However, it was reintroduced by commit
4464853277d0 (2022-11-20) when EPOLL_URING_WAKE was added. The fix
from aacf2f9f382c was not applied to the new path.

Impact:

Permanent resource leak (stuck request)

Application hang / potential DoS

Protocol violation: IORING_CQE_F_MORE promised but never delivered

Affected Versions: All kernels since commit 4464853277d0 (Nov 2022)
through current master. Tested and confirmed on: Linux
6.8.0-106-generic (Ubuntu 24.04 LTS).

Runtime Confirmation: [] CQE #1: res=0x1, flags=0x2 [MORE] [] Writing
to eventfd again to trigger next event... [] CQE #2: res=0x1,
flags=0x2 [MORE] [] Writing to eventfd again to trigger next event...
[*] Timeout waiting for CQE [!] BUG TRIGGERED: multishot poll stopped
delivering events [!] Last CQE had IORING_CQE_F_MORE but no more
events arrived

Details, POC, and steps are provided in the attachments.

Best Regards,

[-- Attachment #2: finding-006-report.docx --]
[-- Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document, Size: 13647 bytes --]

[-- Attachment #3: poc.c --]
[-- Type: application/octet-stream, Size: 6372 bytes --]

/*
 * PoC: io_uring multishot POLL_ADD stall via EPOLL_URING_WAKE
 *
 * Bug: io_uring/poll.c io_poll_wake() sets EPOLLONESHOT on poll->events
 * when it detects EPOLL_URING_WAKE, but does NOT update req->apoll_events.
 * io_poll_check_events() reads req->apoll_events to determine if the poll
 * is oneshot, so it enters the multishot path, posts a CQE with
 * IORING_CQE_F_MORE, but the waitqueue entry was already removed.
 * The request becomes permanently stuck: no waitqueue = no future events.
 *
 * Original bug class fixed: aacf2f9f382c (2022-06-21)
 * Reintroduced by: 4464853277d0 (2022-11-20, EPOLL_URING_WAKE)
 *
 * Impact: Resource leak, DoS, protocol violation (IORING_CQE_F_MORE
 *         promised but never delivered)
 * Affected: All kernels since EPOLL_URING_WAKE (Nov 2022) through master
 *
 * Build: gcc -o poc_poll_stall poc_poll_epoll_uring_wake.c -luring
 * Run:   ./poc_poll_stall
 *
 * Expected: After the first CQE, no more events arrive despite
 *           IORING_CQE_F_MORE being set. The multishot poll is stuck.
 */

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <sys/eventfd.h>
#include <poll.h>
#include <liburing.h>

#define RING_SIZE 16
#define TIMEOUT_MS 2000

int main(void)
{
    struct io_uring ring;
    struct io_uring_sqe *sqe;
    struct io_uring_cqe *cqe;
    int evfd;
    int ret;
    uint64_t val;
    int cqe_count = 0;

    printf("[*] PoC: multishot POLL_ADD stall via EPOLL_URING_WAKE\n");
    printf("[*] Bug: io_uring/poll.c apoll_events not updated on EPOLL_URING_WAKE\n\n");

    /* Create eventfd */
    evfd = eventfd(0, EFD_NONBLOCK);
    if (evfd < 0) {
        perror("[-] eventfd");
        return 1;
    }
    printf("[+] eventfd created: fd %d\n", evfd);

    /* Create io_uring instance */
    ret = io_uring_queue_init(RING_SIZE, &ring, 0);
    if (ret < 0) {
        fprintf(stderr, "[-] io_uring_queue_init: %s\n", strerror(-ret));
        return 1;
    }
    printf("[+] io_uring initialized\n");

    /*
     * Register eventfd as CQ notification fd.
     * This causes EPOLL_URING_WAKE to be set on wakeups triggered
     * by CQE posting to this ring's CQ.
     */
    ret = io_uring_register_eventfd(&ring, evfd);
    if (ret < 0) {
        fprintf(stderr, "[-] io_uring_register_eventfd: %s\n", strerror(-ret));
        return 1;
    }
    printf("[+] eventfd registered as CQ notification\n");

    /*
     * Submit a multishot POLL_ADD on the eventfd itself.
     *
     * This creates a circular dependency:
     * - CQE posted -> eventfd notified -> POLL wakeup -> more CQE -> ...
     *
     * io_poll_wake detects this via EPOLL_URING_WAKE and sets EPOLLONESHOT
     * on poll->events to break the cycle. But it doesn't update
     * req->apoll_events, so io_poll_check_events thinks it's still multishot.
     */
    sqe = io_uring_get_sqe(&ring);
    if (!sqe) {
        fprintf(stderr, "[-] io_uring_get_sqe failed\n");
        return 1;
    }

    io_uring_prep_poll_multishot(sqe, evfd, POLLIN);
    sqe->user_data = 0xDEAD;

    printf("[+] Submitted multishot POLL_ADD on eventfd\n");

    ret = io_uring_submit(&ring);
    if (ret < 0) {
        fprintf(stderr, "[-] io_uring_submit: %s\n", strerror(-ret));
        return 1;
    }
    printf("[+] Submitted %d SQE(s)\n\n", ret);

    /*
     * Write to the eventfd to trigger the first poll event.
     * This will:
     * 1. Wake the poll -> io_poll_wake called
     * 2. Since we're not on the uring wake path yet, normal wake happens
     * 3. CQE is posted -> eventfd is notified (EPOLL_URING_WAKE)
     * 4. Second wake: io_poll_wake sees EPOLL_URING_WAKE
     * 5. Sets poll->events |= EPOLLONESHOT (but NOT apoll_events)
     * 6. Removes waitqueue entry
     * 7. io_poll_check_events: apoll_events lacks EPOLLONESHOT -> multishot path
     * 8. Posts CQE with IORING_CQE_F_MORE
     * 9. Request stuck: no waitqueue entry, no more events possible
     */
    printf("[*] Writing to eventfd to trigger initial wakeup...\n");
    val = 1;
    ret = write(evfd, &val, sizeof(val));
    if (ret != sizeof(val)) {
        perror("[-] write eventfd");
        return 1;
    }

    /* Drain the eventfd so it doesn't stay readable */
    read(evfd, &val, sizeof(val));

    /* Collect CQEs */
    printf("[*] Waiting for CQEs (timeout %d ms)...\n\n", TIMEOUT_MS);

    struct __kernel_timespec ts = {
        .tv_sec = TIMEOUT_MS / 1000,
        .tv_nsec = (TIMEOUT_MS % 1000) * 1000000LL,
    };

    while (1) {
        ret = io_uring_wait_cqe_timeout(&ring, &cqe, &ts);
        if (ret == -ETIME) {
            printf("[*] Timeout waiting for CQE\n");
            break;
        }
        if (ret < 0) {
            fprintf(stderr, "[-] io_uring_wait_cqe: %s\n", strerror(-ret));
            break;
        }

        cqe_count++;
        printf("[*] CQE #%d: res=0x%x, flags=0x%x", cqe_count, cqe->res, cqe->flags);
        if (cqe->flags & IORING_CQE_F_MORE)
            printf(" [MORE]");
        printf("\n");

        if (!(cqe->flags & IORING_CQE_F_MORE)) {
            printf("[*] No IORING_CQE_F_MORE flag - poll terminated normally\n");
            io_uring_cqe_seen(&ring, cqe);
            break;
        }

        io_uring_cqe_seen(&ring, cqe);

        /* Write again to trigger another event */
        printf("[*] Writing to eventfd again to trigger next event...\n");
        val = 1;
        write(evfd, &val, sizeof(val));
        read(evfd, &val, sizeof(val));

        /* Short timeout for subsequent events */
        ts.tv_sec = 1;
        ts.tv_nsec = 0;
    }

    printf("\n[*] === RESULTS ===\n");
    printf("[*] Total CQEs received: %d\n", cqe_count);

    if (cqe_count <= 2) {
        printf("[!] BUG LIKELY TRIGGERED: multishot poll stopped delivering events\n");
        printf("[!] The last CQE had IORING_CQE_F_MORE but no more events arrived\n");
        printf("[!] This confirms: apoll_events not updated on EPOLL_URING_WAKE\n");
        printf("[!] Request is permanently stuck (resource leak + protocol violation)\n");
    } else {
        printf("[*] Multiple events received - bug may not have triggered\n");
        printf("[*] This can happen if EPOLL_URING_WAKE timing varies\n");
    }

    /* Cleanup */
    close(evfd);
    io_uring_queue_exit(&ring);

    printf("\n[*] Done.\n");
    return 0;
}

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [SECURITY] io_uring: multishot poll stall via EPOLL_URING_WAKE (apoll_events not synced)
  2026-04-20 17:50 [SECURITY] io_uring: multishot poll stall via EPOLL_URING_WAKE (apoll_events not synced) Azizcan Daştan
@ 2026-04-20 17:57 ` Jens Axboe
  0 siblings, 0 replies; 2+ messages in thread
From: Jens Axboe @ 2026-04-20 17:57 UTC (permalink / raw)
  To: Azizcan Daştan, io-uring

On 4/20/26 11:50 AM, Azizcan Da?tan wrote:
> Bug Description: I am reporting a logic bug in the Linux kernel's
> io_uring poll subsystem that causes multishot POLL_ADD requests to
> permanently stall.

This is just a bug, it's not a security issue. And don't send binary
attachments, nobody sane would open a binary attachment sent on a
mailing list.

If you want to get credit for finding a fixing a bug, just send a
patch for it. That's what I told you to do, not send the same "report"
style on the public list. A patch is something that can get reviewed and
applied.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-04-20 17:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-20 17:50 [SECURITY] io_uring: multishot poll stall via EPOLL_URING_WAKE (apoll_events not synced) Azizcan Daştan
2026-04-20 17:57 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox