public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH] io_uring: zero remained bytes when reading to fixed kernel buffer
@ 2025-03-22  7:56 Ming Lei
  2025-03-22 12:02 ` Pavel Begunkov
  2025-03-22 18:10 ` Caleb Sander Mateos
  0 siblings, 2 replies; 9+ messages in thread
From: Ming Lei @ 2025-03-22  7:56 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: Ming Lei, Caleb Sander Mateos, Keith Busch

So far fixed kernel buffer is only used for FS read/write, in which
the remained bytes need to be zeroed in case of short read, otherwise
kernel data may be leaked to userspace.

Add two helpers for fixing this issue, meantime replace one check
with io_use_fixed_kbuf().

Cc: Caleb Sander Mateos <[email protected]>
Cc: Keith Busch <[email protected]>
Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs")
Signed-off-by: Ming Lei <[email protected]>
---
 io_uring/rsrc.h | 16 ++++++++++++++++
 io_uring/rw.c   |  8 +++++++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index b52242852ff3..6996eb8e5b7d 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -169,4 +169,20 @@ static inline void io_alloc_cache_vec_kasan(struct iou_vec *iv)
 		io_vec_free(iv);
 }
 
+/* do not call it before assigning buffer node to request */
+static inline bool io_use_fixed_kbuf(struct io_kiocb *req)
+{
+	return (req->flags & REQ_F_BUF_NODE) && req->buf_node->buf->is_kbuf;
+}
+
+/* zero remained bytes of kernel buffer for avoiding to leak data */
+static inline void io_req_zero_remained(struct io_kiocb *req,
+					struct iov_iter *iter)
+{
+	size_t left = iov_iter_count(iter);
+
+	if (left > 0 && iov_iter_rw(iter) == READ)
+		iov_iter_zero(left, iter);
+}
+
 #endif
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 039e063f7091..67dc1a6710c9 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -541,6 +541,12 @@ static void __io_complete_rw_common(struct io_kiocb *req, long res)
 	} else {
 		req_set_fail(req);
 		req->cqe.res = res;
+
+		if (io_use_fixed_kbuf(req)) {
+			struct io_async_rw *io = req->async_data;
+
+			io_req_zero_remained(req, &io->iter);
+		}
 	}
 }
 
@@ -692,7 +698,7 @@ static ssize_t loop_rw_iter(int ddir, struct io_rw *rw, struct iov_iter *iter)
 	if ((kiocb->ki_flags & IOCB_NOWAIT) &&
 	    !(kiocb->ki_filp->f_flags & O_NONBLOCK))
 		return -EAGAIN;
-	if ((req->flags & REQ_F_BUF_NODE) && req->buf_node->buf->is_kbuf)
+	if (io_use_fixed_kbuf(req))
 		return -EFAULT;
 
 	ppos = io_kiocb_ppos(kiocb);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] io_uring: zero remained bytes when reading to fixed kernel buffer
  2025-03-22  7:56 [PATCH] io_uring: zero remained bytes when reading to fixed kernel buffer Ming Lei
@ 2025-03-22 12:02 ` Pavel Begunkov
  2025-03-22 13:50   ` Ming Lei
  2025-03-22 18:10 ` Caleb Sander Mateos
  1 sibling, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2025-03-22 12:02 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, io-uring; +Cc: Caleb Sander Mateos, Keith Busch

On 3/22/25 07:56, Ming Lei wrote:
> So far fixed kernel buffer is only used for FS read/write, in which
> the remained bytes need to be zeroed in case of short read, otherwise
> kernel data may be leaked to userspace.

Can you remind me, how that can happen? Normally, IIUC, you register
a request filled with user pages, so no kernel data there. Is it some
bounce buffers?

> Add two helpers for fixing this issue, meantime replace one check
> with io_use_fixed_kbuf().
> 
> Cc: Caleb Sander Mateos <[email protected]>
> Cc: Keith Busch <[email protected]>
> Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs")
> Signed-off-by: Ming Lei <[email protected]>
> ---
...
> +/* zero remained bytes of kernel buffer for avoiding to leak data */
> +static inline void io_req_zero_remained(struct io_kiocb *req,
> +					struct iov_iter *iter)
> +{
> +	size_t left = iov_iter_count(iter);
> +
> +	if (left > 0 && iov_iter_rw(iter) == READ)
> +		iov_iter_zero(left, iter);
> +}
> +
>   #endif
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index 039e063f7091..67dc1a6710c9 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -541,6 +541,12 @@ static void __io_complete_rw_common(struct io_kiocb *req, long res)
>   	} else {
>   		req_set_fail(req);
>   		req->cqe.res = res;
> +
> +		if (io_use_fixed_kbuf(req)) {
> +			struct io_async_rw *io = req->async_data;
> +
> +			io_req_zero_remained(req, &io->iter);
> +		}

I think it can be exploited. It's called from ->ki_complete, i.e.
io_complete_rw, so make the request size enough, if you're stuck
copying in [soft]irq for too long.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] io_uring: zero remained bytes when reading to fixed kernel buffer
  2025-03-22 12:02 ` Pavel Begunkov
@ 2025-03-22 13:50   ` Ming Lei
  2025-03-22 17:52     ` Keith Busch
  2025-03-22 18:15     ` Pavel Begunkov
  0 siblings, 2 replies; 9+ messages in thread
From: Ming Lei @ 2025-03-22 13:50 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, Caleb Sander Mateos, Keith Busch

On Sat, Mar 22, 2025 at 12:02:02PM +0000, Pavel Begunkov wrote:
> On 3/22/25 07:56, Ming Lei wrote:
> > So far fixed kernel buffer is only used for FS read/write, in which
> > the remained bytes need to be zeroed in case of short read, otherwise
> > kernel data may be leaked to userspace.
> 
> Can you remind me, how that can happen? Normally, IIUC, you register
> a request filled with user pages, so no kernel data there. Is it some
> bounce buffers?

For direct io, it is filled with user pages, but it can be buffered IO,
and the page can be mapped to userspace.

> 
> > Add two helpers for fixing this issue, meantime replace one check
> > with io_use_fixed_kbuf().
> > 
> > Cc: Caleb Sander Mateos <[email protected]>
> > Cc: Keith Busch <[email protected]>
> > Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs")
> > Signed-off-by: Ming Lei <[email protected]>
> > ---
> ...
> > +/* zero remained bytes of kernel buffer for avoiding to leak data */
> > +static inline void io_req_zero_remained(struct io_kiocb *req,
> > +					struct iov_iter *iter)
> > +{
> > +	size_t left = iov_iter_count(iter);
> > +
> > +	if (left > 0 && iov_iter_rw(iter) == READ)
> > +		iov_iter_zero(left, iter);
> > +}
> > +
> >   #endif
> > diff --git a/io_uring/rw.c b/io_uring/rw.c
> > index 039e063f7091..67dc1a6710c9 100644
> > --- a/io_uring/rw.c
> > +++ b/io_uring/rw.c
> > @@ -541,6 +541,12 @@ static void __io_complete_rw_common(struct io_kiocb *req, long res)
> >   	} else {
> >   		req_set_fail(req);
> >   		req->cqe.res = res;
> > +
> > +		if (io_use_fixed_kbuf(req)) {
> > +			struct io_async_rw *io = req->async_data;
> > +
> > +			io_req_zero_remained(req, &io->iter);
> > +		}
> 
> I think it can be exploited. It's called from ->ki_complete, i.e.
> io_complete_rw, so make the request size enough, if you're stuck
> copying in [soft]irq for too long.

Short read seldom happens, so how it can be exploited? And the request size
can't be too big in this(ublk) use case.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] io_uring: zero remained bytes when reading to fixed kernel buffer
  2025-03-22 13:50   ` Ming Lei
@ 2025-03-22 17:52     ` Keith Busch
  2025-03-22 18:21       ` Pavel Begunkov
  2025-03-22 23:58       ` Ming Lei
  2025-03-22 18:15     ` Pavel Begunkov
  1 sibling, 2 replies; 9+ messages in thread
From: Keith Busch @ 2025-03-22 17:52 UTC (permalink / raw)
  To: Ming Lei; +Cc: Pavel Begunkov, Jens Axboe, io-uring, Caleb Sander Mateos

On Sat, Mar 22, 2025 at 09:50:37PM +0800, Ming Lei wrote:
> On Sat, Mar 22, 2025 at 12:02:02PM +0000, Pavel Begunkov wrote:
> > On 3/22/25 07:56, Ming Lei wrote:
> > > So far fixed kernel buffer is only used for FS read/write, in which
> > > the remained bytes need to be zeroed in case of short read, otherwise
> > > kernel data may be leaked to userspace.
> > 
> > Can you remind me, how that can happen? Normally, IIUC, you register
> > a request filled with user pages, so no kernel data there. Is it some
> > bounce buffers?
> 
> For direct io, it is filled with user pages, but it can be buffered IO,
> and the page can be mapped to userspace.

I may missing something here because that doesn't sound specific to
kernel registered bvecs. Is page cache memory not already zeroed out to
protect against short reads?

I can easily wire up a flakey device that won't fill the requested
memory. What do I need to do to observe this data leak?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] io_uring: zero remained bytes when reading to fixed kernel buffer
  2025-03-22  7:56 [PATCH] io_uring: zero remained bytes when reading to fixed kernel buffer Ming Lei
  2025-03-22 12:02 ` Pavel Begunkov
@ 2025-03-22 18:10 ` Caleb Sander Mateos
  2025-03-23  0:08   ` Ming Lei
  1 sibling, 1 reply; 9+ messages in thread
From: Caleb Sander Mateos @ 2025-03-22 18:10 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, io-uring, Keith Busch

On Sat, Mar 22, 2025 at 12:56 AM Ming Lei <[email protected]> wrote:
>
> So far fixed kernel buffer is only used for FS read/write, in which
> the remained bytes need to be zeroed in case of short read, otherwise
> kernel data may be leaked to userspace.

I'm not sure I have all the background to understand whether kernel
data can be leaked through ublk requests, but I share Pavel and
Keith's questions about whether this scenario is even possible. If it
is possible, I don't think this patch would cover all the affected
cases:
- Registered ublk buffers can be used with any io_uring operation, not
just read/write. Wouldn't the same issue apply when using the ublk
buffer with, say, a socket recv or an NVMe passthru operation?
- Wouldn't the same issue apply if the ublk server completes a ublk
read request without performing any I/O (zero-copy or not) to read
data into its buffer?

Best,
Caleb

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] io_uring: zero remained bytes when reading to fixed kernel buffer
  2025-03-22 13:50   ` Ming Lei
  2025-03-22 17:52     ` Keith Busch
@ 2025-03-22 18:15     ` Pavel Begunkov
  1 sibling, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2025-03-22 18:15 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, io-uring, Caleb Sander Mateos, Keith Busch

On 3/22/25 13:50, Ming Lei wrote:
> On Sat, Mar 22, 2025 at 12:02:02PM +0000, Pavel Begunkov wrote:
>> On 3/22/25 07:56, Ming Lei wrote:
>>> So far fixed kernel buffer is only used for FS read/write, in which
>>> the remained bytes need to be zeroed in case of short read, otherwise
>>> kernel data may be leaked to userspace.
>>
>> Can you remind me, how that can happen? Normally, IIUC, you register
>> a request filled with user pages, so no kernel data there. Is it some
>> bounce buffers?
> 
> For direct io, it is filled with user pages, but it can be buffered IO,
> and the page can be mapped to userspace.

I see. I don't mind the patch personally, but I think it's a security
concern, it's still a user space app even though privileged. Is there
a precedent maybe for fuse that we trust the user driver enough to
expose kernel memory?

One option is to try to distinguish when it contains user pages,
and conditionally zero it in ublk beforehand.

But if we consider that it's fine, can ublk zero during the struct
request completion? ublk should already know from the userspace driver
if it failed or whether it's a short IO.


>>> Add two helpers for fixing this issue, meantime replace one check
>>> with io_use_fixed_kbuf().
>>>
>>> Cc: Caleb Sander Mateos <[email protected]>
>>> Cc: Keith Busch <[email protected]>
>>> Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs")
>>> Signed-off-by: Ming Lei <[email protected]>
>>> ---
>> ...
>>> +/* zero remained bytes of kernel buffer for avoiding to leak data */
>>> +static inline void io_req_zero_remained(struct io_kiocb *req,
>>> +					struct iov_iter *iter)
>>> +{
>>> +	size_t left = iov_iter_count(iter);
>>> +
>>> +	if (left > 0 && iov_iter_rw(iter) == READ)
>>> +		iov_iter_zero(left, iter);
>>> +}
>>> +
>>>    #endif
>>> diff --git a/io_uring/rw.c b/io_uring/rw.c
>>> index 039e063f7091..67dc1a6710c9 100644
>>> --- a/io_uring/rw.c
>>> +++ b/io_uring/rw.c
>>> @@ -541,6 +541,12 @@ static void __io_complete_rw_common(struct io_kiocb *req, long res)
>>>    	} else {
>>>    		req_set_fail(req);
>>>    		req->cqe.res = res;
>>> +
>>> +		if (io_use_fixed_kbuf(req)) {
>>> +			struct io_async_rw *io = req->async_data;
>>> +
>>> +			io_req_zero_remained(req, &io->iter);
>>> +		}
>>
>> I think it can be exploited. It's called from ->ki_complete, i.e.
>> io_complete_rw, so make the request size enough, if you're stuck
>> copying in [soft]irq for too long.
> 
> Short read seldom happens, so how it can be exploited? And the request size
> can't be too big in this(ublk) use case.

Denial of service by blocking irq. I'm pretty sure we can construct
a quite large bio / request in general case, e.g. with huge pages.
Maybe ublk forces splitting, but I wouldn't rely on the ublk
behaviour as it's a generic feature even though currently with
one user. We should move it to the task context, where io_uring
requests end up anyway. I'm pretty it can be cleaned up to not
have any overhead later.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] io_uring: zero remained bytes when reading to fixed kernel buffer
  2025-03-22 17:52     ` Keith Busch
@ 2025-03-22 18:21       ` Pavel Begunkov
  2025-03-22 23:58       ` Ming Lei
  1 sibling, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2025-03-22 18:21 UTC (permalink / raw)
  To: Keith Busch, Ming Lei; +Cc: Jens Axboe, io-uring, Caleb Sander Mateos

On 3/22/25 17:52, Keith Busch wrote:
> On Sat, Mar 22, 2025 at 09:50:37PM +0800, Ming Lei wrote:
>> On Sat, Mar 22, 2025 at 12:02:02PM +0000, Pavel Begunkov wrote:
>>> On 3/22/25 07:56, Ming Lei wrote:
>>>> So far fixed kernel buffer is only used for FS read/write, in which
>>>> the remained bytes need to be zeroed in case of short read, otherwise
>>>> kernel data may be leaked to userspace.
>>>
>>> Can you remind me, how that can happen? Normally, IIUC, you register
>>> a request filled with user pages, so no kernel data there. Is it some
>>> bounce buffers?
>>
>> For direct io, it is filled with user pages, but it can be buffered IO,
>> and the page can be mapped to userspace.
> 
> I may missing something here because that doesn't sound specific to
> kernel registered bvecs. Is page cache memory not already zeroed out to
> protect against short reads?

In which case it's not up to io_uring to handle it. Just to be clear,
maybe you implied that as well. But another question is the level of
trust to kernel drivers vs userspace drivers. One may argue you have
to trust the kernel drivers.

> I can easily wire up a flakey device that won't fill the requested
> memory. What do I need to do to observe this data leak?

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] io_uring: zero remained bytes when reading to fixed kernel buffer
  2025-03-22 17:52     ` Keith Busch
  2025-03-22 18:21       ` Pavel Begunkov
@ 2025-03-22 23:58       ` Ming Lei
  1 sibling, 0 replies; 9+ messages in thread
From: Ming Lei @ 2025-03-22 23:58 UTC (permalink / raw)
  To: Keith Busch; +Cc: Pavel Begunkov, Jens Axboe, io-uring, Caleb Sander Mateos

On Sat, Mar 22, 2025 at 11:52:20AM -0600, Keith Busch wrote:
> On Sat, Mar 22, 2025 at 09:50:37PM +0800, Ming Lei wrote:
> > On Sat, Mar 22, 2025 at 12:02:02PM +0000, Pavel Begunkov wrote:
> > > On 3/22/25 07:56, Ming Lei wrote:
> > > > So far fixed kernel buffer is only used for FS read/write, in which
> > > > the remained bytes need to be zeroed in case of short read, otherwise
> > > > kernel data may be leaked to userspace.
> > > 
> > > Can you remind me, how that can happen? Normally, IIUC, you register
> > > a request filled with user pages, so no kernel data there. Is it some
> > > bounce buffers?
> > 
> > For direct io, it is filled with user pages, but it can be buffered IO,
> > and the page can be mapped to userspace.
> 
> I may missing something here because that doesn't sound specific to
> kernel registered bvecs. Is page cache memory not already zeroed out to
> protect against short reads?

Not sure if mm/fs will do that for short read.

At least some drivers do zero data for short read, such as loop,
erofs_fileio_ki_complete(), null_blk...

> 
> I can easily wire up a flakey device that won't fill the requested
> memory. What do I need to do to observe this data leak?

You can observe non-zero data in this way.

Wrt. zero copy, the device need to be trusted, that is why both USER_COPY
and ZERO_COPY features can't be available for unprivileged mode.

Thinking of further, this patch isn't needed because ublk driver does
handle short READ by requeuing request.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] io_uring: zero remained bytes when reading to fixed kernel buffer
  2025-03-22 18:10 ` Caleb Sander Mateos
@ 2025-03-23  0:08   ` Ming Lei
  0 siblings, 0 replies; 9+ messages in thread
From: Ming Lei @ 2025-03-23  0:08 UTC (permalink / raw)
  To: Caleb Sander Mateos; +Cc: Jens Axboe, io-uring, Keith Busch

On Sat, Mar 22, 2025 at 11:10:23AM -0700, Caleb Sander Mateos wrote:
> On Sat, Mar 22, 2025 at 12:56 AM Ming Lei <[email protected]> wrote:
> >
> > So far fixed kernel buffer is only used for FS read/write, in which
> > the remained bytes need to be zeroed in case of short read, otherwise
> > kernel data may be leaked to userspace.
> 
> I'm not sure I have all the background to understand whether kernel
> data can be leaked through ublk requests, but I share Pavel and
> Keith's questions about whether this scenario is even possible. If it
> is possible, I don't think this patch would cover all the affected
> cases:
> - Registered ublk buffers can be used with any io_uring operation, not
> just read/write. Wouldn't the same issue apply when using the ublk
> buffer with, say, a socket recv or an NVMe passthru operation?

IORING_RECVSEND_FIXED_BUF isn't handled for recv yet, so looks socket recv
isn't enabled...

> - Wouldn't the same issue apply if the ublk server completes a ublk
> read request without performing any I/O (zero-copy or not) to read
> data into its buffer?

Yes, it needs ublk zc server implementation to be trusted, and ublk zc
can't work in unprivileted mode.

For non-zc, no such risk because request buffer is filled with user data.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-03-23  0:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-22  7:56 [PATCH] io_uring: zero remained bytes when reading to fixed kernel buffer Ming Lei
2025-03-22 12:02 ` Pavel Begunkov
2025-03-22 13:50   ` Ming Lei
2025-03-22 17:52     ` Keith Busch
2025-03-22 18:21       ` Pavel Begunkov
2025-03-22 23:58       ` Ming Lei
2025-03-22 18:15     ` Pavel Begunkov
2025-03-22 18:10 ` Caleb Sander Mateos
2025-03-23  0:08   ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox