Re: [PATCH 1/2] iouring: one capable call per iouring instance

public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed

From: Ming Lei <ming.lei@redhat.com>
To: Keith Busch <kbusch@kernel.org>
Cc: Jeff Moyer <jmoyer@redhat.com>, Keith Busch <kbusch@meta.com>,
	linux-nvme@lists.infradead.org, io-uring@vger.kernel.org,
	axboe@kernel.dk, hch@lst.de, sagi@grimberg.me,
	asml.silence@gmail.com, linux-security-module@vger.kernel.org,
	Kanchan Joshi <joshi.k@samsung.com>
Subject: Re: [PATCH 1/2] iouring: one capable call per iouring instance
Date: Thu, 7 Dec 2023 09:23:06 +0800	[thread overview]
Message-ID: <ZXEeei6eoDW87xcN@fedora> (raw)
In-Reply-To: <ZXCT6mpt2Tq0k-Nw@kbusch-mbp>

On Wed, Dec 06, 2023 at 08:31:54AM -0700, Keith Busch wrote:
> On Wed, Dec 06, 2023 at 11:08:17AM +0800, Ming Lei wrote:
> > On Tue, Dec 05, 2023 at 08:45:10AM -0700, Keith Busch wrote:
> > > 
> > > It's not necessarily about the read/write passthrough commands. It's for
> > > commands we don't know about today. Do we want to revisit this problem
> > > every time spec provides another operation? Are vendor unique solutions
> > > not allowed to get high IOPs access?
> > 
> > Except for read/write, what other commands are performance sensitive?
> 
> It varies by command set, but this question is irrelevant. I'm not
> interested in gatekeeping the fast path.

IMO, it doesn't make sense to run such optimization for commands which aren't
performance sensitive.

>  
> > > Secondly, some people have rediscovered you can abuse this interface to
> > > corrupt kernel memory, so there are considerations to restricting this
> > 
> > Just wondering why ADMIN won't corrupt kernel memory, and only normal
> > user can, looks it is kernel bug instead of permission related issue.
> 
> Admin can corrupt memory as easily as a normal user through this
> interface. We just don't want such capabilities to be available to
> regular users.
> 
> And it's a user bug: user told the kernel to map buffer of size X, but
> the device transfers size Y into it. Kernel can't do anything about that
> (other than remove the interface, but such an action will break many
> existing users) because we fundamentally do not know the true transfer
> size of a random command. Many NVMe commands don't explicitly encode
> transfer lengths, so disagreement between host and device on implicit
> lengths risk corruption. It's a protocol "feature".

Got it, thanks for the explanation, and looks one big defect of
NVMe protocol or the device implementation.

> 
> > > to CAP_SYS_ADMIN anyway, so there's no cheap check available today if we
> > > have to go that route.
> > 
> > If capable(CAP_SYS_ADMIN) is really slow, I am wondering why not
> > optimize it in task_struct?
> 
> That's an interesting point to look into. I was hoping to not touch such
> a common struct, but I'm open to all options.
 
capability is per-thread, and it is updated in current process/pthread, so
the correct place to cache this info is 'task_struct'.


Thanks,
Ming

next prev parent reply	other threads:[~2023-12-07  1:24 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-04 17:53 [PATCH 1/2] iouring: one capable call per iouring instance Keith Busch
2023-12-04 17:53 ` [PATCH 2/2] nvme: use uring_cmd sys_admin flag Keith Busch
2023-12-04 18:05 ` [PATCH 1/2] iouring: one capable call per iouring instance Jens Axboe
2023-12-04 18:45   ` Pavel Begunkov
2023-12-05 16:21   ` Kanchan Joshi
2023-12-06 21:09     ` Keith Busch
2023-12-04 18:15 ` Jens Axboe
2023-12-04 18:40 ` Jeff Moyer
2023-12-04 18:57   ` Keith Busch
2023-12-05  4:14     ` Ming Lei
2023-12-05  4:31       ` Keith Busch
2023-12-05  5:25         ` Ming Lei
2023-12-05 15:45           ` Keith Busch
2023-12-06  3:08             ` Ming Lei
2023-12-06 15:31               ` Keith Busch
2023-12-07  1:23                 ` Ming Lei [this message]
2023-12-07 17:48                   ` Christoph Hellwig
2023-12-04 19:01   ` Jens Axboe
2023-12-04 19:22     ` Jeff Moyer
2023-12-04 19:33       ` Jens Axboe
2023-12-04 19:37       ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZXEeei6eoDW87xcN@fedora \
    --to=ming.lei@redhat.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=io-uring@vger.kernel.org \
    --cc=jmoyer@redhat.com \
    --cc=joshi.k@samsung.com \
    --cc=kbusch@kernel.org \
    --cc=kbusch@meta.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox