From: Daniel Vetter <[email protected]>
To: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>,
Christian Brauner <[email protected]>,
[email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
[email protected],
[email protected],
[email protected]
Subject: Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
Date: Mon, 6 May 2024 14:47:23 +0200 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAHk-=whFg8-WyMbVUGW5c0baurGzqmRtzFLoU-gxtRXq2nVZ+w@mail.gmail.com>
On Sun, May 05, 2024 at 01:53:48PM -0700, Linus Torvalds wrote:
> On Sun, 5 May 2024 at 13:30, Al Viro <[email protected]> wrote:
> >
> > 0. special-cased ->f_count rule for ->poll() is a wart and it's
> > better to get rid of it.
> >
> > 1. fs/eventpoll.c is a steaming pile of shit and I'd be glad to see
> > git rm taken to it. Short of that, by all means, let's grab reference
> > in there around the call of vfs_poll() (see (0)).
>
> Agreed on 0/1.
>
> > 2. having ->poll() instances grab extra references to file passed
> > to them is not something that should be encouraged; there's a plenty
> > of potential problems, and "caller has it pinned, so we are fine with
> > grabbing extra refs" is nowhere near enough to eliminate those.
>
> So it's not clear why you hate it so much, since those extra
> references are totally normal in all the other VFS paths.
>
> I mean, they are perhaps not the *common* case, but we have a lot of
> random get_file() calls sprinkled around in various places when you
> end up passing a file descriptor off to some asynchronous operation
> thing.
>
> Yeah, I think most of them tend to be special operations (eg the tty
> TIOCCONS ioctl to redirect the console), but it's not like vfs_ioctl()
> is *that* different from vfs_poll. Different operation, not somehow
> "one is more special than the other".
>
> cachefiles and backing-file does it for regular IO, and drop it at IO
> completion - not that different from what dma-buf does. It's in
> ->read_iter() rather than ->poll(), but again: different operations,
> but not "one of them is somehow fundamentally different".
>
> > 3. dma-buf uses of get_file() are probably safe (epoll shite aside),
> > but they do look fishy. That has nothing to do with epoll.
>
> Now, what dma-buf basically seems to do is to avoid ref-counting its
> own fundamental data structure, and replaces that by refcounting the
> 'struct file' that *points* to it instead.
>
> And it is a bit odd, but it actually makes some amount of sense,
> because then what it passes around is that file pointer (and it allows
> passing it around from user space *as* that file).
>
> And honestly, if you look at why it then needs to add its refcount to
> it all, it actually makes sense. dma-bufs have this notion of
> "fences" that are basically completion points for the asynchronous
> DMA. Doing a "poll()" operation will add a note to the fence to get
> that wakeup when it's done.
>
> And yes, logically it takes a ref to the "struct dma_buf", but because
> of how the lifetime of the dma_buf is associated with the lifetime of
> the 'struct file', that then turns into taking a ref on the file.
>
> Unusual? Yes. But not illogical. Not obviously broken. Tying the
> lifetime of the dma_buf to the lifetime of a file that is passed along
> makes _sense_ for that use.
>
> I'm sure dma-bufs could add another level of refcounting on the
> 'struct dma_buf' itself, and not make it be 1:1 with the file, but
> it's not clear to me what the advantage would really be, or why it
> would be wrong to re-use a refcount that is already there.
So there is generally another refcount, because dma_buf is just the
cross-driver interface to some kind of real underlying buffer object from
the various graphics related subsystems we have.
And since it's a pure file based api thing that ceases to serve any
function once the fd/file is gone we tied all the dma_buf refcounting to
the refcount struct file already maintains. But the underlying buffer
object can easily outlive the dma_buf, and over the lifetime of an
underlying buffer object you might actually end up creating different
dma_buf api wrappers for it (but at least in drm we guarantee there's at
most one, hence why vmwgfx does the atomic_inc_unless_zero trick, which I
don't particularly like and isn't really needed).
But we could add another refcount, it just means we have 3 of those then
when only really 2 are needed.
Also maybe here two: dma_fence are bounded like other disk i/o (including
the option of timeouts if things go very wrong), so it's very much not
forever but at most a few seconds worst case (shit hw/driver excluded, as
usual).
-Sima
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
next prev parent reply other threads:[~2024-05-06 12:47 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-08 8:26 [syzbot] [fs?] [io-uring?] general protection fault in __ep_remove syzbot
2024-04-15 14:31 ` Jens Axboe
2024-04-15 14:57 ` Pavel Begunkov
2024-05-03 11:54 ` Bui Quang Minh
2024-05-03 18:26 ` get_file() unsafe under epoll (was Re: [syzbot] [fs?] [io-uring?] general protection fault in __ep_remove) Kees Cook
2024-05-03 18:49 ` Jens Axboe
2024-05-03 19:22 ` Kees Cook
2024-05-03 19:35 ` Jens Axboe
2024-05-03 19:59 ` Kees Cook
2024-05-03 20:28 ` Kees Cook
2024-05-03 21:11 ` Al Viro
2024-05-03 21:24 ` Linus Torvalds
2024-05-03 21:30 ` Al Viro
2024-05-06 17:46 ` Stefan Metzmacher
2024-05-06 18:17 ` Linus Torvalds
2024-05-08 8:47 ` David Laight
2024-05-03 21:36 ` Al Viro
2024-05-03 21:42 ` Linus Torvalds
2024-05-03 21:53 ` Al Viro
2024-05-06 12:23 ` Daniel Vetter
2024-05-04 9:59 ` Christian Brauner
2024-05-03 21:11 ` [PATCH] epoll: try to be a _bit_ better about file lifetimes Linus Torvalds
2024-05-03 21:24 ` Al Viro
2024-05-03 21:33 ` Linus Torvalds
2024-05-03 21:45 ` Al Viro
2024-05-03 21:52 ` Linus Torvalds
2024-05-03 22:01 ` Al Viro
2024-05-03 22:07 ` Al Viro
2024-05-03 23:16 ` Linus Torvalds
2024-05-03 23:39 ` Al Viro
2024-05-03 23:54 ` Linus Torvalds
2024-05-04 10:44 ` Christian Brauner
2024-05-03 22:46 ` Kees Cook
2024-05-03 23:03 ` Al Viro
2024-05-03 23:23 ` Kees Cook
2024-05-03 23:41 ` Linus Torvalds
2024-05-04 9:19 ` Christian Brauner
2024-05-06 12:37 ` Daniel Vetter
2024-05-04 9:37 ` Christian Brauner
2024-05-04 15:32 ` Linus Torvalds
2024-05-04 15:40 ` Linus Torvalds
2024-05-04 15:53 ` Linus Torvalds
2024-05-05 19:46 ` Al Viro
2024-05-05 20:03 ` Linus Torvalds
2024-05-05 20:30 ` Al Viro
2024-05-05 20:53 ` Linus Torvalds
2024-05-06 12:47 ` Daniel Vetter [this message]
2024-05-06 14:46 ` Christian Brauner
2024-05-07 10:58 ` Daniel Vetter
2024-05-06 16:15 ` Christian König
2024-05-05 10:50 ` Christian Brauner
2024-05-05 16:46 ` Linus Torvalds
2024-05-05 17:55 ` [PATCH v2] epoll: be " Linus Torvalds
2024-05-05 18:04 ` Jens Axboe
2024-05-05 20:01 ` David Laight
2024-05-05 20:16 ` Linus Torvalds
2024-05-05 20:12 ` [PATCH] epoll: try to be a _bit_ " Al Viro
2024-05-06 8:45 ` Christian Brauner
2024-05-06 9:26 ` Christian Brauner
2024-05-06 14:19 ` Christian Brauner
2024-05-07 21:02 ` David Laight
2024-05-04 18:20 ` Linus Torvalds
2024-05-06 14:29 ` [Linaro-mm-sig] " Christian König
2024-05-07 11:02 ` Daniel Vetter
2024-05-07 16:46 ` Linus Torvalds
2024-05-07 17:45 ` Christian König
2024-05-08 7:51 ` Michel Dänzer
2024-05-08 7:59 ` Christian König
2024-05-08 8:23 ` Christian Brauner
2024-05-08 9:10 ` Christian König
2024-05-07 18:04 ` Daniel Vetter
2024-05-07 19:07 ` Linus Torvalds
2024-05-08 5:55 ` Christian König
2024-05-08 8:32 ` Daniel Vetter
2024-05-08 10:16 ` Christian Brauner
2024-05-08 8:05 ` Christian Brauner
2024-05-08 16:19 ` Linus Torvalds
2024-05-08 17:14 ` Linus Torvalds
2024-05-09 11:38 ` Christian Brauner
2024-05-09 15:48 ` Linus Torvalds
2024-05-10 6:33 ` Christian Brauner
2024-05-08 10:08 ` Christian Brauner
2024-05-08 15:45 ` Daniel Vetter
2024-05-10 10:55 ` Christian Brauner
2024-05-11 18:25 ` David Laight
2024-05-05 17:31 ` Jens Axboe
2024-05-04 9:45 ` get_file() unsafe under epoll (was Re: [syzbot] [fs?] [io-uring?] general protection fault in __ep_remove) Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox