Re: [PATCH v1 2/3] x86/entry/64: Add info about registers on exit

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Andy Lutomirski <[email protected]>
To: Ammar Faizi <[email protected]>,
	Thomas Gleixner <[email protected]>,
	Ingo Molnar <[email protected]>, Borislav Petkov <[email protected]>,
	Dave Hansen <[email protected]>,
	"H. Peter Anvin" <[email protected]>
Cc: "H.J. Lu" <[email protected]>, Michael Matz <[email protected]>,
	GNU/Weeb Mailing List <[email protected]>, x86-ml <[email protected]>,
	lkml <[email protected]>, Willy Tarreau <[email protected]>
Subject: Re: [PATCH v1 2/3] x86/entry/64: Add info about registers on exit
Date: Fri, 7 Jan 2022 16:03:46 -0800	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 1/7/22 15:52, Ammar Faizi wrote:
> There was a controversial discussion about the wording in the System
> V ABI document regarding what registers the kernel is allowed to
> clobber when the userspace executes syscall.
> 
> The resolution of the discussion was reviewing the clobber list in
> the glibc source. For a historical reason in the glibc source, the
> kernel must restore all registers before returning to the userspace
> (except for rax, rcx and r11).
> 
> Link: https://lore.kernel.org/lkml/[email protected]/
> Link: https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/25
> 
> This adds info about registers on exit.
> 
> Cc: Andy Lutomirski <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Michael Matz <[email protected]>
> Cc: "H.J. Lu" <[email protected]>
> Cc: Willy Tarreau <[email protected]>
> Cc: x86-ml <[email protected]>
> Cc: lkml <[email protected]>
> Cc: GNU/Weeb Mailing List <[email protected]>
> Signed-off-by: Ammar Faizi <[email protected]>
> ---
> 
> Quoted the full comment in that file after patched, so it's easier to
> review:
> /*
>   * 64-bit SYSCALL instruction entry. Up to 6 arguments in registers.
>   *
>   * This is the only entry point used for 64-bit system calls.  The
>   * hardware interface is reasonably well designed and the register to
>   * argument mapping Linux uses fits well with the registers that are
>   * available when SYSCALL is used.
>   *
>   * SYSCALL instructions can be found inlined in libc implementations as
>   * well as some other programs and libraries.  There are also a handful
>   * of SYSCALL instructions in the vDSO used, for example, as a
>   * clock_gettimeofday fallback.
>   *
>   * 64-bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11,
>   * then loads new ss, cs, and rip from previously programmed MSRs.
>   * rflags gets masked by a value from another MSR (so CLD and CLAC
>   * are not needed). SYSCALL does not save anything on the stack
>   * and does not change rsp.
>   *
>   * Registers on entry:
>   * rax  system call number
>   * rcx  return address
>   * r11  saved rflags (note: r11 is callee-clobbered register in C ABI)
>   * rdi  arg0
>   * rsi  arg1
>   * rdx  arg2
>   * r10  arg3 (needs to be moved to rcx to conform to C ABI)
>   * r8   arg4
>   * r9   arg5
>   * (note: r12-r15, rbp, rbx are callee-preserved in C ABI)
>   *
>   * Only called from user space.
>   *
>   * Registers on exit:
>   * rax  syscall return value
>   * rcx  return address
>   * r11  rflags
>   *
>   * For a historical reason in the glibc source, the kernel must restore all
>   * registers except the rax (syscall return value) before returning to the
>   * userspace.
>   *
>   * In other words, with respect to the userspace, when the kernel returns
>   * to the userspace, only 3 registers are clobbered, they are rax, rcx,
>   * and r11.
>   *
>   * When user can change pt_regs->foo always force IRET. That is because
>   * it deals with uncanonical addresses better. SYSRET has trouble
>   * with them due to bugs in both AMD and Intel CPUs.
>   */
> 
> ---
> 
>   arch/x86/entry/entry_64.S | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index e432dd075291..1111fff2e05f 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -79,6 +79,19 @@
>    *
>    * Only called from user space.
>    *
> + * Registers on exit:
> + * rax  syscall return value
> + * rcx  return address
> + * r11  rflags
> + *
> + * For a historical reason in the glibc source, the kernel must restore all
> + * registers except the rax (syscall return value) before returning to the
> + * userspace.
> + *
> + * In other words, with respect to the userspace, when the kernel returns
> + * to the userspace, only 3 registers are clobbered, they are rax, rcx,
> + * and r11.
> + *

I would say this much more concisely:

The Linux kernel preserves all registers (even C callee-clobbered 
registers) except for rax, rcx and r11 across system calls, and existing 
user code relies on this behavior.

>    * When user can change pt_regs->foo always force IRET. That is because
>    * it deals with uncanonical addresses better. SYSRET has trouble
>    * with them due to bugs in both AMD and Intel CPUs.
> 

-- 
GWML mailing list
[email protected]
https://gwml.gnuweeb.org/listinfo/gwml

next prev parent reply	other threads:[~2022-01-08  0:03 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-07 23:52 [PATCH v1 0/3] x86-64 entry documentation and clean up Ammar Faizi
2022-01-07 23:52 ` [PATCH v1 1/3] x86/entry/64: Clean up spaces after the instruction Ammar Faizi
2022-01-07 23:52 ` [PATCH v1 2/3] x86/entry/64: Add info about registers on exit Ammar Faizi
2022-01-08  0:03   ` Andy Lutomirski [this message]
2022-01-08  0:34     ` Ammar Faizi
2022-01-07 23:52 ` [PATCH v1 3/3] Documentation: x86-64: Document registers on entry and exit Ammar Faizi
2022-01-08  0:02   ` Andy Lutomirski
2022-01-08  0:38     ` Ammar Faizi
2022-01-21 13:32     ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox