Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALmYWFscz5W6xSXD-+dimzbj=TykNJEDa0m5gvBx93N-J+3nKA@mail.gmail.com>
Date: Mon, 8 Jul 2024 09:17:32 -0700
From: Jeff Xu <jeffxu@...gle.com>
To: Mickaël Salaün <mic@...ikod.net>
Cc: Al Viro <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>, 
	Kees Cook <keescook@...omium.org>, Linus Torvalds <torvalds@...ux-foundation.org>, 
	Paul Moore <paul@...l-moore.com>, "Theodore Ts'o" <tytso@....edu>, 
	Alejandro Colomar <alx.manpages@...il.com>, Aleksa Sarai <cyphar@...har.com>, 
	Andrew Morton <akpm@...ux-foundation.org>, Andy Lutomirski <luto@...nel.org>, 
	Arnd Bergmann <arnd@...db.de>, Casey Schaufler <casey@...aufler-ca.com>, 
	Christian Heimes <christian@...hon.org>, Dmitry Vyukov <dvyukov@...gle.com>, 
	Eric Biggers <ebiggers@...nel.org>, Eric Chiang <ericchiang@...gle.com>, 
	Fan Wu <wufan@...ux.microsoft.com>, Florian Weimer <fweimer@...hat.com>, 
	Geert Uytterhoeven <geert@...ux-m68k.org>, James Morris <jamorris@...ux.microsoft.com>, 
	Jan Kara <jack@...e.cz>, Jann Horn <jannh@...gle.com>, Jonathan Corbet <corbet@....net>, 
	Jordan R Abrahams <ajordanr@...gle.com>, Lakshmi Ramasubramanian <nramas@...ux.microsoft.com>, 
	Luca Boccassi <bluca@...ian.org>, Luis Chamberlain <mcgrof@...nel.org>, 
	"Madhavan T . Venkataraman" <madvenka@...ux.microsoft.com>, Matt Bobrowski <mattbobrowski@...gle.com>, 
	Matthew Garrett <mjg59@...f.ucam.org>, Matthew Wilcox <willy@...radead.org>, 
	Miklos Szeredi <mszeredi@...hat.com>, Mimi Zohar <zohar@...ux.ibm.com>, 
	Nicolas Bouchinet <nicolas.bouchinet@....gouv.fr>, Scott Shell <scottsh@...rosoft.com>, 
	Shuah Khan <shuah@...nel.org>, Stephen Rothwell <sfr@...b.auug.org.au>, 
	Steve Dower <steve.dower@...hon.org>, Steve Grubb <sgrubb@...hat.com>, 
	Thibaut Sautereau <thibaut.sautereau@....gouv.fr>, 
	Vincent Strubel <vincent.strubel@....gouv.fr>, Xiaoming Ni <nixiaoming@...wei.com>, 
	Yin Fengwei <fengwei.yin@...el.com>, kernel-hardening@...ts.openwall.com, 
	linux-api@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	linux-integrity@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-security-module@...r.kernel.org
Subject: Re: [RFC PATCH v19 2/5] security: Add new SHOULD_EXEC_CHECK and
 SHOULD_EXEC_RESTRICT securebits

Hi

On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@...ikod.net> wrote:
>
> These new SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT, and
> their *_LOCKED counterparts are designed to be set by processes setting
> up an execution environment, such as a user session, a container, or a
> security sandbox.  Like seccomp filters or Landlock domains, the
> securebits are inherited across proceses.
>
> When SECBIT_SHOULD_EXEC_CHECK is set, programs interpreting code should
> check executable resources with execveat(2) + AT_CHECK (see previous
> patch).
>
> When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow
> execution of approved resources, if any (see SECBIT_SHOULD_EXEC_CHECK).
>
Do we need both bits ?
When CHECK is set and RESTRICT is not, the "check fail" executable
will still get executed, so CHECK is for logging ?
Does RESTRICT imply CHECK is set, e.g. What if CHECK=0 and RESTRICT = 1 ?

> For a secure environment, we might also want
> SECBIT_SHOULD_EXEC_CHECK_LOCKED and SECBIT_SHOULD_EXEC_RESTRICT_LOCKED
> to be set.  For a test environment (e.g. testing on a fleet to identify
> potential issues), only the SECBIT_SHOULD_EXEC_CHECK* bits can be set to
> still be able to identify potential issues (e.g. with interpreters logs
> or LSMs audit entries).
>
> It should be noted that unlike other security bits, the
> SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT bits are
> dedicated to user space willing to restrict itself.  Because of that,
> they only make sense in the context of a trusted environment (e.g.
> sandbox, container, user session, full system) where the process
> changing its behavior (according to these bits) and all its parent
> processes are trusted.  Otherwise, any parent process could just execute
> its own malicious code (interpreting a script or not), or even enforce a
> seccomp filter to mask these bits.
>
> Such a secure environment can be achieved with an appropriate access
> control policy (e.g. mount's noexec option, file access rights, LSM
> configuration) and an enlighten ld.so checking that libraries are
> allowed for execution e.g., to protect against illegitimate use of
> LD_PRELOAD.
>
> Scripts may need some changes to deal with untrusted data (e.g. stdin,
> environment variables), but that is outside the scope of the kernel.
>
> The only restriction enforced by the kernel is the right to ptrace
> another process.  Processes are denied to ptrace less restricted ones,
> unless the tracer has CAP_SYS_PTRACE.  This is mainly a safeguard to
> avoid trivial privilege escalations e.g., by a debugging process being
> abused with a confused deputy attack.
>
> Cc: Al Viro <viro@...iv.linux.org.uk>
> Cc: Christian Brauner <brauner@...nel.org>
> Cc: Kees Cook <keescook@...omium.org>
> Cc: Paul Moore <paul@...l-moore.com>
> Signed-off-by: Mickaël Salaün <mic@...ikod.net>
> Link: https://lore.kernel.org/r/20240704190137.696169-3-mic@digikod.net
> ---
>
> New design since v18:
> https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> ---
>  include/uapi/linux/securebits.h | 56 ++++++++++++++++++++++++++++-
>  security/commoncap.c            | 63 ++++++++++++++++++++++++++++-----
>  2 files changed, 110 insertions(+), 9 deletions(-)
>
> diff --git a/include/uapi/linux/securebits.h b/include/uapi/linux/securebits.h
> index d6d98877ff1a..3fdb0382718b 100644
> --- a/include/uapi/linux/securebits.h
> +++ b/include/uapi/linux/securebits.h
> @@ -52,10 +52,64 @@
>  #define SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED \
>                         (issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE_LOCKED))
>
> +/*
> + * When SECBIT_SHOULD_EXEC_CHECK is set, a process should check all executable
> + * files with execveat(2) + AT_CHECK.  However, such check should only be
> + * performed if all to-be-executed code only comes from regular files.  For
> + * instance, if a script interpreter is called with both a script snipped as
> + * argument and a regular file, the interpreter should not check any file.
> + * Doing otherwise would mislead the kernel to think that only the script file
> + * is being executed, which could for instance lead to unexpected permission
> + * change and break current use cases.
> + *
> + * This secure bit may be set by user session managers, service managers,
> + * container runtimes, sandboxer tools...  Except for test environments, the
> + * related SECBIT_SHOULD_EXEC_CHECK_LOCKED bit should also be set.
> + *
> + * Ptracing another process is deny if the tracer has SECBIT_SHOULD_EXEC_CHECK
> + * but not the tracee.  SECBIT_SHOULD_EXEC_CHECK_LOCKED also checked.
> + */
> +#define SECURE_SHOULD_EXEC_CHECK               8
> +#define SECURE_SHOULD_EXEC_CHECK_LOCKED                9  /* make bit-8 immutable */
> +
> +#define SECBIT_SHOULD_EXEC_CHECK (issecure_mask(SECURE_SHOULD_EXEC_CHECK))
> +#define SECBIT_SHOULD_EXEC_CHECK_LOCKED \
> +                       (issecure_mask(SECURE_SHOULD_EXEC_CHECK_LOCKED))
> +
> +/*
> + * When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow
> + * execution of approved files, if any (see SECBIT_SHOULD_EXEC_CHECK).  For
> + * instance, script interpreters called with a script snippet as argument
> + * should always deny such execution if SECBIT_SHOULD_EXEC_RESTRICT is set.
> + * However, if a script interpreter is called with both
> + * SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT, they should
> + * interpret the provided script files if no unchecked code is also provided
> + * (e.g. directly as argument).
> + *
> + * This secure bit may be set by user session managers, service managers,
> + * container runtimes, sandboxer tools...  Except for test environments, the
> + * related SECBIT_SHOULD_EXEC_RESTRICT_LOCKED bit should also be set.
> + *
> + * Ptracing another process is deny if the tracer has
> + * SECBIT_SHOULD_EXEC_RESTRICT but not the tracee.
> + * SECBIT_SHOULD_EXEC_RESTRICT_LOCKED is also checked.
> + */
> +#define SECURE_SHOULD_EXEC_RESTRICT            10
> +#define SECURE_SHOULD_EXEC_RESTRICT_LOCKED     11  /* make bit-8 immutable */
> +
> +#define SECBIT_SHOULD_EXEC_RESTRICT (issecure_mask(SECURE_SHOULD_EXEC_RESTRICT))
> +#define SECBIT_SHOULD_EXEC_RESTRICT_LOCKED \
> +                       (issecure_mask(SECURE_SHOULD_EXEC_RESTRICT_LOCKED))
> +
>  #define SECURE_ALL_BITS                (issecure_mask(SECURE_NOROOT) | \
>                                  issecure_mask(SECURE_NO_SETUID_FIXUP) | \
>                                  issecure_mask(SECURE_KEEP_CAPS) | \
> -                                issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE))
> +                                issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE) | \
> +                                issecure_mask(SECURE_SHOULD_EXEC_CHECK) | \
> +                                issecure_mask(SECURE_SHOULD_EXEC_RESTRICT))
>  #define SECURE_ALL_LOCKS       (SECURE_ALL_BITS << 1)
>
> +#define SECURE_ALL_UNPRIVILEGED (issecure_mask(SECURE_SHOULD_EXEC_CHECK) | \
> +                                issecure_mask(SECURE_SHOULD_EXEC_RESTRICT))
> +
>  #endif /* _UAPI_LINUX_SECUREBITS_H */
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 162d96b3a676..34b4493e2a69 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -117,6 +117,33 @@ int cap_settime(const struct timespec64 *ts, const struct timezone *tz)
>         return 0;
>  }
>
> +static bool ptrace_secbits_allowed(const struct cred *tracer,
> +                                  const struct cred *tracee)
> +{
> +       const unsigned long tracer_secbits = SECURE_ALL_UNPRIVILEGED &
> +                                            tracer->securebits;
> +       const unsigned long tracee_secbits = SECURE_ALL_UNPRIVILEGED &
> +                                            tracee->securebits;
> +       /* Ignores locking of unset secure bits (cf. SECURE_ALL_LOCKS). */
> +       const unsigned long tracer_locked = (tracer_secbits << 1) &
> +                                           tracer->securebits;
> +       const unsigned long tracee_locked = (tracee_secbits << 1) &
> +                                           tracee->securebits;
> +
> +       /* The tracee must not have less constraints than the tracer. */
> +       if ((tracer_secbits | tracee_secbits) != tracee_secbits)
> +               return false;
> +
> +       /*
> +        * Makes sure that the tracer's locks for restrictions are the same for
> +        * the tracee.
> +        */
> +       if ((tracer_locked | tracee_locked) != tracee_locked)
> +               return false;
> +
> +       return true;
> +}
> +
>  /**
>   * cap_ptrace_access_check - Determine whether the current process may access
>   *                        another
> @@ -146,7 +173,8 @@ int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
>         else
>                 caller_caps = &cred->cap_permitted;
>         if (cred->user_ns == child_cred->user_ns &&
> -           cap_issubset(child_cred->cap_permitted, *caller_caps))
> +           cap_issubset(child_cred->cap_permitted, *caller_caps) &&
> +           ptrace_secbits_allowed(cred, child_cred))
>                 goto out;
>         if (ns_capable(child_cred->user_ns, CAP_SYS_PTRACE))
>                 goto out;
> @@ -178,7 +206,8 @@ int cap_ptrace_traceme(struct task_struct *parent)
>         cred = __task_cred(parent);
>         child_cred = current_cred();
>         if (cred->user_ns == child_cred->user_ns &&
> -           cap_issubset(child_cred->cap_permitted, cred->cap_permitted))
> +           cap_issubset(child_cred->cap_permitted, cred->cap_permitted) &&
> +           ptrace_secbits_allowed(cred, child_cred))
>                 goto out;
>         if (has_ns_capability(parent, child_cred->user_ns, CAP_SYS_PTRACE))
>                 goto out;
> @@ -1302,21 +1331,39 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
>                      & (old->securebits ^ arg2))                        /*[1]*/
>                     || ((old->securebits & SECURE_ALL_LOCKS & ~arg2))   /*[2]*/
>                     || (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS))   /*[3]*/
> -                   || (cap_capable(current_cred(),
> -                                   current_cred()->user_ns,
> -                                   CAP_SETPCAP,
> -                                   CAP_OPT_NONE) != 0)                 /*[4]*/
>                         /*
>                          * [1] no changing of bits that are locked
>                          * [2] no unlocking of locks
>                          * [3] no setting of unsupported bits
> -                        * [4] doing anything requires privilege (go read about
> -                        *     the "sendmail capabilities bug")
>                          */
>                     )
>                         /* cannot change a locked bit */
>                         return -EPERM;
>
> +               /*
> +                * Doing anything requires privilege (go read about the
> +                * "sendmail capabilities bug"), except for unprivileged bits.
> +                * Indeed, the SECURE_ALL_UNPRIVILEGED bits are not
> +                * restrictions enforced by the kernel but by user space on
> +                * itself.  The kernel is only in charge of protecting against
> +                * privilege escalation with ptrace protections.
> +                */
> +               if (cap_capable(current_cred(), current_cred()->user_ns,
> +                               CAP_SETPCAP, CAP_OPT_NONE) != 0) {
> +                       const unsigned long unpriv_and_locks =
> +                               SECURE_ALL_UNPRIVILEGED |
> +                               SECURE_ALL_UNPRIVILEGED << 1;
> +                       const unsigned long changed = old->securebits ^ arg2;
> +
> +                       /* For legacy reason, denies non-change. */
> +                       if (!changed)
> +                               return -EPERM;
> +
> +                       /* Denies privileged changes. */
> +                       if (changed & ~unpriv_and_locks)
> +                               return -EPERM;
> +               }
> +
>                 new = prepare_creds();
>                 if (!new)
>                         return -ENOMEM;
> --
> 2.45.2
>

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.