Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 9 Jul 2020 19:26:18 +0200
From: Arnd Bergmann <>
To: Mickaël Salaün <>
Cc: "" <>, Al Viro <>, 
	Andy Lutomirski <>, Anton Ivanov <>, 
	Casey Schaufler <>, James Morris <>, Jann Horn <>, 
	Jeff Dike <>, Jonathan Corbet <>, Kees Cook <>, 
	Michael Kerrisk <>, Mickaël Salaün <>, 
	Richard Weinberger <>, "Serge E . Hallyn" <>, Shuah Khan <>, 
	Vincent Dagonneau <>, 
	Kernel Hardening <>, Linux API <>, 
	linux-arch <>, 
	"open list:DOCUMENTATION" <>, 
	Linux FS-devel Mailing List <>, 
	LSM List <>, 
	"the arch/x86 maintainers" <>
Subject: Re: [PATCH v19 08/12] landlock: Add syscall implementation

On Wed, Jul 8, 2020 at 7:50 PM Mickaël Salaün <> wrote:
> On 08/07/2020 15:49, Arnd Bergmann wrote:
> > On Wed, Jul 8, 2020 at 3:04 PM Mickaël Salaün <> wrote:
> >> On 08/07/2020 10:57, Arnd Bergmann wrote:
> >>> On Tue, Jul 7, 2020 at 8:10 PM Mickaël Salaün <> wrote:
> >>>
> >>> It looks like all you need here today is a single argument bit, plus
> >>> possibly some room for extensibility. I would suggest removing all
> >>> the extra bits and using a syscall like
> >>>
> >>> SYSCALL_DEFINE1(landlock_create_ruleset, u32, flags);
> >>>
> >>> I don't really see how this needs any variable-length arguments,
> >>> it really doesn't do much.
> >>
> >> We need the attr_ptr/attr_size pattern because the number of ruleset
> >> properties will increase (e.g. network access mask).
> >
> > But how many bits do you think you will *actually* need in total that
> > this needs to be a two-dimensional set of flags? At the moment you
> > only have a single bit that you interpret.
> I think there is a misunderstanding. For this syscall I wasn't talking
> about the "options" field but about the "handled_access_fs" field which
> has 14 bits dedicated to control access to the file system:

Ok, got it. I didn't read far enough there.

> The idea is to add other handled_access_* fields for other kernel object
> types (e.g. network, process, etc.).
> The "options" field is fine as a raw __u32 syscall argument.

I'd still like to avoid having it variable-length and structured though.
How about having a __u32 "options" flag, plus an indirect argument
with 32 fixed-length (all 32 bit or all 64 bit) flag words, each of which
corresponds to one of the option bits?

It's still fairly complex that way, but not as much as the version
you have right now that can be extended in multiple dimensions.

This could possibly also help avoid the need for the get_features
syscall: If user space just passes the bitmap of all the access flags
it wants to use in a fixed-size structure, the kernel can update the
bits to mask out the ones it does not understand and write back
that bitmap as the result of create_ruleset().

> >>> To be on the safe side, you might split up the flags into either the
> >>> upper/lower 16 bits or two u32 arguments, to allow both compatible
> >>> (ignored by older kernels if flag is set) and incompatible (return error
> >>> when an unknown flag is set) bits.
> >>
> >> This may be a good idea in general, but in the case of Landlock, because
> >> this kind of (discretionary) sandboxing should be a best-effort security
> >> feature, we should avoid incompatible behavior. In practice, every
> >> unknown bit returns an error because userland can probe for available
> >> bits thanks to the get_features command. This kind of (in)compatibility
> >> can then be handled by userland.
> >
> > If there are not going to be incompatible extensions, then just ignore
> > all unknown bits and never return an error but get rid of the user
> > space probing that just complicates the interface.
> There was multiple discussions about ABI compatibility, especially
> inspired by open(2) vs. openat2(2), and ignoring flags seems to be a bad
> idea. In the "sandboxer" example, we first probe the supported features
> and then mask unknown bits (i.e. access rights) at run time in userland.
> This strategy is quite straightforward, backward compatible and
> future-proof.

For behavior changing flags, I agree they should be seen as
incompatible flags (i.e. return an error if an unknown bit is set).

However, for the flags you pass in in an allowlist, treating them
as compatible (i.e. ignore any unknown flags, allowing everything
you are not forbidding already) seems completely reasonable
to me. Do you foresee user space doing anything other than masking
out the bits that the kernel doesn't know about? If not, then doing
it in the  kernel should always be simpler.

> >> I suggest this syscall signature:
> >> SYSCALL_DEFINE3(landlock_create_ruleset, __u32, options, const struct
> >> landlock_attr_ruleset __user *, ruleset_ptr, size_t, ruleset_size);
> >
> > The other problem here is that indirect variable-size structured arguments
> > are a pain to instrument with things like strace or seccomp, so you
> > should first try to use a fixed argument list, and fall back to a fixed
> > structure if that fails.
> I agree that it is not perfect with the current tools but this kind of
> extensible structs are becoming common and well defined (e.g. openat2).
> Moreover there is some work going on for seccomp to support "extensible
> argument" syscalls:

openat2() is already more complex than we'd ideally want, I think we
should try hard to make new syscalls simpler than that, following the
rule that any interface should be as simple as possible, but no simpler.

> >>>> +static int syscall_add_rule_path_beneath(const void __user *const attr_ptr,
> >>>> +               const size_t attr_size)
> >>>> +{
> >>>> +       struct landlock_attr_path_beneath attr_path_beneath;
> >>>> +       struct path path;
> >>>> +       struct landlock_ruleset *ruleset;
> >>>> +       int err;
> >>>
> >>> Similarly, it looks like this wants to be
> >>>
> >>> SYSCALL_DEFINE3(landlock_add_rule_path_beneath, int, ruleset, int,
> >>> path, __u32, flags)
> >>>
> >>> I don't see any need to extend this in a way that wouldn't already
> >>> be served better by adding another system call. You might argue
> >>> that 'flags' and 'allowed_access' could be separate, with the latter
> >>> being an indirect in/out argument here, like
> >>>
> >>> SYSCALL_DEFINE4(landlock_add_rule_path_beneath, int, ruleset, int, path,
> >>>                            __u64 *, allowed_acces, __u32, flags)
> >>
> >> To avoid adding a new syscall for each new rule type (e.g. path_beneath,
> >> path_range, net_ipv4_range, etc.), I think it would be better to keep
> >> the attr_ptr/attr_size pattern and to explicitely set a dedicated option
> >> flag to specify the attr type.
> >>
> >> This would look like this:
> >> SYSCALL_DEFINE4(landlock_add_rule, __u32, options, int, ruleset, const
> >> void __user *, rule_ptr, size_t, rule_size);
> >>
> >> The rule_ptr could then point to multiple types like struct
> >> landlock_attr_path_beneath (without the current ruleset_fd field).
> >
> > This again introduces variable-sized structured data. How many different
> > kinds of rule types do you think there will be (most likely, and maybe an
> > upper bound)?
> I don't know how many rule types will come, but right now I think it may
> be less than 10.


> > Could (some of) these be generalized to use the same data structure?
> I don't think so, file path and network addresses are an example of very
> different types.

Clearly the target object is something different, but maybe there is
enough commonality to still make them fit into a more regular form.

For the file system case, you have an identify for an object
(the file descriptor) and the  '__u64 allowed_access'. I would
expect that the 'allowed_access' concept is generic enough that
you can make it a direct argument (32 bit register arg, or pointer
to a __u64). Do you expect others to need something besides
an object identifier and a permission bitmask? Maybe it could
be something like

 SYSCALL_DEFINE4(landlock_add_rule, int, ruleset, __u32, options,
                       const void __user *, object, const __u64 __user
*, allowed_access,
                       __u32, flags);

with a fixed-length 'object' identifier type (file descriptor,
sockaddr_storage, ...) for each option.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.