Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 13 Dec 2020 20:04:42 +0100
To: Jann Horn <>
Cc: Kernel Hardening <>,
Subject: Re: Kernel complexity

Thank you for the extensive response.

On Sat, Dec 12, 2020 at 11:34:12PM +0100, Jann Horn wrote:
> On Sat, Dec 12, 2020 at 9:14 PM <> wrote:
> > Personally I am interested in Linux Kernel Security and especially features supporting attack surface reduction. In the past I did some work on sandboxing features like seccomp support in user space applications. I have been rather hesitant to get involved here, since I am not a full time developer and certainly not an expert in C programming.
> (By the way, one interesting area where upstream development is
> currently happening that's related to userspace sandboxing is the
> Landlock patchset by Mickaël Salaün, which adds an API that allows
> unprivileged processes to restrict their filesystem access without
> having to mess around with stuff like mount namespaces and broker
> processes; the latest version is at
> <>.
> That might be relevant to your interests.)

That sounds very interesting indeed, thank you.

> > However I am currently doing a research project that aims to identify risk areas in the kernel by measuring code complexity metrics and assuming this might help this project, I would like to ask for some feedback in case this work can actually help with this project.
> >
> > My approach is basically to take a look at the different system calls and measure the complexity of the code involved in their execution. Since code complexity has already been found to have a strong correlation with the probability of existing vulnerabilities, this might indicate kernel areas that need a closer look.
> Keep in mind that while system calls are one of the main entry points
> from userspace into the kernel, and the main way in which userspace
> can trigger kernel bugs, syscalls do not necessarily closely
> correspond to specific kernel subsystems.
> For example, system calls like read() and write() can take a gigantic
> number of execution paths because, especially when you take files in
> /proc and /sys into consideration, they interact with things all over
> the place across the kernel. For example, write() can modify page
> tables of other processes, can trigger page allocation and reclaim,
> can modify networking configuration, can interact with filesystems and
> block devices and networking and user namespace configuration and
> pipes, and so on. But the areas that are reachable through this
> syscall depend on other ways in which the process is limited - in
> particular, what kinds of files it can open.
> Also keep in mind that even a simple syscall like getresuid() can,
> through the page fault handling code, end up in subsystems related to
> filesystems, block devices, networking, graphics and so on - so you'd
> probably have to exclude any control flows that go through certain
> pieces of core kernel infrastructure.
> > Additionally the functionality of the syscall will also be considered for a final risk score, although most of the work for this part has already been done in [1].
> That's a paper from 2002 that talks about "UNIX system calls", and
> categorizes syscalls like init_module as being of the highest "threat
> level" even though that syscall does absolutely nothing unless you're
> already root. It also has "denial of service attacks" as the
> second-highest "threat level classification", which I don't think
> makes any sense - I don't think that current OS kernels are designed
> to prevent an attacker with the ability to execute arbitrary syscalls
> from userspace from slowing the system down. Fundamentally it looks to
> me as if it classifies syscalls by the risk caused if you let an
> attacker run arbitrary code in userspace **with root privileges**,
> which seems to me like an extremely silly threat model.
> > The objective is to create a risk score matrix for linux syscalls that consists of the functionality risk according to [1], times the measured complexity.
> I don't understand why you would multiply functionality risk and
> complexity. They're probably more additive than multiplicative, since
> in a per-subsystem view, risk caused by functionality and complexity
> of the implementation are often completely separate. For example, the
> userfaultfd subsystem introduces functionality risk by allowing
> attackers to arbitrarily pause the kernel at any copy_from_user()
> call, but that doesn't combine with the complexity of the userfaultfd
> subsystem, but with the complexity of all copy_from_user() callers
> everywhere across the kernel.
> > This will (hopefully) be helpful to identify risk areas in the kernel and provide user space developers with an measurement that can help design secure software and sandboxing features.
> I'm not sure whether this would really be all that helpful for
> userspace sandboxing decisions - as far as I know, userspace normally
> isn't in a position where it can really choose which syscalls it wants
> to use, but instead the choice of syscalls to use is driven by the
> requirements that userspace has. If you tell userspace that write()
> can hit tons of kernel code, it's not like userspace can just stop
> using write(); and if you then also tell userspace that pwrite() can
> also hit a lot of kernel code, that may be misinterpreted as meaning
> that pwrite() adds lots of risk while actually, write() and pwrite()
> reach (almost) the same areas of code. Also, the areas of code that a
> syscall like write() can hit depend hugely on file system access
> policies.

Some issues I have come across revolve around how much attention the
avoidance of certain system calls should get based on the risk.
Many applications e.g. like "file" include a seccomp filter that
restricts most systemcalls from ever being used, without using a broker
architecture. This is feasible for small applications that do not always
need to do dangerous things like execve or open (for write). 
This decision is however often made without extensive research on what
systemcalls provide dangerous functionality. The idea was to change that
by providing a risk score for systemcalls.

> I also don't think that doing something like this on a per-syscall
> basis would be very beneficial for informing something like priorities
> for auditing kernel code; only a small chunk of the kernel even has
> its own syscalls, while most of it receives commands through
> more-or-less generic syscalls that are then plumbed through.

Thank you for that explanation. I was afraid something like this might
be the case. I suppose I will change my approach to something more
generic and not focus on system calls for the complexity analysis.
Hopefully this will yeild some helpful results.

> > One major aspect I am still not sure about is the challenges regarding the dynamic measure of code path execution. While it is possible to measure the cyclomatic complexity of the kernel code with existing tools, I am not sure how much value the results would have, given that this does not include the dynamic code path behind each syscall. I was thinking of using ftrace to follow and measure the execution path. Any feedback and advise on this for this would be appreciated.

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.