Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210302183032.GA3049@ubuntu>
Date: Tue, 2 Mar 2021 19:31:05 +0100
From: John Wood <john.wood@....com>
To: Andi Kleen <ak@...ux.intel.com>
Cc: John Wood <john.wood@....com>, Kees Cook <keescook@...omium.org>,
	Jann Horn <jannh@...gle.com>, Randy Dunlap <rdunlap@...radead.org>,
	Jonathan Corbet <corbet@....net>, James Morris <jmorris@...ei.org>,
	Shuah Khan <shuah@...nel.org>, "Serge E. Hallyn" <serge@...lyn.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-security-module@...r.kernel.org,
	linux-kselftest@...r.kernel.org,
	kernel-hardening@...ts.openwall.com
Subject: Re: [PATCH v5 7/8] Documentation: Add documentation for the Brute LSM

On Sun, Feb 28, 2021 at 10:56:45AM -0800, Andi Kleen wrote:
> John Wood <john.wood@....com> writes:
> > +
> > +To detect a brute force attack it is necessary that the statistics shared by all
> > +the fork hierarchy processes be updated in every fatal crash and the most
> > +important data to update is the application crash period.
>
> So I haven't really followed the discussion and also not completely read
> the patches (so apologies if that was already explained or is documented
> somewhere else).
>
> But what I'm missing here is some indication how much
> memory these statistics can use up and how are they limited.

The statistics shared by all the fork hierarchy processes are hold by the
"brute_stats" struct.

struct brute_cred {
	kuid_t uid;
	kgid_t gid;
	kuid_t suid;
	kgid_t sgid;
	kuid_t euid;
	kgid_t egid;
	kuid_t fsuid;
	kgid_t fsgid;
};

struct brute_stats {
	spinlock_t lock;
	refcount_t refc;
	unsigned char faults;
	u64 jiffies;
	u64 period;
	struct brute_cred saved_cred;
	unsigned char network : 1;
	unsigned char bounds_crossed : 1;
};

This is a fixed size struct where, in every process crash (due to a fatal
signal), the application crash period (period field) is updated on an on going
basis using an exponential moving average (this way it is not necessary to save
the old crash period values). The jiffies and faults fields complete the
statistics basic info. The saved_cred field is used to fine tuning the detection
(detect if the privileges have changed) and also is fixed size. And the
remaining flags are also used to narrow the detection (detect if a privilege
boundary has been crossed).

> How much is the worst case extra memory consumption?

In every fork system call the parent statistics are shared with the child
process. In every execve system call a new brute_stats struct is allocated. So,
only one brute_stats struct is allocated for every fork hierarchy (hierarchy of
processes from the execve system call). The more processes are running, the more
memory will be used.

> If there is no limit how is DoS prevented?
>
> If there is a limit, there likely needs to be a way to throw out
> information, and so the attack would just shift to forcing the kernel
> to throw out this information before retrying.
>
> e.g. if the data is hold for the parent shell: restart the parent
> shell all the time.
> e.g. if the data is hold for the sshd daemon used to log in:
> Somehow cause sshd to respawn to discard the statistics.

When a process crashes due to a fatal signal delivered by the kernel (with some
user signal exceptions) the statistics shared by this process with all the fork
hierarchy processes are updated. This allow us to detect a brute force attack
through the "fork" system call. If these statistics show a fast crash rate a
mitigation is triggered. Also, these statistics are removed when all the
processes in this hierarchy have finished.

But at the same time these statistics are updated, also are updated the
statistics of the parent fork hierarchy (the statistics shared by the process
that "exec" the child process annotated in the last paragraph). This way a brute
force attack through the "execve" system call can be detected. Also, if this
new statistics show a fast crash rate the mitigation is triggered.

> Do I miss something here? How is that mitigated?

As a mitigation method, all the offending tasks involved in the attack are
killed. Or in other words, all the tasks that share the same statistics
(statistics showing a fast crash rate) are killed.

> Instead of discussing all the low level tedious details of the
> statistics it would be better to focus on these "high level"
> problems here.

Thanks for the advise. I will improve the documentation adding these high level
details.

> -Andi
>

I hope this info clarify your questions. If not, I will try again.

Thanks,
John Wood

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.