kernel-hardening - [PATCH v4] scripts: add leaking

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1510050731-32446-1-git-send-email-me@tobin.cc>
Date: Tue,  7 Nov 2017 21:32:11 +1100
From: "Tobin C. Harding" <me@...in.cc>
To: kernel-hardening@...ts.openwall.com
Cc: "Tobin C. Harding" <me@...in.cc>,
	"Jason A. Donenfeld" <Jason@...c4.com>,
	Theodore Ts'o <tytso@....edu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Kees Cook <keescook@...omium.org>,
	Paolo Bonzini <pbonzini@...hat.com>,
	Tycho Andersen <tycho@...ker.com>,
	"Roberts, William C" <william.c.roberts@...el.com>,
	Tejun Heo <tj@...nel.org>,
	Jordan Glover <Golden_Miller83@...tonmail.ch>,
	Greg KH <gregkh@...uxfoundation.org>,
	Petr Mladek <pmladek@...e.com>,
	Joe Perches <joe@...ches.com>,
	Ian Campbell <ijc@...lion.org.uk>,
	Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
	Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <wilal.deacon@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Chris Fries <cfries@...gle.com>,
	Dave Weinstein <olorin@...gle.com>,
	Daniel Micay <danielmicay@...il.com>,
	Djalal Harouni <tixxdz@...il.com>,
	linux-kernel@...r.kernel.org,
	Network Development <netdev@...r.kernel.org>,
	David Miller <davem@...emloft.net>
Subject: [PATCH v4] scripts: add leaking_addresses.pl

Currently we are leaking addresses from the kernel to user space. This
script is an attempt to find some of those leakages. Script parses
`dmesg` output and /proc and /sys files for hex strings that look like
kernel addresses.

Only works for 64 bit kernels, the reason being that kernel addresses
on 64 bit kernels have 'ffff' as the leading bit pattern making greping
possible. On 32 kernels we don't have this luxury.

Scripts is _slightly_ smarter than a straight grep, we check for false
positives (all 0's or all 1's, and vsyscall start/finish addresses).

Output is saved to file to expedite repeated formatting/viewing of
output.

Signed-off-by: Tobin C. Harding <me@...in.cc>
---

This version outputs a report instead of the raw results by default. Designing
this proved to be non-trivial, the reason being that it is not immediately clear
what constitutes a duplicate entry (similar message, address range, same
file?). Also, the aim of the report is to assist users _not_ missing correct
results; limiting the output is inherently a trade off between noise and
correct, clear results.

Without testing on various real kernels its not clear that this reporting is any
good, my test cases were a bit contrived. Your usage may vary.

It would be super helpful to get some comments from people running this with
different set ups.

Please feel free to say 'try harder Tobin, this reporting is shit'.

Thanks, appreciate your time,
Tobin.

v4:
 - Add `scan` and `format` sub-commands.
 - Output report by default.
 - Add command line option to send scan results (to me).

v3:
 - Iterate matches to check for results instead of matching input line against
   false positives i.e catch lines that contain results as well as false
   positives.

v2:
 - Add regex's to prevent false positives.
 - Clean up white space.

 MAINTAINERS                  |   5 +
 scripts/leaking_addresses.pl | 437 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 442 insertions(+)
 create mode 100755 scripts/leaking_addresses.pl

diff --git a/MAINTAINERS b/MAINTAINERS
index 2f4e462aa4a2..a7995c737728 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7745,6 +7745,11 @@ S:	Maintained
 F:	Documentation/scsi/53c700.txt
 F:	drivers/scsi/53c700*
 
+LEAKING_ADDRESSES
+M:	Tobin C. Harding <me@...in.cc>
+S:	Maintained
+F:	scripts/leaking_addresses.pl
+
 LED SUBSYSTEM
 M:	Richard Purdie <rpurdie@...ys.net>
 M:	Jacek Anaszewski <jacek.anaszewski@...il.com>
diff --git a/scripts/leaking_addresses.pl b/scripts/leaking_addresses.pl
new file mode 100755
index 000000000000..282c0cc2bdea
--- /dev/null
+++ b/scripts/leaking_addresses.pl
@@ -0,0 +1,437 @@
+#!/usr/bin/env perl
+#
+# (c) 2017 Tobin C. Harding <me@...in.cc>
+# Licensed under the terms of the GNU GPL License version 2
+#
+# leaking_addresses.pl: Scan 64 bit kernel for potential leaking addresses.
+#  - Scans dmesg output.
+#  - Walks directory tree and parses each file (for each directory in @DIRS).
+#
+# Use --debug to output path before parsing, this is useful to find files that
+# cause the script to choke.
+#
+# You may like to set kptr_restrict=2 before running script
+# (see Documentation/sysctl/kernel.txt).
+
+use warnings;
+use strict;
+use POSIX;
+use File::Basename;
+use File::Spec;
+use Cwd 'abs_path';
+use Term::ANSIColor qw(:constants);
+use Getopt::Long qw(:config no_auto_abbrev);
+use File::Spec::Functions 'catfile';
+
+my $P = $0;
+my $V = '0.01';
+
+# Directories to scan (we scan `dmesg` also).
+my @DIRS = ('/proc', '/sys');
+
+# Output path for raw scan data, set by set_ouput_path().
+my $OUTPUT = "";
+
+# Command line options.
+my $output = "";
+my $suppress_dmesg = 0;
+my $squash_by_path = 0;
+my $raw = 0;
+my $send_report = 0;
+my $help = 0;
+my $debug = 0;
+
+# Do not parse these files (absolute path).
+my @skip_parse_files_abs = ('/proc/kmsg',
+			    '/proc/kcore',
+			    '/proc/fs/ext4/sdb1/mb_groups',
+			    '/proc/1/fd/3',
+			    '/sys/kernel/debug/tracing/trace_pipe',
+			    '/sys/kernel/security/apparmor/revision');
+
+# Do not parse thes files under any subdirectory.
+my @skip_parse_files_any = ('0',
+			    '1',
+			    '2',
+			    'pagemap',
+			    'events',
+			    'access',
+			    'registers',
+			    'snapshot_raw',
+			    'trace_pipe_raw',
+			    'ptmx',
+			    'trace_pipe');
+
+# Do not walk these directories (absolute path).
+my @skip_walk_dirs_abs = ();
+
+# Do not walk these directories under any subdirectory.
+my @skip_walk_dirs_any = ('self',
+			  'thread-self',
+			  'cwd',
+			  'fd',
+			  'stderr',
+			  'stdin',
+			  'stdout');
+
+sub help
+{
+	my ($exitcode) = @_;
+
+	print << "EOM";
+Usage: $P COMMAND [OPTIONS]
+Version: $V
+
+Commands:
+
+	scan	Scan the kernel (savesg raw results to file and runs `format`).
+	format	Parse results file and format output.
+
+Options:
+	-o, --output=<path>	 Accepts absolute or relative filename or directory name.
+	    --suppress-dmesg	 Don't show dmesg results.
+	    --squash-by-path	 Show one result per unique path.
+	    --raw	 	 Show raw results.
+	    --send-report	 Submit raw results for someone else to worry about.
+	-d, --debug              Display debugging output.
+	-h, --help, --version    Display this help and exit.
+
+Scans the running (64 bit) kernel for potential leaking addresses.
+}
+
+EOM
+	exit($exitcode);
+}
+
+GetOptions(
+        'o|output=s'		=> \$output,
+        'suppress-dmesg'	=> \$suppress_dmesg,
+        'squash-by-path'	=> \$squash_by_path,
+        'raw'			=> \$raw,
+        'send-report'		=> \$send_report,
+        'd|debug'		=> \$debug,
+        'h|help'		=> \$help,
+        'version'		=> \$help
+) or help(1);
+
+help(0) if ($help);
+
+my ($command) = @ARGV;
+if (not defined $command) {
+        help(128);
+}
+
+set_output_path($output);
+
+if ($command ne 'scan' and $command ne 'format') {
+        printf "\nUnknown command: %s\n\n", $command;
+        help(128);
+}
+
+if ($command eq 'scan') {
+        scan();
+}
+
+if ($send_report) {
+        send_report();
+        print "Raw scan results sent, thank you.\n";
+        exit(0);
+}
+
+format_output();
+
+exit 0;
+
+sub dprint
+{
+	printf(STDERR @_) if $debug;
+}
+
+# Sets global $OUTPUT, defaults to "./scan.out"
+# Accepts relative or absolute path (directory name or filename).
+sub set_output_path
+{
+        my ($path) = @_;
+        my $def_filename = "scan.out";
+        my $def_dirname = getcwd();
+
+        if ($path eq "") {
+                $OUTPUT = catfile($def_dirname, $def_filename);
+                return;
+        }
+
+        my($filename, $dirs, $suffix) = fileparse($path);
+
+        if ($filename eq "") {
+                $OUTPUT = catfile($dirs, $def_filename);
+        } elsif ($filename) {
+                $OUTPUT = catfile($dirs, $filename);
+        }
+}
+
+sub scan
+{
+        open (my $fh, '>', "$OUTPUT") or die "Cannot open $OUTPUT\n";
+        select $fh;
+
+        parse_dmesg();
+        walk(@DIRS);
+
+        select STDOUT;
+}
+
+sub send_report
+{
+        system("mail -s 'LEAK REPORT' leaks\@tobin.cc < $OUTPUT");
+}
+
+sub parse_dmesg
+{
+	open my $cmd, '-|', 'dmesg';
+	while (<$cmd>) {
+		if (may_leak_address($_)) {
+			print 'dmesg: ' . $_;
+		}
+	}
+	close $cmd;
+}
+
+# Recursively walk directory tree.
+sub walk
+{
+	my @dirs = @_;
+	my %seen;
+
+	while (my $pwd = shift @dirs) {
+		next if (skip_walk($pwd));
+		next if (!opendir(DIR, $pwd));
+		my @files = readdir(DIR);
+		closedir(DIR);
+
+		foreach my $file (@files) {
+			next if ($file eq '.' or $file eq '..');
+
+			my $path = "$pwd/$file";
+			next if (-l $path);
+
+			if (-d $path) {
+				push @dirs, $path;
+			} else {
+				parse_file($path);
+			}
+		}
+	}
+}
+
+# True if argument potentially contains a kernel address.
+sub may_leak_address
+{
+        my ($line) = @_;
+
+        my @addresses = extract_addresses($line);
+        return @addresses > 0;
+}
+
+# Return _all_ non false positive addresses from $line.
+sub extract_addresses
+{
+        my ($line) = @_;
+        my $address = '\b(0x)?ffff[[:xdigit:]]{12}\b';
+        my (@addresses, @empty);
+
+        # Signal masks.
+        if ($line =~ '^SigBlk:' or
+            $line =~ '^SigCgt:') {
+                return @empty;
+        }
+
+        if ($line =~ '\bKEY=[[:xdigit:]]{14} [[:xdigit:]]{16} [[:xdigit:]]{16}\b' or
+            $line =~ '\b[[:xdigit:]]{14} [[:xdigit:]]{16} [[:xdigit:]]{16}\b') {
+                return @empty;
+        }
+
+        while ($line =~ /($address)/g) {
+                if (!is_false_positive($1)) {
+                        push @addresses, $1;
+                }
+        }
+
+        return @addresses;
+}
+
+# True if we should skip walking this directory.
+sub skip_walk
+{
+	my ($path) = @_;
+	return skip($path, \@skip_walk_dirs_abs, \@skip_walk_dirs_any)
+}
+
+sub parse_file
+{
+	my ($file) = @_;
+
+	if (! -R $file) {
+		return;
+	}
+
+	if (skip_parse($file)) {
+		dprint "skipping file: $file\n";
+		return;
+	}
+	dprint "parsing: $file\n";
+
+	open my $fh, "<", $file or return;
+	while ( <$fh> ) {
+		if (may_leak_address($_)) {
+			print $file . ': ' . $_;
+		}
+	}
+	close $fh;
+}
+
+sub is_false_positive
+{
+        my ($match) = @_;
+
+        if ($match =~ '\b(0x)?(f|F){16}\b' or
+            $match =~ '\b(0x)?0{16}\b') {
+                return 1;
+        }
+
+        # vsyscall memory region, we should probably check against a range here.
+        if ($match =~ '\bf{10}600000\b' or
+            $match =~ '\bf{10}601000\b') {
+                return 1;
+        }
+
+        return 0;
+}
+
+# True if we should skip this path.
+sub skip
+{
+	my ($path, $paths_abs, $paths_any) = @_;
+
+	foreach (@$paths_abs) {
+		return 1 if (/^$path$/);
+	}
+
+	my($filename, $dirs, $suffix) = fileparse($path);
+	foreach (@$paths_any) {
+		return 1 if (/^$filename$/);
+	}
+
+	return 0;
+}
+
+sub skip_parse
+{
+	my ($path) = @_;
+	return skip($path, \@skip_parse_files_abs, \@skip_parse_files_any);
+}
+
+sub format_output
+{
+        if ($raw) {
+                dump_raw_output();
+                return;
+        }
+
+        my ($total, $dmesg, $paths, $files) = parse_raw_file();
+
+        printf "\nTotal number of results from scan (incl dmesg): %d\n", $total;
+
+        if (!$suppress_dmesg) {
+                print_dmesg($dmesg);
+        }
+        squash_by($files, 'filename');
+
+        if ($squash_by_path) {
+                squash_by($paths, 'path');
+        }
+}
+
+sub dump_raw_output
+{
+        open (my $fh, '<', $OUTPUT) or die "Cannot open $OUTPUT\n";
+        while (<$fh>) {
+                print $_;
+        }
+        close $fh;
+}
+
+sub print_dmesg
+{
+        my ($dmesg) = @_;
+
+        print "\ndmesg output:\n";
+        foreach(@$dmesg) {
+                my $index = index($_, ':');
+                $index += 2;    # skid ': '
+                print substr($_, $index);
+        }
+}
+
+sub squash_by
+{
+        my ($ref, $desc) = @_;
+
+        print "\nResults squashed by $desc (excl dmesg). ";
+        print "Displaying <number of results>, <$desc>, <example result>\n";
+        foreach(keys %$ref) {
+                my $lines = $ref->{$_};
+                my $length = @$lines;
+                printf "[%d %s] %s", $length, $_, @$lines[0];
+        }
+}
+
+sub parse_raw_file
+{
+        my $total = 0;          # Total number of lines parsed.
+        my @dmesg;              # dmesg output.
+        my %files;              # Unique filenames containing leaks.
+        my %paths;              # Unique paths containing leaks.
+
+        open (my $fh, '<', $OUTPUT) or die "Cannot open $OUTPUT\n";
+
+        while (my $line = <$fh>) {
+                $total++;
+
+                if ("dmesg:" eq substr($line, 0, 6)) {
+                        push @dmesg, $line;
+                        next;
+                }
+
+                cache_path(\%paths, $line);
+                cache_filename(\%files, $line);
+        }
+
+        return $total, \@dmesg, \%paths, \%files;
+}
+
+sub cache_path
+{
+        my ($paths, $line) = @_;
+
+        my $index = index($line, ':');
+        my $path = substr($line, 0, $index);
+
+        if (!$paths->{$path}) {
+                $paths->{$path} = ();
+        }
+        push @{$paths->{$path}}, $line;
+}
+
+sub cache_filename
+{
+        my ($files, $line) = @_;
+
+        my $index = index($line, ':');
+        my $path = substr($line, 0, $index);
+        my $filename = basename($path);
+        if (!$files->{$filename}) {
+                $files->{$filename} = ();
+        }
+        $index += 2;            # skip ': '
+        push @{$files->{$filename}}, substr($line, $index);
+}
-- 
2.7.4
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.