Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240427234834.c0219029-fe37-49ef-a563-4d24eea118c2@korelogic.com>
Date: Sun, 28 Apr 2024 00:34:36 -0600
From: Hank Leininger <hlein@...elogic.com>
To: oss-security@...ts.openwall.com
Subject: Re: Update on the distro-backdoor-scanner effort

On 2024-04-27, Jacob Bachmeyer wrote:

> >   - Output is manageable; able to rule out all hits not part of the
> >     actual xz-utils backdoors as false positives.
> 
> This is what I would expect:  the backdoor dropper appears to have
> been specifically developed for xz-utils, but could /possibly/ be
> adaptable to other compression tools.

Indeed, well my thinking was more along the lines of: "This is an
impressive amount of moving parts to be created new for this project,
and burned for just this project. What if what we are seeing is the 2nd
or 3rd gen of such a toolkit, and earlier ones have some similar
characteristics but maybe fewer layers, etc. so we could spot them?" As
well as "a human made choices here and here and there. Did the same
human make the same choices (muscle memory, etc.) elsewhere?" Basically
did they make an opsec mistake we can catch?

We'd never know until we looked. And while the liblzma decompressor had
certain things going for it as a target, why not other things? (Nor am I
restricting to "other things that would get linked into sshd", but
really more broadly.)

As with pretty much all of this, anyone who knows or just assumes
someone is looking, can adjust their behavior going forward. But did
they baby steps this 5-10 years ago and what we're now discussing is a
descendant of something we can identify?

Maybe 20 years ago I was involved in catching/analyzing './configure'
backdoors in a few things - irssi, fragroute, BitchX, fragrouter, that
typically involved upstream distribution sites getting hacked, and
sometimes attempts to phish kernel developers on LKML, etc. The
xz-utils stuff is many generations ahead of those; it would be
interesting to spot some missing links.

> You might get better results by indexing macro definitions found in
> *.m4 files, instead of trying to fuzzily hash the files.  The
> interesting comparison is then different definitions of macros with
> the same name.

I like this a lot as a potential next layer for the m4 reconciliation.
Essentially a field-level matching once things that match at a
file-level have been eliminated. I don't see why (he says, not actually
having dug into the m4 format much) we couldn't break apart all the m4s
we are choosing to consider known-good, and catalog each individual
macro, and then do the same when bashing project-specific files. Could
be we can entirely "clear" a file with an unknown checksum because it
consists 100% of idnividual macros that are known. (Insert weird machine
here that combined OK macros in surprising ways.)

> many (most?) modern Linux kernels are compressed using xz, which means
> that a Thompsonesque attack could binary-patch a freshly built kernel
> while compressing vmlinux to make vmlinuz.

Good call. This may be far out on the.... "far out" scale, but it'd be
pretty trivial to harvest distro kernels that used xz to make their
vmlinuz, and then run those through multiple independent implementations
like we have done with .tar.xz files. Sold.

> The IFUNC mechanism is actually a security feature.  In "inner-loop"
> code, having multiple implementations with different optimizations
> with the preferred implementation for the local processor chosen at
> runtime is fairly common.

Thanks for this! I've seen that discussed as a (valid, useful) use of
IFUNC but also AFAIK things like musl don't implement such a thing, so
either software that wants it just doesn't support musl, or can't pick
an optimization, or does so in an undesirable way like writable pointers
in the data segment or... some other option? The glibc thread was
discussing some options like a cpu_features mask although that looks to
get unpractically unweildy in a hurry.

Aside from any discussions about "can/should we deprecate IFUNCs" which
are above my pay grade, I'd be satisfied being able to say "IFUNCs are
used by 15 out of 10,000 packages; that's a small enough number we can
a) audit them all b) add alerting to tooling used for builds; when a
package suddenly starts using them, look into it." If the real number
turns out to be 1,000 out of 10,000, then that's good to know, and
probably give up.

> I currently suspect that the crackers used IFUNC support as a covert
> flag.  The "jankiness" of the current glibc IFUNC implementation
> provided a convenient excuse to ask oss-fuzz to --disable-ifunc when
> building xz-utils, which *also* conveniently inhibited the backdoor
> dropper and ensured that the fuzzing builds would not contain the
> backdoor.

Uncynically, I like this conspiracy theory.

> > - Check for irregular contents in .pc files, inspired by Vegard
> >   Nossum's oss-security post

> Much easier:  look for pkg-config descriptions containing text other
> than a variable definition.  The pkg-config tool itself should
> probably enforce "cleanliness" on this matter and refuse to process
> files containing other text.  (It also should complain about and
> reject an *-uninstalled.pc file found in the system directories, which
> was another logic error exploited in that sample backdoor.)

Really, doing this seems a more robust approach anyway, because allowing
only known-good > rejecting known-bad. I was mostly driven by "hang on,
how many of the things Nossum's example does are actually used by real
files?" and the answer from my initial sample size was zero, so it'd be
trivial to extend that check to every .pc file shipped by every current
distro's packages.

I think Sam looked into existing pkg-config verifiers and found they do
not complain about things we thought they should complain about (this
could just mean we misunderstand their purpose). A strict lint-checker
for such files would be better than just checking for specific
suspicious patterns. But, I don't yet know how strict a format we could
insist on (would it turn out 10% of files in fact break what we
initially think are reasonable rules?). Even still, I think you could
embed badness in legit variables, although I haven't dug in enough to
know that for sure.

Thanks,

-- 

Hank Leininger <hlein@...elogic.com>
8428 ED14 5268 C727 0C48  F454 846F 0637 5FEB 1612

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.