Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Fri, 26 Apr 2024 14:06:16 -0600
From: Hank Leininger <hlein@...elogic.com>
To: oss-security@...ts.openwall.com
Subject: Update on the distro-backdoor-scanner effort

tl;dr: We've pursued a number of avenues, with plans for more; so far
no "smoking gun" of other backdoors of a similar vein; help wanted.

So: what is this, where are we, what's next, what do we need, credits.

What is this?

- Ongoing work to look for backdoors similar to those found in 
  xz-utils, or using vectors that were discussed in its aftermath, 
  that have made their way into Linux distributions' build pipelines;
  see https://marc.info/?l=oss-security&m=171208242904550&w=4

- All the below is also covered in code, READMEs, or GH issues at
  https://github.com/hlein/distro-backdoor-scanner; the tools are
  intended to be bog-standard and work on any of the supported 
  distros, and to document what they need so our work should be
  repeatable by anybody.

Where we are: main things investigated:

- Similar exploitation toolkits / operator-behavior in other packages?

  - Unpack and scan all packages in multiple distribution families
    looking for the "fist" of the operator: similar stage0 / stage1 /
    stage2 loaders, characteristic command-line switch combinations,
    etc.; can we find earlier generations of any of the widgets used
    (and now burned) in other packages previously backdoored? See the
    patterns in bin/package_scan_all.sh

  - Unpacked and scanned (generally post-distro-patches, to focus on
    as-used-by-distros, not just rescan the same upstreams 4 times):

    - ~11k EndeavourOS/Arch packages
    - ~40k Debian packages
    - ~19k Gentoo packages
    - ~9k Rocky/RPM packages

  - Output is manageable; able to rule out all hits not part of the
    actual xz-utils backdoors as false positives.

- Examine the provenance of every .m4 in every package unpacked above

  - What m4 macro files unique/introduced by a project? Which are
    recognizable, but updated/modified from any revision ever in an
    upstreams (GNU autotools, etc.) in some way? m4 files have serial
    numbers, and the xz-utils backdoor used a big jump in its backdoor
    .m4; maybe an attempt to keep from getting clobbered after
    upstream upgrades?

  - Turns out serial numbers are made up and the points don't matter.
    But still, this author appears to have _thought_ they were
    important. So if they'd done similar somewhere else, should stand
    out there too.

  - Analyzing about 50k m4 files found about 5k that didn't match an
    upstream. around 1k of those had a near-match so we can diff them.
    That's still too many to digest manually. 3 had big serial jumps.
    2 of those seem benign; the third is of course, the trojan in
    xz-utils.

  - Big TODOs here are to implement fuzzy hashing when we don't have
    a perfect match, so that we can pick the best knowngood candidate
    to offer a diff against and to group the unknowns amongst
    themselves, and something to facilitate tracking of diff-review
    (CSV or another sqlite DB that tracks review status?), and then
    to actually read all the diffs (currently only spot-checked).

- Compare decompression of xz-utils vs other compatible tools

  - Just to check for some obvious Thompsonesque weird machine where
    xz injects malicious .c code into a tarball it unpacks, etc. Very
    unlikely to find anything.

  - Found nothing except some minor bugs in other decompressors (will
    submit upstream bugs, but low priority).

  - Still plan to add more different decompressors for completeness.

What's next: rough notions only, not yet implemented:

- Analyze IFUNC real-world use. They're dodgy and weird and useful for
  backdoors like this one. Removing IFUNC support from glibc has been
  floated: https://marc.info/?l=glibc-alpha&m=171389592724184&w=4
  But that'll get hung up on "but what if users". AFAWK nobody knows.
  So let's find out: survey sources & binaries from major distros and
  get some actual numbers. Also thegrugq made an interesting
  observation: it'd be telling which projects recently _added_ IFUNC
  use, if any. See
  https://github.com/hlein/distro-backdoor-scanner/issues/16

- Check for irregular contents in .pc files, inspired by Vegard
  Nossum's oss-security post
  https://marc.info/?l=oss-security&m=171335763115933&w=4
  This seems it'd be pretty easy to look for known bads. Starting
  notes: https://github.com/hlein/distro-backdoor-scanner/issues/7

- Systematically compare git-tagged versions of software to release
  artifacts for that same version. What differs, and why? There's
  often minor differences for what seem like good releng reasons. But
  in the xz-utils case, the backdoor author was able to get access to
  post Release assets even w/o commit/merge access; their backdoor was
  injected in files/contents that didn't match the Git repo contents.
  See https://github.com/hlein/distro-backdoor-scanner/issues/17

What do we need:

- Testers, especially on other distros in a family we support but
  only tested on one so far.

- Reproducers to rerun our analysis yourself and make sure you concur
  with our conclusions.

- Contributors to the currently outstanding issues/tools.

- Analysis help on the m4 diffs (once we have fuzzy-matching to choose
  best-fit diff comparison targets).

- Brainstorming to come up with the next big items to put on the list.

Credits:

  Most of this work has been done by Sam James of the Gentoo team and
  Hank Leininger (me), partially sponsored by KoreLogic. Thanks also
  to folks who helped us get a handle on a lot of different distros'
  ecosystems, especially Solar Designer (Rocky/RPM family), brocellous
  (Arch family).

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.