|
Message-ID: <66175855.2090805@gmail.com>
Date: Wed, 10 Apr 2024 22:26:13 -0500
From: Jacob Bachmeyer <jcb62281@...il.com>
To: oss-security@...ts.openwall.com
CC: Alejandro Colomar <alx@...nel.org>, Sam James <sam@...too.org>,
Joey Hess <id@...yh.name>,
Jonathan Nieder <jrnieder@...il.com>, Andres Freund <andres@...razel.de>,
Lasse Collin <lasse.collin@...aani.org>,
xz@...aani.org
Subject: Re: Analysis on who is Jia Tan, and who he could work
for, reading xz.git
Solar Designer wrote:
> On Wed, Apr 10, 2024 at 05:16:52AM +0200, Alejandro Colomar wrote:
>
>> I've been researching xz.git to learn about this malicious actor, and
>> who he might have worked for.
>>
>
> As a moderator, I reluctantly let this through out of respect for
> Alejandro's time and knowing that many readers will find it interesting.
>
> However:
>
> This is almost off-topic for oss-security and it risks provoking further
> speculation and potentially hatred in follow-ups. Related analyses,
> including not only of timezones but also of commit times, were already
> posted elsewhere (e.g., a Wired story). So let's please limit the
> follow-ups to (1) corrections of any factual errors or major omissions
> (to the extent of being misleading) there might be in Alejandro's
> postings and (2) observations that more directly help us identify or
> prevent more compromises like this (if any can be made based on this
> analysis, which I doubt). One major omission I'd like to point out is
> that timezones can be faked - we have no reliable way to know which of
> these, if any, actually correspond to where Jia Tan was.
>
> Note that other recent threads in here about search for code patterns
> similar to Jia Tan's and even for PGP keys similar to Jia Tan's are more
> relevant to oss-security, because they're aimed to uncover potential
> related backdoor code in other projects. In contrast, identifying who
> Jia Tan is or what country/ies they're from doesn't obviously help. At
> best, it may give us guesses on where the presumed targets are, but then
> what? We need to protect the whole ecosystem regardless of who/where
> the current attackers are, and we need to develop means to detect such
> attacks everywhere, not only at currently likely targets.
>
First, a factual correction: The hypothesis that "Jia Tan" was actually
in UTC+03 seems to have been backwards, since the peak activity overlaps
only partially with office hours in UTC+03, but does indeed start with
9AM in *UTC-03* by my reckoning. The only problem is that UTC-01
through UTC-03 cover various islands in the Atlantic Ocean and a few
Eastern parts of South America. All of these strike me as unlikely
sockmaster bases. The problem with time zones east of UTC is the
observed UTC 17:00 "quitting time" (more below) which only gets /later/
in the local day as you move east.
Second, I think that we can probably put the "Israeli" hypothesis to
bed: There seems to be no 24 hour period where "Jia" made no commits,
and what I think is Friday night into Saturday (therefore the Jewish
Sabbath) is one of the more frequent late-night periods, while "Jia"
seemingly (mostly) took Sundays off. I have read reports where
activities were attributed to Israel and two of the key arguments were
that APT group did /nothing/ on Friday evenings or Saturdays, and Sunday
seemed to be an ordinary work day for them. These characteristics do
/not/ describe the "Jia" crew. Whoever "Jia" is, an observant Jew he is
not.
I have been looking at this from a different angle, assuming that all of
the time zone information in the commits is bogus and looking for
patterns in the commit epoch timestamps, which are harder to
convincingly fake. The attached "collect.sh" is intended to run in a
directory next to a copy of the repository as "xz-backdoored" and
extracts the commit and author timestamps in epoch time, further
decomposing them into week/time-of-week and day/time-of-day for analysis
and plotting. The week and day numbers are counted from 1 Jan 1970,
which was a Thursday, so the time-of-week numbers in the output of the
attached script are seconds from midnight Thursday. An epoch day number
X can be converted back to a date with `date --date='1 Jan 1970 UTC + X
days'` and an analogous command converts week numbers to Thursdays.
This is a work in progress and I am not yet fully confident that I have
correct analysis, in part because my results are different from what
others had found before I started, so I am presenting the data
extraction script for others to either find problems with or replicate
my results. The script was run on a repository clone with master
checked out at commit f9cf4c05edd14dedfe63833f8ccbe41b55823b00.
There is a noticeable cluster in the plot, and about 85% of "Jia Tan"'s
commits were in the five hours starting at UTC noon. If we exclude
2024, which seems to have been "crunch time" on getting the backdoor
out, that jumps to about 91%. I believe that this pattern *might* be a
good indicator for the sock farm containing "Jia Tan" but there are
likely to be false positives, so it is probably a weak indicator.
Combining this pattern with a claimed timezone (like "Jia"'s UTC+08)
where that period is into the night might work better. In UTC+08, that
period is 8PM to 1AM, which are unlikely office hours. The peak also
ends almost as abruptly as it begins, suggesting that UTC 17:00 was
"quitting time" at "Jia"'s office, but that "Jia" did occasionally work
late. The five hour active period is consistent with morning planning
meetings, followed by general work keeping up "Jia"'s appearances, with
a floating lunch break somewhere. Think "rogue state bureaucracy" here.
The percentages above were calculated with these Awk commands:
awk '{ if ($5>(12*3600) && $5<(17*3600)) A++; else B++ } END {print
"in: "A" out: "B" all: "A+B" %in: "100*A/(A+B)}'
timedata-committer-JiaTan
awk '$4 < 19723 { if ($5>(12*3600) && $5<(17*3600)) A++; else B++ } END
{print "in: "A" out: "B" all: "A+B" %in: "100*A/(A+B)}'
timedata-committer-JiaTan
Epoch day 19723 is 1 Jan 2024 by my reckoning, (`TZ=UTC date --date='1
jan 1970 UTC + 19723 days'`) so the second command repeats the count,
excluding 2024.
This thread landed in my inbox as I was planning to start work on
further partitioning the "Jia Tan" commits, initially by keywords in the
commit message. Do commits involving "ifunc" stand out in time from all
others? Alejandro's work raises another question: Does time-of-commit
correlate to diff size? Alternately: Was the more complex work
seemingly done in a different time zone?
Lastly, I believe that if (a big "if") enough evidence can be found to
make attribution of the xz backdoor stick, the results are likely to be
a political scandal that will serve to deter others from similarly going
rogue, so pinning the "Jia" on the sockmaster might be a good step to
reduce the overall threat to the community.
-- Jacob
Download attachment "collect.sh" of type "application/x-sh" (987 bytes)
Powered by blists - more mailing lists
Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.