john-users - Re: Anyone looked at the Ashley Madison data yet?

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150826054905.GA2804@openwall.com>
Date: Wed, 26 Aug 2015 08:49:05 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Anyone looked at the Ashley Madison data yet?

On Wed, Aug 26, 2015 at 06:15:55AM +0300, Solar Designer wrote:
> For the "top N" work, you need to "shuf" the dump and choose specific
> e.g. 100k lines from it (e.g. for intending to produce a top 100 list).
> To make this even safer, "shuf" the 100k sub-list of hashes for each
> potential contributor separately, and give each contributor only their
> shuffled list.  This extra measure is in case of interrupted attacks, so
> that with a large number of contributors the original 100k list is
> attacked uniformly anyway.  (It wouldn't be fatal even if it's not,
> though, since it's already shuffled.  However, if a particularly common
> password is found closer to the start of the 100k list, it might appear
> as even more common than it actually is if some attacks are interrupted.)

Actually, for a likely top 100 list from a 100k sub-list, you don't need
a community effort.  This can be done by one person using one machine in
a few days.  Just take a few hundred top passwords from existing such
lists, add four lines:

ashley
madison
Ashley
Madison

and run it until completion against the 100k sample (it's crucial to
"shuf" the original list before you extract this sample).  Out of the
four lines I suggested adding, I guess the all-lowercase ones are
somewhat likely to appear in top 100.  The capitalized ones probably
aren't popular enough, but are worth testing as well (can't rule out
them being in top 100 without testing).

To test 300 candidate passwords against a 100k sample at 50 c/s (one
modern quad-core CPU), you need:

300 * 100000 / 50 / 86400 = ~7 days

300 is probably enough to have good confidence that ~90% of the eventual
top 100 were included in testing.  Someone might want to confirm or
disprove this by comparing similar portions of existing top lists from
various leaks, assuming that AM is similar in this respect.

Adding a few hundred of top already cracked AM passwords (cracked
without following this methodology, so without being limited to this
sample) to the list of candidate passwords to test against the 100k
sample is also a good idea.  (If you already have those other cracks.)
They will compete against the usual top 300 (derived from other top
lists), in case there are enough specifics to AM that some (many?)
otherwise not top 300 passwords are on AM's top 100.  This may take a
second week, or a second CPU.  Or those passwords may be probed in a day
against a 10k sample first, and only those that are common enough in
that sample to potentially be in top 100 then tested against the 100k
sample.  Then it's just a day more.

So it's unclear if a community effort is justified.  For a top 100 list,
if desired, someone just needs to do it right.  And doing it right is
more important than testing a larger candidate password list against a
larger sample.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.