john-users - Re: How long should I let JtR munch?

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANc-5UwQnxt5beGm0skDEHT8Zb9+c4Bxbu2ZcTPi3SVZPKUfbw@mail.gmail.com>
Date: Fri, 12 Aug 2016 12:07:06 -0500
From: Skip Montanaro <skip.montanaro@...il.com>
To: john-users@...ts.openwall.com
Subject: Re: How long should I let JtR munch?

> Also, I hope you're aware the XKCD was by far not the first time the use
> of (generated) passphrases was proposed. For example, there was
> Diceware (1995?), and our passwdqc included a passphrase generator right
> away when I wrote it in 2000. You might want to play with it:
>
> http://www.openwall.com/passwdqc/

Yes, I'm aware there is previous work in passphrase generation. XKCD 936
has the advantage of being presented in cartoon form which is easy for
dummies like me to understand. :-)

> polly looks broken in at least its use of a non-cryptographic PRNG.
> Per Python documentation, this is Mersenne Twister. Cracking it amounts
> to finding the 32-bit seed, somewhat similarly to (but also differently
> from) how I did it for PHP's here:
>
> http://www.openwall.com/php_mt_seed/
>
> Also take a look at the Strip external mode in the default john.conf for
> an example of attacking a similarly broken password generator with small
> seed space.

> Even if this is fixed by substituting a CSPRNG into polly, it doesn't
> look trivial to figure out whether your generated passphrases are
> distributed uniformly or not (most likely not), and what the total size
> of the search space is. It's a pitfall similar to the one pwgen fell
> into (albeit with phonemes rather than words), and it's especially bad
> if you don't use word separators:

> http://www.openwall.com/lists/oss-security/2012/01/22/6
>
> For a decent password/passphrase generator, you should be able to prove
> that the distribution is uniform, and calculate exactly how large the
> search space is. If you can't, then the generator is presumably broken.

I think I've at least partially addressed these problems by replacing calls
to random.random and random.shuffle with calls to those methods on a
random.SystemRandom instance, which uses os.urandom under the covers.

The dictionary I use is another issue which I've yet to consider. When mail
arrives in my Gmail inbox, it gets a "polly" tag. When run without
generating passwords, polly will pull messages with that trash and
incorporate the words it finds into polly's dictionary. They are certainly
not going to be random, but it's an odd set of "words" which are specific
to my interests. I don't know if that's good or bad. I've toyed with the
idea of mixing in /usr/dict/words as well, but haven't yet done that. For
one thing, the size of the words file would swamp the size of my current
dictionary (about 10x larger), making it much less likely that my odd words
would be seen. Also, about half of my polly word list is in words, so there
are even fewer unique words. OTOH, the dictionary size would increase
dramatically.

This idea (and the name of the program) came from a similar concept in
another Python programmer's MUD. He had a character named Polly who would
spit out passphrases from the conversation of participants in the mud.

>> I was unaware of the "$n$" prefix in general, but knew
>> "$1$" meant "MD5". BTW, I chose MD5 precisely because I see that used for
>> the bulk of the passwords in the NIS password database where I work.
>
> "$1$" refers to md5crypt.
>
> You might be confusing raw MD5 and md5crypt. There's roughly a 1000
> times difference in performance between the two, and md5crypt is salted.
> Unlike raw MD5, md5crypt was in fact intended for password hashing back
> when it was introduced in 1994. (Yet md5crypt has been declared
> end-of-life by its designer in 2012, since attacks on it became too fast
> and since we have better password hashing methods now.)

Hmmm... okay. md5crypt. Will do some more reading. At any rate, the $dummy$
hex form sounds better for my testing.

>> Obviously, I have no control over the hashing [...] used by my systems.
>
> If those systems are yours, then technically you have control, but you
> might not want (and know how) to deviate from the system vendors'
> supported settings and code.

Yeah, sorry, by "my systems" I meant anywhere I might have a
password-protected account. In this day and age, there are really only two
systems over which I might conceivably control the password hashing, the
two Macs at home. I have no such control over the systems at work or any of
the many websites where I have accounts. If any of them fail to use a
cryptographically strong hash, then I think I'm correct in assuming the
strength of my passwords is that much more important.

I pushed the change to random.SystemRandom and a "do not use" warning to
the top of my README.md file.

Thanks again for your help.

Skip
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.