Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 26 Sep 2018 15:35:33 +0200
From: Solar Designer <>
Subject: Re: pwqgen vs diceware

On Tue, Sep 25, 2018 at 10:00:43PM -0400, John Roman wrote:
> I'm certainly not here to start a flame war, but I had wondered casually
> which would be most suitable for a user generating a password:  pwqgen,
> or diceware?  

Of course, I prefer pwqgen, but I'm biased.

> what is the random dictionary used for pwqgen?

It's a list of 4096 English words found in the file wordset_4k.c in the
passwdqc source tree, and starting with the following comment:

 * 4096 English words for generation of easy to memorize random passphrases.
 * This list comes from the MakePass passphrase generator developed by
 * Dianelos Georgoudis <dianelos at>, which was announced on
 * sci.crypt on 1997/10/24.  Here's a relevant excerpt from that posting:
 * > The 4096 words in the word list were chosen according to the following
 * > criteria:
 * >    - each word must contain between 3 and 6 characters
 * >    - each word must be a common English word
 * >    - each word should be clearly different from each other
 * >      word, orthographically or semantically
 * >
 * > The MakePass word list has been placed in the public domain
 * At least two other sci.crypt postings by Dianelos Georgoudis also state
 * that the word list is in the public domain, and so did the web page at:
 * which existed until 2006 and is available from the Wayback Machine as of
 * this writing (March 2010).  Specifically, the web page said:
 * > The MakePass word list has been placed in the public domain.  To download
 * > a copy click here.  You can use the MakePass word list for many other
 * > purposes.
 * "To download a copy click here" was a link to free/makepass.lst, which is
 * currently available via the Wayback Machine:
 * Even though the original description of the list stated that "each word
 * must contain between 3 and 6 characters", there were two 7-character words:
 * "England" and "Germany".  For use in passwdqc, these have been replaced
 * with "erase" and "gag".
 * The code in passwdqc_check.c and passwdqc_random.c makes the following
 * assumptions about this list:
 * - there are exactly 4096 words;
 * - the words are of up to 6 characters long;
 * - although some words may contain capital letters, no two words differ by
 * the case of characters alone (e.g., converting the list to all-lowercase
 * would yield a list of 4096 unique words);
 * - the words contain alphabetical characters only;
 * - if an entire word on this list matches the initial substring of other
 * word(s) on the list, it is placed immediately before those words (e.g.,
 * "bake", "baker", "bakery").
 * Additionally, the default minimum passphrase length of 11 characters
 * specified in passwdqc_parse.c has been chosen such that a passphrase
 * consisting of any three words from this list with two separator
 * characters will pass the minimum length check.  In other words, this
 * default assumes that no word is shorter than 3 characters.

> are they similar?

No, they're not very similar.  passwdqc (pwqgen) uses 4096 words of
lengths 3 to 6.  Diceware uses 7776 "words" of a wider variety of
lengths, and some are not actually dictionary words (e.g., digits).

There are two major versions of Diceware - an older one, and a newer one
introduced by EFF.  The EFF one is better in some ways.

As you note, passwdqc (pwqgen) typically alters the case of the first
letter of words, which results in 8192 possibilities.  Punctuation and
digits between words add another 16 possibilities each (so a word with
its adjacent separator character encodes 131072 possibilities).

> as pwqgen generated phrases increase in size, so to do they increase in
> difficulty to remember.  this difficulty is bolstered by the strength
> imparted by pwqgens random inclusion of case, numerics, and specials.

passwdqc (pwqgen) chooses to alter the case of the first letter of words
or not, and to include the random separator characters or not, depending
on the amount of randomness you try to encode.  For example, this
doesn't use case and separators:

$ for n in `seq 1 5`; do pwqgen random=24; done

("Suez" was already capitalized in the input wordlist, and the separator
is always a "-".)

This encodes 2 bits more by starting to use the case:

$ for n in `seq 1 5`; do pwqgen random=26; done

This encodes another 4 bits (6 bits more than the original) by also
randomizing the separator:

$ for n in `seq 1 5`; do pwqgen random=30; done

At these requested bit sizes, the alternative to using random case and
separators would have been to add a third word, which I think would have
been harder to memorize.  But we may reasonably go for it when encoding
even more randomness:

$ for n in `seq 1 5`; do pwqgen random=36; done

("Taiwan" and "Rhine" were that way in the wordlist.)

Again, adding 3 more bits starts to randomize the case:

$ for n in `seq 1 5`; do pwqgen random=39; done

Adding another 8 bits (11 more bits total) randomizes the separators:

$ for n in `seq 1 5`; do pwqgen random=47; done

And the above is passwdqc's current default, encoding 47 bits in lengths
ranging from 11 to 20.

Unfortunately, 48 would be ambiguous in whether it'd request 4 words
without case toggling and without random separators, or addition of
another separator.  So it does the latter, actually encodes 51 bits in
12 to 21 characters, which is much shorter than adding a fourth word.
You can actually get four words starting with 52 bits, and that also
includes case toggling:

$ for n in `seq 1 5`; do pwqgen random=52; done

Go for also using random separators, and you can encode 64 bits in four
words.  For most current purposes, you won't need to use more than this:

$ for n in `seq 1 5`; do pwqgen random=64; done

Note that 5 words without case toggling and without separators would
only give you 60 bits.

And the maximum you can currently use is 85 bits encoded in 5 words,
with all the case toggling and specials:

$ for n in `seq 1 5`; do pwqgen random=85; done

Note that 7 words without case toggling and without separators would
only give you 84 bits.

> diceware offers high entropy passphrases at a low entry cost for the
> user, but is a shorter 3 word pwqgen passphrase just as strong as a
> longer 6 word passphrase from diceware?  entropically they seem
> identical.

No, and no.

A 3 word pwqgen passphrase encodes 36 (no random toggling and
separators) to 51 bits (with random toggling, separators, and a trailing
character).  By default, it's 47.

A 6 word Diceware passphrase encodes ~77 bits.

Do you find 6 word Diceware easier to memorize and quick enough to type?
If so, go for it (or use fewer words, see below).

> pwqgwen offers greater possibility of acceptance from legacy password
> systems that take fewer than 30 characters, but increases the potential
> that a character might be suspect or unsupported.  Diceware in turn can
> be adulterated with a case, numeric, or special as needed, but might see
> length issues.


> pwqgen states its capable of
> 24-85 for entropy.  diceware seems to appreciate ~77 bits of entropy.


> ive been testing entropy from this page:
> and here:
> its worth noting rumkins calculation for entropy seems a little high...a
> 77 bit entropy phrase at diceware will yield a 200 entropy phrase, for
> example...I wonder too what the appropriate entropy calculation is?

You can't measure entropy of one password just by looking at it, nor by
having a program look at it.  Entropy of a random variable is a property
of its distribution, not a property of any one value of it.

Any attempts to estimate entropy by looking at a password are thus at
best trying to make assumptions about the distribution, and those won't
perfectly match distribution of passwords in the real world, nor that of
a particular password generator.

Moreover, Shannon entropy is not an appropriate metric for password
strength (even if we could measure it accurately), except for passwords
that came from a uniform distribution.  So generated passwords are
pretty much the only case where this metric is applicable, and for those
we're lucky to be able to calculate the entropy from the generator's
design and parameters - like we did above.  These are the numbers you
should use if you assume that your adversary knows (or can guess with
almost no effort) that you use a password generator and with what word
list and parameters.

Also note that for most use cases going for pwqgen's maximum or
Diceware's recommended 6 words is overkill.  For example, there's little
point in doing that for online services passwords if you use a unique
password per service and plan on changing the password should you become
aware that it leaked.  It makes more sense to do it for data encryption,
such as on your PGP or SSH key or on your encrypted filesystem.  Modern
implementations of those are switching to use of modern KDFs with high
enough parameters, which should let you use simpler passphrases (albeit
perhaps not as simple as those I'd recommend you use as unique passwords
for online services), but you'd need to consider what KDF and with what
parameters your software uses (or else use a 1-2 word longer
passphrase).  Here's a relevant answer I gave a few days ago:

I hope this helps, and I'm sorry if it's more than you wanted to know.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.