passwords - Re: GMOs And Passwords

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <739a8936-87b9-b767-afe9-153a2090ec14@sinon.org>
Date: Thu, 25 Aug 2016 08:46:37 +0200
From: Yoha <yoha@...on.org>
To: passwords@...ts.openwall.com
Subject: Re: GMOs And Passwords


> A password does not have to be random!!!
Yes, they do. I think you meant "generated by computers".

> A password have to be UNKNOWN and UNOBTAINABLE for the attacker.
> (it is not equivalent to randomness).
That's pretty much the definition of random in cryptography. If you do
not have a background in cryptography, do consider that it the intuitive
concept of “randomness” does not map well to a mathematical definition.
This article is a pretty good summary:
https://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator
.

> Look, since we know that humans are destined to fail in creating a
> random password, it should be obvious that asking them to fail is a
> stupid move.
Agreed, in the ideal case, humans should be be the one generating the
passwords. They should only select a generation process.

> I suggest making DEEPLY PERSONAL passwords.
> You can ask your users to use a bit of memories that they know they
> never shared with anyone; write a sentence about it, add some flavour;
> Job's done.
This is terrible advice, unless your goal is to crack the users' passwords.

> Actually, people already feel about the passwords as very personal
> stuff. Look, how many people use birthdays as their passwords -- this
> is clearly an attempt of PERSONALIZATION. This is a total failure
> because the birthdays are NOT personal information. Funny as it is, in
> this birthday password situation the actual mistake is on the birthday
> side not the password side. People understand passwords, people
> misunderstand birthdays! And it is a solvable problem, it is very easy
> to tell people, you are on the right path, just make it way further.
It reads like an “is-ought” fallacy, but I see your point. Correct my if
I misinterpret: people want passwords they can memorize, and you should
take that into account. Then everyone here agree with you; the
disagreement is about how to conciliate this goal with that of making
passwords unguessable.

I feel that your model is mostly intuitive, and you disregard the actual
reasons for which we want computer-generated passwords. I have written
of summary of those reason. If you are already familiar with the
concept, it might be useful for someone else, and it is also an occasion
for me to organize my thoughts.

---

First, the setup. We consider two types of attacks at once, because they
share similar patterns: in an *online attack*, an *adversary* is trying
to access an user's account (for instance) by guessing their password;
in an *offline attack*, an adversary has gotten access to discriminating
information on the password (i.e. a cryptographic digest), and is trying
to find which passwords maps to this information, knowing the mapping
process.

First, we should note that an online attack is, in practise, supposed to
be strictly harder than an offline attack (we can use guess rate limits
on the server's side). For the purpose of password generation, we can
therefore focus on the second attack model. Second, we must realize that
we cannot attain information theoretic security
<https://en.wikipedia.org/wiki/Information-theoretic_security>, since
most information is leaked (through the digest); to deter an attacker,
we will have to rely on computational security
<https://en.wikipedia.org/wiki/Computational_hardness_assumption>. In
other words, given an infinite amount of computational power, the
adversary /will/ succeed; our only hope is to assume they have a limited
amount of available computational power, and make it so that guessing
the password will take a long time.

Since the adversary will try various guesses until they find the correct
password, we have two approaches to protect the user:

  * make each guess costly
  * make the attack require many guesses

The first approach is what key derivation functions (PBKDF2, bcrypt,
scrypt, argon2) aim for. This is very effective at improving security,
but is limited by the requirement of not deterring normal uses (real
user logging in): it it takes 100 ms to test a guess, but that the
password is obvious, the adversary will still take only 100 ms to crack it.

---

So, now, we want the attack to require many guesses. Obviously, we
cannot ensure that criterion with probability 1 (the adversary could
always find the password at their first guess). Hence, we need to refine
our model to allow for a probabilistic defense. First, we better give
the adversary as much plausible superpowers as possible, so that our
model encompasses most scenarios. For this, we let the attacker know
pretty much everything about how the password was generated (by a
computer or by a human), except the password itself.

There is definitely some fuzzy limit in our definition: assuming perfect
simulation of a deterministic universe (not even required if generation
process use no quantum source of entropy¹), the adversary could simulate
the whole generation process with the correct initial conditions and
derive what password was generated. Realistically, though, we assume the
adversary is somewhat more limited than that (there are some debates
about this, when designing true random generators like /dev/random).

¹ in the absolute, most generation processes do use some quantum entropy
anyway (noise in network activity, /probably/ some stuff in neuronal
communication)

Now, what does it mean, to know the generation process, from the point
of view of the adversary? Well, now, they can use their knowledge of the
generation process to construct the probabilistic distribution of all
password, and optimize their search to minimize the expected guessing
time, by choosing the best possible order. More specifically, they will
try to guess in order decreasing likelyhood (first "123456", then
"password", ... /then "Z///暗号/7bc_*ō[…]//")/.

Since the goal of the adversary is to /minimize/ their expected guessing
time, and our goal is to /maximize/ it, you can think of it as
[minimax](https://en.wikipedia.org/wiki/Minimax). Now, it turns out that
the solution is to have a uniform distribution. That way, the order of
guesses do not effect the expected guessing time, meaning that the
adversary cannot reduce it.

Since constructing and comparing probabilistic distributions is not
always very practical (especially distributions on all strings of length
up to 64 characters
<https://nakedsecurity.sophos.com/2016/08/18/nists-new-password-rules-what-you-need-to-know/>),
we'd like a simpler metric. This is exactly what entropy, in the context
of information theory
<https://en.wikipedia.org/wiki/Entropy_%28information_theory%29>, is.
This is very convenient: is h is the entropy, an optimal adversary will
have an expected guessing time of 2^h (for b = 2). Our goal is thus:

    create a generation process whose entropy is maximum

All of this was only to give us a security metric. Of course, we do need
to take other constraints into account (ease of memorizing).

---

Now, remember that we want a secure generation process for
easy-to-memorize passwords. For the security, the passwords should be as
long as possible, and generated as uniformly as possible; for the ease
of memorizing, they should be as relatable as possible to the user. Now,
the easiest and most generic solution to this to select a dictionary of
common words in the language of the users, and then pick some uniformly
at random. However, as long as we check the entropy of the process, a
few tweaks are possible:

  * letting the user choose among several generated passwords (reroll),
    sacrificing just a few bits of the entropy
  * letting the user choose among several dictionaries (cities, sport
    teams, colors, presidents, numbers etc)
  * using a user-specific dictionary (however, such a dictionary will
    probably be pretty short, so passwords should be longer)

Just keep in mind that the adversary does know the dictionary, or the
preference of the user among several passwords.

Finally, regarding human-generated passwords, we can measure their
security very easily: let a human generate a probabilistic distribution,
and compute its entropy; repeat for many people. Well, it turns out that
we really suck at generating entropy. A tempting solutions would be to
let the user choose a long passphrase (a sentence), but it's not
automatically better. In the computer-generated case, we can make it so
that each word is generated independently from the others (or, at least,
know exactly their correlation, and adjust entropy); in the
human-generated case, words will be highly correlated, and the entropy
will probably grow /very/ slowly. For instance, some users are very
likely to use famous quotes, which are definitely at the very left of
the probability distribution for any decent adversary.


---


With this in mind, we should be able to have users pick decently secure
passwords, while avoiding horrendous policies of the like of
lowercase+uppercase+digit+non-alphanumeric.


Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.