passwords - Re: Don't Scratch Your Entropy

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKws9z2YU7371hEPA9xH=XLVh_x2bpRC3yKidtD9sBqPpQQ03Q@mail.gmail.com>
Date: Sat, 9 Jul 2016 14:09:57 -0400
From: Scott Arciszewski <scott@...agonie.com>
To: passwords@...ts.openwall.com
Subject: Re: Don't Scratch Your Entropy

Spot on. Entropy must describe the password pool your password exists in,
not the password itself.

An example most developers might be more familiar with: If you generate a
1000 character, random printable ASCII password, but use LCG or MT19937 to
generate it, the maximum entropy you'll enjoy is 32 bits. Naive entropy
estimates (i.e. lg(95^1000) or 6569.8556) offer no insight into the
resistance of your password to being guessed.

> The entropy is a function of a distribution of a random value.

Correct.

> (a) your password's entropy is 0

Mathematically, yes. In real terms, I would say it's a meaningless value to
measure, sort of like asking what the voltage is at a single point without
another point to compare the potential?

> (b) every "security expert" pronouncing "entropy", without defining the
distribution or at very least the pool of candidate passwords, is a brain
dead buffoon.

That's a bit harsh.

----

Do you know of any particularly egregious sources of misinformation
off-hand for this issue? If they're on Stack Exchange, for example, I can
make clarifying edits.

Scott Arciszewski
Chief Development Officer
Paragon Initiative Enterprises <https://paragonie.com>

On Sat, Jul 9, 2016 at 12:00 PM, e@...tmx.net <e@...tmx.net> wrote:

> I have a strong conviction that 99% of "security experts" do not know the
> definition of the entropy. This conviction does certainly seem wildly
> deranged for you, unless you know the definition in question. So, let's
> begin with the definition, by the book.
>
> H = sum(p_i * log(p_i))
>
> This is a function of the probability vector P = {..., p_i, ...} that
> represents a distribution of a random variable. Entropy is a characteristic
> of a distribution of a random variable. No more and no less.
>
> Let us find the entropy of your password. Your password's distribution
> vector is {1}, therefore your password's entropy is:
>
> H = 1 * log(1) = 0
>
> Your password's entropy is ZERO. Try log(1) in different bases on
> different computers if you are unsure.
>
> A sophisticated reader may ask: "What if we apply entropy to the password
> creation procedure?" It is doable in seemingly reasonable way. We can model
> any password creation procedure as a random choice from a pool of candidate
> passwords, then characterize the password distribution over this pool with
> the entropy. The resulting number will tell us how much information our
> procedure represents. So what? Is this number of any use in the context of
> "password security"?
>
> Security experts usually jump in here and claim that this number
> represents the strength of the produced password. For the argument sake,
> let's accept this claim, and construct a password creation procedure as
> follows:
> the password pool is {"123", "password", "gtfr3467ujhbvcddgy6r5ddsefvvs",
> "###"},
> we toss two coins and pick one from this four according to the coin toss
> outcome.
>
> The entropy of this procedure is (given the coin toss produces uniformly
> distributed outcomes):
> H1 = -(1/4) * log(1/4) * 4 = 2
>
> Now (according to the mainstream computer "science" (dictated by the NIST
> recommendations)) we must label all our passwords with this entropy value:
> "123" has the entropy based strength 2
> "password" has the entropy based strength 2
> "gtfr3467ujhbvcddgy6r5ddsefvvs" has the entropy based strength 2
> "###" has the entropy based strength 2.
>
> Looks somewhat counter intuitive, and not at all what you used to think
> about the "entropy" as being pronounced by a respectable "expert" with a
> straight face.
>
> Furthermore, we can define another password creation procedure:
> toss one coin and pick from the pool
> {"123","gtfr3467ujhbvcddgy6r5ddsefvvs"}.
> The entropy of this procedure is (twice less than the previous): 1.
> Therefore:
> the password "123" has the entropy based strength 1.
>
> The very same password "123" that also has the strength 2. A password has
> two different strengths simultaneously. If we understand the "strength" as
> a likelihood of being guessed by the attacker, then a single password can
> not have two different values, because the password alone is the input
> argument for the hypothetical attack, not the password creation procedure.
>
> Thus, accepting the premise: "the password creation entropy characterizes
> a produced password", we end up with a contradiction. Entropy is
> demonstrated to be not a function of a password. However, in a little less
> mentally insane world I should have skipped this lengthy demonstration
> altogether. The entropy is just defined as a function of a random
> distribution -- who would have thought that it is also NOT a function of
> anything else!
>
> But I am not a champion of taking the longer route to obvious conclusions.
> Matt Weir have conducted a meticulous experiment with leaked passwords to
> make the statement: "entropy based password strength measures do not
> provide any actionable information to the defender", and also: "there is no
> way to convert the notion of Shannon entropy into the guessing entropy of
> password creation policies". In other words, he gave us an experimental
> evidence that the entropy is irrelevant to the password strength problem.
> Of course, it is irrelevant! This irrelevance is plainly written in the
> entropy definition. Matt, you could have just read the definition and say:
> "corollary, dear 'experts', don't scratch your entropy". Nevertheless,
> these experimental results are of a great value for humanity, and I am glad
> we have them, the more evidence the better. In this world of imbeciles,
> even the most obvious facts require tons of "proofs", so far as the
> "experts" does not go along with math logic very well.
>
> Still there is more to the topic! Not only the entropy of an accurate
> password creation model is irrelevant to the problem of password strength,
> but also the model itself is not possible in real life usecases. What
> distribution are you going to apply to human created passwords? Given that
> (a) humans are incapable of randomization (b) the pool of passwords they
> choose from is not accessible to us, not even by vivisection of the brain.
> This fact makes the entropy even worse than irrelevant, it makes the
> entropy ARBITRARY -- whatever distribution we assume for a human created
> password it is inevitably baseless arbitrary garbage.
>
> Let's recap:
>
> The entropy is a function of a distribution of a random value.
>
> Corollary:
>
> (a) your password's entropy is 0
>
> (b) every "security expert" pronouncing "entropy", without defining the
> distribution or at very least the pool of candidate passwords, is a brain
> dead buffoon.
>

Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.