Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Wed, 22 Jan 2014 05:58:17 +0400
From: Solar Designer <>
Subject: Re: scrypt parallelism vs. area-time vs. TMTO resistance

On Thu, Dec 19, 2013 at 06:08:53AM +0400, Solar Designer wrote:
> Setting scrypt's p > 1 introduces parallelism at (almost) the highest
> level.  This has the advantage of needing to synchronize the threads
> just once (before the final PBKDF2), but it results in much lower
> area-time cost for the attacker than we could achieve by having the
> parallelism at a slightly lower level, inside SMix.
> Specifically, in SMix's first loop, we may have the threads each write
> to its own portion of V, and in SMix's second loop we may have them all
> read from the entire V.  This requires that we synchronize the threads
> just one more time: between the two loops in SMix.
> For classic scrypt, the running time is 2*N (assuming that we have
> enough computing resources to make full use of the parallelism, and
> omitting r from this discussion for now).  For the revision proposed
> above, the running time is only 2*N/p: the first loop fills all N
> elements of V in N/p time, and the second loop performs a total of N
> lookups (as usual) and accesses 63% of V elements (as usual) in N/p time.
> Now given the same p and the same target running time, we can set N to
> be p times higher.
> If for classic scrypt the area-time cost of attack is
> O(old_N^2 * p * r^y) [1], then for the proposed revision it is
> O(new_N^2 * r^y).

I was wrong.  I guess I temporarily forgot that area*time is not the
same as memory*computation.  The attacker cares about how much memory is
tied up for how long.  When we increase parallelism at any level within
a hash computation, it may be made use of to reduce the total running
time (the "how long"), even if the number of memory lookups and crypto
primitive computations stays the same.  For high N, small p, and an
attacker with ASICs, the area cost of extra crypto cores is negligible.

Thus, for that scenario, the proposed revision's AT cost is only
O(new_N^2 / p * r^y).  There was no "/ p" for classic scrypt's SMix AT
cost because there's no parallelism in classic scrypt at that level.
There was "* p" for classic scrypt's AT cost because either that much
more memory is tied up for the same duration or the original (for p=1)
amount of memory is tied up for p times longer (because that smaller
memory allocation has to be used and reused sequentially, regardless of
how many more free crypto cores the attacker could have), or it can be
some inbetween combination.

> If we maintain the same running time, then
> new_N = old_N * p, giving the new area-time cost O(old_N^2 * p^2 * r^y).

With the above correction, this changes to O(old_N^2 * p * r^y), which
is the same as classic scrypt's.

> This is a major improvement: we went from p to p^2 for a factor in the
> attacker's cost

Unfortunately, we did not, at least not for the most capable attackers
with ASICs.

However, the change may still be helpful in that it limits the
attacker's flexibility.  Only attackers with sufficient supply of free
crypto cores maintain the classic scrypt's attack cost, and even those
attackers are forced to keep more memory per (faster) instance to
achieve good efficiency (unlike with classic scrypt, where they have
more of a choice).  This more-memory-per-scrypt-instance setup might
turn out to be slightly less cost-effective than the smaller-memory
setup (classic scrypt's), such as because of increased average distance
between crypto cores and the now-larger memory and because of extra
routing and logic required to deliver data to any of the cores rather
than only to a core tied to a specific memory block.  If so, there's an
attack cost increase by a small factor.  And if not, there's no attack
cost decrease (since that sort of setup was possible anyway).

> While p > 1 is currently most relevant to (offline) uses of the KDF, we
> may also look at what happens on authentication servers doing password
> hashing and needing to achieve a certain throughput.  With classic
> scrypt, going from p = 1 (with N and r tuned to achieve the desired
> throughput) to p > 1 (with N decreased accordingly) weakens the hashes
> (in area-time terms) by a factor of p.  With the proposed revision, this
> (theoretically) does not affect security, so the decision may be made
> based on other criteria (including possibly providing lower latency for
> the users by making full use of the available computing resources even
> when the server is under less than the maximum supported load).  This
> may be most relevant for possible defensive use of GPUs and such, where
> the amount of parallelism available from concurrent authentication
> attempts may be way below what's needed to optimally utilize the device,
> even on busy authentication servers (need tens of thousands of
> concurrent threads for current GPUs, and this keeps growing).

Unfortunately, the same AT cost reduction by a factor of p happens for
the proposed revision.  Thus, as long as we're maximizing the die area
via the memory portion (rather than via the crypto cores), the defender
should prefer relatively few fast cores (to keep N or N/p high) over
many slower cores (which would require lower N or N/p, for the classic
and revised scrypt, respectively).  If we're considering attackers with
ASICs, then it's only when die area consumed by the crypto cores
themselves would be comparable to that consumed by memory that it
becomes reasonable to go for using many cores per (possibly revised)
scrypt instance instead of running many independent parallel instances
(if we have the parallelism from concurrent authentication attempts).

> Sounds great so far?  It does to me.

It was too good to be true, even.

Didn't anyone spot the error?


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.