john-dev - Re: Handling of hashes with different iteration counts

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140411082605.GA17157@openwall.com>
Date: Fri, 11 Apr 2014 12:26:05 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Handling of hashes with different iteration counts

Frank,

On Fri, Apr 11, 2014 at 12:02:24AM +0200, Frank Dittrich wrote:
> So far, most formats just had a tunable parameter "iteration count".
> Now I am dealing with django-scrypt.
> It has the tunable cost parameters known from scrypt:
>   N - the CPU cost, must be a power of 2, defaults to 2^15
>   r - the memory cost, defaults to 8

It's wrong to call scrypt's N "the CPU cost" and r "the memory cost".
Either of them affects both the CPU time cost and the memory cost, in
roughly the same way.  The difference between them is in how they affect
the memory access pattern: higher r increases the size of individual
accesses (more cache lines are accessed sequentially if r is higher),
whereas higher N increases the number of individual random accesses.

>   p - the parallelization parameter, defaults to 1

Right.

> If p is 1, then N*r = m_cost = N*r*p = t_cost. This doesn't seem to be
> very useful.

Yet it makes sense.

> For now (not committed yet, and thus no pull request), I decided to
> report N*p as t_cost and r*p as m_cost.
> (As long as p == 1, just N and r are reported; p > 1 influences both
> t_cost and m_cost. Does that make sense?)

No, this makes no sense to me.

With scrypt, the only way to tune t_cost without affecting m_cost is via p.
That's the reality.  What you're doing is reusing my suggested t_cost
and m_cost names to mean something totally different.  That's confusing.
Please don't do that!

BTW, in yescrypt, I've introduced t - a way to tune t_cost without
affecting either m_cost or p.  So with yescrypt you'd have:

m_cost = N * r
t_cost = N * r * f(t) * ((flags & YESCRYPT_PARALLEL_SMIX) ? 1 : p)

whereas with classic scrypt it is:

m_cost = N * r
t_cost = N * r * p

(This is assuming that we're not making use of p as parallelism, since
we have enough parallelism from multiple candidate passwords.)

Typically, we'll see p=1 and f(t)=1, though.  In other words, with
typical uses the defender maximizes the memory usage for the time
allotted to compute a hash, because this maximizes the area-time.
So typically m_cost and t_cost calculated as above will be equal (or
numerically proportional, if we introduce scaling to some actual units
of memory and time, respectively).

I understand we might want to group hashes not only by their full m_cost
and t_cost, but also e.g. by N and r individually, as the differences in
memory access pattern may turn into speed differences (this is why these
settings are separately tunable, after all).  However, this simply does
not fit into the m_cost and t_cost model.  If you want to include such
support, you need to include it explicitly: as ability to group hashes
by individual hash type specific parameters.  For yescrypt, you would
then also need to support grouping separately by p, t, and flags.  Do we
really want to introduce that support proactively?

Thanks,

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.