|
Message-ID: <20110712081648.GA15804@openwall.com> Date: Tue, 12 Jul 2011 12:16:48 +0400 From: Solar Designer <solar@...nwall.com> To: crypt-dev@...ts.openwall.com Subject: Re: Some thoughts on hard to compute hashes Hi Anthony, I'm sorry for not commenting on this sooner. On Fri, Jul 01, 2011 at 06:36:11PM -0400, Anthony Ferrara wrote: > So that started me thinking. What can a general purpose CPU do > efficiently that neither an ASIC nor a GPU could do in an efficient > manner (at a reasonable cost at least)? And the answer that I came up > with was branching logic. Modern CPUs have lots of die space devoted > to branching prediction and prefetching. Therefore, a CPU should be > more efficient (from my limited view at least) at running a hashing > algorithm with significant branching than either a GPU or an > inexpensive ASIC (as the intelligent branching logic adds complexity > and significant numbers of transistors to the design). You're right about GPUs, but probably mostly not about ASICs. I'd expect an ASIC not to have to implement the equivalent of a CPU's branch predictor/prefetcher, but instead just have a hard-coded state machine specific for your algorithm. On the other hand, if your code is large and its different portions are dissimilar, then you might force an optimal ASIC to effectively be a special-purpose CPU. The specific example you provided doesn't do that, though. > Just wondering if this idea is interesting enough to do some testing > and modeling of, or if it's been done before and is not worth it (or > it's not worth it for other reasons). It's been known before that branching is more unfriendly to stream processors such as in GPUs than it is to CPUs. I don't know if it's been proposed for a KDF or a password hashing method before or not. My concern is that it's somewhat unfriendly to CPUs as well, so we might not be making optimal use of the CPUs' execution units (leaving them idle while code is being fetched into L1 cache). If we consider attacks with GPUs only (and not ASICs), then we could fit all the code in L1 cache and still achieve GPU-unfriendliness with branches. We'd have to include some amount of parallelism, though (enough for CPUs), and we'd rely on the CPU's branch predictor to be no worse than some minimum we design for. Also, we'd need to keep the number of different branch targets small enough that we don't fill up the Pentium 4 trace cache or the like. Overall, yes, this makes sense. We've been considering another way of GPU-unfriendliness so far - arbitrary memory accesses, like in Blowfish. I think both approaches may be combined. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.