|
Message-ID: <20121007195453.GC5182@openwall.com> Date: Sun, 7 Oct 2012 23:54:53 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Password hashing at scale (for Internet companies with millions of users) - YaC 2012 slides On Fri, Oct 05, 2012 at 06:46:09PM +0530, Sayantan Datta wrote: > I don't understand how the lack of parallelism can affect the defenders? Or > maybe I don't understand how the defense mechanism works!! When the hashing method lacks parallelism and you don't have any parallelism externally either (you only have one password to hash with one salt value - to authenticate the one user currently logging in), you (the defender) might not be able to use your CPU fully. However, the attacker will be able to use an equivalent CPU more fully, by bringing parallelism from high level (multiple candidate passwords to test) down to instruction level. For example, with PBKDF2-HMAC-SHA-1, a defender might spend 100 ms to validate a password - with SHA-1 implemented using 32-bit integer instructions only (can't use SIMD because of the lack of parallelism). However, an attacker having the same type of CPU will be able to test e.g. 16 passwords per 100 ms on the same quad-core CPU - using four 32-bit elements per 128-bit SIMD vector and four cores. The actual speedup might be somewhat different since the CPU might have a different number of 32-bit ALUs than vector units per core, and the throughput of these might be different too. (In terms of attacker's code optimization, the availability of extra execution units is taken advantage of by mixing instructions corresponding to different instances of the hash function being computed, or/and by running multiple threads per core if the CPU supports SMT.) Now, the defender might in fact have some parallelism, due to multiple nearly concurrent authentication attempts (for the same or different target accounts). In practice, this lets defenders benefit from multiple CPU cores and SMT, but usually not from SIMD, nor from other optimizations to be made within one thread (such as the instruction mixing I mentioned). Those optimizations are tricky and risky for defenders, and they require synchronization of different authentication attempts (delaying processing of some in order to process them in parallel with others). It is a lot better to design the hashing method such that it has sufficient parallelism in it, and make use of that parallelism for defense without having to combine processing for different authentication attempts at a low level. With possible use of GPUs for defense, the parallelism available from nearly concurrent authentication attempts is simply insufficient even at the busiest authentication servers - e.g., a server might receive "only" 500 authentication attempts per 100 ms, but efficient use of a GPU may require thousands of work-items. I hope this is clearer now. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.