|
Message-ID: <CA+TsHUA0Vr0qAgh1meOgJ-BQ+fOPwJvK9puYnsGszeSKE6GO5A@mail.gmail.com> Date: Mon, 8 Oct 2012 06:44:03 +0530 From: Sayantan Datta <std2048@...il.com> To: john-users@...ts.openwall.com Subject: Re: Password hashing at scale (for Internet companies with millions of users) - YaC 2012 slides On Mon, Oct 8, 2012 at 1:24 AM, Solar Designer <solar@...nwall.com> wrote: > On Fri, Oct 05, 2012 at 06:46:09PM +0530, Sayantan Datta wrote: > > I don't understand how the lack of parallelism can affect the defenders? > Or > > maybe I don't understand how the defense mechanism works!! > > When the hashing method lacks parallelism and you don't have any > parallelism externally either (you only have one password to hash with > one salt value - to authenticate the one user currently logging in), you > (the defender) might not be able to use your CPU fully. However, the > attacker will be able to use an equivalent CPU more fully, by bringing > parallelism from high level (multiple candidate passwords to test) down > to instruction level. > > For example, with PBKDF2-HMAC-SHA-1, a defender might spend 100 ms to > validate a password - with SHA-1 implemented using 32-bit integer > instructions only (can't use SIMD because of the lack of parallelism). > However, an attacker having the same type of CPU will be able to test > e.g. 16 passwords per 100 ms on the same quad-core CPU - using four > 32-bit elements per 128-bit SIMD vector and four cores. > So the lack of parallelism affects the defenders by not letting them fully utilize the cpu resources(vector units). > > The actual speedup might be somewhat different since the CPU might have > a different number of 32-bit ALUs than vector units per core, and the > throughput of these might be different too. (In terms of attacker's > code optimization, the availability of extra execution units is taken > advantage of by mixing instructions corresponding to different > instances of the hash function being computed, or/and by running > multiple threads per core if the CPU supports SMT.) > By ALUs you mean the scalar units. Right? Maybe we could mix bcrypt with md5 and get awesome resource utilization...at least on paper. > Now, the defender might in fact have some parallelism, due to multiple > nearly concurrent authentication attempts (for the same or different > target accounts). In practice, this lets defenders benefit from > multiple CPU cores and SMT, but usually not from SIMD, nor from other > optimizations to be made within one thread (such as the instruction > mixing I mentioned). Those optimizations are tricky and risky for > defenders, and they require synchronization of different authentication > attempts (delaying processing of some in order to process them in > parallel with others). It is a lot better to design the hashing method > such that it has sufficient parallelism in it, and make use of that > parallelism for defense without having to combine processing for > different authentication attempts at a low level. > Why would the optimizations become risky for defenders? I mean , does/could those optimizations you mentioned give the attacker any added advantage? > > With possible use of GPUs for defense, the parallelism available from > nearly concurrent authentication attempts is simply insufficient even at > the busiest authentication servers - e.g., a server might receive "only" > 500 authentication attempts per 100 ms, but efficient use of a GPU may > require thousands of work-items. > > I hope this is clearer now. > > Alexander > Regards, Sayantan
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.