|
Message-ID: <20150401072909.GA8621@openwall.com> Date: Wed, 1 Apr 2015 10:29:09 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: [GSoC] John the Ripper support for PHC finalists Hi Agnieszka, Thank you for your work on this project so far! On Tue, Mar 31, 2015 at 04:34:58PM +0200, Agnieszka Bielec wrote: > POMELO will never be fast on GPU because __local and __private memory > are limited It is expected that 7 out of 9 PHC finalists are GPU-unfriendly. However, we need to try anyway, and determine those PHC finalists' cost settings where attacks on them with GPUs and CPUs become same speed per-chip. With even lower cost settings (likely way lower than those PHC finalists are normally intended to be used with), GPUs may win. Where this threshold is might affect which PHC finalists are eventually selected as winners, so the sooner we obtain such results, the better. For example, a PHC finalist with a CPU vs. GPU parity threshold at 1 MB is worse (in this respect) than one with the threshold at 10 KB. Use of large amounts of memory, beyond the local memory size, does not necessarily prevent this threshold from being at multi-megabyte levels, as seen for scrypt here: http://www.openwall.com/lists/crypt-dev/2014/03/13/1 On the other hand, GPU resistance may happen to be achieved even at very low memory usage, such as bcrypt's 4 KB: http://www.openwall.com/presentations/Passwords14-Energy-Efficient-Cracking/slide-44.html http://www.openwall.com/presentations/Passwords14-Energy-Efficient-Cracking/slide-45.html As you can see, for scrypt with r=1 the threshold appears to be around 16 MB. That's with a TMTO factor of 11 (the attacker gets to tune the TMTO factor optimally for themselves). If a PHC finalist that you experiment with is not susceptible to efficient TMTO, which they try not to be, we might get something like 16/11 = ~1.5 MB for the threshold _if_ the scheme is otherwise scrypt-like. Since there are many other differences from scrypt, beyond TMTO resistance, it is expected that actual thresholds will be at very different levels, hopefully lower than this ~1.5 MB estimate. Another reason why the ~1.5 MB estimate would not actually hold true even for an scrypt-like with TMTO resistance is that when TMTO is exploited (such as in the 16 MB benchmark mentioned above) the attacker pays for the reduction in memory needs by extra computation. So when TMTO is not exploited (such as because it can't reasonably be), we get to do less computation. This suggests that the without-TMTO threshold could as well be somewhat higher than ~1.5 MB. Your task is to produce optimized code and test inputs that will let us determine exactly where this threshold is, for each PHC finalist. BTW, note that it is quite possible that different optimizations (possibly even entirely different OpenCL kernels) might be needed for different settings. For example, with scrypt, besides tuning of the TMTO factor, whether you'd optimally keep X (a buffer holding the current block) in local or global memory might vary by the value of r (for r=1 it's optimal to keep X in local memory, but starting with some threshold this is probably no longer so as it starts to limit the concurrency too much). In fact, this same issue will likely arise when you approach implementing yescrypt. It might happen that an 8th finalist, Makwa, is also GPU-unfriendly at some settings. If so, that would be an interesting and useful result. Makwa is not explicitly intended to be GPU-unfriendly. And in case you were wondering, the PHC finalist that is expected to be GPU-friendly is Parallel. As currently defined, it uses SHA-512, which is not the best fit for GPUs, but it's by far not the worst either. Thanks again, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.