|
Message-ID: <20150425123413.GA20496@openwall.com> Date: Sat, 25 Apr 2015 15:34:13 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: [GSoC] JtR SIMD support enhancements Lei, On Thu, Apr 23, 2015 at 11:35:44PM +0800, Lei Zhang wrote: > Please see the attachment for a full report. Thanks! This shows a mix of decent speeds (descrypt, md5crypt) and worse speeds and even ridiculously low speeds. None of these speeds are particularly high, though. Modern GPUs generally do better (e.g., the md5crypt speed is on par with that of GPUs from a few years back). But we don't have the full set of formats supported on GPUs yet, so this may be useful - e.g., for things such as SunMD5. The numerically higher speeds - those in the millions of c/s - are the ridiculously low ones, because those are fast hashes that are meant to reach a billion c/s with proper code. But that was to be expected, because of how we've implemented OpenMP so far. We have the same problem on CPU; it's just even more profound on MIC due to the differences in architecture emphasizing the effect of Amdahl's law. You may experiment with --fork=240 and length locked e.g. to 8 chars. The cumulative speeds for the fast hashes should be a lot higher then, possibly reaching a billion c/s for e.g. NTLM and raw MD4. > I did tune a bunch of OMP_SCALEs. Some them are too big by default and would drain MIC's memory if not tuned. There're just too many formats there to do a thorough check. So I just picked out some formats that have too big a OMP_SCALE (e.g. > 4096), and experimentally tuned it one by one. You need to also tune them for best performance. In fact, tuning of the OMP_SCALE factors and the interleave factors should be done together. > Benchmarking: NT [MD4 32/64]... DONE > Raw: 3509K c/s real, 3509K c/s virtual No MIC code for it yet? > Benchmarking: sha256crypt, crypt(3) $5$ (rounds=5000) [SHA256 512/512 MIC 16x]... (240xOMP) DONE > Speed for cost 1 (iteration count) of 5000 > Raw: 23141 c/s real, 111 c/s virtual > > Benchmarking: sha512crypt, crypt(3) $6$ (rounds=5000) [SHA512 512/512 MIC 8x]... (240xOMP) DONE > Speed for cost 1 (iteration count) of 5000 > Raw: 6168 c/s real, 25.7 c/s virtual > Benchmarking: Drupal7, $S$ (x16385) [SHA512 512/512 MIC 8x]... (240xOMP) DONE > Speed for cost 1 (iteration count) of 16384 > Raw: 2039 c/s real, 8.5 c/s virtual These are not ridiculous, but they are not great either. There ought to be room for improvement here, such as through interleaving. And these are actually relevant to be run on MIC. > Benchmarking: nt2, NT [MD4 512/512 MIC 16x]... DONE > Raw: 4907K c/s real, 4907K c/s virtual Very little improvement relative to the "NT" format. I expected more of a difference. Perhaps this will be seen with --fork=240. I guess the SIMD instructions have higher latency, so impact the case of running only one thread/core more. Need to run 4 threads/core here. > Benchmarking: phpass ($P$9) [phpass ($P$ or $H$) 128/128 MIC 16x1]... (240xOMP) DONE > Raw: 17976 c/s real, 75.5 c/s virtual This is very poor speed. Needs to be investigated. > Benchmarking: SunMD5 [MD5 512/512 MIC 16x]... DONE > Speed for cost 1 (iteration count) of 5000 > Raw: 75.0 c/s real, 75.0 c/s virtual No OpenMP for it yet - we should add that. Want to work on this? Not only for MIC, but in general. Meanwhile, it'd be curious to see how it performs with --fork=240. There's no GPU alternative to this yet, as far as I'm aware, so it's relevant. > All 298 formats passed self-tests! Cool. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.