|
Message-Id: <BA7E1376-5788-45DD-B851-8BFD9349785A@gmail.com> Date: Sat, 25 Apr 2015 23:10:05 +0800 From: Lei Zhang <zhanglei.april@...il.com> To: john-dev@...ts.openwall.com Subject: Re: [GSoC] JtR SIMD support enhancements > On Apr 25, 2015, at 8:34 PM, Solar Designer <solar@...nwall.com> wrote > > On Thu, Apr 23, 2015 at 11:35:44PM +0800, Lei Zhang wrote: >> Please see the attachment for a full report. > > Thanks! This shows a mix of decent speeds (descrypt, md5crypt) and > worse speeds and even ridiculously low speeds. None of these speeds > are particularly high, though. Modern GPUs generally do better (e.g., > the md5crypt speed is on par with that of GPUs from a few years back). > But we don't have the full set of formats supported on GPUs yet, so this > may be useful - e.g., for things such as SunMD5. > > The numerically higher speeds - those in the millions of c/s - are the > ridiculously low ones, because those are fast hashes that are meant to > reach a billion c/s with proper code. But that was to be expected, > because of how we've implemented OpenMP so far. We have the same > problem on CPU; it's just even more profound on MIC due to the > differences in architecture emphasizing the effect of Amdahl's law. > You may experiment with --fork=240 and length locked e.g. to 8 chars. > The cumulative speeds for the fast hashes should be a lot higher then, > possibly reaching a billion c/s for e.g. NTLM and raw MD4. > >> I did tune a bunch of OMP_SCALEs. Some them are too big by default and would drain MIC's memory if not tuned. There're just too many formats there to do a thorough check. So I just picked out some formats that have too big a OMP_SCALE (e.g. > 4096), and experimentally tuned it one by one. > > You need to also tune them for best performance. In fact, tuning of the > OMP_SCALE factors and the interleave factors should be done together. > >> Benchmarking: NT [MD4 32/64]... DONE >> Raw: 3509K c/s real, 3509K c/s virtual > > No MIC code for it yet? > >> Benchmarking: sha256crypt, crypt(3) $5$ (rounds=5000) [SHA256 512/512 MIC 16x]... (240xOMP) DONE >> Speed for cost 1 (iteration count) of 5000 >> Raw: 23141 c/s real, 111 c/s virtual >> >> Benchmarking: sha512crypt, crypt(3) $6$ (rounds=5000) [SHA512 512/512 MIC 8x]... (240xOMP) DONE >> Speed for cost 1 (iteration count) of 5000 >> Raw: 6168 c/s real, 25.7 c/s virtual > >> Benchmarking: Drupal7, $S$ (x16385) [SHA512 512/512 MIC 8x]... (240xOMP) DONE >> Speed for cost 1 (iteration count) of 16384 >> Raw: 2039 c/s real, 8.5 c/s virtual > > These are not ridiculous, but they are not great either. There ought to > be room for improvement here, such as through interleaving. And these > are actually relevant to be run on MIC. > >> Benchmarking: nt2, NT [MD4 512/512 MIC 16x]... DONE >> Raw: 4907K c/s real, 4907K c/s virtual > > Very little improvement relative to the "NT" format. I expected more of > a difference. Perhaps this will be seen with --fork=240. I guess the > SIMD instructions have higher latency, so impact the case of running > only one thread/core more. Need to run 4 threads/core here. > >> Benchmarking: phpass ($P$9) [phpass ($P$ or $H$) 128/128 MIC 16x1]... (240xOMP) DONE >> Raw: 17976 c/s real, 75.5 c/s virtual > > This is very poor speed. Needs to be investigated. It'll take me some time walking through all of the above. I'll report back later. >> Benchmarking: SunMD5 [MD5 512/512 MIC 16x]... DONE >> Speed for cost 1 (iteration count) of 5000 >> Raw: 75.0 c/s real, 75.0 c/s virtual > > No OpenMP for it yet - we should add that. Want to work on this? Not > only for MIC, but in general. Sure, I can work on this. Lei
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.