|
Message-ID: <20120501061416.GA10734@openwall.com> Date: Tue, 1 May 2012 10:14:16 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Lukas - status report #2 Lukas - On Tue, May 01, 2012 at 06:13:01AM +0200, Lukas Odzioba wrote: > I would be more happy to see 80-90k, previously (just pmk calculation > - most time consuming) we had 90% of hashcat's speed. For now > difference will be ever worst for super fast gpus and slow cpu. > Besides cpu side code utilizes only 1 core. Do you have any ideas to > get around it other than MPI? On the other side we could move all code > to second kernel gpu. So you invoke the SHA-1 compression function about 20 times per key in wpapsk_postprocess(). This is about 1/400 of total, yet it causes significant slowdown when your GPU code is optimized, your CPU code is not optimized, and you run these sequentially rather than in parallel. Besides the current "just use OpenMP" hack, you can try these approaches: 1. Include wpapsk_postprocess() into your GPU kernel. I don't see why you're mentioning a second kernel. You already happen to have wpapsk_kernel.cl separate from pbkdf2_kernel.cl (even though I think we could have a shared PBKDF2 with HMAC-SHA1 kernel, if it were not for this new WPA specific detail). Ditto for CUDA. 2. Interleave the GPU and CPU code invocations by invoking the GPU kernel multiple times from a single crypt_all() call, for different subsets of the total set of keys. The first GPU kernel invocation won't overlap with any on-CPU work, and the last invocation of the on-CPU postprocessing won't overlap with any on-GPU work - but the rest will. So you'll need to keep the number of chunks large enough (e.g., 10) - and have it tunable. (Maybe we need to enhance the formats interface to allow for async processing across crypt_all() call boundaries.) 3. Optimize the on-CPU postprocessing. Replace the calls into OpenSSL with uses of our SSE2+ intrinsics implementations of SHA-1 and MD5. Implement HMAC on your own. #2 and #3 above may be combined. But #1 is probably better. #2 alone is probably the easiest to implement now. You may keep the OpenMP stuff too. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.