|
Message-ID: <b22ffe515d839935ce076432d3c7fa41@smtp.hushmail.com> Date: Thu, 8 Nov 2012 20:09:54 +0100 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: Split kernel for OpenCL WPA-PSK On 8 Nov, 2012, at 19:12 , magnum <john.magnum@...hmail.com> wrote: > Using device 0: Tahiti > Local worksize (LWS) 192, Global worksize (GWS) 196608 > Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [OpenCL]... DONE > Raw: 66197 c/s real, 137970 c/s virtual > This code too does over 2.1 billion SHA1/second, but CPU post-processing nearly halves the speed (without OMP). So I'm in the process of moving all of that post-processing to GPU. It's just a couple HMACs more, so I hope to exceed 120K c/s with that in place. Lol, while digging into that post processing, I found out that the (CPU side) prf_512() function of wpapsk.h did four times more work than needed. It produced an 80 byte key of which only 16 bytes was needed. Just with this fix, the Tahiti figure went up another 35%: Using device 0: Tahiti Local worksize (LWS) 256, Global worksize (GWS) 262144 Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [OpenCL]... DONE Raw: 89164 c/s real, 296207 c/s virtual This will affect CUDA too. Still, I'm proceeeding with implementing all of that post-processing on GPU. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.