|
Message-ID: <20150425192015.GC21470@openwall.com> Date: Sat, 25 Apr 2015 22:20:15 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: [GSoC] John the Ripper support for PHC finalists Agnieszka, On Wed, Apr 15, 2015 at 04:41:33PM +0200, Agnieszka Bielec wrote: > I've modified my previous algorithm and now it computes one hash on 4 GPU units. > it's better coalescing now "4 GPU units" sounded weird and confusing to me. I finally took a look at commit d259f3be880e9f55e2424d9819cf56e2987302fe. It appears that you're trying to simulate 4x SIMD by separate work-items. Does this work reliably? I think you'd need a barrier before you can combine the 4 intermediate results into one hash output, and you don't appear to have that. I am actually surprised this works at all for you. Am I missing something? If you put a hash of a known password in a text file, and put the password somewhere down a wordlist, and run "john" with pomelo-opencl against that, will it crack the hash? I expect it won't, since the code looks broken to me. You don't actually have any test vectors in opencl_pomelo_fmt_plug.c, so when you --test you don't actually test, right? :-( You got to fix that, or you might have major bugs go unnoticed for months while you "optimize". I think you should use a vector data type instead of the separate work-items, much like the AVX2 code on CPU uses __m256i and doesn't rely on the compiler's auto-vectorization. Granted, auto-vectorization is much more common with OpenCL than with C, but you can help the compiler by doing a part of it explicitly anyway (and then it'd be the compiler's job to combine these narrow SIMD portions into possibly wider SIMD that the hardware might need). I think you need ulong4 there: https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/vectorDataTypes.html Also, I think you're over-using "unsigned long", such as for loop variables. This may have performance impact on GPUs. You should prefer "uint" except where a wider type is actually needed. You may also try "#pragma unroll" and "#pragma unroll N" before loops, see e.g. cryptsha512_kernel_GPU.cl for examples. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.