|
Message-ID: <CAKGDhHUG8dzXfSxLwFGBo61qhzoawz=uixKEwn_bfiDoY_G7jQ@mail.gmail.com> Date: Sun, 26 Apr 2015 15:35:02 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: [GSoC] John the Ripper support for PHC finalists my code is on branch "vectors" 2015-04-26 15:28 GMT+02:00 Agnieszka Bielec <bielecagnieszka8@...il.com>: > 2015-04-25 22:28 GMT+02:00 Agnieszka Bielec <bielecagnieszka8@...il.com>: >> 2015-04-25 21:20 GMT+02:00 Solar Designer <solar@...nwall.com>: >>> I think you should use a vector data type instead of the separate >>> work-items, much like the AVX2 code on CPU uses __m256i and doesn't >>> rely on the compiler's auto-vectorization. Granted, auto-vectorization >>> is much more common with OpenCL than with C, but you can help the >>> compiler by doing a part of it explicitly anyway (and then it'd be the >>> compiler's job to combine these narrow SIMD portions into possibly wider >>> SIMD that the hardware might need). I think you need ulong4 there: >>> >>> https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/vectorDataTypes.html >> >> sorry, I didn't tested this. I have noticed now that in function H >> index_global and index_local are always divisible by 4 so I can use >> this with coalescing > > results with version with vectors: > > [a@...er run]$ ./john --test --format=pomelo-opencl --cost=2:2,2:2 --dev=1 > Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient, > development use only)]... Device 1: Tahiti [AMD Radeon HD 7900 Series] > Local worksize (LWS) 64, global worksize (GWS) 4096 > DONE > Speed for cost 1 (N) of 2, cost 2 (r) of 2 > Raw: 87487 c/s real, 9011K c/s virtual > > [a@...er run]$ ./john --test --format=pomelo-opencl --cost=2:2,2:2 --dev=5 > Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient, > development use only)]... Device 5: GeForce GTX TITAN > Local worksize (LWS) 64, global worksize (GWS) 4096 > DONE > Speed for cost 1 (N) of 2, cost 2 (r) of 2 > Raw: 59650 c/s real, 59650 c/s virtual > > results with the previous version: > > [a@...er run]$ ./john --test --format=pomelo-opencl --cost=2:2,2:2 --dev=5 > Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient, > development use only)]... Device 5: GeForce GTX TITAN > Local worksize (LWS) 64, global worksize (GWS) 8192 > DONE > Speed for cost 1 (N) of 2, cost 2 (r) of 2 > Raw: 82671 c/s real, 82671 c/s virtual > > [a@...er run]$ ./john --test --format=pomelo-opencl --cost=2:2,2:2 --dev=1 > Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient, > development use only)]... Device 1: Tahiti [AMD Radeon HD 7900 Series] > Local worksize (LWS) 64, global worksize (GWS) 2048 > DONE > Speed for cost 1 (N) of 2, cost 2 (r) of 2 > Raw: 77053 c/s real, 3891K c/s virtual
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.