|
Message-ID: <CAKGDhHV8XSpJiiQr0sz=5_A0=0jSbah3VyaQMuX9bmXebh3HvA@mail.gmail.com> Date: Sun, 26 Apr 2015 15:28:37 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: [GSoC] John the Ripper support for PHC finalists 2015-04-25 22:28 GMT+02:00 Agnieszka Bielec <bielecagnieszka8@...il.com>: > 2015-04-25 21:20 GMT+02:00 Solar Designer <solar@...nwall.com>: >> I think you should use a vector data type instead of the separate >> work-items, much like the AVX2 code on CPU uses __m256i and doesn't >> rely on the compiler's auto-vectorization. Granted, auto-vectorization >> is much more common with OpenCL than with C, but you can help the >> compiler by doing a part of it explicitly anyway (and then it'd be the >> compiler's job to combine these narrow SIMD portions into possibly wider >> SIMD that the hardware might need). I think you need ulong4 there: >> >> https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/vectorDataTypes.html > > sorry, I didn't tested this. I have noticed now that in function H > index_global and index_local are always divisible by 4 so I can use > this with coalescing results with version with vectors: [a@...er run]$ ./john --test --format=pomelo-opencl --cost=2:2,2:2 --dev=1 Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient, development use only)]... Device 1: Tahiti [AMD Radeon HD 7900 Series] Local worksize (LWS) 64, global worksize (GWS) 4096 DONE Speed for cost 1 (N) of 2, cost 2 (r) of 2 Raw: 87487 c/s real, 9011K c/s virtual [a@...er run]$ ./john --test --format=pomelo-opencl --cost=2:2,2:2 --dev=5 Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient, development use only)]... Device 5: GeForce GTX TITAN Local worksize (LWS) 64, global worksize (GWS) 4096 DONE Speed for cost 1 (N) of 2, cost 2 (r) of 2 Raw: 59650 c/s real, 59650 c/s virtual results with the previous version: [a@...er run]$ ./john --test --format=pomelo-opencl --cost=2:2,2:2 --dev=5 Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient, development use only)]... Device 5: GeForce GTX TITAN Local worksize (LWS) 64, global worksize (GWS) 8192 DONE Speed for cost 1 (N) of 2, cost 2 (r) of 2 Raw: 82671 c/s real, 82671 c/s virtual [a@...er run]$ ./john --test --format=pomelo-opencl --cost=2:2,2:2 --dev=1 Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient, development use only)]... Device 1: Tahiti [AMD Radeon HD 7900 Series] Local worksize (LWS) 64, global worksize (GWS) 2048 DONE Speed for cost 1 (N) of 2, cost 2 (r) of 2 Raw: 77053 c/s real, 3891K c/s virtual
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.