|
Message-ID: <CAKGDhHUM3AA_zfnbuQ3LS40r8xov3sD+qdBsq31xCKY2C-uKng@mail.gmail.com> Date: Mon, 6 Jul 2015 16:56:11 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Lyra2 on GPU 2015-07-05 9:53 GMT+02:00 Solar Designer <solar@...nwall.com>: > Agnieszka, > > On Sat, Jul 04, 2015 at 02:04:26AM +0200, Agnieszka Bielec wrote: >> my optimizations are based on transfer one table to local memory and >> copying small portions of global memory into local buffers, I didn't >> saw any sense i coalescing and I didn't tried it > > Please also try going in the opposite direction: keep more stuff in > global memory, reduce use of local memory per instance to the point > where you can use a lot higher GWS - like 20480 (10x higher than what's > auto-tuned now) or even higher. This may result in a speedup through > hiding of global memory access latencies due to the greater concurrency. it's my first version, I'm including results for costs 16 16, 1 20 and 1 28. benchmarking doesn't work good in my old version and I'm setting GWS manually, note that I'm getting CL_INVALID_BUFFER_SIZE for GWS=8192 and cost 16 16. it's 3GB. I said that I'm using local memory but I wanted to say __private , sorry if caused confusion [a@...er run]$ ./john --test --format=lyra2-opencl --cost=16:16,16:16 Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series] memory per hash : 384.00 kB Local worksize (LWS) 64, global worksize (GWS) 1024 DONE Speed for cost 1 (t) of 16, cost 2 (m) of 16, cost 3 (c) of 256, cost 4 (p) of 2 Raw: 1932 c/s real, 51200 c/s virtual [a@...er run]$ GWS=1024 ./john --test --format=lyra2-old-pencl --cost=16:16,16:16 Benchmarking: Lyra2-old-pencl [Lyra2 OpenCL (inefficient, development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series] memory per hash : 384.00 kB Local worksize (LWS) 64, global worksize (GWS) 1024 DONE Speed for cost 1 (t) of 16, cost 2 (m) of 16, cost 3 (c) of 256, cost 4 (p) of 2 Raw: 769 c/s real, 34133 c/s virtual GWS=8192 ./john --test --format=lyra2-old-pencl --cost=16:16,16:16 Benchmarking: Lyra2-old-pencl [Lyra2 OpenCL (inefficient, development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series] memory per hash : 384.00 kB OpenCL error (CL_INVALID_BUFFER_SIZE) in file (opencl_lyra2_old_fmt_plug.c) at line (170) - (Error creating device buffer) [a@...er run]$ ./john --test --format=lyra2-opencl --cost=1:1,20:20 Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series] memory per hash : 480.00 kB Local worksize (LWS) 64, global worksize (GWS) 1024 DONE Speed for cost 1 (t) of 1, cost 2 (m) of 20, cost 3 (c) of 256, cost 4 (p) of 2 Raw: 9660 c/s real, 78769 c/s virtual [a@...er run]$ ./john --test --format=lyra2-old-pencl --cost=1:1,20:20 Benchmarking: Lyra2-old-pencl [Lyra2 OpenCL (inefficient, development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series] memory per hash : 480.00 kB Local worksize (LWS) 64, global worksize (GWS) 256 DONE Speed for cost 1 (t) of 1, cost 2 (m) of 20, cost 3 (c) of 256, cost 4 (p) of 2 Raw: 1969 c/s real, 51200 c/s virtual [a@...er run]$ GWS=512 ./john --test --format=lyra2-old-pencl --cost=1:1,20:20 Benchmarking: Lyra2-old-pencl [Lyra2 OpenCL (inefficient, development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series] memory per hash : 480.00 kB Local worksize (LWS) 64, global worksize (GWS) 512 DONE Speed for cost 1 (t) of 1, cost 2 (m) of 20, cost 3 (c) of 256, cost 4 (p) of 2 Raw: 3318 c/s real, 51200 c/s virtual [a@...er run]$ GWS=1024 ./john --test --format=lyra2-old-pencl --cost=1:1,20:20 Benchmarking: Lyra2-old-pencl [Lyra2 OpenCL (inefficient, development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series] memory per hash : 480.00 kB Local worksize (LWS) 64, global worksize (GWS) 1024 DONE Speed for cost 1 (t) of 1, cost 2 (m) of 20, cost 3 (c) of 256, cost 4 (p) of 2 Raw: 3938 c/s real, 51200 c/s virtual [a@...er run]$ GWS=2048 ./john --test --format=lyra2-old-pencl --cost=1:1,20:20 Benchmarking: Lyra2-old-pencl [Lyra2 OpenCL (inefficient, development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series] memory per hash : 480.00 kB Local worksize (LWS) 64, global worksize (GWS) 2048 DONE Speed for cost 1 (t) of 1, cost 2 (m) of 20, cost 3 (c) of 256, cost 4 (p) of 2 Raw: 2178 c/s real, 51200 c/s virtual [a@...er run]$ ./john --test --format=lyra2-opencl --cost=1:1,28:28 Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series] memory per hash : 672.00 kB Local worksize (LWS) 64, global worksize (GWS) 1024 DONE Speed for cost 1 (t) of 1, cost 2 (m) of 28, cost 3 (c) of 256, cost 4 (p) of 2 Raw: 7123 c/s real, 51200 c/s virtual [a@...er run]$ GWS=1024 ./john --test --format=lyra2-old-pencl --cost=1:1,28:28 Benchmarking: Lyra2-old-pencl [Lyra2 OpenCL (inefficient, development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series] memory per hash : 672.00 kB Local worksize (LWS) 64, global worksize (GWS) 1024 DONE Speed for cost 1 (t) of 1, cost 2 (m) of 28, cost 3 (c) of 256, cost 4 (p) of 2 Raw: 2718 c/s real, 51200 c/s virtual
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.