|
Message-ID: <CAKGDhHUiN_2Xnr=Ee9XGyV9rFjOuo-rhg0-bFXJb4bRkMSy=2w@mail.gmail.com> Date: Sun, 31 May 2015 13:39:43 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Parallel in OpenCL I was having problems with all cards after removing "add 0" instructions. (sometimes we need to call function with this optimization and sometimes normal function) I unrolled loops manually to do this and then the size of code increased and results were worse. I created 4 split kernels and I am getting better speed on my laptop and on --dev=5. but I have still a problem with AMD GCN which has less code cache size - 32KB. also instructions for gcn can take more size. and the speed in my laptop is strangely fast I'm stuck with AMD GCN results: none@...e ~/Desktop/parallel/run $ ./john --test --format=parallel-opencl Device 0: GeForce GTX 960M Local worksize (LWS) 64, global worksize (GWS) 32768 Benchmarking: Parallel-opencl [SHA-512 OpenCL]... DONE Speed for cost 1 (s) of 0, cost 2 (p) of 0 Many salts: 37236 c/s real, 37236 c/s virtual Only one salt: 37236 c/s real, 37236 c/s virtual [a@...er run]$ ./john --test --format=parallel-opencl --dev=5 Device 5: GeForce GTX TITAN Local worksize (LWS) 64, global worksize (GWS) 32768 Benchmarking: Parallel-opencl [SHA-512 OpenCL]... DONE Speed for cost 1 (s) of 0, cost 2 (p) of 0 Many salts: 40206 c/s real, 40454 c/s virtual Only one salt: 40206 c/s real, 40454 c/s virtual GCN without "add 0" optimization [a@...er run]$ ./john --test --format=parallel-opencl --dev=1 Device 1: Tahiti [AMD Radeon HD 7900 Series] Building the kernel, this could take a while Build log: LOOP UNROLL: pragma unroll (line 109) Not unrolled because pragma requests no unroll LOOP UNROLL: pragma unroll (line 663) Not unrolled because pragma requests no unroll LOOP UNROLL: pragma unroll (line 660) Not unrolled because pragma requests no unroll LOOP UNROLL: pragma unroll (line 219) Not unrolled because pragma requests no unroll LOOP UNROLL: pragma unroll (line 281) Unrolled as requested! Local worksize (LWS) 64, global worksize (GWS) 16384 Benchmarking: Parallel-opencl [SHA-512 OpenCL]... Speed for cost 1 (s) of 0, cost 2 (p) of 0 Many salts: 45093 c/s real, 4915K c/s virtual Only one salt: 45093 c/s real, 4915K c/s virtual Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00000000 186 OBJECT LOCAL DEFAULT 5 __OpenCL_compile_options 2: 00000000 640 OBJECT LOCAL DEFAULT 6 __OpenCL_0_global 3: 00000280 559 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_ 4: 00000000 40490 FUNC LOCAL DEFAULT 7 __OpenCL_parallel_kernel_ 5: 000004af 32 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_ 6: 000004cf 619 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_ 7: 00009e2a 25454 FUNC LOCAL DEFAULT 7 __OpenCL_parallel_kernel_ 8: 0000073a 32 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_ 9: 0000075a 633 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_ 10: 00010198 43430 FUNC LOCAL DEFAULT 7 __OpenCL_parallel_kernel_ 11: 000009d3 32 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_ 12: 000009f3 623 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_ 13: 0001ab3e 43170 FUNC LOCAL DEFAULT 7 __OpenCL_parallel_kernel_ 14: 00000c62 32 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_ GCN with unrolling one loop [a@...er run]$ ./john --test --format=parallel-opencl --dev=1 Device 1: Tahiti [AMD Radeon HD 7900 Series] Building the kernel, this could take a while Build log: LOOP UNROLL: pragma unroll (line 109) Not unrolled because pragma requests no unroll LOOP UNROLL: pragma unroll (line 663) Not unrolled because pragma requests no unroll LOOP UNROLL: pragma unroll (line 660) Not unrolled because pragma requests no unroll LOOP UNROLL: pragma unroll (line 219) Not unrolled because pragma requests no unroll LOOP UNROLL: pragma unroll (line 281) Unrolled as requested! Local worksize (LWS) 64, global worksize (GWS) 16384 Benchmarking: Parallel-opencl [SHA-512 OpenCL]... DONE Speed for cost 1 (s) of 0, cost 2 (p) of 0 Many salts: 27536 c/s real, 3276K c/s virtual Only one salt: 27536 c/s real, 3276K c/s virtual 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00000000 186 OBJECT LOCAL DEFAULT 5 __OpenCL_compile_options 2: 00000000 640 OBJECT LOCAL DEFAULT 6 __OpenCL_0_global 3: 00000280 559 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_ 4: 00000000 40490 FUNC LOCAL DEFAULT 7 __OpenCL_parallel_kernel_ 5: 000004af 32 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_ 6: 000004cf 619 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_ 7: 00009e2a 56690 FUNC LOCAL DEFAULT 7 __OpenCL_parallel_kernel_ 8: 0000073a 32 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_ 9: 0000075a 633 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_ 10: 00017b9c 43430 FUNC LOCAL DEFAULT 7 __OpenCL_parallel_kernel_ 11: 000009d3 32 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_ 12: 000009f3 623 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_ 13: 00022542 43170 FUNC LOCAL DEFAULT 7 __OpenCL_parallel_kernel_ 14: 00000c62 32 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.