|
|
Message-ID: <CAKGDhHUiN_2Xnr=Ee9XGyV9rFjOuo-rhg0-bFXJb4bRkMSy=2w@mail.gmail.com>
Date: Sun, 31 May 2015 13:39:43 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Parallel in OpenCL
I was having problems with all cards after removing "add 0"
instructions. (sometimes we need to call function with this
optimization and sometimes normal function)
I unrolled loops manually to do this and then the size of code
increased and results were worse.
I created 4 split kernels and I am getting better speed on my laptop
and on --dev=5. but I have still a problem with AMD GCN which has less
code cache size - 32KB. also instructions for gcn can take more size.
and the speed in my laptop is strangely fast
I'm stuck with AMD GCN
results:
none@...e ~/Desktop/parallel/run $ ./john --test --format=parallel-opencl
Device 0: GeForce GTX 960M
Local worksize (LWS) 64, global worksize (GWS) 32768
Benchmarking: Parallel-opencl [SHA-512 OpenCL]...
DONE
Speed for cost 1 (s) of 0, cost 2 (p) of 0
Many salts: 37236 c/s real, 37236 c/s virtual
Only one salt: 37236 c/s real, 37236 c/s virtual
[a@...er run]$ ./john --test --format=parallel-opencl --dev=5
Device 5: GeForce GTX TITAN
Local worksize (LWS) 64, global worksize (GWS) 32768
Benchmarking: Parallel-opencl [SHA-512 OpenCL]...
DONE
Speed for cost 1 (s) of 0, cost 2 (p) of 0
Many salts: 40206 c/s real, 40454 c/s virtual
Only one salt: 40206 c/s real, 40454 c/s virtual
GCN without "add 0" optimization
[a@...er run]$ ./john --test --format=parallel-opencl --dev=1
Device 1: Tahiti [AMD Radeon HD 7900 Series]
Building the kernel, this could take a while
Build log: LOOP UNROLL: pragma unroll (line 109)
Not unrolled because pragma requests no unroll
LOOP UNROLL: pragma unroll (line 663)
Not unrolled because pragma requests no unroll
LOOP UNROLL: pragma unroll (line 660)
Not unrolled because pragma requests no unroll
LOOP UNROLL: pragma unroll (line 219)
Not unrolled because pragma requests no unroll
LOOP UNROLL: pragma unroll (line 281)
Unrolled as requested!
Local worksize (LWS) 64, global worksize (GWS) 16384
Benchmarking: Parallel-opencl [SHA-512 OpenCL]...
Speed for cost 1 (s) of 0, cost 2 (p) of 0
Many salts: 45093 c/s real, 4915K c/s virtual
Only one salt: 45093 c/s real, 4915K c/s virtual
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 186 OBJECT LOCAL DEFAULT 5 __OpenCL_compile_options
2: 00000000 640 OBJECT LOCAL DEFAULT 6 __OpenCL_0_global
3: 00000280 559 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_
4: 00000000 40490 FUNC LOCAL DEFAULT 7 __OpenCL_parallel_kernel_
5: 000004af 32 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_
6: 000004cf 619 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_
7: 00009e2a 25454 FUNC LOCAL DEFAULT 7 __OpenCL_parallel_kernel_
8: 0000073a 32 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_
9: 0000075a 633 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_
10: 00010198 43430 FUNC LOCAL DEFAULT 7 __OpenCL_parallel_kernel_
11: 000009d3 32 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_
12: 000009f3 623 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_
13: 0001ab3e 43170 FUNC LOCAL DEFAULT 7 __OpenCL_parallel_kernel_
14: 00000c62 32 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_
GCN with unrolling one loop
[a@...er run]$ ./john --test --format=parallel-opencl --dev=1
Device 1: Tahiti [AMD Radeon HD 7900 Series]
Building the kernel, this could take a while
Build log: LOOP UNROLL: pragma unroll (line 109)
Not unrolled because pragma requests no unroll
LOOP UNROLL: pragma unroll (line 663)
Not unrolled because pragma requests no unroll
LOOP UNROLL: pragma unroll (line 660)
Not unrolled because pragma requests no unroll
LOOP UNROLL: pragma unroll (line 219)
Not unrolled because pragma requests no unroll
LOOP UNROLL: pragma unroll (line 281)
Unrolled as requested!
Local worksize (LWS) 64, global worksize (GWS) 16384
Benchmarking: Parallel-opencl [SHA-512 OpenCL]...
DONE
Speed for cost 1 (s) of 0, cost 2 (p) of 0
Many salts: 27536 c/s real, 3276K c/s virtual
Only one salt: 27536 c/s real, 3276K c/s virtual
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 186 OBJECT LOCAL DEFAULT 5 __OpenCL_compile_options
2: 00000000 640 OBJECT LOCAL DEFAULT 6 __OpenCL_0_global
3: 00000280 559 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_
4: 00000000 40490 FUNC LOCAL DEFAULT 7 __OpenCL_parallel_kernel_
5: 000004af 32 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_
6: 000004cf 619 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_
7: 00009e2a 56690 FUNC LOCAL DEFAULT 7 __OpenCL_parallel_kernel_
8: 0000073a 32 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_
9: 0000075a 633 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_
10: 00017b9c 43430 FUNC LOCAL DEFAULT 7 __OpenCL_parallel_kernel_
11: 000009d3 32 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_
12: 000009f3 623 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_
13: 00022542 43170 FUNC LOCAL DEFAULT 7 __OpenCL_parallel_kernel_
14: 00000c62 32 OBJECT LOCAL DEFAULT 6 __OpenCL_parallel_kernel_
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.