|
Date: Fri, 21 Aug 2015 17:40:42 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU 2015-08-20 22:34 GMT+02:00 Solar Designer <solar@...nwall.com>: > Also, we're optimizing this blindfolded, and that's wrong. We should be > reviewing the generated code. You may patch common-opencl.c: > opencl_build_kernel_opt() to invoke opencl_build() like this: > > opencl_build(sequential_id, opts, 1, "kernel.out"); > > instead of the current: > > opencl_build(sequential_id, opts, 0, NULL); > > Then when targeting NVIDIA cards it dumps PTX assembly to the filename > specified there. It looks something like this, just much larger: > > http://arrayfire.com/demystifying-ptx-code/ > > You could start by experimenting with a much simpler than Argon2 yet in > some ways similar kernel: implement some trivial operation like XOR on > different vector widths and see whether/how this changes the assembly. > Then make it slightly less trivial (just enough to prevent the compiler > from optimizing things out) and add uses of private or local memory, > and see if you can make it run faster by using wider vectors per the > same private or local memory usage. > I tested (only 960m) -copying memory from __private to __private - from __global to __private -xoring private tables with __prrivate tables using ulong, ulong2, ulong4, ulong8 (I was getting empty kernel using ulong16) in generated PTX code ulong4 and ulong8 were changed to ulong2 something like here (uong4): ld.global.v2.u64 {%rd73, %rd74}, [%rd926+8000]; ld.global.v2.u64 {%rd77, %rd78}, [%rd926+8016]; st.local.v2.u64 [%rd937+208], {%rd77, %rd78}; st.local.v2.u64 [%rd937+192], {%rd73, %rd74}; I was getting the best speed on ulong ( except copying from global to private ) speeds: xoring: //1 Local worksize (LWS) 64, global worksize (GWS) 256 DONE Speed for cost 1 (t) of 2, cost 2 (m) of 2 Many salts: 1312 c/s real, 1312 c/s virtual Only one salt: 1301 c/s real, 1312 c/s virtual //2 Local worksize (LWS) 64, global worksize (GWS) 256 DONE Speed for cost 1 (t) of 2, cost 2 (m) of 2 Many salts: 590 c/s real, 590 c/s virtual Only one salt: 595 c/s real, 595 c/s virtual //4 Local worksize (LWS) 64, global worksize (GWS) 256 DONE Speed for cost 1 (t) of 2, cost 2 (m) of 2 Many salts: 914 c/s real, 914 c/s virtual Only one salt: 906 c/s real, 898 c/s virtual //8 Speed for cost 1 (t) of 2, cost 2 (m) of 2 Many salts: 738 c/s real, 731 c/s virtual Only one salt: 738 c/s real, 738 c/s virtual copying from global: //1 Local worksize (LWS) 64, global worksize (GWS) 256 DONE Speed for cost 1 (t) of 2, cost 2 (m) of 2 Many salts: 853 c/s real, 860 c/s virtual Only one salt: 860 c/s real, 860 c/s virtual //2 Local worksize (LWS) 64, global worksize (GWS) 256 DONE Speed for cost 1 (t) of 2, cost 2 (m) of 2 Many salts: 1174 c/s real, 1185 c/s virtual Only one salt: 1174 c/s real, 1163 c/s virtual //4,8 Local worksize (LWS) 64, global worksize (GWS) 256 DONE Speed for cost 1 (t) of 2, cost 2 (m) of 2 Many salts: 1122 c/s real, 1113 c/s virtual Only one salt: 1132 c/s real, 1132 c/s virtual copying from private: //1 */ /* Local worksize (LWS) 64, global worksize (GWS) 256 DONE Speed for cost 1 (t) of 2, cost 2 (m) of 2 Many salts: 2258 c/s real, 2236 c/s virtual Only one salt: 2258 c/s real, 2258 c/s virtual*/ //2 /* Local worksize (LWS) 64, global worksize (GWS) 256 DONE Speed for cost 1 (t) of 2, cost 2 (m) of 2 Many salts: 685 c/s real, 679 c/s virtual Only one salt: 685 c/s real, 691 c/s virtual */ //4 /* Local worksize (LWS) 64, global worksize (GWS) 256 DONE Speed for cost 1 (t) of 2, cost 2 (m) of 2 Many salts: 1153 c/s real, 1142 c/s virtual Only one salt: 1163 c/s real, 1163 c/s virtual */ //8 /*Local worksize (LWS) 64, global worksize (GWS) 256 DONE Speed for cost 1 (t) of 2, cost 2 (m) of 2 Many salts: 1796 c/s real, 1796 c/s virtual Only one salt: 1812 c/s real, 1812 c/s virtual */
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.