|
Message-ID: <20150823080224.GA16570@openwall.com> Date: Sun, 23 Aug 2015 11:02:24 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU On Sun, Aug 23, 2015 at 10:21:53AM +0300, Solar Designer wrote: > For starters, we should try OpenCL's rotate() and see if it translates > to decent PTX code these days. Right now, opencl_blake2.h: rotr64() > does not yet use rotate(), while opencl_blake2-round-no-msg.h does. We > should switch both to use the same approach, at least to make reviewing > the generated PTX code easier. The body of rotr64() should be: return rotate(w, (ulong)(64 - c)); (or we can turn it into a macro, not to rely on the inlining). Unfortunately, when we're dealing with 64-bit types, the generated PTX code includes extra mov's: { .reg .b32 %dummy; mov.b64 {%r15,%dummy}, %rd82; } { .reg .b32 %dummy; mov.b64 {%dummy,%r16}, %rd82; } shf.r.wrap.b32 %r17, %r16, %r15, 24; shf.r.wrap.b32 %r18, %r15, %r16, 24; These are simply to extract the 32-bit halves as needed for the shf instructions. The mov's should be gone and proper registers substituted right into the shf instructions in the final ISA code - however, I am not sure this is what is actually happening (depends on how good the translator from PTX to native ISA is). I think this also serves to illustrate why working with 32-bit values or vector elements at OpenCL source level is a safer bet... although then we'd need to find and use the right intrinsics for funnel shift in OpenCL. AMD has it as amd_bitalign(), but I don't know if NVIDIA has an equivalent now (maybe the same funnel shift intrinsics names as they use in CUDA?) Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.