|
Message-ID: <20150823084034.GA16651@openwall.com> Date: Sun, 23 Aug 2015 11:40:35 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU On Sun, Aug 23, 2015 at 11:02:24AM +0300, Solar Designer wrote: > Unfortunately, when we're dealing with 64-bit types, the generated PTX > code includes extra mov's: > > { > .reg .b32 %dummy; > mov.b64 {%r15,%dummy}, %rd82; > } > { > .reg .b32 %dummy; > mov.b64 {%dummy,%r16}, %rd82; > } > shf.r.wrap.b32 %r17, %r16, %r15, 24; > shf.r.wrap.b32 %r18, %r15, %r16, 24; > > These are simply to extract the 32-bit halves as needed for the shf > instructions. The mov's should be gone and proper registers > substituted right into the shf instructions in the final ISA code - > however, I am not sure this is what is actually happening (depends on > how good the translator from PTX to native ISA is). I passed the PTX code through "ptxas --gpu-name sm_35" and nvdisasm, and it looks OK in this respect: /*28d0*/ LOP.XOR R26, R248, R12; /*28d8*/ LOP.XOR R36, R32, R13; /*28e0*/ LOP.XOR R37, R25, R31; /*28e8*/ IADD.X R239, R24, R27; /*28f0*/ LDL.64 R24, [R141+0x58]; /*28f8*/ IADD R35.CC, R232, c[0x3][0x0]; /*2908*/ SHF.R.W R242, R26, 0x18, R36; /*2910*/ LOP.XOR R7, R7, c[0x3][0x34]; /*2918*/ SHF.R.W R39, R36, 0x18, R26; Here we can see R26 and R36 come directly from LOP.XOR, without MOV. Also interesting is LDL.64. I guess it loads two adjacent registers (R24 and R25 in this example), which under the SIMT model are 32-bit elements in two different hardware SIMD registers. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.