|
|
Message-ID: <20150823084034.GA16651@openwall.com>
Date: Sun, 23 Aug 2015 11:40:35 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Argon2 on GPU
On Sun, Aug 23, 2015 at 11:02:24AM +0300, Solar Designer wrote:
> Unfortunately, when we're dealing with 64-bit types, the generated PTX
> code includes extra mov's:
>
> {
> .reg .b32 %dummy;
> mov.b64 {%r15,%dummy}, %rd82;
> }
> {
> .reg .b32 %dummy;
> mov.b64 {%dummy,%r16}, %rd82;
> }
> shf.r.wrap.b32 %r17, %r16, %r15, 24;
> shf.r.wrap.b32 %r18, %r15, %r16, 24;
>
> These are simply to extract the 32-bit halves as needed for the shf
> instructions. The mov's should be gone and proper registers
> substituted right into the shf instructions in the final ISA code -
> however, I am not sure this is what is actually happening (depends on
> how good the translator from PTX to native ISA is).
I passed the PTX code through "ptxas --gpu-name sm_35" and nvdisasm, and
it looks OK in this respect:
/*28d0*/ LOP.XOR R26, R248, R12;
/*28d8*/ LOP.XOR R36, R32, R13;
/*28e0*/ LOP.XOR R37, R25, R31;
/*28e8*/ IADD.X R239, R24, R27;
/*28f0*/ LDL.64 R24, [R141+0x58];
/*28f8*/ IADD R35.CC, R232, c[0x3][0x0];
/*2908*/ SHF.R.W R242, R26, 0x18, R36;
/*2910*/ LOP.XOR R7, R7, c[0x3][0x34];
/*2918*/ SHF.R.W R39, R36, 0x18, R26;
Here we can see R26 and R36 come directly from LOP.XOR, without MOV.
Also interesting is LDL.64. I guess it loads two adjacent registers
(R24 and R25 in this example), which under the SIMT model are 32-bit
elements in two different hardware SIMD registers.
Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.