|
Message-ID: <CABob6irwDj=CMKA729AkxDLwDx-_zLZyXKLDMKAFv4i3htq=gQ@mail.gmail.com> Date: Tue, 2 Jun 2015 02:38:22 +0200 From: Lukas Odzioba <lukas.odzioba@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Parallel in OpenCL 2015-05-31 13:39 GMT+02:00 Agnieszka Bielec <bielecagnieszka8@...il.com>: > [a@...er run]$ ./john --test --format=parallel-opencl --dev=1 > Local worksize (LWS) 64, global worksize (GWS) 16384 > Benchmarking: Parallel-opencl [SHA-512 OpenCL]... > Many salts: 27536 c/s real, 3276K c/s virtual diff --git a/src/opencl/parallel_kernel.cl b/src/opencl/parallel_kernel.cl index 1a18e14..b50c1e2 100755 --- a/src/opencl/parallel_kernel.cl +++ b/src/opencl/parallel_kernel.cl @@ -512,19 +512,18 @@ inline void hash(void *message, unsigned int length, void *out, unsigned int out m_messageLengthLo += (LENGTH); \ } -#define SIMPLE(tmpJ,key) { \ - for(k=0;k<8;k++) \ - ((unsigned char*)m_block)[k]=((unsigned char*) (tmpJ))[k];\ -\ - for(k=0;k<(HASH_LENGTH);k++) \ - ((unsigned char*)m_block)[8+k]=((unsigned char*) (key))[k];\ -\ - ((unsigned char*) m_block)[72] = 0x80; \ - for(i=0;i<55;i++) \ - ((unsigned char*) m_block)[73+i]=0; \ -\ - m_block[15] = 576; \ - sha512Block_Z(m_block, m_state); \ +#define SIMPLE(j,key) { \ + m_block[0]=SWAP_ENDIAN_64(j);\ + for(k=0;k<(HASH_LENGTH/8);k++) \ + m_block[1+k]=key[k];\ + m_block[9]=0x80;\ + m_block[10]=0;\ + m_block[11]=0;\ + m_block[12]=0;\ + m_block[13]=0;\ + m_block[14]=0;\ + m_block[15] = 576;\ + sha512Block_Z(m_block, m_state);\ } //to do: why change 55 to 10, makes slower speed from 32k to 24 k? @@ -645,9 +644,8 @@ __kernel void parallel_kernel_loop(__global const uchar * in, for (j = 0; j < parallelLoops; j++) { // work ^= hash(WRITE_BIG_ENDIAN_64(j) || key) - tmpJ = SWAP_ENDIAN_64(j); - SIMPLE(&tmpJ, key); + SIMPLE(j, key); for (k = 0; k < HASH_LENGTH / sizeof(unsigned long); k++) { work[k] ^= SWAP_ENDIAN_64(m_state[k]);//to do: test on another GPUs ~ 27536->28743 Free 1k c/s, ideally we would get rid of constants like m_block[i] = 0 and assume such values in SHA512_Z body, which means less memory ops and less alu ops. Copying key[] seems to be unnecesary also, just introduce references to key in SHA512 and forget about garbage in m_block table, iirc it will be overwritten later with the proper value. Slowly step by step make it simple not every change will give you thousands of c/s right away, but each will likely add something to the end result. Compiler can be good at optimizing unoptimal code, but it will not do the job for us. Please keep attention to the details. Thanks, Lukas
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.