Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7b43b6f9e32f47d3e3473317462fa7c7@smtp.hushmail.com>
Date: Fri, 17 Jul 2015 21:09:15 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Lyra2 on GPU

On 2015-07-17 20:41, magnum wrote:
> On 2015-07-17 20:03, Agnieszka Bielec wrote:
>> 2015-07-17 18:29 GMT+02:00 magnum <john.magnum@...hmail.com>:
>>> I tried building your code but it's broken for OSX:
>>
>> OS X doesn't have pthread_barrier_t but I had problems with speed on
>> lyra2-lm on super when I was using barriers in openmp, and only this
>> change made that the speed is normal, I will decide what I will do
>> with this error, if you want to test my code I recommend you to switch
>> to commit e6a532b40e4c98418913075b5407e50765f2298a because my newest
>> commit works on super on both cards but in my laptop doesn't work when
>> LWS=GWS (cmp_all(1) failed) and I don't know if this is bug in my code
>> or somewhere else. and to make my code compiling it's enough to remove
>> files whose name begin with "Lyra2"
>
> I'll try that.
>
> Perhaps you can use the pthread barrier stuff "#ifndef APPLE", with a
> fallback to OpenMP barriers. Or we could check for it in autoconf.

Several other problems. The yescrypt-opencl format use "ulong" which 
doesn't exist (on host side) here. You need to use cl_ulong or uint64_t 
instead. And I see lots of uses of "long" too. This should never be used 
in host code - it may end up as 32-bit. You probably want to use int64_t 
or cl_long for them.

The kernel build produces a boatload of warnings that you need to fix:

--8<------8<------8<------8<------8<------8<------8<----

$ ../run/john -test -form:lyra2-opencl -dev=2
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use 
only)]... Device 2: GeForce GT 650M
Build log: <program source>:49:6: warning: no previous prototype for 
function 'lyra2_initState'
void lyra2_initState(__global ulong * state)
      ^
<program source>:109:6: warning: no previous prototype for function 
'lyra2_absorbInput'
void lyra2_absorbInput(__global ulong * memMatrixGPU,
      ^
<program source>:215:16: warning: comparison of integers of different 
signs: 'int' and 'unsigned int'
         for (i = 0; i < nBlocksInput * BLOCK_LEN_BLAKE2_SAFE_BYTES; i++) {
              ~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<program source>:220:2: warning: comparison of integers of different 
signs: 'int' and 'uint' (aka 'unsigned int')
         memcpy(ptrByte, ptrByteSource, inlen);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<program source>:47:32: note: expanded from macro 'memcpy'
#define memcpy(dst, src, size) gmemcpy(dst, src, size)
                                ^~~~~~~~~~~~~~~~~~~~~~~
<program source>:43:16: note: expanded from macro 'gmemcpy'
     for(mi=0;mi<(size);mi++)            \
              ~~^
<program source>:227:2: warning: comparison of integers of different 
signs: 'int' and 'uint' (aka 'unsigned int')
         memcpy(ptrByte, ptrByteSource, saltlen);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<program source>:47:32: note: expanded from macro 'memcpy'
#define memcpy(dst, src, size) gmemcpy(dst, src, size)
                                ^~~~~~~~~~~~~~~~~~~~~~~
<program source>:43:16: note: expanded from macro 'gmemcpy'
     for(mi=0;mi<(size);mi++)            \
              ~~^
<program source>:231:2: warning: comparison of integers of different 
signs: 'int' and 'unsigned int'
         glmemcpy(ptrByte, &kLen, sizeof(int));
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<program source>:38:16: note: expanded from macro 'glmemcpy'
     for(mi=0;mi<(size);mi++)            \
              ~~^
<program source>:233:2: warning: comparison of integers of different 
signs: 'int' and 'unsigned int'
         glmemcpy(ptrByte, &inlen, sizeof(int));
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<program source>:38:16: note: expanded from macro 'glmemcpy'
     for(mi=0;mi<(size);mi++)            \
              ~~^
<program source>:235:2: warning: comparison of integers of different 
signs: 'int' and 'unsigned int'
         glmemcpy(ptrByte, &saltlen, sizeof(int));
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<program source>:38:16: note: expanded from macro 'glmemcpy'
     for(mi=0;mi<(size);mi++)            \
              ~~^
<program source>:237:2: warning: comparison of integers of different 
signs: 'int' and 'unsigned int'
         memcpy(ptrByte, &(salt->t_cost), sizeof(int));
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<program source>:47:32: note: expanded from macro 'memcpy'
#define memcpy(dst, src, size) gmemcpy(dst, src, size)
                                ^~~~~~~~~~~~~~~~~~~~~~~
<program source>:43:16: note: expanded from macro 'gmemcpy'
     for(mi=0;mi<(size);mi++)            \
              ~~^
<program source>:239:2: warning: comparison of integers of different 
signs: 'int' and 'unsigned int'
         memcpy(ptrByte, &(salt->m_cost), sizeof(int));
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<program source>:47:32: note: expanded from macro 'memcpy'
#define memcpy(dst, src, size) gmemcpy(dst, src, size)
                                ^~~~~~~~~~~~~~~~~~~~~~~
<program source>:43:16: note: expanded from macro 'gmemcpy'
     for(mi=0;mi<(size);mi++)            \
              ~~^
<program source>:241:2: warning: comparison of integers of different 
signs: 'int' and 'unsigned int'
         memcpy(ptrByte, &(salt->nCols), sizeof(int));
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<program source>:47:32: note: expanded from macro 'memcpy'
#define memcpy(dst, src, size) gmemcpy(dst, src, size)
                                ^~~~~~~~~~~~~~~~~~~~~~~
<program source>:43:16: note: expanded from macro 'gmemcpy'
     for(mi=0;mi<(size);mi++)            \
              ~~^
<program source>:247:3: warning: comparison of integers of different 
signs: 'int' and 'unsigned int'
                 glmemcpy(ptrByte, &nPARALLEL, sizeof(int));
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<program source>:38:16: note: expanded from macro 'glmemcpy'
     for(mi=0;mi<(size);mi++)            \
              ~~^
<program source>:251:3: warning: comparison of integers of different 
signs: 'int' and 'unsigned int'
                 glmemcpy(ptrByte, &thread, sizeof(int));
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<program source>:38:16: note: expanded from macro 'glmemcpy'
     for(mi=0;mi<(size);mi++)            \
              ~~^
<program source>:300:16: warning: comparison of integers of different 
signs: 'int' and 'uint' (aka 'unsigned int')
         for (i = 0; i < N_COLS; i++) {
              ~ ^ ~~~~~~
<program source>:349:16: warning: comparison of integers of different 
signs: 'int' and 'uint' (aka 'unsigned int')
         for (i = 0; i < N_COLS; i++) {
              ~ ^ ~~~~~~
<program source>:388:16: warning: comparison of integers of different 
signs: 'int' and 'uint' (aka 'unsigned int')
         for (i = 0; i < N_COLS; i++) {
              ~ ^ ~~~~~~
<program source>:424:6: warning: no previous prototype for function 
'reducedDuplexRowFilling'
void reducedDuplexRowFilling(ulong * state,
      ^
<program source>:462:16: warning: comparison of integers of different 
signs: 'int' and 'uint' (aka 'unsigned int')
         for (i = 0; i < N_COLS; i++) {
              ~ ^ ~~~~~~
<program source>:518:6: warning: no previous prototype for function 
'reducedDuplexRowWanderingParallel'
void reducedDuplexRowWanderingParallel(__global ulong * memMatrixGPU,
      ^
<program source>:553:16: warning: comparison of integers of different 
signs: 'int' and 'uint' (aka 'unsigned int')
         for (i = 0; i < N_COLS; i++) {
              ~ ^ ~~~~~~
<program source>:601:6: warning: no previous prototype for function 
'absorbRandomColumn'
void absorbRandomColumn(__global ulong * in, ulong * state,
      ^
<program source>:633:6: warning: no previous prototype for function 
'wanderingPhaseGPU2'
void wanderingPhaseGPU2(__global ulong * memMatrixGPU,
      ^
<program source>:778:16: warning: comparison of integers of different 
signs: 'unsigned int' and 'int'
         for (i = 0; i < fullBlocks; i++) {
              ~ ^ ~~~~~~~~~~
<program source>:788:6: warning: no previous prototype for function 
'reducedDuplexRowFilling_P1'
void reducedDuplexRowFilling_P1(ulong * state,
      ^
<program source>:819:16: warning: comparison of integers of different 
signs: 'int' and 'uint' (aka 'unsigned int')
         for (i = 0; i < N_COLS; i++) {
              ~ ^ ~~~~~~
<program source>:875:6: warning: no previous prototype for function 
'reducedDuplexRowWandering_P1'
void reducedDuplexRowWandering_P1(__global ulong * memMatrixGPU,
      ^
<program source>:897:16: warning: comparison of integers of different 
signs: 'int' and 'uint' (aka 'unsigned int')
         for (i = 0; i < N_COLS; i++) {
              ~ ^ ~~~~~~
<program source>:941:6: warning: no previous prototype for function 
'wanderingPhaseGPU2_P1'
void wanderingPhaseGPU2_P1(__global ulong * memMatrixGPU,
      ^
<program source>:1046:16: warning: comparison of integers of different 
signs: 'unsigned int' and 'int'
         for (i = 0; i < fullBlocks; i++) {
              ~ ^ ~~~~~~~~~~

memory per hash : 384.00 kB

--8<------8<------8<------8<------8<------8<------8<----

The "no previous prototype" can be avoided by always putting "static" or 
"inline" before *all* non-kernel functions.

The auto tune doesn't seem to ever end using my GT650M. Using my even 
weaker Intel HD4000, it just segfaults. Using the CPU device, it fails 
at cmp_all(1).


The pomelo-opencl format also produces a few warnings you need to fix, 
but works on the nvidia:

Benchmarking: pomelo-opencl [OpenCL (inefficient, development use 
only)]... Device 2: GeForce GT 650M
Build log: <program source>:283:16: warning: unused variable 'random_number'
         unsigned long random_number, index_global, index_local;
                ^
<program source>:283:31: warning: unused variable 'index_global'
         unsigned long random_number, index_global, index_local;
                               ^
<program source>:283:45: warning: unused variable 'index_local'
         unsigned long random_number, index_global, index_local;
                                             ^
<program source>:279:19: warning: unused variable 'j'
         unsigned long i, j, y, from=loop->from;
                   ^
<program source>:495:16: warning: unused variable 'random_number'
         unsigned long random_number, index_global, index_local;
                ^
<program source>:495:45: warning: unused variable 'index_local'
         unsigned long random_number, index_global, index_local;
                                             ^
<program source>:495:31: warning: unused variable 'index_global'
         unsigned long random_number, index_global, index_local;
                               ^
<program source>:491:19: warning: unused variable 'j'
         unsigned long i, j, y;
                   ^

memory per hash : 256.00 kB
DONE
Speed for cost 1 (t) of 2, cost 2 (m) of 2
Many salts:	6400 c/s real, 95085 c/s virtual
Only one salt:	6462 c/s real, 83200 c/s virtual

It passes self-test on CPU device too but on the Intel HD4000, it fails 
at cmp_all(3).

magnum


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.