|
Message-ID: <CAKGDhHVMfonhrZ5fxVA2w2hZ3zWGsbyFkHycQXQmiq8L=dQ4TQ@mail.gmail.com> Date: Fri, 8 May 2015 21:25:16 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: [GSoC] John the Ripper support for PHC finalists 2015-05-07 16:09 GMT+02:00 Solar Designer <solar@...nwall.com>: > Agnieszka, > > On Thu, May 07, 2015 at 03:30:43PM +0200, Agnieszka Bielec wrote: >> 2015-05-05 20:00 GMT+02:00 Solar Designer <solar@...nwall.com>: >> > On Mon, May 04, 2015 at 01:18:46AM +0200, Agnieszka Bielec wrote: >> >> 2015-04-27 3:50 GMT+02:00 Solar Designer <solar@...nwall.com>: >> >> >> >> > BTW, bumping into total GPU global memory size may be realistic with >> >> > these memory-hard hashes. Our TITAN's 6 GB was the performance >> >> > limiting factor in some of the benchmarks here: >> >> > http://www.openwall.com/lists/crypt-dev/2014/03/13/1 >> >> >> >> I use only 128MB >> > >> > What happens if you increase GWS further? Does performance drop? What >> > if you manually increase GWS even further? It might happen that the >> > auto-tuning finds a local minimum, whereas a higher GWS is optimal. >> >> the speed drops significantly when I make gws x2 bigger > > Can you try making it bigger yet anyway? This probably won't help, but > it may be worth trying. > I tested 2x, 4x and 8x, and the bigger is gws the worse are results. for 16x I get (CL_INVALID_BUFFER_SIZE) and (CL_MEM_OBJECT_ALLOCATION_FAILURE) > >> > Also, I notice there are some if/else in G and H macros. Are they >> > removed during loop unrolling, or do they translate to exec masks in the >> > generated code? >> >> I cached values from memory into variables and I must check if >> i0==index_global and i0==index_local, it's faster with this. In F all >> workitems execute the same if-else branch but not in H. I didn't >> disassemble the code yet. I doubt > > I don't understand. > What exactly have you cached? > > Do you expect the "i0==index_local" and "i0==index_global" conditions to > often be true, or are these rare special cases? special case >I'd expect the latter, but I don't see the purpose.z original code looks like: S[i0]=some operations S[x]=some operations on S[i0] and again S[i0]=some operations and so on I cache values into variables (v,v1) at te beginnng of function F and save at the end of H or G functions. I must to make different set of instructions when address of the first value is equal to the second one > >> >> and the gws number with the memory usage were the same, I can nothing >> >> to do with this bottleneck >> >> >> >> but If I remove everything from the code, GWS also doesn't differ >> > >> > "Everything"? >> >> if I change my function into pomelo_crypt_kernel(args...) { nothing } >> but sorry, this was a false positive, If i set manually gws in this >> case everything looks normal > > Does this suggest that GWS auto-tuning does not work correctly? maybe but this is a special case. I can also set variables for auto tune wrong for the "empty" crypt_kernel. It happens when I'm allocating buffer as usual and also when I remove it but I did't changed auto-tune options > >> > AMD GCN (dev=0 and dev=1 in super) has 64 KB of local memory per CU. >> > See http://developer.amd.com/wordpress/media/2013/06/2620_final.pdf >> > slide 10. >> >> I checked local memory size using this code >> >> clGetDeviceInfo(devices[gpu_id],CL_DEVICE_LOCAL_MEM_SIZE,sizeof(cl_ulong),&local_memory_size,NULL); >> printf("mamy %llu\n",(unsigned long long) local_memory_size); >> >> and I was getting 48 and 32 KB > Which devices do these correspond to? 32768 for --dev=1 and 49152 for --dev=5
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.