|
Message-ID: <20150507140947.GB11074@openwall.com> Date: Thu, 7 May 2015 17:09:47 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: [GSoC] John the Ripper support for PHC finalists Agnieszka, On Thu, May 07, 2015 at 03:30:43PM +0200, Agnieszka Bielec wrote: > 2015-05-05 20:00 GMT+02:00 Solar Designer <solar@...nwall.com>: > > On Mon, May 04, 2015 at 01:18:46AM +0200, Agnieszka Bielec wrote: > >> 2015-04-27 3:50 GMT+02:00 Solar Designer <solar@...nwall.com>: > >> > >> > BTW, bumping into total GPU global memory size may be realistic with > >> > these memory-hard hashes. Our TITAN's 6 GB was the performance > >> > limiting factor in some of the benchmarks here: > >> > http://www.openwall.com/lists/crypt-dev/2014/03/13/1 > >> > >> I use only 128MB > > > > What happens if you increase GWS further? Does performance drop? What > > if you manually increase GWS even further? It might happen that the > > auto-tuning finds a local minimum, whereas a higher GWS is optimal. > > the speed drops significantly when I make gws x2 bigger Can you try making it bigger yet anyway? This probably won't help, but it may be worth trying. > > BTW, can you explain why sMAP is as it is? > > > > #define sMAP(X) ((X)*GID+gid4) > > > > where: > > > > gid = get_global_id(0); > > GID = get_global_size(0); > > gid4 = gid * 4; > > > > sMAP, MAP and cMAP stands for coalescing (IIRC it's around 5 (maybe > more) times faster with it) Ideally, you'd literally explain these definitions in source code comments. > > Also, I notice there are some if/else in G and H macros. Are they > > removed during loop unrolling, or do they translate to exec masks in the > > generated code? > > I cached values from memory into variables and I must check if > i0==index_global and i0==index_local, it's faster with this. In F all > workitems execute the same if-else branch but not in H. I didn't > disassemble the code yet. I doubt I don't understand. What exactly have you cached? Do you expect the "i0==index_local" and "i0==index_global" conditions to often be true, or are these rare special cases? I'd expect the latter, but I don't see the purpose. > >> and the gws number with the memory usage were the same, I can nothing > >> to do with this bottleneck > >> > >> but If I remove everything from the code, GWS also doesn't differ > > > > "Everything"? > > if I change my function into pomelo_crypt_kernel(args...) { nothing } > but sorry, this was a false positive, If i set manually gws in this > case everything looks normal Does this suggest that GWS auto-tuning does not work correctly? > > AMD GCN (dev=0 and dev=1 in super) has 64 KB of local memory per CU. > > See http://developer.amd.com/wordpress/media/2013/06/2620_final.pdf > > slide 10. > > I checked local memory size using this code > > clGetDeviceInfo(devices[gpu_id],CL_DEVICE_LOCAL_MEM_SIZE,sizeof(cl_ulong),&local_memory_size,NULL); > printf("mamy %llu\n",(unsigned long long) local_memory_size); > > and I was getting 48 and 32 KB Which devices do these correspond to? Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.