|
Message-ID: <20150816215059.GA27142@openwall.com> Date: Mon, 17 Aug 2015 00:51:00 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU On Sun, Aug 16, 2015 at 10:27:27PM +0200, Agnieszka Bielec wrote: > 2015-08-16 16:09 GMT+02:00 Solar Designer <solar@...nwall.com>: > > bi+i is used to index an array if 16-byte elements, so it needs to be > > multiplied by 16 each time (unless the compiler manages to optimize > > this, perhaps much like you had done manually in the first version). > > if something is not supported why I have on my laptop the opposite of > this slowdown on AMD? It is possible that index scaling by 16 is not supported on AMD GCN, but is supported on NVIDIA Maxwell (although I doubt it) - you'd need to check the corresponding ISA manuals and/or the generated GPU ISA code. It is also possible that one compiler happens to handle this better than the other, optimizing out the need to scale the index. Finally, it is possible that extra instructions for the scaling by 16 are generated for either GPU, but on one of them they end up actually helping e.g. through avoiding a stall elsewhere. (It does sometimes happen that even a NOP introduced into code speeds it up. In fact, some compilers generate code with occasional NOPs in it in some cases - I've recently seen that in code that icc generates for MIC. Usually this is done to have a next instruction more likely issued onto a specific execution unit, which may in turn benefit yet another sequence of instructions through which execution units are busy vs. available at the time that sequence starts.) > none@...e ~/Desktop/r/run $ ./john --test --format=argon2d-opencl > Benchmarking: argon2d-opencl [Blake2 OpenCL]... > memory per hash : 1.46 MB > Device 0: GeForce GTX 960M > using different password for benchmarking > DONE > Speed for cost 1 (t) of 1, cost 2 (m) of 1500, cost 3 (l) of 1 > Many salts: 4114 c/s real, 4114 c/s virtual > Only one salt: 4114 c/s real, 4151 c/s virtual BTW, these are impressively good speeds for your small GPU. We need to get a Titan X, and it'll outperform a CPU significantly. What speeds are you getting on well's CPU for Argon2d at these settings? With memory (de)allocation out of the loop, like we had for the Lyra2 and yescrypt benchmarks. Also, please set m=1536, so we'd have exactly 1.5 MiB. Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.