|
Message-ID: <CAKGDhHU266RPNVDaP=2RcJfdDUPJ2zoORweRsF9qkaDDVTW=8Q@mail.gmail.com> Date: Wed, 19 Aug 2015 19:39:24 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU 2015-08-19 19:12 GMT+02:00 Agnieszka Bielec <bielecagnieszka8@...il.com>: > 2015-08-19 6:10 GMT+02:00 Solar Designer <solar@...nwall.com>: >> (just to illustrate the problem of slow integer division on GPUs). >> >> Before you spend a lot of time on this, I suggest that you replace this >> modulo operation with something simpler (and wrong), yet in some ways >> similar, e.g.: >> >> static inline uint32_t wrap(uint64_t x, uint32_t n) >> { >> uint64_t a = (x + n) & (n - 1); >> uint64_t b = x & n; >> uint64_t c = (x << 1) & n; >> return ((a << 1) + b + c) >> 2; >> } >> >> (and its OpenCL equivalent, with proper data types). Of course, this >> revision of Argon2 won't match Argon2's normal test vectors, but you >> should be able to see roughly what performance you could get if you >> later optimize the division. > > it's slower with wrap instead of % > I just changed x % y to number 5 and I gained speed only on my 960m > from 1861 to 1878 (argon2i). I will check again % after another > optimizations I checked also argon2d just in case and I have more speedup here normal code none@...e ~/Desktop/r/run $ GWS=512 ./john --test --format=argon2d-opencl Benchmarking: argon2d-opencl [Blake2 OpenCL]... memory per hash : 1.50 MB Device 0: GeForce GTX 960M using different password for benchmarking DONE Speed for cost 1 (t) of 1, cost 2 (m) of 1536, cost 3 (l) of 1 Many salts: 3976 c/s real, 3938 c/s virtual Only one salt: 3976 c/s real, 4015 c/s virtual with 5 none@...e ~/Desktop/r/run $ GWS=512 ./john --test --format=argon2d-opencl --skip-self-test Benchmarking: argon2d-opencl [Blake2 OpenCL]... memory per hash : 1.50 MB Device 0: GeForce GTX 960M using different password for benchmarking DONE Speed for cost 1 (t) of 1, cost 2 (m) of 1536, cost 3 (l) of 1 Many salts: 4055 c/s real, 4055 c/s virtual Only one salt: 4114 c/s real, 4151 c/s virtual with wrap() none@...e ~/Desktop/r/run $ GWS=512 ./john --test --format=argon2d-opencl --skip-self-test Benchmarking: argon2d-opencl [Blake2 OpenCL]... memory per hash : 1.50 MB Device 0: GeForce GTX 960M using different password for benchmarking DONE Speed for cost 1 (t) of 1, cost 2 (m) of 1536, cost 3 (l) of 1 Many salts: 3976 c/s real, 3976 c/s virtual Only one salt: 4015 c/s real, 4015 c/s virtual maybe it's just usual coincidence
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.