|
Message-ID: <CAKGDhHV2gzyvFcC8+X0iB+ZjM51haPpg+x_UaBrCTp5G7t5Hbg@mail.gmail.com> Date: Wed, 19 Aug 2015 19:12:55 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU 2015-08-19 6:10 GMT+02:00 Solar Designer <solar@...nwall.com>: > (just to illustrate the problem of slow integer division on GPUs). > > Before you spend a lot of time on this, I suggest that you replace this > modulo operation with something simpler (and wrong), yet in some ways > similar, e.g.: > > static inline uint32_t wrap(uint64_t x, uint32_t n) > { > uint64_t a = (x + n) & (n - 1); > uint64_t b = x & n; > uint64_t c = (x << 1) & n; > return ((a << 1) + b + c) >> 2; > } > > (and its OpenCL equivalent, with proper data types). Of course, this > revision of Argon2 won't match Argon2's normal test vectors, but you > should be able to see roughly what performance you could get if you > later optimize the division. it's slower with wrap instead of % I just changed x % y to number 5 and I gained speed only on my 960m from 1861 to 1878 (argon2i). I will check again % after another optimizations
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.