john-dev - Re: PHC: Argon2 on GPU

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKGDhHV2gzyvFcC8+X0iB+ZjM51haPpg+x_UaBrCTp5G7t5Hbg@mail.gmail.com>
Date: Wed, 19 Aug 2015 19:12:55 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Argon2 on GPU

2015-08-19 6:10 GMT+02:00 Solar Designer <solar@...nwall.com>:
> (just to illustrate the problem of slow integer division on GPUs).
>
> Before you spend a lot of time on this, I suggest that you replace this
> modulo operation with something simpler (and wrong), yet in some ways
> similar, e.g.:
>
> static inline uint32_t wrap(uint64_t x, uint32_t n)
> {
>         uint64_t a = (x + n) & (n - 1);
>         uint64_t b = x & n;
>         uint64_t c = (x << 1) & n;
>         return ((a << 1) + b + c) >> 2;
> }
>
> (and its OpenCL equivalent, with proper data types).  Of course, this
> revision of Argon2 won't match Argon2's normal test vectors, but you
> should be able to see roughly what performance you could get if you
> later optimize the division.

it's slower with wrap instead of %
I just changed x % y to number 5 and I gained speed only on my 960m
from 1861 to 1878 (argon2i). I will check again % after another
optimizations

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.