|
Message-ID: <CAKGDhHWP3cUzVKNnRQcs7M+znHfvNTKhdEJwQrA1gn4auKd_bA@mail.gmail.com> Date: Fri, 14 Aug 2015 20:40:28 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU 2015-08-14 20:11 GMT+02:00 Solar Designer <solar@...nwall.com>: > On Fri, Aug 14, 2015 at 08:01:31PM +0200, Agnieszka Bielec wrote: >> 2015-08-14 19:06 GMT+02:00 Solar Designer <solar@...nwall.com>: >> > On Fri, Aug 14, 2015 at 07:02:39PM +0200, Agnieszka Bielec wrote: >> >> ah, In this link is argon2d, it's faster than argon2i because t_cost >> >> for argon2d is equal to 1, 3 for argon2i >> > >> > Sure, but IIRC on other benchmarks you posted there was only a small >> > difference in performance for 2i at t=3 and 2d at t=1. Also, this >> > doesn't explain the ~10x worse performance we're seeing for 2i now. >> >> where do you see ~10x batter performance than now with the same costs? > > Not the same, but I meant this: > > http://www.openwall.com/lists/john-dev/2015/08/14/42 > > [a@...er run]$ ./john --test --format=argon2i-opencl --v=4 > Benchmarking: argon2i-opencl [Blake2 OpenCL]... > memory per hash : 1.46 MB > Device 0: Tahiti [AMD Radeon HD 7900 Series] > Calculating best global worksize (GWS); max. 1s single kernel invocation. > gws: 256 387 c/s 387 rounds/s 659.830ms per crypt_all()! > gws: 512 720 c/s 720 rounds/s 710.817ms per crypt_all()+ > gws: 1024 1305 c/s 1305 rounds/s 784.470ms per crypt_all()+ > Local worksize (LWS) 64, global worksize (GWS) 1024 > using different password for benchmarking > DONE > Speed for cost 1 (t) of 3, cost 2 (m) of 1500, cost 3 (l) of 1 > Many salts: 389 c/s real, 102400 c/s virtual > Only one salt: 386 c/s real, 51200 c/s virtual > > vs. this: > > http://www.openwall.com/lists/john-dev/2015/08/12/11 > > [a@...er run]$ ./john --test --format=argon2d-opencl --v=4 > Benchmarking: argon2d-opencl [Blake2 OpenCL]... > memory per hash : 1.46 MB > Device 0: Tahiti [AMD Radeon HD 7900 Series] > Calculating best global worksize (GWS); max. 1s single kernel invocation. > gws: 256 964 c/s 964 rounds/s 265.514ms per crypt_all()! > gws: 512 1878 c/s 1878 rounds/s 272.497ms per crypt_all()+ > gws: 1024 3447 c/s 3447 rounds/s 297.022ms per crypt_all()+ > Local worksize (LWS) 64, global worksize (GWS) 1024 > using different password for benchmarking > DONE > Speed for cost 1 (t) of 1, cost 2 (m) of 1500, cost 3 (l) of 1 > Many salts: 2925 c/s real, 307200 c/s virtual > Only one salt: 2898 c/s real, 307200 c/s virtual > > It's 2i at t=3 vs. 2d at t=1. I'd expect the former to be at most 3x > slower (because of higher t), and in practice less than that due to 2i's > predictable and coalescing-friendly access pattern. I'm not sure if you fully understand this post http://www.openwall.com/lists/john-dev/2015/08/14/42 on super, john computes gws, 1024 is the best, it prints: Local worksize (LWS) 64, global worksize (GWS) 1024 but actually 256 is set and these coputations are for GWS=256, if you specify GWS=1024 speed is better and really for GWS=1024 for argon2d is similar problem but it's not so bad : GWS isn't equal to 256 but it's between 512 and 1024 this shows that there is a bug in auto tune or in my configuration ( but if there is a bug in my configuration there is also in auto tune, even if I configured something wrong john shouldn't show that GWS=1024 when GWS=256) but I don't have this problem on my laptop (another or it's just only that first call of crypt_all() is just slower)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.