|
Message-ID: <CAKGDhHUh+9Wo4zUDP1uG5FaDrbSZF_SvfbnvA+RCYO05hqF_Vg@mail.gmail.com> Date: Sun, 16 Aug 2015 14:01:38 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU 2015-08-16 0:21 GMT+02:00 Agnieszka Bielec <bielecagnieszka8@...il.com>: > I added to crypt_all() time measurement and here are results: > > [a@...er run]$ ./john --test --format=argon2i-opencl --v=4 > Benchmarking: argon2i-opencl [Blake2 OpenCL]... > memory per hash : 1.46 MB > Device 0: Tahiti [AMD Radeon HD 7900 Series] > Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=138 > -DDEV_VER_MAJOR=1800 -DDEV_VER_MINOR=5 -D_OPENCL_COMPILER > -DBINARY_SIZE=256 -DSALT_SIZE=64 -DPLAINTEXT_LENGTH=32 > Calculating best global worksize (GWS); max. 1s single kernel invocation. > crypt all start, count=256, gws=256, lws=64 > crypt all end, time: 0.702250 > gws: 256 385 c/s 385 rounds/s 664.384ms per crypt_all()! > crypt all start, count=512, gws=512, lws=64 > crypt all end, time: 0.738910 > gws: 512 719 c/s 719 rounds/s 711.666ms per crypt_all()+ > crypt all start, count=1024, gws=1024, lws=64 > crypt all end, time: 0.819439 > gws: 1024 1306 c/s 1306 rounds/s 783.545ms per crypt_all()+ > Local worksize (LWS) 64, global worksize (GWS) 1024 > crypt all start, count=1, gws=64, lws=64 > crypt all end, time: 0.982416 > crypt all start, count=2, gws=64, lws=64 > crypt all end, time: 0.642484 > crypt all start, count=3, gws=64, lws=64 > crypt all end, time: 0.675356 > crypt all start, count=4, gws=64, lws=64 > crypt all end, time: 0.677136 > crypt all start, count=5, gws=64, lws=64 > crypt all end, time: 0.057678 > crypt all start, count=7, gws=64, lws=64 > crypt all end, time: 0.057936 > crypt all start, count=10, gws=64, lws=64 > crypt all end, time: 0.042161 > crypt all start, count=14, gws=64, lws=64 > crypt all end, time: 0.054247 > crypt all start, count=1024, gws=1024, lws=64 > crypt all end, time: 2.615536 > using different password for benchmarking > crypt all start, count=1024, gws=1024, lws=64 > crypt all end, time: 2.635043 > qqqqqqqqqqqqqqqqqqqqqqqqq > real_time 263 > crypt all start, count=1024, gws=1024, lws=64 > crypt all end, time: 2.645786 > qqqqqqqqqqqqqqqqqqqqqqqqq > real_time 265 > DONE > Speed for cost 1 (t) of 3, cost 2 (m) of 1500, cost 3 (l) of 1 > ten int 1024 > clock : 263 > aaa Many salts: 389 c/s real, 102400 c/s virtual > zzzzz Only one salt: 386 c/s real, 102400 c/s virtual > > [a@...er run]$ GWS=1024 ./john --test --format=argon2i-opencl --v=4 > Benchmarking: argon2i-opencl [Blake2 OpenCL]... > memory per hash : 1.46 MB > Device 0: Tahiti [AMD Radeon HD 7900 Series] > Local worksize (LWS) 64, global worksize (GWS) 1024 > crypt all start, count=1, gws=64, lws=64 > crypt all end, time: 0.653867 > crypt all start, count=2, gws=64, lws=64 > crypt all end, time: 0.578068 > crypt all start, count=3, gws=64, lws=64 > crypt all end, time: 0.618967 > crypt all start, count=4, gws=64, lws=64 > crypt all end, time: 0.621076 > crypt all start, count=5, gws=64, lws=64 > crypt all end, time: 0.053851 > crypt all start, count=7, gws=64, lws=64 > crypt all end, time: 0.054477 > crypt all start, count=10, gws=64, lws=64 > crypt all end, time: 0.041921 > crypt all start, count=14, gws=64, lws=64 > crypt all end, time: 0.052137 > crypt all start, count=1024, gws=1024, lws=64 > crypt all end, time: 0.788093 > using different password for benchmarking > crypt all start, count=1024, gws=1024, lws=64 > crypt all end, time: 0.788118 > crypt all start, count=1024, gws=1024, lws=64 > crypt all end, time: 0.789293 > qqqqqqqqqqqqqqqqqqqqqqqqq > real_time 158 > crypt all start, count=1024, gws=1024, lws=64 > crypt all end, time: 0.788320 > crypt all start, count=1024, gws=1024, lws=64 > crypt all end, time: 0.787732 > qqqqqqqqqqqqqqqqqqqqqqqqq > real_time 158 > DONE > Speed for cost 1 (t) of 3, cost 2 (m) of 1500, cost 3 (l) of 1 > ten int 2048 > clock : 158 > aaa Many salts: 1296 c/s real, 204800 c/s virtual > zzzzz Only one salt: 1296 c/s real, 102400 c/s virtual > > > don't know how is this possible, this bug occurs only on super AMD > (--dev=5 on super works after I cut plaintext length) > also the same problem in cracking run - works faster when GWS=1024 is > set, works slow when GWS is not set now I was digging in argon2d ( I discovored that this bug occurs after commit 9e96f452350c0f2cae32b38e4a4cd1f83d51a367) and before this commit was code: bi = prev_block_offset = ((prev_slice * lanes + pos.lane + 1) * segment_length - 1) * BLOCK_SIZE; for (i = 0; i < 64; i++) { prev_block[i] = *(__global ulong2 *) (&memory[bi]); bi += 16; } slowdown on AMD occurs when I changed this code to: bi = prev_block_offset = ((prev_slice * lanes + pos.lane + 1) * segment_length - 1) * BLOCK_SIZE / 16; for (i = 0; i < 64; i++) { prev_block[i] = ((__global ulong2*)memory)[bi+i]; } see anyone some logic here or is this just a bug on AMD? I didn't gained speed anywhere on similar changes to this so I can just revert back these changes
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.