|
Message-ID: <CAKGDhHXw4u7kMSNVs+tg=Jk_4K9yPHSEmEgsnLKZF6JnkxjQaA@mail.gmail.com> Date: Fri, 14 Aug 2015 16:44:09 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU 2015-08-14 16:37 GMT+02:00 Agnieszka Bielec <bielecagnieszka8@...il.com>: > 2015-08-14 15:31 GMT+02:00 Solar Designer <solar@...nwall.com>: >> On Thu, Aug 13, 2015 at 12:28:57AM +0200, magnum wrote: >>> On 2015-08-12 23:51, Solar Designer wrote: >>> >magnum, do you have an explanation why the best benchmark result during >>> >auto-tuning is usually substantially different from the final benchmark >>> >in most of Agnieszka's formats? I'm fine with eventually dismissing it >>> >as "hard to achieve" and "cosmetic anyway", but I'd like to understand >>> >the cause first. Thanks! >>> >>> Generally a mismatch could be caused by using different [cost] test >>> vectors in auto-tune than the ones benchmarked, or auto-tune using just >>> one repeated plaintext in a format where length matters for speed (eg. >>> RAR), or something along those lines. >>> >>> Another reason would be incorrect setup of autotune for split kernels. >>> For example, if auto-tune thinks we're going to call a split kernel 500 >>> times but the real run does it 1000 times, we'll see inflated figures >>> from autotune. >>> >>> A third reason (seen in early WPA-PSK) is when crypt_all() does >>> significant post-processing on CPU where auto-tune doesn't. >> >> At least the first reason you listed may likely result in suboptimal >> auto-tuning. Perhaps it wouldn't with simple iterated schemes like >> PBKDF2, but with memory-hard schemes like Argon2 the cost settings do >> affect optimal LWS and GWS substantially. >> >> So we shouldn't dismiss this without understanding of what exactly is >> going on in a given case. > > > cracking mode on my laptop on argon2d showed that at the beginning > speed is the same to this showed during computing gws, after some time > I am getting speed closest to showed during --test but it's not > exactly the same. > > beggining > 0g 0:00:00:05 13.67% 2/3 (ETA: 16:00:32) 0g/s 3922p/s 3922c/s 3922C/s > GPU:56°C util:99% leugim..nolfet > > after 1 min > 0g 0:00:03:25 3/3 0g/s 4067p/s 4067c/s 4067C/s GPU:77°C util:99% 213160..241144 > > after 5 min > 0g 0:00:07:40 3/3 0g/s 4083p/s 4083c/s 4083C/s GPU:78°C util:45% > critas01..crachera > > --test > > Local worksize (LWS) 64, global worksize (GWS) 512 > using different password for benchmarking > DONE > Speed for cost 1 (t) of 1, cost 2 (m) of 1500, cost 3 (l) of 1 > Many salts: 4114 c/s real, 4077 c/s virtual > Only one salt: 4114 c/s real, 4114 c/s virtual > > I don't have big differences with argon2i on my laptop > > on super: > > [a@...er run]$ ./john --test --format=argon2i-opencl --v=4 > Benchmarking: argon2i-opencl [Blake2 OpenCL]... > memory per hash : 1.46 MB > Device 0: Tahiti [AMD Radeon HD 7900 Series] > Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=138 > -DDEV_VER_MAJOR=1800 -DDEV_VER_MINOR=5 -D_OPENCL_COMPILER > -DBINARY_SIZE=256 -DSALT_SIZE=64 -DPLAINTEXT_LENGTH=32 > Calculating best global worksize (GWS); max. 1s single kernel invocation. > gws: 256 385 c/s 385 rounds/s 663.846ms per crypt_all()! > gws: 512 719 c/s 719 rounds/s 711.475ms per crypt_all()+ > gws: 1024 1298 c/s 1298 rounds/s 788.748ms per crypt_all()+ > Local worksize (LWS) 64, global worksize (GWS) 1024 > using different password for benchmarking > DONE > Speed for cost 1 (t) of 3, cost 2 (m) of 1500, cost 3 (l) of 1 > Many salts: 390 c/s real, 102400 c/s virtual > Only one salt: 390 c/s real, 102400 c/s virtual > > cracking run shows > > Press 'q' or Ctrl-C to abort, almost any other key for status > 0g 0:00:00:21 6.61% 2/3 (ETA: 17:03:04) 0g/s 385.3p/s 385.3c/s > 385.3C/s fireballs..bens > GPU 0 overheat (33816176°C, fan 0%), aborting job. > 0g 0:00:00:21 6.61% 2/3 (ETA: 17:03:04) 0g/s 384.0p/s 384.0c/s > 384.0C/s fireballs..bens > > so speeds reported by main --test are good wtf? [a@...er run]$ ./john --test --format=argon2i-opencl Benchmarking: argon2i-opencl [Blake2 OpenCL]... memory per hash : 1.46 MB Device 0: Tahiti [AMD Radeon HD 7900 Series] using different password for benchmarking DONE Speed for cost 1 (t) of 3, cost 2 (m) of 1500, cost 3 (l) of 1 Many salts: 423 c/s real, 102400 c/s virtual Only one salt: 423 c/s real, 102400 c/s virtual [a@...er run]$ ./john --test --format=argon2i-opencl --v=4 Benchmarking: argon2i-opencl [Blake2 OpenCL]... memory per hash : 1.46 MB Device 0: Tahiti [AMD Radeon HD 7900 Series] Calculating best global worksize (GWS); max. 1s single kernel invocation. gws: 256 387 c/s 387 rounds/s 659.830ms per crypt_all()! gws: 512 720 c/s 720 rounds/s 710.817ms per crypt_all()+ gws: 1024 1305 c/s 1305 rounds/s 784.470ms per crypt_all()+ Local worksize (LWS) 64, global worksize (GWS) 1024 using different password for benchmarking DONE Speed for cost 1 (t) of 3, cost 2 (m) of 1500, cost 3 (l) of 1 Many salts: 389 c/s real, 102400 c/s virtual Only one salt: 386 c/s real, 51200 c/s virtual [a@...er run]$ GWS=1024 ./john --test --format=argon2i-opencl --v=4 Benchmarking: argon2i-opencl [Blake2 OpenCL]... memory per hash : 1.46 MB Device 0: Tahiti [AMD Radeon HD 7900 Series] Local worksize (LWS) 64, global worksize (GWS) 1024 using different password for benchmarking DONE Speed for cost 1 (t) of 3, cost 2 (m) of 1500, cost 3 (l) of 1 Many salts: 1296 c/s real, 204800 c/s virtual Only one salt: 1304 c/s real, 204800 c/s virtual this can have something common with MEM_SIZE/4 (now I have removed /4) http://www.openwall.com/lists/john-dev/2015/08/06/22 sorry, couldn't find my original e-mail
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.