|
Message-ID: <20150726134018.GB1688@openwall.com> Date: Sun, 26 Jul 2015 15:40:18 +0200 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Lyra2 vs yescrypt benchmarks 2 On Sun, Jul 26, 2015 at 03:15:43PM +0200, Agnieszka Bielec wrote: > 2015-07-26 2:31 GMT+02:00 Solar Designer <solar@...nwall.com>: > > On Sat, Jul 25, 2015 at 10:56:42PM +0200, Agnieszka Bielec wrote: > >> a@...l:~/m/run$ ./john --test --format=lyra2 > >> Will run 8 OpenMP threads > >> Benchmarking: Lyra2 [Blake2 AVX2]... (8xOMP) DONE > > > > Does this build actually use AVX2? If so, how much slower is an > > AVX-only build? > > nope :<, my bad, Ouch. I'll need to communicate a correction to the PHC community, then. > I was thinking that it uses AVX2 becaues Lyra2 uses > blake2b which has some instructions in SSE4_1 Huh?! Do you understand how SSE2, SSE4.1, AVX, and AVX2 correspond to each other and in what ways they differ? Can you please explain your understanding to me, so that I see if it's correct or where exactly it is wrong. You sound confused pretty badly here. Perhaps not enough assembly output reading on your part. ;-) > #if defined(__SSE4_1__) > #include "blake2b-load-sse41.h" > #else > #include "blake2b-load-sse2.h" > #endif > > but now I see that these instructions are not coverable by Lyra2 > (because Lyra2 ' blake2b' uses another but similar to blake2b ROUND > without LOAD_MSG_ ) I don't know if these rounds are the same, looks > like different things > > round used by Lyra: ROUND_LYRA_SSE in file Sponge_sse.h > original round: ROUND in file blake2b-round.h What does any of this have to do with AVX2? > >> Calculating best global worksize (GWS); max. 1s single kernel invocation. > >> gws: 256 436 c/s 436 rounds/s 586.434ms per crypt_all()! > >> gws: 512 832 c/s 832 rounds/s 615.005ms per crypt_all()+ > >> gws: 1024 1477 c/s 1477 rounds/s 693.232ms per crypt_all()+ > >> Local worksize (LWS) 64, global worksize (GWS) 1024 > >> DONE > >> Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1 > >> Raw: 1077 c/s real, 204800 c/s virtual > > > > Why are we getting, here and elsewhere, a higher c/s rate reported for > > the optimal GWS during auto-tuning than we're getting during a > > subsequent benchmark? Is this because auto-tuning is possibly run with > > too few different passwords (just a guess)? > > the opposite, seems that Lyra2 is faster with different passwords, > when I was testing Lyra I forgot to upload bench.c to server, Please add some debugging output to your modified bench.c, e.g. telling that random candidate passwords are being generated. This way, you'd hopefully notice if/when you use a different than intended bench.c. > after > that I uploaded bench.c and tested the speed before that and after and > somehow overlooked the difference but now I see, so Lyra2 should be > faster > > these speeds returned by auto-tuning seems be the same to these > returned by modified bench.c (for slow hashes, my bench.c uses rand > which makes a difference at cracking faster hashes, I saw the > difference at 150k/s) I think your modified bench.c should be committed to your tree anyway. Ideally, you'd have it skip the modifications when they are not needed or when they are harmful. e.g. you may introduce a new format flag, say call it FMT_RANDOM, and check for it in your formats that need it for proper benchmarking. While we might not accept this exact change into the main jumbo tree, I think you should have it in your tree anyway, since it's easy to make and it will then be saving you time and avoiding errors like "used a wrong bench.c". As to your benchmark results, there are similar speed differences between c/s rates reported during auto-tuning and during benchmarking for yescrypt as well. How do you explain those? Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.