|
Message-ID: <20150826083319.GA4898@openwall.com> Date: Wed, 26 Aug 2015 11:33:19 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: LWS and GWS auto-tuning On Tue, Aug 25, 2015 at 08:36:44PM +0200, magnum wrote: > Worst/best 10 for Tahiti (oldoffice failing): > Ratio: 0.90959 real, 0.76098 virtual descrypt-opencl, traditional > crypt(3):Only one salt > Ratio: 0.93767 real, 0.93587 virtual strip-opencl, STRIP Password > Manager:Raw > Ratio: 0.94268 real, 1.06452 virtual ssha-opencl, Netscape LDAP > {SSHA}:Many salts > Ratio: 0.94619 real, 1.00000 virtual sha256crypt-opencl, crypt(3) $5$ > (rounds=5000):Raw Out of these, the first 3 remain the same for me, and sha256crypt-opencl actually gets auto-tuned better now. Old: [solar@...er run]$ time ./john -test -form=sha256crypt-opencl -v=4 Device 0: Tahiti [AMD Radeon HD 7900 Series] Benchmarking: sha256crypt-opencl, crypt(3) $5$ (rounds=5000) [SHA256 OpenCL]... Calculating best global worksize (GWS); max. 4s total for crypt_all() gws: 1024 1889 c/s 9445000 rounds/s 541.875ms per crypt_all()! gws: 2048 4931 c/s 24655000 rounds/s 415.304ms per crypt_all()! gws: 4096 8352 c/s 41760000 rounds/s 490.413ms per crypt_all()+ gws: 8192 12427 c/s 62135000 rounds/s 659.195ms per crypt_all()+ gws: 16384 22380 c/s 111900000 rounds/s 732.058ms per crypt_all()+ gws: 32768 22443 c/s 112215000 rounds/s 1.459s per crypt_all() gws: 65536 28263 c/s 141315000 rounds/s 2.318s per crypt_all()+ gws: 131072 29923 c/s 149615000 rounds/s 4.380s per crypt_all() - too slow Max local worksize 256, Local worksize (LWS) 256, global worksize (GWS) 65536 DONE Speed for cost 1 (iteration count) of 5000 Raw: 25700 c/s real, 2184K c/s virtual real 0m12.138s user 0m0.396s sys 0m0.751s New: [solar@...er run]$ time ./john -test -form=sha256crypt-opencl -v=4 Device 0: Tahiti [AMD Radeon HD 7900 Series] Benchmarking: sha256crypt-opencl, crypt(3) $5$ (rounds=5000) [SHA256 OpenCL]... Calculating best global worksize (GWS); max. 2s total for crypt_all() gws: 1024 1788 c/s 8940000 rounds/s 572.529ms per crypt_all()! gws: 2048 4928 c/s 24640000 rounds/s 415.537ms per crypt_all()! gws: 4096 8350 c/s 41750000 rounds/s 490.537ms per crypt_all()+ gws: 8192 12418 c/s 62090000 rounds/s 659.676ms per crypt_all()+ gws: 16384 22401 c/s 112005000 rounds/s 731.391ms per crypt_all()+ gws: 32768 22392 c/s 111960000 rounds/s 1.463s per crypt_all() gws: 65536 27499 c/s 137495000 rounds/s 2.383s per crypt_all() - too slow Calculating best local worksize (LWS) Testing LWS=64 GWS=16384 ... 50.580ms+ Testing LWS=128 GWS=16384 ... 50.796ms Testing LWS=192 GWS=16320 ... 50.181ms+ Testing LWS=256 GWS=16384 ... 50.658ms Calculating best global worksize (GWS); max. 4s total for crypt_all() gws: 6144 10079 c/s 50395000 rounds/s 609.575ms per crypt_all()! gws: 12288 18821 c/s 94105000 rounds/s 652.854ms per crypt_all()+ gws: 24576 29143 c/s 145715000 rounds/s 843.278ms per crypt_all()+ gws: 49152 30982 c/s 154910000 rounds/s 1.586s per crypt_all()+ gws: 98304 34272 c/s 171360000 rounds/s 2.868s per crypt_all()+ gws: 196608 36327 c/s 181635000 rounds/s 5.412s per crypt_all() - too slow Local worksize (LWS) 192, global worksize (GWS) 98304 DONE Speed for cost 1 (iteration count) of 5000 Raw: 27459 c/s real, 3276K c/s virtual real 0m14.208s user 0m0.475s sys 0m0.865s but the differences during LWS tuning might be too small to be reliable. In fact, another run does show worse auto-tuning: [solar@...er run]$ time ./john -test -form=sha256crypt-opencl -v=4 Device 0: Tahiti [AMD Radeon HD 7900 Series] Benchmarking: sha256crypt-opencl, crypt(3) $5$ (rounds=5000) [SHA256 OpenCL]... Calculating best global worksize (GWS); max. 2s total for crypt_all() gws: 1024 1993 c/s 9965000 rounds/s 513.606ms per crypt_all()! gws: 2048 4918 c/s 24590000 rounds/s 416.360ms per crypt_all()! gws: 4096 8346 c/s 41730000 rounds/s 490.734ms per crypt_all()+ gws: 8192 12419 c/s 62095000 rounds/s 659.625ms per crypt_all()+ gws: 16384 22365 c/s 111825000 rounds/s 732.555ms per crypt_all()+ gws: 32768 22631 c/s 113155000 rounds/s 1.447s per crypt_all()+ gws: 65536 28402 c/s 142010000 rounds/s 2.307s per crypt_all() - too slow Calculating best local worksize (LWS) Testing LWS=64 GWS=32768 ... 76.825ms+ Testing LWS=128 GWS=32768 ... 76.901ms Testing LWS=192 GWS=32640 ... 80.392ms Testing LWS=256 GWS=32768 ... 77.425ms Calculating best global worksize (GWS); max. 4s total for crypt_all() gws: 2048 5334 c/s 26670000 rounds/s 383.886ms per crypt_all()! gws: 4096 8862 c/s 44310000 rounds/s 462.146ms per crypt_all()+ gws: 8192 12420 c/s 62100000 rounds/s 659.536ms per crypt_all()+ gws: 16384 22905 c/s 114525000 rounds/s 715.285ms per crypt_all()+ gws: 32768 24439 c/s 122195000 rounds/s 1.340s per crypt_all()+ gws: 65536 30948 c/s 154740000 rounds/s 2.117s per crypt_all()+ gws: 131072 34028 c/s 170140000 rounds/s 3.851s per crypt_all()+ gws: 262144 38105 c/s 190525000 rounds/s 6.879s per crypt_all() - too slow Local worksize (LWS) 64, global worksize (GWS) 131072 DONE Speed for cost 1 (iteration count) of 5000 Raw: 24317 c/s real, 4369K c/s virtual real 0m18.643s user 0m0.517s sys 0m0.876s I think starting with a better (queried) LWS for the first pass at GWS tuning would prevent this. BTW, why are the c/s rates reported during auto-tuning so different from the final ones here? Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.