|
Message-ID: <20150826082411.GA4835@openwall.com> Date: Wed, 26 Aug 2015 11:24:11 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: LWS and GWS auto-tuning On Wed, Aug 26, 2015 at 11:06:17AM +0300, Solar Designer wrote: > On Tue, Aug 25, 2015 at 08:36:44PM +0200, magnum wrote: > > Worst/best 10 for Tahiti (oldoffice failing): > > > Ratio: 0.83409 real, 1.02941 virtual salted-sha1-opencl:Many salts > > This one doesn't auto-tune. Gives same speeds to me. > > > Ratio: 0.85046 real, 0.85934 virtual sxc-opencl, StarOffice .sxc:Raw > > Ratio: 0.90867 real, 0.78394 virtual blockchain-opencl, blockchain My > > Wallet:Raw > > These two don't auto-tune. I was wrong: they do. > They use OpenMP. With > GOMP_CPU_AFFINITY=0-31, they give stable same speeds to me, regardless > of recent changes (obviously). Running these a few more times, I see that blockchain-opencl gives unstable speeds. There's also a weird discrepancy between c/s rates printed during auto-tuning and the final benchmark: Calculating best local worksize (LWS) Testing LWS=64 GWS=524288 ... 84.860ms+ Testing LWS=128 GWS=524288 ... 84.612ms Testing LWS=192 GWS=524160 ... 108.859ms Testing LWS=256 GWS=524288 ... 84.863ms Calculating best global worksize (GWS); max. 1s single kernel invocation. gws: 2048 4410969 c/s 4410969 rounds/s 464.297us per crypt_all()! gws: 4096 6655747 c/s 6655747 rounds/s 615.408us per crypt_all()+ gws: 8192 15338669 c/s 15338669 rounds/s 534.075us per crypt_all()! gws: 16384 1634042 c/s 1634042 rounds/s 10.026ms per crypt_all() gws: 32768 20423261 c/s 20423261 rounds/s 1.604ms per crypt_all()+ gws: 65536 21218723 c/s 21218723 rounds/s 3.088ms per crypt_all()+ gws: 131072 21881532 c/s 21881532 rounds/s 5.990ms per crypt_all()+ gws: 262144 22232899 c/s 22232899 rounds/s 11.790ms per crypt_all()+ gws: 524288 22359039 c/s 22359039 rounds/s 23.448ms per crypt_all() gws: 1048576 22350637 c/s 22350637 rounds/s 46.914ms per crypt_all() gws: 2097152 22066990 c/s 22066990 rounds/s 95.035ms per crypt_all() gws: 4194304 22258786 c/s 22258786 rounds/s 188.433ms per crypt_all() gws: 8388608 22233414 c/s 22233414 rounds/s 377.297ms per crypt_all() gws: 16777216 21051147 c/s 21051147 rounds/s 796.973ms per crypt_all() Local worksize (LWS) 64, global worksize (GWS) 262144 DONE Raw: 6488K c/s real, 204736 c/s virtual The auto-tuned GWS usually varies between 262144, 524288, and 1048576, and the final speeds from ~5000K to ~6500K even for the same GWS. Why is the discrepancy between ~22M while benchmarking and ~6M finally? Is this how split kernel should manifest itself here? I think not. Looks like an issue unrelated to recent changes. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.