|
Message-ID: <7aa346d81ba44cf50f18cc986e8ce34b@smtp.hushmail.com> Date: Fri, 28 Aug 2015 21:53:10 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: LWS and GWS auto-tuning On 2015-08-28 11:18, Solar Designer wrote: > Returning to the topic of auto-tuning, there's now some weirdness seen > for md5crypt-opencl on Titan X: (...) > Notice how it was up to 3.46M during auto-tuning, but only 1.2M when > benchmarked the tuned settings. And: I made some changes to self-test and autotune key setting and it seems more stable now. One problem was the autotune key fuzzing would spuriously result in shorter keys (an xor resulting in 0), so formats that are sensitive to varying lengths would be "hurt". While that doesn't quite explain the things you saw fully, it seems quite good now: $ ../run/john -stress-test -form:md5crypt-opencl -dev=4 Device 4: GeForce GTX TITAN X Benchmarking: md5crypt-opencl, crypt(3) $1$ [MD5 OpenCL]... DONE Raw: 3601K c/s real, 3601K c/s virtual Benchmarking: md5crypt-opencl, crypt(3) $1$ [MD5 OpenCL]... DONE Raw: 3613K c/s real, 3649K c/s virtual Benchmarking: md5crypt-opencl, crypt(3) $1$ [MD5 OpenCL]... DONE Raw: 3649K c/s real, 3613K c/s virtual Benchmarking: md5crypt-opencl, crypt(3) $1$ [MD5 OpenCL]... DONE Raw: 3465K c/s real, 3430K c/s virtual Benchmarking: md5crypt-opencl, crypt(3) $1$ [MD5 OpenCL]... DONE Raw: 3624K c/s real, 3589K c/s virtual Benchmarking: md5crypt-opencl, crypt(3) $1$ [MD5 OpenCL]... DONE Raw: 3391K c/s real, 3357K c/s virtual Benchmarking: md5crypt-opencl, crypt(3) $1$ [MD5 OpenCL]... DONE Raw: 3576K c/s real, 3612K c/s virtual $ ../run/john -test -form:md5crypt-opencl -dev=4 -v:4 Device 4: GeForce GTX TITAN X Benchmarking: md5crypt-opencl, crypt(3) $1$ [MD5 OpenCL]... Options used: -I ../run/kernels -cl-mad-enable -DSM_MAJOR=5 -DSM_MINOR=2 -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=262162 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21 -D_OPENCL_COMPILER -DPLAINTEXT_LENGTH=15 Calculating best GWS for LWS=32; max. 250ms single kernel invocation. gws: 1024 449198 c/s 449198000 rounds/s 2.279ms per crypt_all()! gws: 2048 892620 c/s 892620000 rounds/s 2.294ms per crypt_all()+ gws: 4096 1686918 c/s 1686918000 rounds/s 2.428ms per crypt_all()+ gws: 8192 2730841 c/s 2730841000 rounds/s 2.999ms per crypt_all()+ gws: 16384 3258489 c/s 3258489000 rounds/s 5.028ms per crypt_all()+ gws: 32768 2326987 c/s 2326987000 rounds/s 14.081ms per crypt_all() gws: 65536 2835324 c/s 2835324000 rounds/s 23.114ms per crypt_all() Calculating best LWS for GWS=16384 Testing LWS=32 GWS=16384 ... 17.250ms+ Testing LWS=64 GWS=16384 ... 17.270ms Testing LWS=96 GWS=16320 ... 17.203ms Testing LWS=128 GWS=16384 ... 17.276ms Testing LWS=160 GWS=16320 ... 18.320ms Testing LWS=192 GWS=16320 ... 17.198ms+ Testing LWS=224 GWS=16352 ... 18.416ms Testing LWS=256 GWS=16384 ... 17.128ms+ Testing LWS=288 GWS=16128 ... 18.256ms Testing LWS=512 GWS=16384 ... 17.887ms Testing LWS=1024 GWS=16384 ... 17.464ms Calculating best GWS for LWS=256; max. 500ms single kernel invocation. gws: 6144 2579293 c/s 2579293000 rounds/s 2.382ms per crypt_all()! gws: 12288 3464203 c/s 3464203000 rounds/s 3.547ms per crypt_all()+ gws: 24576 2889977 c/s 2889977000 rounds/s 8.503ms per crypt_all() gws: 49152 2170788 c/s 2170788000 rounds/s 22.642ms per crypt_all() gws: 98304 2560407 c/s 2560407000 rounds/s 38.393ms per crypt_all() gws: 196608 2462964 c/s 2462964000 rounds/s 79.825ms per crypt_all() gws: 393216 2556987 c/s 2556987000 rounds/s 153.780ms per crypt_all() gws: 786432 2562092 c/s 2562092000 rounds/s 306.949ms per crypt_all() Local worksize (LWS) 256, global worksize (GWS) 12288 DONE Raw: 3589K c/s real, 3624K c/s virtual Speed matches autotune's (actually we get even better speed than predicted). A similar problem in self-test was that indexes between tested keys (indexes that core just skip) were filled with max-length plaintexts, now changed to same length as 1st test vector plaintext. Max length would slow things down considerably for formats like RAR or 7z, and to some extent md5crypt too. But it shouldn't affect benchmark results, only time for self-testing. I'm currently experimenting with sorting keys by length for RAR-opencl and these issues made a lot of noise. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.