|
Message-ID: <002a01ce41c2$18a60f40$49f22dc0$@net> Date: Thu, 25 Apr 2013 09:35:12 -0500 From: "jfoug" <jfoug@....net> To: <john-dev@...ts.openwall.com> Subject: RE: ICC performance regression I have just built, using gcc, to build the sse-intrinsics-32.S file, and the speed was almost identical to the older version made with icc. I simply used the exact same command line to build to a .S file, but added -o sse-intrinsic-32.S -S and things worked. $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.7/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.7.2-2ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs --enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.7 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --disable-werror --with-arch-32=i686 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.7.2 (Ubuntu/Linaro 4.7.2-2ubuntu1) Speed of the gcc built .S file: $ ../run/john -test=5 -form=dynamic Benchmarking: dynamic_0: md5($p) (raw-md5) [128/128 SSE2 intrinsics 10x4x3]... DONE Raw: 27669K c/s real, 27715K c/s virtual Benchmarking: dynamic_1: md5($p.$s) (joomla) [128/128 SSE2 intrinsics 10x4x3]... DONE Many salts: 16383K c/s real, 16394K c/s virtual Only one salt: 12244K c/s real, 12259K c/s virtual Benchmarking: dynamic_2: md5(md5($p)) (e107) [128/128 SSE2 intrinsics 10x4x3]... DONE Raw: 14142K c/s real, 14150K c/s virtual $ ../run/john -test=5 -form=md5 Benchmarking: crypt-MD5 [128/128 SSE2 intrinsics 12x]... DONE Raw: 31925 c/s real, 31987 c/s virtual Speed of the icc (older version) .S file $ ../run/john -test=5 -form=dynamic Benchmarking: dynamic_0: md5($p) (raw-md5) [128/128 SSE2 intrinsics 10x4x3]... DONE Raw: 27212K c/s real, 27294K c/s virtual Benchmarking: dynamic_1: md5($p.$s) (joomla) [128/128 SSE2 intrinsics 10x4x3]... DONE Many salts: 16273K c/s real, 16263K c/s virtual Only one salt: 12295K c/s real, 12295K c/s virtual Benchmarking: dynamic_2: md5(md5($p)) (e107) [128/128 SSE2 intrinsics 10x4x3]... DONE Raw: 13753K c/s real, 14002K c/s virtual Benchmarking: dynamic_3: md5(md5(md5($p))) [128/128 SSE2 intrinsics 10x4x3]... Wait... Speed from unstable (where format md5 still works, using older ICC *-32.S file). $ ../run/john -test=5 -form=md5 Benchmarking: FreeBSD MD5 [128/128 SSE2 intrinsics 12x]... DONE Raw: 31637 c/s real, 31744 c/s virtual So instead of fighting with getting an older ICC working properly, we might simply look at a 'current' gcc version. One bad side effect is size. The older icc file was 359k (64 bit) and 394K (32 bit). The .S file I build (only the 32 bit version), required some hand patching (the perl file helped, but there was more code cutting needed, and some UNDERSCORES defines needed added). That file, however is 1156K, so it is much larger. But this 'may' be an option (using newer gcc version). NOTE, on this same system (cygwin), if I do a make win32-cygwin-x86-sse2 (no i build), I get these timings (only about 70% as fast as the prebuild .S file code): $ ../run/john -test=5 -form=dynamic Benchmarking: dynamic_0: md5($p) (raw-md5) [128/128 SSE2 intrinsics 10x4x3]... DONE Raw: 20052K c/s real, 20014K c/s virtual Benchmarking: dynamic_1: md5($p.$s) (joomla) [128/128 SSE2 intrinsics 10x4x3]... DONE Many salts: 13214K c/s real, 13222K c/s virtual Only one salt: 10312K c/s real, 10313K c/s virtual Benchmarking: dynamic_2: md5(md5($p)) (e107) [128/128 SSE2 intrinsics 10x4x3]... DONE Raw: 10121K c/s real, 10132K c/s virtual $ ../run/john -test=5 -form=md5 Benchmarking: crypt-MD5 [128/128 SSE2 intrinsics 12x]... DONE Raw: 21839 c/s real, 21841 c/s virtual As for ICC, I did try a few other things. I could not recover the speed loss. But like magnum mentioned, it 'could' simply be PARA values needing updated. However, at almost an hour of build time for each change, it is not easy to do a lot of testing. Jim. From: magnum Sent: Thursday, April 25, 2013 2:10 > >On 25 Apr, 2013, at 1:30 , Solar Designer <solar@...nwall.com> wrote: >> On Thu, Apr 25, 2013 at 01:12:19AM +0200, magnum wrote: >>> Old pre-built files, icc 12.1.4: >> [...] >>> Benchmarking: FreeBSD MD5 [128/128 SSE2 intrinsics 12x]... DONE >>> Raw: 39204 c/s real, 39204 c/s virtual >> [...] >>> gcc 4.7.2, -native target: >> [...] >>> Benchmarking: crypt-MD5 [128/128 AVX intrinsics 12x]... DONE >>> Raw: 36936 c/s real, 36936 c/s virtual >> >> This is pretty significant difference in favor of old icc, and not all >> CPUs have AVX, so I think we should simply continue to use old icc to >> prebuild the files. > > This requires someone having an older version. I haven't found one yet. > > Until now we have compared icc using -O3 (25 *minutes* compile time per file), to gcc using just -O2 (compiling in 3 seconds). I will try some different versions of icc as well as MD5_PARA values (very time consuming), but also different sets of options to gcc and see where we end up.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.