|
Message-ID: <mpro.m6bwqh00fh20g0jpv.taviso@cmpxchg8b.com> Date: Thu, 28 Jun 2012 15:13:32 +0200 From: Tavis Ormandy <taviso@...xchg8b.com> To: john-dev@...ts.openwall.com Subject: Re: Re: Failed self test for raw-sha1-ng (linux-x86-sse2i OMP) magnum <john.magnum@...hmail.com> wrote: > On 2012-06-28 12:54, magnum wrote: > > On 2012-06-28 12:18, Frank Dittrich wrote: > > > Due to another recent change in raw-sha1-ng, I get a new warning when > > > compiling with clang version 2.9: rawSHA1_ng_fmt.c:127:14: warning: > > > unknown pragma ignored [-Wunknown-pragmas] # pragma GCC optimize 3 > >> > > > I don't know whether more recent clang versions support this pragma, > > > so I wouldn't disable it for clang. I just wanted to let you know. > > > > That pragma is not implemented in GCC versions earlier than 4.4 so we > > can test __GNUC__ and __GNUC_MINOR__ - I'll have a look even though this > > is 100% benign. > > Fixed. We are really nit-picking now :-) > > magnum > Indeed ;-) I was actually going to use this for another tweak, if I prefetch the next passwords with __builtin_prefetch, I can pull a little extra performance out of my hottest loop, but I've found that gcc doesn't do too badly with just -fprefetch-loop-arrays (a non-default option). I think I can do _slightly_ better than gcc, maybe I could perfect the code gcc generates with various --params, but having gcc do it automatically is nice. I was planning to just add #pragma GCC optimize "-fpretch-loop-arrays". Does that sound okay? I can wrap it in whatever __GNUC__ checks you like. In fact, it made me curious if there are any other free performance wins, I used this quick script below ( you can grep ^Raw: | sort -g ) Example output: $ bash chooseopts.sh linux-x86-64-native rawSHA1_ng_fmt.c raw-sha1-ng | tee log Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 SSE4.1 intrinsics 4x]... DONE Raw: 19482K c/s real, 19528K c/s virtual -fipa-type-escape Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 SSE4.1 intrinsics 4x]... DONE Raw: 19482K c/s real, 19528K c/s virtual -fno-ipa-type-escape Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 SSE4.1 intrinsics 4x]... DONE Raw: 19482K c/s real, 19528K c/s virtual -fivopts Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 SSE4.1 intrinsics 4x]... DONE Raw: 19391K c/s real, 19443K c/s virtual -fno-ivopts .... $ grep ^Raw: log | sort -g | tail Raw: 19538K c/s real, 19590K c/s virtual -funroll-all-loops Raw: 19540K c/s real, 19572K c/s virtual -fno-align-jumps Raw: 19573K c/s real, 19605K c/s virtual -minline-all-stringops Raw: 19688K c/s real, 19721K c/s virtual -fprefetch-loop-arrays The quickest win for me does seem to be -fprefetch-loop-arrays #!/bin/bash # # usage: cd src; bash chooseopts.sh target filename.c formatname # e.g. # $ bash chooseopts.sh linux-x86-64-native rawSHA1_ng_fmt.c raw-sha1-ng # declare -a optimizers=( $(gcc --help=optimizers | awk '/^[ ]*-f/ {printf "%s\n%s\n",$1,gensub(/^-f/,"-fno-",1,$1)}') ); declare -a scores declare -ir testtime=30 # for every optimization option gcc reports it supports, build an object # and benchmark it. for ((i = 0; i < ${#optimizers[@]}; i++)); do # build a new john with this flag applied to this object file. if ! make ${1} MAKE="make -W ${2}" CC="gcc -frandom-seed=seed ${optimizers[i]}" &> /dev/null; then # code doesn't compile, skip it. continue fi # if it built, find checksum of this code. checksum="$(objdump -d ${2//.c/.o} | cksum | sed 's/ //g')" # if another flag generated the same code, we don't need to benchmark, # we can just re-use the results. if test -n "${scores[checksum]}"; then printf "optimizer %s code cached\n" ${optimizers[i]} 1>&2 else # no luck, we need to run it. if ! results=$(../run/john --test=$testtime --format=${3}); then # crashes, or doesn't pass test. continue; fi # cache the scores. scores[checksum]="${results}" fi # output score. printf "%s %s\n" "${scores[checksum]}" "${optimizers[i]}" done With #pragma: $ ../run/john --format=raw-sha1 -test=30 Benchmarking: Raw SHA-1 [128/128 SSE2 intrinsics 4x]... DONE Raw: 14214K c/s real, 14271K c/s virtual Without: $ ../run/john --format=raw-sha1 -test=30 Benchmarking: Raw SHA-1 [128/128 SSE2 intrinsics 4x]... DONE Raw: 14141K c/s real, 14178K c/s virtual So..not earth-shattering, but worth a pragma imo. Tavis. -- ------------------------------------- taviso@...xchg8b.com | pgp encrypted mail preferred -------------------------------------------------------
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.