john-dev - Re: Re: Failed self test for raw-sha1-ng (linux-x86-sse2i OMP)

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mpro.m6bwqh00fh20g0jpv.taviso@cmpxchg8b.com>
Date: Thu, 28 Jun 2012 15:13:32 +0200
From: Tavis Ormandy <taviso@...xchg8b.com>
To: john-dev@...ts.openwall.com
Subject: Re: Re: Failed self test for raw-sha1-ng (linux-x86-sse2i OMP)

magnum <john.magnum@...hmail.com> wrote:

> On 2012-06-28 12:54, magnum wrote:
> > On 2012-06-28 12:18, Frank Dittrich wrote:
> > > Due to another recent change in raw-sha1-ng, I get a new warning when
> > > compiling with clang version 2.9: rawSHA1_ng_fmt.c:127:14: warning:
> > > unknown pragma ignored [-Wunknown-pragmas] # pragma GCC optimize 3
> >>
> > > I don't know whether more recent clang versions support this pragma,
> > > so I wouldn't disable it for clang. I just wanted to let you know.
> > 
> > That pragma is not implemented in GCC versions earlier than 4.4 so we
> > can test __GNUC__ and __GNUC_MINOR__ - I'll have a look even though this
> > is 100% benign.
> 
> Fixed. We are really nit-picking now :-)
> 
> magnum
> 

Indeed ;-)

I was actually going to use this for another tweak, if I prefetch the next
passwords with __builtin_prefetch, I can pull a little extra performance out
of my hottest loop, but I've found that gcc doesn't do too badly with just
-fprefetch-loop-arrays (a non-default option).

I think I can do _slightly_ better than gcc, maybe I could perfect the code
gcc generates with various --params, but having gcc do it automatically is
nice. I was planning to just add #pragma GCC optimize
"-fpretch-loop-arrays".

Does that sound okay? I can wrap it in whatever __GNUC__ checks you like.

In fact, it made me curious if there are any other free performance wins, I
used this quick script below ( you can grep ^Raw: | sort -g )

Example output:

$ bash chooseopts.sh linux-x86-64-native rawSHA1_ng_fmt.c raw-sha1-ng | tee
log
Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 SSE4.1 intrinsics 4x]... DONE
Raw:	19482K c/s real, 19528K c/s virtual -fipa-type-escape
Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 SSE4.1 intrinsics 4x]... DONE
Raw:	19482K c/s real, 19528K c/s virtual -fno-ipa-type-escape
Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 SSE4.1 intrinsics 4x]... DONE
Raw:	19482K c/s real, 19528K c/s virtual -fivopts
Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 SSE4.1 intrinsics 4x]... DONE
Raw:	19391K c/s real, 19443K c/s virtual -fno-ivopts
....

$ grep ^Raw: log | sort -g | tail
Raw:	19538K c/s real, 19590K c/s virtual -funroll-all-loops
Raw:	19540K c/s real, 19572K c/s virtual -fno-align-jumps
Raw:	19573K c/s real, 19605K c/s virtual -minline-all-stringops
Raw:	19688K c/s real, 19721K c/s virtual -fprefetch-loop-arrays

The quickest win for me does seem to be -fprefetch-loop-arrays

#!/bin/bash
#
# usage: cd src; bash chooseopts.sh target filename.c formatname
# e.g.
#   $ bash chooseopts.sh linux-x86-64-native rawSHA1_ng_fmt.c raw-sha1-ng
#

declare -a  optimizers=(
    $(gcc --help=optimizers | awk '/^[ ]*-f/ {printf
"%s\n%s\n",$1,gensub(/^-f/,"-fno-",1,$1)}')
);
declare -a  scores
declare -ir testtime=30

# for every optimization option gcc reports it supports, build an object
# and benchmark it.
for ((i = 0; i < ${#optimizers[@]}; i++)); do
    # build a new john with this flag applied to this object file.
    if ! make ${1} MAKE="make -W ${2}" CC="gcc -frandom-seed=seed
${optimizers[i]}" &> /dev/null; then
        # code doesn't compile, skip it.
        continue
    fi

    # if it built, find checksum of this code.
    checksum="$(objdump -d ${2//.c/.o} | cksum | sed 's/ //g')"

    # if another flag generated the same code, we don't need to benchmark,
    # we can just re-use the results.
    if test -n "${scores[checksum]}"; then
        printf "optimizer %s code cached\n" ${optimizers[i]} 1>&2
    else
        # no luck, we need to run it.
        if ! results=$(../run/john --test=$testtime --format=${3}); then
            # crashes, or doesn't pass test.
            continue;
        fi

        # cache the scores.
        scores[checksum]="${results}"
    fi

    # output score.
    printf "%s %s\n" "${scores[checksum]}" "${optimizers[i]}"
done


With #pragma:

$ ../run/john --format=raw-sha1 -test=30
Benchmarking: Raw SHA-1 [128/128 SSE2 intrinsics 4x]... DONE
Raw:	14214K c/s real, 14271K c/s virtual

Without:

$ ../run/john --format=raw-sha1 -test=30
Benchmarking: Raw SHA-1 [128/128 SSE2 intrinsics 4x]... DONE
Raw:	14141K c/s real, 14178K c/s virtual

So..not earth-shattering, but worth a pragma imo.

Tavis.


-- 
-------------------------------------
taviso@...xchg8b.com | pgp encrypted mail preferred
-------------------------------------------------------
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.