Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120926114522.GA4627@cmpxchg8b.com>
Date: Wed, 26 Sep 2012 13:45:22 +0200
From: Tavis Ormandy <taviso@...xchg8b.com>
To: john-dev@...ts.openwall.com
Subject: choosing code optimisations

Hey, just in case anybody else wants to try it with their formats, here is a
quick note on finding low hanging performance wins for gcc. I've been using the
attached script to benchmark different options, it turns out
prefetch-loop-arrays is a pretty major win for raw-sha1-ng, so I include:

#pragma GCC optimize "-fprefetch-loop-arrays"

Here is an example for raw-md5, a default build of jumbo7 with
linux-x86-64-native produces this:

$ ../run/john --test --format=raw-md5
Benchmarking: Raw MD5 [128/128 SSE2 intrinsics 4x]... DONE
Raw:    17693K c/s real, 17693K c/s virtual

$ bash ~/chooseopts.sh linux-x86-64-native rawMD5_fmt_plug.c raw-md5 | sort -g | tail
Raw:    17971K c/s real, 18025K c/s virtual -fno-regmove
Raw:    18033K c/s real, 18105K c/s virtual -fschedule-insns
Raw:    18039K c/s real, 18075K c/s virtual -falign-loops
Raw:    18052K c/s real, 18106K c/s virtual -fno-forward-propagate
Raw:    18072K c/s real, 18108K c/s virtual -fno-tree-loop-optimize
Raw:    18092K c/s real, 18146K c/s virtual -fno-guess-branch-probability
Raw:    18158K c/s real, 18194K c/s virtual -fno-reorder-blocks
Raw:    18168K c/s real, 18223K c/s virtual -fno-tree-ch
Raw:    18185K c/s real, 18221K c/s virtual -fno-if-conversion
Raw:    18516K c/s real, 18553K c/s virtual -fno-tree-ter

So -ftree-ter (enabled by default) sounds like it's hurting a lot...

$ sed -i '1i#pragma GCC optimize "-fno-tree-ter"' rawMD5_fmt_plug.c
$ make clean linux-x86-64-native
$ ../run/john --test --format=raw-md5
Benchmarking: Raw MD5 [128/128 SSE2 intrinsics 4x]... DONE
Raw:    18484K c/s real, 18484K c/s virtual

Nearly 1M c/s for free, maybe you want to guard it with gcc major/minor version
checks, but it seems too good to ignore. Here is the default output for nt:

$ ../run/john --test --format=nt
Benchmarking: NT MD4 [128/128 X2 SSE2-16]... DONE
Raw:    37253K c/s real, 37253K c/s virtual

$ bash ~/chooseopts.sh linux-x86-64-native nt2_fmt_plug.c nt | sort -g | tail
Raw:    38596K c/s real, 38596K c/s virtual -fno-tree-loop-optimize
Raw:    38709K c/s real, 39100K c/s virtual -fno-strict-aliasing
Raw:    38822K c/s real, 39214K c/s virtual -fno-optimize-register-move
Raw:    38822K c/s real, 39214K c/s virtual -fno-regmove
Raw:    38846K c/s real, 39239K c/s virtual -fnon-call-exceptions
Raw:    38955K c/s real, 39348K c/s virtual -fno-reorder-blocks
Raw:    39007K c/s real, 39007K c/s virtual -fno-tree-ter
Raw:    39326K c/s real, 39326K c/s virtual -freorder-blocks-and-partition
Raw:    39343K c/s real, 39740K c/s virtual -fno-crossjumping
Raw:    39373K c/s real, 39373K c/s virtual -fno-guess-branch-probability

Unexpected, but -fguess-branch-probability doesn't seem helpful:

$ sed -i '1i#pragma GCC optimize "-fno-guess-branch-probability"' nt2_fmt_plug.c
$ make clean linux-x86-64-native
$ ../run/john --test --format=nt
Benchmarking: NT MD4 [128/128 X2 SSE2-16]... DONE
Raw:    39428K c/s real, 39428K c/s virtual

A free 2.2M c/s improvement.

Tavis.


-- 
-------------------------------------
taviso@...xchg8b.com | pgp encrypted mail preferred
-------------------------------------------------------

Download attachment "chooseopts.sh" of type "application/x-sh" (1419 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.