john-users - RE: Contributing significant changes to the jumbo patch (mostly performance improvements)

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <011301ca01a4$cb1e37e0$615aa7a0$@net>
Date: Fri, 10 Jul 2009 16:24:22 -0500
From: "Jim" <jfoug@....net>
To: <john-users@...ts.openwall.com>
Subject: RE: Contributing significant changes to the jumbo patch 	(mostly performance improvements)

Patch: john-1.7.3.1-all-5-several-performance-updates-1.diff

Things changed in this patch:

Memory file:  A file is preloaded and read from memory.  There
is a 25-50% improvement in speed doing this.  Note, requires
memory, so of course, there is trade-offs.  By default, will
load all files under 5 million bytes.  This size is controllable
by the new option --mem-file-size=#, and the --save-memory
option changed so that if it is present, it will shut off
memory file.   The biggest gains come from non-salted and
fast algorithms (such as rawMD5).  Also, biggest gains are 
when using --rules.  When a memory file is used, only ONE
read of the file is done into a memory block, then the
lines are read (possibly over and over again) from this 
block.   In the existing file reading mode, there is a file
IO call for each line.

Larger hash sizes.  In prior version, 4k elements was the largest
hash table size.  Now, there has been a 64k element and 1mb element
hash table sizes added (5 total now).  This made a 3x to 4x 
improvement in speed for a fast non-salted algorithm like rawMD5,
when working with a large set of user entries (such as 150k records).
At 150k, there was almost a 10x improvement in speed.   For slower
Algorithms or algorithms which have GOOD salts, this performance
gain will be little to none.  NOTE at this time, only rawMD5 and
phpass formats actually implement the 4th and 5th hash level. The
code automatically ignores these hash levels in the other formats.
However, if there ARE other formats which could benefit, then 
all that has to be done, is add the 2 functions for each of the
hashing methods to the format, and then put the function pointers
into the format object. Again, non-salted (or where the salt is
'broken'), fast formats, where you get a LARGE set of candidates
to test, is the ideal situation.  Other formats, may or may not
need these extra functions added.

Added ability to reduce the ftell() (IO performance hit), within
the wordfile.c.   This is controlled with a new command option:
--fix-state-delay=#   The default is 0.  Again, the big improvements
(if any) would come from FAST non-salted algorithms.  Setting to
10 (or even 100), on an algorithm processing 5 million passwords
per second will not make crash recovery any different, but will
help to reduce the file IO calls.  NOTE if the program is running
in memory file mode, then it already has done away with ALL calls
to the ftell() function.

Some changes made to some generic string hashing functions.  Small
changes were made, that allow the hashes to run just as fast, but
spread the hash returned values better.

The percentage done shown on screen has been enhanced to be 100th
of a percent:  (so now 63.42% instead of 63% is shown).  Not a 
performance patch, but due to collisions, this patch was put in.
The patch should cause no problems, it is pretty much out of the
'normal' runtime path of execution.

I am sure there is something else I am forgetting, but I need
to head home, or the wife will beat me, lol.   If there are 
other changes, I will post a follow up.

Jim.

-- 
To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply
to the automated confirmation request that will be sent to you.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.