Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080203132508.GA4638@openwall.com>
Date: Sun, 3 Feb 2008 16:25:08 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re:  faster hash file loading

On Sat, Feb 02, 2008 at 06:32:44PM +0000, helleye wrote:
> i think that if loader.c will use ifstream::getline
> the loading might be lot faster

I doubt it, although this is system-specific.

The loader actually does quite a lot of work - parsing the input lines,
validating and decoding hash encodings, combining hashes with matching
salts, eliminating duplicate hashes (when "single crack" mode is not to
be used), updating linked lists and hash tables, etc. - yet it is quite
fast, given this amount of work.  Typically, it can process hundreds of
thousands of input lines (or perhaps even millions on newer systems) in
under a minute.

If you really want to optimize the buffered file reads, rather than
actual processing of the data (which is what most processor time is
probably spent on), then the way to do so would be by using lower-level
C library functions in a way that you think is more optimal for this
specific task and for your operating system.  For example, you can try
to use the read(2) syscall directly and implement your own buffered
input with no support for seeks and writes - and you'd use a larger
buffer (although you can also alter the buffer size with stdio).  Or you
could mmap(2) your file into the process address space, then use
madvise(2) with the MADV_SEQUENTIAL flag and have the loader go over the
address space range, avoiding the need for any explicit read buffer
(this approach is only available on some operating systems).

However, let me repeat: I don't expect any significant speedup from
this, and especially not consistent speedup across a wide range of
operating systems, their versions, and underlying hardware (cache sizes,
their relative speeds, etc. will affect optimal buffer size).

That said, I have not done any benchmarks of the loader on Windows,
which is what you appear to be using.  If you suspect stdio to be the
bottleneck, then one easy thing to try is to provide your own and much
larger buffer with setvbuf(3).  Please give this a try and post your
results in here (specific load times before and after the change, as
well as what buffer size you found to be optimal for your system).

> g++ -o john.exe DES_fmt.o ...
...
> DES_fmt.o: file not recognized: File format not recognized

Maybe you did not "make clean", resulting in a mixed build.

> any clue how to use ifstream in loader please ?

You really should not be doing this, and if you are - you're on your
own with it.

-- 
Alexander Peslyak <solar at openwall.com>
GPG key ID: 5B341F15  fp: B3FB 63F4 D7A3 BCCC 6F6E  FC55 A2FC 027C 5B34 1F15
http://www.openwall.com - bringing security into open computing environments

-- 
To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply
to the automated confirmation request that will be sent to you.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.