Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <56BE4064.4090209@openwall.net>
Date: Fri, 12 Feb 2016 14:28:20 -0600
From: jfoug <jfoug@...nwall.net>
To: john-dev@...ts.openwall.com
Subject: java KeyStore hash format

Different 'type' of format.  This format is 1% key and 99% salt 
overhead, so is VERY salt heavy.

Now the default behavior of jtr, is to load keys, and then do a 
set_salt() / crypt() / set_salt() / crypt ....  until all salts are handled.

When using CTX model, and working on 1 hash at a time, this makes not 
difference at all. However, when working with any HW which parallel 
groups candidates, then this method of setting keys, and then switching 
salts and calling crypt all until all salts are completed, would kill 
ANY benefit, since each salt switch would cause every candidate buffer 
to have to be fully rebuilt (1000's of characters each).

Now, I have come up with a method to handle this.  for SIMD, I prebuild 
a simd record for each salt (making SURE) that if the structure is 
already built (i.e. same salt), that I do not rebuild it again, but 
re-use it.  This simd structure has this layout

struct simd {
      simd *next;
     unsigned first_limb[#threads][21][1-sha1-simd buffer];
     unsigned *extra_limbs[21];
     unsigned hash0[5];
};

Note, that first limb is 'allocated' #threads deep (i.e. first_limb is a 
pointer).  The extra_limbs are allocated 1 at a time. They are all the 
SIMD buffers for each password length.  The 21 is the number of password 
lengths we handle in SIMD code.  We handle 4 to 24 byte passwords (21 
possible).   Then within the salt record returned to jtr, we place a 
pointer to this simd pre-built structure. We still require the 
'original' salt to contain the original data also. Even though the SIMD 
code will not use any of it at all, it will still be required for any 
passwords < 4 or > 24 characters long. Those will still use the CTX 
method to compute the hash results.

So this structure is allocated and setup at salt loading time.  Then at 
runtime, set_salt() simply sets up a pointer.

Then in crypt_all, we sort the passwords based upon length (other 
formats such as cryptsha256 has to do this for SIMD work also). Then 
when we run a block of work, we determine which thread we are, and ONLY 
use the first_limb[] data FOR that thread (since this first limb is 
read/write).  All passwords for the 'block' of work, will be of the same 
length.  So all work would be done in the same buffer, already properly 
setup for passwords of that length.

Then the code simply does a SIMD_SHA1 of that buffer, once filled, and 
then SIMD_SHA1-resume hashes of the 'extra_limb' that matches the proper 
password length.

Then any passwords left over ( < 4 or > 24 byte passwords), are done 
using CTX logic.

This code will be checked in soon.  I have it working, but need to clean 
things up a bit.  I was able to get good SIMD improvement (3x on a 
SIMD_COEF32 4 AVX box).

Note, this would have been easier to do, if the format was built only 
for handling a single salt at a time, OR if JtR had a salt based mode, 
where a salt was loaded, then every password was tested against that 
salt, then the next next salt was tested. But since we do not have a way 
to operate in that model, I had to engineer the logic to cache things so 
there would be no time spent in the salt swapping.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.