|
Message-ID: <56BE4064.4090209@openwall.net> Date: Fri, 12 Feb 2016 14:28:20 -0600 From: jfoug <jfoug@...nwall.net> To: john-dev@...ts.openwall.com Subject: java KeyStore hash format Different 'type' of format. This format is 1% key and 99% salt overhead, so is VERY salt heavy. Now the default behavior of jtr, is to load keys, and then do a set_salt() / crypt() / set_salt() / crypt .... until all salts are handled. When using CTX model, and working on 1 hash at a time, this makes not difference at all. However, when working with any HW which parallel groups candidates, then this method of setting keys, and then switching salts and calling crypt all until all salts are completed, would kill ANY benefit, since each salt switch would cause every candidate buffer to have to be fully rebuilt (1000's of characters each). Now, I have come up with a method to handle this. for SIMD, I prebuild a simd record for each salt (making SURE) that if the structure is already built (i.e. same salt), that I do not rebuild it again, but re-use it. This simd structure has this layout struct simd { simd *next; unsigned first_limb[#threads][21][1-sha1-simd buffer]; unsigned *extra_limbs[21]; unsigned hash0[5]; }; Note, that first limb is 'allocated' #threads deep (i.e. first_limb is a pointer). The extra_limbs are allocated 1 at a time. They are all the SIMD buffers for each password length. The 21 is the number of password lengths we handle in SIMD code. We handle 4 to 24 byte passwords (21 possible). Then within the salt record returned to jtr, we place a pointer to this simd pre-built structure. We still require the 'original' salt to contain the original data also. Even though the SIMD code will not use any of it at all, it will still be required for any passwords < 4 or > 24 characters long. Those will still use the CTX method to compute the hash results. So this structure is allocated and setup at salt loading time. Then at runtime, set_salt() simply sets up a pointer. Then in crypt_all, we sort the passwords based upon length (other formats such as cryptsha256 has to do this for SIMD work also). Then when we run a block of work, we determine which thread we are, and ONLY use the first_limb[] data FOR that thread (since this first limb is read/write). All passwords for the 'block' of work, will be of the same length. So all work would be done in the same buffer, already properly setup for passwords of that length. Then the code simply does a SIMD_SHA1 of that buffer, once filled, and then SIMD_SHA1-resume hashes of the 'extra_limb' that matches the proper password length. Then any passwords left over ( < 4 or > 24 byte passwords), are done using CTX logic. This code will be checked in soon. I have it working, but need to clean things up a bit. I was able to get good SIMD improvement (3x on a SIMD_COEF32 4 AVX box). Note, this would have been easier to do, if the format was built only for handling a single salt at a time, OR if JtR had a salt based mode, where a salt was loaded, then every password was tested against that salt, then the next next salt was tested. But since we do not have a way to operate in that model, I had to engineer the logic to cache things so there would be no time spent in the salt swapping.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.