|
Message-ID: <20150913145939.GA2813@openwall.com> Date: Sun, 13 Sep 2015 17:59:39 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: get_hash*() might be removed with other format interface On Sun, Sep 13, 2015 at 05:17:35PM +0300, Aleksey Cherepanov wrote: > Consider format interface where crypt_all() fills an array of integers > just like get_hash_6() returns. So we could avoid MAX_KEYS_PER_CRYPT > function calls. But we would need to move body of get_hash_*() into > caller, so we would not reduce bodies. This is already possible in the current interface, by having crypt_all() make use of its optional capability to check against loaded hashes. The caller would still double-check, but only if crypt_all() returned that there might be matches. So what you suggest sounds like middle ground: include the bitmap index computations into crypt_all(), but don't necessarily include actual bitmap lookups in there. This may be shorter source code (given how many formats we have), but it's also less optimal (having to store those indices somewhere instead of using them right away). Maybe the source code complexity and duplication should be reduced in some other way, such as by offering a macro that crypt_all() could use to perform the bitmap lookups. Another aspect is that with our current naive use of OpenMP, moving the bitmap lookups into crypt_all() allows to have them in the same parallel region and in the same threads with the corresponding hash computations. In an OpenMP-enabled format, your "crypt_all() fills an array of integers" idea would still involve the overhead of transferring many of those integers between CPUs - well, unless we switch to program-level OpenMP support (where the parallel region would be at higher levels, and everything below it would be MT-safe) or some other way of managing multiple threads at that higher level, which we should want to anyway. > It would hit flexibility. For instance, dummy format uses different > ways to compute get_hash_*(). Hm, I guess it does not affect > performance at all, because only 1 get_hash_*() function is used > during the whole run of john, right? For a given salt, yes. > It would hit FMT_BS formats when bitmap is not in use. For that case, > a separate method might be introduced to fill the array separately. Yes, and IIRC bitslicing was the primary motivation behind having those separate get_hash*() functions. OTOH, there's no good reason to have separate binary_hash*() functions - this should go. Instead, we can have one binary_hash() function accepting a binary and a desired hash table index size. BTW, I think a few smallest hash table sizes (2 or 3) should go as well. They pre-date our use of bitmaps. Our current 4th size is 16-bit, and that's only an 8 KB bitmap, so fits typical L1 data cache. The hash table that follows (upon a hit) is larger, though, such as 128 KB on 64-bit with PASSWORD_HASH_SHR 2. For a salted hash, those might consume quite some RAM, so this may be a reason to keep size 3 as well (and only use it with --save-memory?) At the same time, I think a larger size (than our current largest) should be introduced. Currently, the largest is 27-bit. If we introduce 30-bit on top of it (128 MB bitmap) and keep PASSWORD_HASH_SHR at 2 as it currently is, this means the hash table size will be 1 GB on 32-bit systems (28-bit indices and 32-bit elements), so is still barely acceptable compatibility-wise. (I assume that malloc() of 2 GB or more might not work on 32-bit, and wouldn't leave enough address space in the process even if it works.) Another reasonable option would be to replace the current 27-bit with 28-bit, and introduce 32-bit. Then we'd need to use PASSWORD_HASH_SHR of at least 4 (probably 5 or 6 since 1.5 GB for bitmap + hash table is too much) on 32-bit (but can keep it at 2 on 64-bit if we prefer). All of this is if we don't go for any better solution yet, but merely tweak what we've got first, optimizing speed rather than memory usage. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.