|
Message-ID: <20150620221414.GA10461@openwall.com> Date: Sun, 21 Jun 2015 01:14:14 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: continuous feeding (for data driven branching) On Fri, Jun 19, 2015 at 05:33:11PM +0300, Aleksey Cherepanov wrote: > Just to document a discussion behind an idea... > > When there is a data driven branching we may want to pack intermediate > results into several queues. For example, sunmd5 does it after "coin > flip". > > Currently we get a lot of candidates, separate them into 2 packs and, > oops, packs may be sub-optimal: for instance, we got 8 candidates and > based on "coin flip" (or other factor), we got packs with intermediate > results for 3 and 5 candidates while we would like to have 4 and 4 for > our sse code. This problem also exists in cases where the data-driven branching is not in the original algorithm, but is acquired along with our optimizations such as bitslicing. This happens e.g. when we bitslice an algorithm where processing varied by password length. Once we do, we have data-driven (actually password length driven) branching in our attack specific code. Another example is tripcode, where descrypt salts are generated from candidate passwords, whereas our bitslice DES implementation assumes a fixed salt for all candidates. Buffering and grouping by password length or by tripcode descrypt salt or whatever is relevant works around this issue, but with the drawback you describe. > Continuous feeding: format reads N candidates, separate them and if it > needs more input to fill up buffers optimally then it asks for them. Yes, that's an obvious idea I had some years back... possibly as far back as 1998, when thinking of whether to bitslice bsdicrypt's key setup. (I ended up not bitslicing it at the time, and it still is not - only the main processing is bitsliced; I am still considering changing this, though, so that we'd drop the non-bitslice DES code from JtR tree. Would also need to deal with what'd become a similar issue in AFS_fmt.c, though - grouping by length. Right now, that format is not even bitsliced at all for that reason, and it feels wasteful to spend time on it when few people, if anyone at all, still need it.) Another way to do the same is to make the multiple queues visible to the cracking modes, which would then knowingly work on the multiple queues. This complicates the cracking modes' implementations, but may help solve some of the new issues such as progress and speed reporting, which could then be 100% correct, even if too complicated to many users ("what are all those queues I get separate reporting for?"), and crash recovery (can have separate checkpoints for each queue, much like we do for separate --fork'ed processes, or alternatively can record the fully processed point only, like "single crack" mode does for its rules now). > Problem 1: format's interface needs changes. Changes can be minimal: > currently crypt_all() can hold back some candidates, use them on next > call and report them as newly generated. The only changes needed: call > crypt_all() in the end with count=0 as a signal for crypt_all() that > it needs to flush all candidates at any cost because john finishes. > (As a modification: call with count=0 only if the last call had count > == MAX_KEYS_PER_CRYPT .) > > Such change was proposed by Solar Designer You and I were discussing multiple related issues at once at the time, and having crypt_all() hold back and then seemingly generate new candidates was your idea. At the time, I only pointed out one problem, which you list above: handling the very last crypt_all() call in a session, where nothing must be held back and the queue must be flushed. And I proposed that workaround for it. I've since recalled that this trick, like some other optimizations for various things that I considered before, unfortunately breaks crash recovery. With the current formats interface, cracker.c and the cracking modes will assume that crypt_all() has processed all of the candidates it was called for, and JtR may record that in its .rec file. Recovering that session may then effectively skip some candidates, which were never fully processed, but were queued by crypt_all(). Unfortunately, this can't be fully solved by making that new "flush the queue" crypt_all() call before process termination. This would work for intentionally aborted sessions where the user is willing to wait for this last call to complete, but it won't work for crash recovery, nor for impatient user aborts. The only ways to achieve reliable crash recovery appear to be: 1. Maintain per-queue checkpoints. -OR- 2. Record the fully processed point only. -OR- 3. Flush all queues before each .rec file update (so e.g. every 10 minutes, and don't update the .rec file until this flushing is complete). I don't like any of these very much. > and may be good for asynchronous calls too (to support FPGAs). As to asynchronous processing, while your idea happens to be relevant to that too, I'd rather implement separate support for async crypt_all(), where recording the crash recovery position would be as simple as staying one crypt_all() behind. On my to-do list, the new format method to achieve this is called set_block(), and it'd allow for crypt_all() to work on different input and output block indices. (Block is a new concept to be introduced along with this change.) As I had mentioned, I am also considering using this same blocks concept for higher-level multi-threading than what we have now. We can have one block per thread and use them for this per-thread separation rather than for async processing then. Or we can have two blocks total, and let our threads complete asynchronously to key generation for the other block. The former is more scalable, the latter allows to keep the simpler sequential cracking mode implementations. With per-thread blocks, we'd also have the crash recovery issue. To be clear: the above paragraph is about different issues, not the multi-queue thing that the rest of this posting is about. I digressed. But those other issues are subtly relevant, including in bumping into the same kind of crash recovery problem as soon as we let our multiple threads, rather than multiple queues within a thread, get out of sync. As soon as anything it kept out of sync beyond at most a small constant number of crypt_all() calls (like 1 or 2), crash recovery becomes significantly more difficult. And if both per-thread queues and multiple threads are out of sync, do we have to maintain N*M separate checkpoints or something?.. It gets tricky or/and nasty. > Problem 2: holding candidate (worst case: password) for too long: for > instance, 1 our queue got 1 candidate that is password but we don't > hash it because 1 is not optimal, all other candidates go to other > queue by a coincidence, so format waits for the finish to hash that 1 > candidate and user instead of 5 minutes has to wait long > hours/days/months/years. So format has to do something to not hold > candidates more than N minutes. It's been a very long time since I wrote this, but it seems the single.c equivalent to this problem and its workaround is this excerpt: if (keys->count && rule_number - keys->rule > (key_count << 1)) if (single_process_buffer(salt)) return 1; Looks like processing is forced if there are any keys queued for the current salt (keys->count is non-zero) and it's more than two key_count's worth of rules behind the candidate password generator. Optimally, exactly key_count candidates would be buffered for each salt before processing (so we'd process when keys->count == key_count), but as you can see we are also willing to process when this optimal point isn't reached in twice the number of rules (meaning that many rules happened to reject their inputs, not producing any output). Of course, the above excerpt is a workaround for a different variation of the problem, but it's similar in concept. > Problem 3: partial hashing and holding back would give inaccurate > speed reports. Yes, that too. Personally, I find the problem with crash recovery (which we didn't discuss, and which you didn't list) to be the worst of the four(?) Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.