|
Message-ID: <20231125231630.GA10961@openwall.com> Date: Sun, 26 Nov 2023 00:16:30 +0100 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: deviating from incremental Hi, JtR's incremental mode generates strings consisting of likely characters, where each character is determined by two preceding ones (so we're working on trigrams), its position, and total string length. This is similar to Markov chains. While each trigram individually likely follows the training set nearly optimally, the same isn't necessarily true for the generated candidate password as a whole (nor is this desirable, since we want training and generating new candidates, rather than memorizing and recalling of what we already had in a wordlist). Hypothesis: longer passwords tend to contain a few (or just one) character(s) that would be outlier(s) within a trigram. In other words, what's statistically unlikely among 3 characters becomes statistically likely to appear at least once (or perhaps exactly once) among, say, 8 characters. (A related hypothesis, not to be tested here, is that the same holds true for words in a passphrase.) I use a representative 1 million non-unique NT hash sample from HIBP v8 generated using my pwned-passwords-sampler (which virtually repeats each hash according to its count, then takes a random sample). There are 826973 unique hashes in the sample. First, test 1 billion candidates using incremental mode's default charset (currently based on RockYou): ./john -form=nt -inc -len=8 -max-cand=1000000000 -verb=1 sample This cracks 49141 unique hashes (john.pot lines), or 72660 non-unique passwords (last line of "john --show"). Then save john.pot, and let it run for another 1 billion candidates: ./john -rest This increases the total cracks to 55202 unique, 80373 non-unique. Now revert john.pot to what it was after the first run (at 1 billion) and try replacing the second run (second 1 billion candidates) with possible alternatives. One such alternative is overstriking each character in the incremental-produced stream with each possible other printable ASCII character. With the 1 billion limit on candidates actually tested against the hashes, this means that the input incremental-produced stream is cut short at a much lower count. In this test, we also start that stream from the beginning rather than from the 1 billion mark. ./john -se=inc -inc -len=8 -stdo | ./john -form=nt -pipe -ru=o1 -max-cand=1000000000 -verb=1 sample The total cracks are now 60152 unique, 88598 non-unique. That's already an improvement. Maybe going for the full printable ASCII is taking the idea unoptimally too far, let's revert john.pot again and instead try [a-z0-9]: [List.Rules:o1an] ->\r[1-9A-ZZ] >\p[0-9A-Z] o\0[a-z0-9] Q ./john -se=inc -inc -len=8 -stdo | ./john -form=nt -pipe -ru=o1an -max-cand=1000000000 -verb=1 sample The total cracks are now 65240 unique, 93976 non-unique. A further improvement. Notably, there's relatively little overlap in candidates tested and passwords cracked by the 2nd billion of incremental vs. these alternatives. Totals for 2 billion incremental (where we had 55202 unique) and 1 billion of o1 are 65764 unique, 95123 non-unique. For 2 billion incremental and 1 billion of o1an they are 69574 unique, 99003 non-unique. For all of these (4 billion candidates total, including significant overlap between o1 and o1an), they are 71330 unique, 101144 non-unique. For comparison, 3 billion of incremental gives 62751 unique, 89318 non-unique. 4 billion of incremental gives 66833 unique, 94363 non-unique. So we seem to have an improvement. Of course, the pipe approach is way too slow for NT hashes, but with a much slower target this could make sense. Also, we may want to think of whether/how to implement such functionality within one invocation. Right now, we can use a mask on top of incremental mode, but this is only good enough to prepend or append a mask character (or several), not to replace or insert characters inside a previous mode's candidate. I did not run many other experiments nor tune the numbers (e.g., the 1 billion figure) to achieve this effect. I did, however, try one other thing - "-inc -len=7" with "-ru=i1" - so inserting into 7-character incremental output instead of replacing in 8-character output. With full ASCII (I didn't try any other), this performed moderately worse than the 2nd billion of incremental. So the improvement might not be consistent. Yet it looks like the hypothesis at least has merit. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.