|
Message-ID: <20240515215045.GA17357@openwall.com> Date: Wed, 15 May 2024 23:50:45 +0200 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Markov phrases in john On Wed, May 08, 2024 at 07:12:47AM -0400, Matt Weir wrote: > First, Mask mode is only in the basic sense a Markov chain. With Hashcat at > least, it's more letter frequency sorted (vs alphabetical sorted). Aka > instead of abcdef'. it is aeionr' BUT the order of the second character > isn't influenced by the first character. I assume JtR is the same way but > that's probably worth doublechecking as I've certainly been wrong before. Yes, I wouldn't call our mask mode a Markov chain at all. Not even 0th-order. That's because while we do the character frequency sorting, we not only have no chains from one character to another (could have been 0th-order, or degraded case of a Markov chain), but we also iterate the indices in simple positional numerical system order. This is even dumber than JtR 1.0's incremental mode, which I do call 0th-order Markov. I was hoping hashcat's was better than that, is it not? We have Markov chains or alike in incremental mode and in Markov mode. Externally, there's also OMEN. > None of this answers your question though! I'm not aware of a way to > bootstrap passphrase generation into existing JtR Markov modes due to the > huge size of the character set. Aka you are looking at an 'alphabet' of 10k > or more values. I'm not that familiar with JtR Markov mode's internals (I didn't write that code), but I am with incremental mode's (which I wrote). It is possible to modify the code such that it would handle the much larger "alphabet", but then to have sane memory requirements we'd need to reduce the number of conditions or in other words Markov chain order. Incremental mode is currently like a 2nd-order Markov chain since it considers 2 preceding characters (works on 3-grams). However, it also considers the current length and character position, making it kind of 4th-order. We can reduce some of this to allow the larger "alphabet". But I am lazy and I'd like to try my other idea that I mentioned in the talk and in another message in here earlier today - mapping of frequent substrings (likely syllables) to/from an "alphabet" small enough to fit the existing incremental and Markov modes and OMEN and whatever NN-based models people try. I think we need to do this anyway, to have another baseline for comparisons and another stream of candidates that (within a realistic number of initial candidates) will only partially overlap with a future full per-word Markov chain's (so will remain useful as well). > Now if you don't want to do conditional probability and do more 'individual > words frequency sorted' like mask mode, that is a lot easier to do. I > wouldn't be surprised if there is an external mode in JtR to do just this > already. This wasn't convenient to do from an external mode because of its too limited interfacing to the rest of JtR. We offered two kinds of modes: standalone generators and filters. The former would have to have the wordlist embedded in the code, which is cumbersome. The latter can produce at most one output word per input word (filtering or modifying it). This changed when JimF introduced hybrid external modes in 2016 - that's two extra callbacks. So a candidate passphrase generator as an external mode is more reasonably implementable now, but not done yet. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.