|
Message-ID: <CAJ9ii1HaA0M+FcWfe-iza8x5TrEx61oOmDz27=cUNa7R-Z3knA@mail.gmail.com> Date: Thu, 15 Feb 2018 10:17:58 -0500 From: Matt Weir <cweir@...edu> To: john-users@...ts.openwall.com Subject: Re: Markov Sampling Realized I had a typo in my genmkvpwd description. That would be for a max length 12, not 120. Should have picked my example level a bit differently ;p Matt On Thu, Feb 15, 2018 at 10:00 AM, Matt Weir <cweir@...edu> wrote: > This is for JtR's Markov mode. You can do similar options with more > advanced Markov methods like OMEN but I'm going to skip that for now. > > 1) Make sure you have your stats file trained already. The following > instructions will calculate the level based upon that. > 2) Open up the stats file. You'll see a lot of lines like: > > 97=proba1[32] > 51=proba2[32*256+35] > 44=proba2[32*256+38] > 51=proba2[32*256+39] > 40=proba2[32*256+48] > 35=proba2[32*256+49] > 37=proba2[32*256+50] > .... > > 3) The format is a bit wonky if you have never dug into it before. The > main thing to keep in mind is the level is the leftmost value. > 'proba1' means the probability for the first letter. 'proba2' means a > conditional probability for a letter following the previous letter. > > These examples taken from the code comments in my pcfg implementation. > Note when I'm using probability/level interchangeably, (not ideal I > know) : https://github.com/lakiw/pcfg_cracker/blob/master/python_pcfg_cracker/pcfg_trainer/markov.py > > # Format: > # Probability=proba1[ORD_REP1] > # Probability=proba2[ORD_REP1*256+ORD_REP2] > # > # Example: > # 27=proba1[97] //'a' has probability 27 > # 85=proba2[97*256+114] //'r' given 'a' has a probability of 85 > # 83=proba2[97*256+100] //'d' given 'a' has a probability of 83 > > 4) This is all based on the ASCII representation so 'a' = '97'. You > can ignore the *256. That's a holdover from the JtR implementation. I > suspect at one point there was a plan to support non-ASCII characters. > > 5) Now take the password you want to find the Markov level/probability > for. Let's assume it is 'cats'. You start out by looking up the proba1 > for [c]. That's your starting level. Then look up the prob2['c' + > 'a']. Add that level to your running total. Next look up proba2['a' + > t']. Add that to your running total. Finally look up proba2['t' + s']. > Add that to your running total and the sum of all of that is your > Markov level. > > 6) If you have a transition that isn't listed, (for example it uses a > non-ascii character) that guess will never be generated. > > 7) Use ./genmkvpwd to calculate how many password guesses would be > made for that level of the target password. While it won't tell you > where in that level the password would be guessed, you can probably > say it'll be (keysize for level)/2 and have it average out. With > Markov it isn't weighted to the front which is why cracking sessions > look a lot like a straight line with a few bumps in it. > > ./genmkvpwd statfile max_lvl [max_len] [start] [stop] > > Example (if you are looking up for level 120, max length 120) > ./genmkvpwd statfile 120 12 1 120 > > 8) Yeah you have to specify a maximum length for guesses and that does > impact the keysize. Also I can't remember if the results of genmkvpwd > are total for 0-level or just that level and you have to add all the > previous levels to get the guesses for it so you might need to play > with that. I think they are additive, (aka take the last value for the > total keysize), but I could be wrong as it's been a while since I > played with that. > > Good luck, and if this is confusing or didn't make sense let me know. > > Matt > > On Thu, Feb 15, 2018 at 9:10 AM, Matlink <matlink@...link.fr> wrote: >> I am interested by knowing how you calculate the level required to >> generate plaintext passwords (as you explain in 1)). >> >> >> Le 13/02/2018 à 19:16, Matt Weir a écrit : >>> So my initial reaction was that I'm doubtful printing out 1/10 of all >>> the password guesses will give you the end results you are looking >>> for. Here are a couple of other options: >>> >>> 1) You can always work backwards if your target plaintexts are known. >>> Aka calculate the level required to generate them, (that's easy to do. >>> Sum up all of the transitions based on your character file). Once you >>> have that, calculate how many guesses are required to get to that >>> level which the included tools in JtR will do for you. If you need >>> additional help on this approach I can write up something more >>> detailed. This is *super* fast and gives you a pretty good >>> approximation. >>> >>> 2) You can rely on JtR's own logging to see when passwords would be >>> cracked so that way you are not slowed down by piping the output to >>> stdout or using a 3rd party tool to figure out how many passwords you >>> cracked. If you want to go this route I'm sure the people on this list >>> can help out. >>> >>> There's other options as well I guess. You could always modify the >>> Markov code directly in JtR. If you want other examples of the Markov >>> code, I re-implemented it in Python for my PCFG cracker. You can see >>> the guess generation code at: >>> >>> https://github.com/lakiw/pcfg_cracker/blob/master/python_pcfg_cracker/pcfg_manager/markov_cracker.py >>> >>> Of course, my code is slow so that almost certainly is not the way to >>> go, but it may be useful as a reference. Long story short, if you let >>> us know more what you are trying to do I'm sure we can brainstorm some >>> better options. >>> >>> Matt >>> >>> >>> >>> On Tue, Feb 13, 2018 at 11:47 AM, Solar Designer <solar@...nwall.com> wrote: >>>> On Tue, Feb 13, 2018 at 04:44:03PM +0100, Matlink wrote: >>>>>> The pre-defined --external=Parallel mode will do what you ask for. >>>>>> You'll just need to customize the "node" and "total" numbers in its >>>>>> init() in john.conf. >>>>> Well, I guess it's only 'not printing' generated candidates? Does it >>>>> really speed up the process, since generating a password candidate is >>>>> more costly than printing it? >>>> It doesn't speed up the processing inside JtR; it actually adds extra >>>> processing. >>>> >>>>> Concretely, is --markov --stdout --external=Parallel with node 1/100, >>>>> 100 times faster than with node 1/1? >>>> No. It's probably roughly same speed: the external mode adds overhead >>>> internally to JtR, but then those skipped candidates don't need to be >>>> printed to the Unix pipe. >>>> >>>>>> However, note that "every 10th" doesn't necessarily produce a >>>>>> representative sample: the underlying cracking mode (in this case, >>>>>> Markov) might happen to have some periodicity in its output, and one of >>>>>> its period lengths might just happen to be a multiple of 10 or whatever. >>>>>> So ideally you'd want to randomize the order (if the order somehow >>>>>> doesn't matter for your research) over a larger number of candidate >>>>>> passwords - say, pass a million of them through GNU coreutils' shuf(1) - >>>>>> and then take every 10th out of that randomized list. >>>>> My issue is that I can't get the whole output because it is too costly >>>>> for me to gather them due to UNIX pipe. I would like to my >>>>> >>>>> john --stdout --markov --sample=100 | my_sublime_post-process >>>>> >>>>> be somewhat 100 times faster than >>>>> >>>>> john --stdout --markov --sample=1 | my_sublime_post-process >>>> You could use the built-in --node=1/100 feature, which probably will >>>> speed things up a lot, but then it almost certainly doesn't result in a >>>> representative sample - it's just a way to split the work between >>>> multiple nodes, without regard as to whether each node would get a >>>> representative sample and be expected to crack a similar percentage of >>>> real-world passwords that other nodes crack or not (so this probably >>>> won't be the case, making this approach unsuitable for use in research). >>>> >>>> The same applies to incremental mode. >>>> >>>>> Your solution requires to get the whole output of john and then >>>>> post-process it, but I can't find a satisfiable way to get its whole >>>>> output (since john is really fast to generate candidates). >>>> A question is whether you actually need to get this many candidates (or >>>> a sample from this many), or whether fewer would suffice. That depends >>>> on what your ultimate goal is. >>>> >>>> Alexander >> >> -- >> Matlink - Sysadmin matlink.fr >> Sortez couverts, chiffrez vos mails : https://café-vie-privée.fr/ >> XMPP/Jabber : matlink@...link.fr >> Clé publique PGP : 0x186BB3CA >> Empreinte Off-the-record : 572174BF 6983EA74 91417CA7 705ED899 DE9D05B2 >> >>
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.