|
Message-ID: <20090725212325.GB10716@openwall.com> Date: Sun, 26 Jul 2009 01:23:25 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: how charset are made ? On Sat, Jul 25, 2009 at 09:49:20PM +0200, websiteaccess wrote: > I have generated my own alnum.chr charset (from a cracked password > dico), with 200 000 words. It is not clear where those 200k "words" came from - were all of them real passwords? Was it possible for the same "word" to occur more than once (such as if it were a common password)? If you were cracking saltless hashes, then probably you'd only get one instance of each password, even if it matched multiple hashes... > I did a test : > > 1 - original JTR's charset (alnum) > 2 - my charset (alnum) > > Original charset are at least 3 times faster to find plaintext ! the > word was easy "test620" > > How do you explain that ? :-/ 1. The supplied .chr files are fairly good. :-) Processing the source material (including rejecting some of it) to generate the supplied .chr files involved quite some effort. 2. You need to do out-of-sample testing. Pick two non-overlapping sets of hashes. Use the cracked passwords for one of the sets to generate a .chr file. Use the hashes from the other set to test efficiency of the generated .chr file vs. the "corresponding" one supplied with JtR. The two sets don't need to be of equal size - you may well use 190,000 of cracked passwords to generate a .chr file and another 10,000 hashes to test its efficiency. 3. Time to crack a single hash is not of statistical significance. You need to run JtR on a large enough test sample - say, 10,000 hashes - and see how many get cracked after 1 minute, 1 hour, 1 day with each of the .chr files you test separately. 4. Another important test is to repeat #3 after having run "single crack" and wordlist. It is possible that one .chr file will perform better "from scratch", but another will perform better after "single crack" and wordlist. The latter will likely be of more use in practice (because you'd get a larger percentage of hashes cracked total - for all three cracking modes combined). So this the case I've been optimizing a few things for. > Is John build a charset based on words statitics ? This is not a very specific question, so I can't answer it directly. However, I can say that, yes, statistical information is collected and saved in .chr files. It does not include statistics on entire words (except unintentionally in some rare special cases), but it includes lists of characters sorted by their estimated probabilities (derived from the numbers of occurrences) for a given length, position, and two preceding characters. So it can be said that indirectly .chr files include character triplet statistics (separately for each password length and starting position of the triplet). Alexander -- To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply to the automated confirmation request that will be sent to you.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.