|
Message-ID: <20091028234533.GA23548@openwall.com> Date: Thu, 29 Oct 2009 02:45:33 +0300 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: wordlist generation On Sat, Oct 24, 2009 at 02:43:59AM +0200, SL wrote: > What is the recommended/preferrable method to convert an arbitrary > text file (SQL dump, con-'cat'-enated HTML files, Wikipedia XML > export, not a precompiled dictionary) into a (reasonably usable) john > wordlist? > > cat $textfile | tr -s -c "[:alpha:]\-??????????????" "\n" | ./unique > wordlist.lst > kind of works, but I wonder if there are better ways? You're on the right track. When I need something like this, I generally try to combine several approaches. Specifically, I pass the input files through several different tr's, splitting up "words" on different characters - e.g., in one of the invocations a dash will be a delimiter, but in another it will be part of the target "word". When processing files of a known format, such as SQL dumps, I may also use "sed" to extract and un-escape the values - e.g., for proper handling of apostrophes and backslashes embedded into the values vs. those added for the SQL dump. Then the resulting stream is passed through "sort -u" or "sort | uniq" (the standard Unix commands) or "unique" (the program included with JtR). The latter tends to be quicker (because it does not need to do any sorting), but when the input data was not sorted in a meaningful way, it may be better to have the resulting wordlist sorted alphabetically as that allows for some optimizations in JtR to work - detecting effectively-duplicates when the hash type truncates passwords at a certain length, as well as speeding up DES key setup. On the other hand, if the hashes are fast to compute and you do not intend to be applying plenty of rules to your wordlist, you may choose to save time on generating the wordlist and use the quicker "unique". BTW, "unique" can be made even quicker by increasing the values of UNIQUE_HASH_LOG and UNIQUE_BUFFER_SIZE in params.h. The defaults are rather conservative (using around 9 MB of RAM). Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.