|
Message-ID: <20180911154246.GA3070@openwall.com> Date: Tue, 11 Sep 2018 17:42:46 +0200 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: good program for sorting large wordlists Hi, On Tue, Sep 11, 2018 at 05:19:18PM +0200, JohnyKrekan wrote: > Hello, I would like to ask whether someone has experience with good tool to sort large text files with possibilities such as gnu sort. I am using it to sort wordlists but when I tried to sort 11 gb wordlist, it crashed while writing final output file after writing around 7 gb of data and did not delete some temp files. When I was sorting smaller (2gb) wordlist it took me just about 15 minutes while this 11 gb took 4.5 hours (Intel core I 7 2.6ghz, 12 gb ram, ssd drives). Most importantly, usually you do not need to "sort" - you just need to eliminate duplicates. In fact, in many cases you'd prefer to eliminate duplicates without sorting, in case your input list is sorted roughly for non-increasing estimated probability of hitting a real password - e.g., if it's produced by concatenating common/leaked password lists first with other general wordlists next, or/and by pre-applying wordlist rules (which their authors generally order such that better performing rules come first). You can eliminate duplicates without sorting using JtR's bundled "unique" program. In jumbo and running on a 64-bit platform, it will by default use a memory buffer of 2 GB (the maximum it can use). It does not use any temporary files (instead, it reads back the output file multiple times if needed). You can use it e.g. like this: ./unique output.lst < input.lst or: cat ~/wordlists/* | ./unique output.lst or: cat ~/wordlists/common/* ~/wordlists/uncommon/* | ./unique output.lst or: ./john -w=password.lst --rules=jumbo --stdout | ./unique output.lst As to sorting, recent GNU sort from the coreutils package works well. You'll want to use the "-S" option to let it use more RAM, and less temporary files, e.g. "-S 5G". You can also use e.g. "--parallel=8". As to it running out of space for the temporary files, perhaps you have your /tmp on tmpfs, so in RAM+swap, and this might be too limiting. If so, you may use the "-T" option, e.g. "-T /home/user/tmp", to let it use your SSDs instead. Combine this with e.g. "-S 5G" to also use your RAM. As to "it crashed while writing final output file after writing around 7 gb of data", did you possibly put the output file in /tmp as well? Just don't do that. I hope this helps. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.