|
Message-ID: <20080425204839.GB10423@openwall.com> Date: Sat, 26 Apr 2008 00:48:39 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: working with large files On Fri, Apr 25, 2008 at 02:22:14PM +0200, Simon Marechal wrote: > I noticed that john slows down a lot when working with unsalted > large password files. It spends all its time at first in the functions that > are called when a password is found: You've raised two related, but distinct issues here: 1. Slowness with a lot of passwords per salt (or just with a lot of passwords when there are no salts). Originally, JtR was designed for salted hashes, and it assumed that salts were being generated in a not-too-broken fashion. In practice, I saw Unix password files with up to around 10,000 of hashes for a given salt (on systems that didn't generate salts properly). To deal with that, I introduced hash tables of three sizes - 16, 256, and 4096 entries - and one of these is being chosen for any given salt depending on the number of hashes with that salt. 4096 entries was about the right size for those salts with 10,000 hashes. However, support has been added for many saltless hash types since then (especially with patches such as yours), and there's often a lot more than 10,000 of saltless hashes being loaded into JtR at once. I will likely deal with that by introducing two additional hash table sizes - with 64 Ki and 1 Mi entries - or maybe by introducing arbitrary-size hash tables (although the hash functions for those would be slower). 2. Slowness in processing of successful guesses. It was my deliberate decision to focus my optimizations on algorithms and code that significantly affect overall performance of long-living sessions. Even if it takes, say, as much as 1 second to process a successful guess, and you have loaded, say, 90,000 of password hashes into JtR and 90% of those get cracked, that's at most 1 day of CPU time spent on processing those guesses. This is a constant amount of time that becomes almost negligible for sessions that run for weeks or months. In practice, all of these numbers are almost always a lot better - it takes maybe some milliseconds to process a guess, etc. Yes, things get worse as the number of hashes per salt increases. > Each sample counts as 0.01 seconds. > % cumulative self self total > time seconds seconds calls s/call s/call name > 53.99 35.69 35.69 1120 0.03 0.03 ldr_update_salt > 26.55 53.24 17.55 1119 0.02 0.05 crk_process_guess > 12.63 61.59 8.35 884044 0.00 0.00 ldr_load_pw_line > 4.18 64.35 2.76 817642756 0.00 0.00 binary_hash3 Please note that you've been running JtR for just around 1 minute here, yet you've got over a thousand passwords cracked. That's around 60 ms per successful guess (including all processing and "overhead"). At this rate, you can have JtR process around 1.5 million of successful guesses per day (or even more than that since the number of remaining hashes will be decreasing). If you let the same session run for an hour and look at the gprof output, you'll most likely notice other functions get to the top. > This really isn't cool. I already increased the size of the values > returned by binary_hash and stuff like that. However, it would be really > useful to be able to speed up the ldr_update_salt function. Has anybody > a suggestion on how to do this in a clean way? Well, what JtR does right now is to completely rebuild the per-salt hash table each time there's a guess for that salt (and a hash gets removed). The obvious optimization here is to update the existing per-salt hash table instead. If there's demand for a speedup in this area, even though it is negligible for long-living sessions, then perhaps this should be done. Alternatively, you could choose to not remove hashes for cracked passwords at all. This means that there may be duplicate guesses if duplicate candidate passwords are tried (but it's not much of a problem since "john --show" will only report each guess just once). It also means that the reported effective c/s rate will be artificially higher - this will not correspond to any meaningful speedup, and overall things may get slower for long-living sessions because keeping more hashes in the tables results in a higher number of collisions. > PS: I couldn't give it much thought, but i guess that replacing > ldr_update_salt by a piece of code in crk_process_guess that removes the > found password in the next_hash chain would be enough to gain much > speed. Would there be side effects? This is what I meant by "updating the existing per-salt hash table" above. There should be no side-effects if you do it right (and if the hardware is not faulty). Also, don't forget to update the "count". Another change I had considered is to add code to switch to smaller hash table sizes when a lot of passwords get cracked for a given salt. Right now, the hash table sizes are only determined at startup. This intent of this change would be to use processor caches more optimally. Alexander -- To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply to the automated confirmation request that will be sent to you.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.