|
Message-ID: <20060601234313.GA11394@openwall.com> Date: Fri, 2 Jun 2006 03:43:13 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Re: beta MPI john patch against 1.7.2. I'll respond to several messages at once: On Thu, Jun 01, 2006 at 06:57:56PM +0200, Michal Luczaj wrote: > (let's blame mail system! ;) Yes, the mail system is to blame. I had this list configured with some MIME type filters and those were too strict. I've shortened the list of disallowed MIME types now. More importantly, with the appropriate permissions from the authors, I have placed all four John/MPI hacks into the contrib/ directory: ftp://ftp.openwall.com/pub/projects/john/contrib/mpi/ On Thu, Jun 01, 2006 at 05:34:45AM +0200, Otheus (aka Timothy J. Shelling) wrote: > Please let me know of any rejects while performing the patch. (Use -l to > ignore whitespace to avoid problems with bench.c.) Yes, there were rejects on the Makefile, even with "patch -l". I had to fix those manually and re-generate the patch before placing it for download. Please use the following command for generating patches for subsequent revisions of your code: TZ=UTC diff -urp john-1.7.2 john-1.7.2-mpi > john-1.7.2-mpi-tjs-2.diff The "-2" will be the patch revision number - you'll increment it each time you generate a new patch for the same base version of John. If you add any new files, you'll need to add the "-N" option to diff as well. Also, your john.conf has lots of unrelated changes. It is a good idea to revert those. The same suggestions apply to any other patches I place in contrib/ - which is why I am posting them to the mailing list. > Known bugs: > o john.pot gets littered with extraneous output, usually low integers > like "4" or "1" and a newline. Very strange. Still trying to track it down. My previous guess (from the private e-mail) proves correct - you do in fact close stdout and stderr in your patch. While you try to re-open stderr, the way you do it is not guaranteed to work right. And I don't see you re-opening stdout at all. So those numbers are some debugging output being printed to either stdout or stderr - perhaps they are MPI_rank's? > o internalized computing the checksum for ext_word. 15% increase in speed > (at least) > o use of external filter with internal checksum causes a slight (1-2%) > slowdown You can't reasonably speak of speedups/slowdowns by a certain percentage unless you mention the hash type and the number of different salts for which you did your benchmarks. On Wed, May 31, 2006 at 10:53:07PM +0200, Otheus (aka Timothy J. Shelling) wrote: > It occured to me, just before I looked at the "Pippin" MPI project, that in > both Ryan's implementation and in mine, once a particular hash is cracked by > a particular task, the other tasks are *still* comparing against the cracked > passwords. They are also generating hashes from salts that may be no longer > needed. [...] > Question: Where/how should the other tasks handle this event? I'm somewhat > lost on this, but my current guess is that for each cracked ciphertext > received via MPI, the task must find the right salt list and pw entry can > call crk_process_guess(). If there's a cleaner way, I'd love to know. Yes, crk_process_guess() is what you need. However, you'll need to modify it such that it accepts a boolean flag indicating whether to call log_guess(). The existing calls to crk_process_guess() would continue to log, whereas your added call, invoked on request from another task, would merely remove the cracked hash/salt. > Question: Where/how should the cracking task broadcast this event? It seems > pretty clear that this should be in crk_process_guess(). Of course, if my > guess above is true, the crk_process_guess must be called for every cracked > password received from the other tasks, then this would be split into two > functions right after the log() call. Actually, the added "if" statement would be right before the log_guess() call, as I have suggested above. You would need to broadcast the event right after the log_guess() - but still within the "if" block. > Alternatively, I could add a boolean > parameter to the function which would be use to skip re-broadcasting and > re-logging the cracked password. Exactly. On Wed, May 31, 2006 at 11:30:14PM +0200, Otheus (aka Timothy J. Shelling) wrote: > On synchronizing crypts/salts between all tasks, another thought occurred to > me. After every checkpoint, simply load in the john.pot file (assuming it's > not corrupted ;) ). Would that be a simpler, (though slightly less optimal), > approach? Yes, that would work - as long as you have that file shared between the tasks anyway. On Tue, May 30, 2006 at 02:28:08AM +0200, Otheus (aka Timothy J. Shelling) wrote: > >> o Recovery: Each task runs with its own session; the whole thing can > >> be re-started in recovery mode > > > >OK, that can work - but does it mean that there's a .rec file for each > >task? > > Yes. With more work, I could set it up so the root task keeps track of one > record file, but I'd need to understand your restore format/logic very very > well. Well, it'd be really tricky to have a single .rec file given the way you split the keyspace - unless that file would essentially combine the information from the current .rec files of each task - making the file fairly large. Unfortunately, John currently rewrites .rec files in-place, so if the file becomes larger than a filesystem block it is more likely to get corrupted. > Well, cluster programs generally assume equal-performing nodes.. It's rare > that they don't. At any rate, if you have a 10 cpu cluster, and half the > cpus are, let's say, only 66% the speed of the other half, then you would > assign particular tasks to particular node names (using a carefully > constructed machines files) and modify the external filter so that the > faster cpu/tasks get 33% more keys than the other. The filter logic would be > a tad bit more complex, but would only amount to a few cycles per key per > task. That's too tricky to setup for an end user - even for an experienced one. Besides, the performance of John might differ between tasks even on the same hardware due to different placement in caches and different physical page allocations. I've been trying to reduce cache aliasing conflicts in John by placing all frequently accessed data in sequential structs - and this works well most of the time - but there are still cases when performance is not stable. This mostly affects CPUs with direct-mapped caches such as Alphas and fast hashes such as LM ones. > > What happens if a node fails? > > If MPI is working properly, the whole job fails, and you would restart the > job in recovery mode. So that job would be somewhat behind the others - resulting in candidate passwords being tried in a non-optimal order. > If MPI implementation is buggy (like most are), you > could implement some kind of MPI_Barrier in the sig-alarm handler so that if > a node fails, everything stops. That's not great. > > What if you want to add/remove nodes? > > Removing nodes: I'm not sure ... I'd want to make sure that the removed > nodes weren't working on keys that no longer get generated after a restore. That's not really compatible with the way you split the keyspace and let each node proceed further into the keyspace on its own. > Adding nodes: I'd have to look at your restore file layout/logic to see if > restore files could be cloned as dummys for the new nodes. A restore, but > with more nodes, would work fine, since the external filter use the current > MPI job's Rank and Size. I don't think that hacking the .rec files is the right way to do this, and I also don't think that you can do it optimally. Yes, you can split a node's remaining keyspace in two and assign a half of it to the added node. But that would complicate the filter() and result in a portion of the keyspace being searched twice quicker than the rest of it - so the ordering of candidate passwords once again becomes non-optimal. Basically, you should be splitting the keyspace differently in order to implement this functionality. Speaking of the overhead: > >The impact is minor with slow hashes and multiple salts. The impact is > >major with fast saltless hashes. > What's an example of a fast, saltless hash? I'll benchmark it. LM hashes are good for this. -- Alexander Peslyak <solar at openwall.com> GPG key ID: B35D3598 fp: 6429 0D7E F130 C13E C929 6447 73C3 A290 B35D 3598 http://www.openwall.com - bringing security into open computing environments Was I helpful? Please give your feedback here: http://rate.affero.net/solar
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.