|
Message-ID: <CADkMHCnG9nnMcp7O29ff8CQeiNKq7x6Gg4=zPL0hcgvK1s6rzQ@mail.gmail.com> Date: Mon, 19 Dec 2011 08:48:34 -0700 From: RB <aoz.syn@...il.com> To: john-dev@...ts.openwall.com Subject: Re: DES BS + OMP improvements, MPI direction 2011/12/19 Solar Designer <solar@...nwall.com>: > On Sun, Dec 18, 2011 at 01:29:02AM -0700, RB wrote: >> On Sat, Dec 17, 2011 at 15:46, Solar Designer <solar@...nwall.com> wrote: >> > Are you suggesting that we'd apply MPI at the same low level where we >> > currently apply OpenMP? No, that doesn't sound like a good idea to me, >> > except maybe for the slowest hash/cipher types only (BF, MSCash2, RAR, >> > crypt when run against SHA-crypt), where it might be acceptable. >> >> Yes, it is what I'm suggesting. Obviously anything that completes in >> less than the RTT of the MPI environment would need to be handled >> differently (if at all), but the problem the MPI implementation has >> historically had is that it splits work at too high a level. Last I >> checked it was splitting work based on individual work steps - each >> thread took a given length x count and worked it by itself on that. >> For very fast hashes (like LM) I'm sure that's more acceptable, but >> for slower ones and with larger MPI implementations (hundreds to >> thousands of threads), the network very quickly gets to the point that >> every thread is working on an effectively impossible-to-solve step, >> which really isn't the end-user's intent of parallelization. > > I think you don't literally mean "every" above (as that would imply that > all easier "steps" were already completed, so everything is as desired), > but I get what you're referring to. For posterity's completeness, I meant that for a given run of X seconds, no given step/pass/stage (let's call it a 'set' unless you desire otherwise) that takes a single processor longer than X seconds to solve will finish, no matter the number of processor resources. This can be exacerbated in MPI environments with asymmetric processors (or even cooling, in the case of clock-boosting processors). It is often the case with MPI runs that, when aborted, many (if not all) of the processors have been working for a considerable length of time on a single set that, given the overall processing power available, "could" have been completed had all resources been focused there. It would be fascinating (but programmatically challenging) to have the implementation automatically switch approaches (or reduce working unit size) as timing for individual sets clears a given threshold. There will definitely have to be a balance, which history has shown you're certainly capable of finding and striking.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.