|
Message-ID: <20150506193536.GD8201@openwall.com> Date: Wed, 6 May 2015 22:35:36 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Adding OpenMP support to SunMD5 Lei, Frank - On Mon, May 04, 2015 at 11:42:04AM +0200, Frank Dittrich wrote: > On 05/04/2015 11:26 AM, Lei Zhang wrote: > > I tried parallelizing a inner loop in the hotspot loop of SunMD5, like this: > > > > for (round = 0; round < maxrounds; round++) { > > ... > > #pragma omp parallel for ... > > for (idx = 0; idx < count; ++idx) > > { ... } > > ... > > } > > May be I misunderstand what you are trying to do, but when implementing > OpemMP, you shouldn't try to parallelize the calculation of a single > hash for a given key. Frank is correct. However, SunMD5 is a special case, so general advice like that isn't straightforward to apply to it. In the SunMD5 format, we're already computing multiple hashes per crypt_all() call, but we can't just parallelize this with OpenMP without bringing even more parallelism to that level. The original SunMD5, as used defensively, used data-dependent branching. Normally, this would thwart SIMD. Normally, the "for each candidate" loop would be the outer one (it is a cracking-specific one), and the "for each round" loop the inner one (it is part of specification of a single instance of SunMD5, and thus is also present in defensive implementations). Jim managed to get around SunMD5's SIMD-unfriendliness, getting multiple instances of it to work well with SIMD. Jim's trick involves swapping the "for each candidate" and "for each round" loops. Now we have: for each round for each candidate do some processing end for end for These nested loops compute multiple hashes for the multiple candidates in parallel, making use of this parallelism to have enough work buffered to have nearly full SIMD vectors most of the time. To add OpenMP to this, we may introduce an extra outer loop: for each sub-group of candidates for each round for each candidate in the sub-group do some processing end for end for end for and in order to avoid reducing the parallelism in the previous inner loop, we'd need to multiply max_keys_per_crypt by at least the number of threads (times OMP_SCALE, which we'll tune; probably set it to 1 since this is a very slow hash). > May be you have a look at commit > 69deec0490ee3b793545617b72f53d23680dfe4b where magnum implemented OPenMP > for ssha512. Yes, something like this is needed, but applied to the new outer loop. Since we're already processing multiple keys per crypt_all(), no changes to things such as get_key(), etc. should be needed. Probably only init() and crypt_all() need changes. Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.