|
Message-ID: <20111226181556.GB24951@openwall.com> Date: Mon, 26 Dec 2011 22:15:56 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: MD5 intrinsics compile-time condition On Sat, Dec 24, 2011 at 03:34:11AM +0100, magnum wrote: > On 12/23/2011 04:49 PM, Solar Designer wrote: > >Apparently, the condition that enables the use of intrinsics is not the > >same for md5 vs. dynamic_27 and 28, and apparently it is non-optimal for > >md5 for certain gcc version(s) (I guess Apple's gcc 4.2). > > You introduced it, on purpose :) It started here: > http://www.openwall.com/lists/john-dev/2011/06/08/13 Sure. My latest "complaint" is not about having this condition, but about it apparently being inconsistent for different formats representing the same hash type. > ...then it was tweaked over time (search list for MD5_in_sse_intrinsics) > and today it looks like this: > > #if !defined(MD5_in_sse_intrinsics) && defined(__GNUC__) && \ > (__GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 4)) && \ > !defined(USING_ICC_S_FILE) > #undef MD5_SSE_PARA > #endif > > I can't find any note of why/when it was changed from 4.0 to 4.4 but > j5c4 had 4.4. I changed it to 4.4 shortly before the contest because someone on john-users reported a performance regression with gcc 4.3.x when compiling with the intrinsics. > Anyways, I *guess* we can drop that whole test, and do someting like > this in the arch.h's: > > #elif defined(__GNUC__) && (__GNUC__ == 4 && __GNUC_MINOR__ == 5) > #define MD5_SSE_PARA 2 > #define MD5_N_STR "8x" > -#elif defined(__GNUC__) > +#elif defined(__GNUC__) && (__GNUC__ >= 4 || (__GNUC_MINOR__ == 4 && > __GNUC_MINOR__ > 5)) That's a weird check. > #define MD5_SSE_PARA 3 > #define MD5_N_STR "12x" > +#elif defined(__GNUC__) > +#define MD5_SSE_PARA 1 > +#define MD5_N_STR "4x" > #else > #define MD5_SSE_PARA 3 > #define MD5_N_STR "12x" > > The current code picks PARA 3 (12x) for any gcc other than 4.5. I > recently tweaked those tests after empirical tests with 4.4, 4.5 and 4.6 > (and clang and icc) - the versions that were available in my Ubuntu repo > at the time. I suppose PARA 1 (4x) would be the safe choice for any > untested version and it should always be faster than disabling SSE. Maybe. We need to test with pre-4.0 versions of gcc as well. > I can do this change, but I will probably not find time to actually test > it on ancient compilers. If someone else can produce test results for > para 1, 2 and 3 for versions of gcc older than 4.4 and running on intel, > we can put additional clauses for them instead. Otherwise this change > may be detrimental for other intrinsics formats with some versions of > gcc. The optimal para's for MD4 and SHA1 should ideally also be tested. > Also, all tests should be separate for 32-bit and 64-bit... Yes, this is a lot of testing. > Like I said in http://www.openwall.com/lists/john-dev/2011/12/11/4 the > optimal solution would be build-time checking. We could do that, but a drawback of build-time benchmarking is that builds may be inconsistent, such as depending on system load. I'd be happier to have this tested with a range of gcc versions for both 32- and 64-bit at least on a Core 2'ish CPU, so we'd have hard-coded settings for each range of gcc versions (similar to what we have now). > Here are some test > results that illustrates how important the PARA setting is (each figure > is geometrical mean for 10 runs iirc): Thanks! Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.