|
Message-ID: <4EF53A23.3000609@hushmail.com> Date: Sat, 24 Dec 2011 03:34:11 +0100 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: MD5 intrinsics compile-time condition On 12/23/2011 04:49 PM, Solar Designer wrote: > Apparently, the condition that enables the use of intrinsics is not the > same for md5 vs. dynamic_27 and 28, and apparently it is non-optimal for > md5 for certain gcc version(s) (I guess Apple's gcc 4.2). You introduced it, on purpose :) It started here: http://www.openwall.com/lists/john-dev/2011/06/08/13 ...then it was tweaked over time (search list for MD5_in_sse_intrinsics) and today it looks like this: #if !defined(MD5_in_sse_intrinsics) && defined(__GNUC__) && \ (__GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 4)) && \ !defined(USING_ICC_S_FILE) #undef MD5_SSE_PARA #endif I can't find any note of why/when it was changed from 4.0 to 4.4 but j5c4 had 4.4. Anyways, I *guess* we can drop that whole test, and do someting like this in the arch.h's: #elif defined(__GNUC__) && (__GNUC__ == 4 && __GNUC_MINOR__ == 5) #define MD5_SSE_PARA 2 #define MD5_N_STR "8x" -#elif defined(__GNUC__) +#elif defined(__GNUC__) && (__GNUC__ >= 4 || (__GNUC_MINOR__ == 4 && __GNUC_MINOR__ > 5)) #define MD5_SSE_PARA 3 #define MD5_N_STR "12x" +#elif defined(__GNUC__) +#define MD5_SSE_PARA 1 +#define MD5_N_STR "4x" #else #define MD5_SSE_PARA 3 #define MD5_N_STR "12x" The current code picks PARA 3 (12x) for any gcc other than 4.5. I recently tweaked those tests after empirical tests with 4.4, 4.5 and 4.6 (and clang and icc) - the versions that were available in my Ubuntu repo at the time. I suppose PARA 1 (4x) would be the safe choice for any untested version and it should always be faster than disabling SSE. I can do this change, but I will probably not find time to actually test it on ancient compilers. If someone else can produce test results for para 1, 2 and 3 for versions of gcc older than 4.4 and running on intel, we can put additional clauses for them instead. Otherwise this change may be detrimental for other intrinsics formats with some versions of gcc. The optimal para's for MD4 and SHA1 should ideally also be tested. Also, all tests should be separate for 32-bit and 64-bit... Like I said in http://www.openwall.com/lists/john-dev/2011/12/11/4 the optimal solution would be build-time checking. Here are some test results that illustrates how important the PARA setting is (each figure is geometrical mean for 10 runs iirc): == icc_64_Q9550_md5 == PARA 3: 32058 real, 31994 virtual PARA 4: 29443 real, 29443 virtual PARA 2: 27142 real, 27142 virtual PARA 5: 25399 real, 25399 virtual PARA 1: 18265 real, 18265 virtual PARA 6: 7013 real, 6985 virtual PARA 7: 6231 real, 6231 virtual PARA 8: 6043 real, 6031 virtual == gcc-4.6_64_Q9550_md5 == PARA 4: 27445 real, 27554 virtual PARA 3: 26783 real, 26836 virtual PARA 2: 26080 real, 26080 virtual PARA 1: 17294 real, 17363 virtual PARA 5: 14320 real, 14291 virtual PARA 6: 5877 real, 5889 virtual PARA 7: 5262 real, 5273 virtual PARA 8: 5031 real, 5031 virtual == gcc-4.5_64_Q9550_md5 == PARA 2: 18528 real, 18528 virtual PARA 3: 16480 real, 16513 virtual PARA 4: 13638 real, 13638 virtual PARA 1: 13273 real, 13300 virtual PARA 6: 4416 real, 4389 virtual PARA 5: 4308 real, 4317 virtual PARA 8: 4063 real, 4087 virtual PARA 7: 3910 real, 3902 virtual == gcc-4.4_64_Q9550_md5 == PARA 3: 24996 real, 25147 virtual PARA 2: 19603 real, 19642 virtual PARA 4: 18221 real, 18221 virtual PARA 1: 17014 real, 17048 virtual PARA 5: 8023 real, 8023 virtual PARA 6: 5480 real, 5458 virtual PARA 7: 5253 real, 5232 virtual PARA 8: 5067 real, 5047 virtual Worse yet, the optimal setting for intel is not optimal for AMD: == gcc-4.4_64_AMD_md5 == PARA 5: 18543 real, 18506 virtual PARA 4: 17961 real, 17961 virtual PARA 6: 17851 real, 17851 virtual PARA 7: 16945 real, 16979 virtual PARA 3: 15523 real, 15523 virtual PARA 8: 14860 real, 14890 virtual PARA 2: 13455 real, 13455 virtual PARA 1: 8779 real, 8779 virtual Using intel's best values is less detrimental for AMD, than the other way round (in general, lower value is safer than higher). Actually, the figures are depending on exact CPU model too (like Q9550 vs P8600), but to a lesser degree than intel vs AMD. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.