|
Message-ID: <e1e6d5931a87053cc054bf726262cb88@smtp.hushmail.com> Date: Fri, 05 Jun 2015 19:01:17 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: Interleaving of intrinsics On 2015-06-05 17:07, Lei Zhang wrote: > Hi, > > I haven't got useful info from viewing the assembly yet. But I tried > to collect some statistics using VTune. > > Running PBKDF2-HMAC-SHA256 with various interleaving factors, > OpenMP-disabled, on a Linux VM (Ivy Bridge): > > [x1] > Function CPU Time > __memcpy_sse2_unaligned 0.094s > memcpy 0.080s > cfg_get_section 0.060s > pbkdf2_sha256_sse 0.036s > _mm_xor_si128 0.020s > [Others] 1.140s > > [x2] > Function CPU Time > SSESHA256body 0.276s > cfg_get_section 0.042s > _mm_add_epi32 0.028s > pbkdf2_sha256_sse 0.028s > _mm_add_epi32 0.024s > [Others] 1.452s >(...) You should probably do much longer runs (eg --test=15 or more) to get things like cfg_get_section completely out of the way. > '__memcpy_sse2_unaligned' might imply some overhead incurred from > unaligned memcpy, which is irrelevant to this topic though. If this is still seen on longer runs, we should look into it. Maybe we should try callgrind with very long test runs (--test=60 or much more) and see if it can sample enough for kcachegrind to show some info within the hash functions. This might even help see relations between the source and the resulting assembler. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.