|
Message-Id: <DDD80B04-1A36-4020-A158-EAA783710103@gmail.com> Date: Tue, 9 Jun 2015 00:24:49 +0800 From: Lei Zhang <zhanglei.april@...il.com> To: john-dev@...ts.openwall.com Subject: Re: Interleaving of intrinsics > On Jun 6, 2015, at 7:47 PM, Solar Designer <solar@...nwall.com> wrote: > > Your use of VTune appears to be similar to use of gprof. If you use > VTune at all, I'd expect you to profile things such as cache misses and > pipeline stalls, as well as utilization of the CPU's execution units. > Things that only the CPU vendor's profiler is capable of. I played with VTune for a while and gathered some more statistics. There're so many micro-architecture metrics that it's a bit overwhelming. I picked a some metrics which VTune marked as non-optimal for some interleaving factors and showed them here: (figures prefixed with * are marked as non-optimal by VTune; x1/2/3/4 denote interleaving factors) Configurations: icc, non-OpenMP, Linux VM, --test=20 --format=pbkdf2-hmac-sha256 Filled Pipeline Slots -> Retirement x1 0.613* x2 0.658* x3 0.620* x4 0.592 (This metric represents a fraction of slots during which CPU was retiring uOps not originated from the Microcode Sequencer) Unfilled Pipeline Slots -> Back-End Bound x1 0.355* x2 0.246* x3 0.342* x4 0.338* (Identify slots where no uOps are delivered due to a lack of required resources for accepting more uOps in the back-end of pipeline) Unfilled Pipeline Slots -> Front-End Bound -> Cache Misses x1 0.004 x2 0.003 x3 0.024* x4 0.018* (A proportion of instruction fetches are missing in the instruction cache) Full reports are attached, containing more detailed metrics. But some near-zero figures might be inaccurate where VTune reports the amount of samples collected is too low. Honestly I don't have a very solid understanding of micro-architecture and could't interpret many of those metrics. Maybe you can get some hints out of them. Lei View attachment "report-x1.txt" of type "text/plain" (2123 bytes) View attachment "report-x2.txt" of type "text/plain" (2123 bytes) View attachment "report-x3.txt" of type "text/plain" (2122 bytes) View attachment "report-x4.txt" of type "text/plain" (2123 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.