|
Message-ID: <20151019135102.GA14926@openwall.com> Date: Mon, 19 Oct 2015 16:51:02 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: SMT (was: SHA-1 H()) Lei, I just came across a recently posted article on this very topic: performance scaling with POWER8's SMT (albeit in context of the different reporting on AIX vs. Linux): http://www.ibm.com/developerworks/library/l-processor-utilization-difference-aix-lop-trs/index.html "Simultaneous multithreading (SMT) performance characterization shown in Figure 6 is taken from the IBM POWER8 specification. This figure shows that SMT8 provides 2.2 times better performance compared to single threaded on POWER8." The article also mentions that "a single-threaded application" run "on an IBM POWER7 SMT4 system" "shows the core utilization as approximately 63% to 65%". So the expected speedup when going from 1 thread/core to 8 threads/core on POWER8 is 2.2 times, and the expected speedup when going from 1 thread/core to 4 threads/core on POWER7 is 1.5 to 1.6 times. Of course, actual speedup will vary by application. Alexander P.S. I don't normally top-post, but it's one of those rare cases where I find this appropriate - needing to quote a lot of context, yet not wanting to keep it above the new content. So here goes: On Sat, Sep 12, 2015 at 12:57:45PM +0300, Solar Designer wrote: > On Sat, Sep 12, 2015 at 04:53:42PM +0800, Lei Zhang wrote: > > On my laptop, where each core supports 2 hardware threads, running 2 threads gets a 2x speedup compared to 1 thread on the same core. > > This happens, but it's not very common. Usually, speedup from running 2 > threads/core is much less than 2x. > > > OTOH, each Power8 core supports up to 8 hardware threads, so I'd expect a higher speedup than just 2x. > > SMT isn't only a way to increase resource utilization of a core when > running many threads. It's also a way to achieve lower latency due to > fewer context switches in server workloads (with lots of concurrent > requests) and to allow CPU designers to use higher instruction latencies > and achieve higher clock rate. (Note that my two uses of the word > latency in the previous sentence refer to totally different latencies: > server response latency on the order of milliseconds may be improved, > but instruction latency on the order of nanoseconds may be harmed at the > same time.) Our workload uses relatively low latency instructions: > integer only, and with nearly 100% L1 cache hit rate. Some other > workloads like multiplication of large matrices (exceeding L1 data > cache) might benefit from more hardware threads per core (or explicit > interleaving, but that's uncommon in scientific workloads except through > OpenCL and such), and that's also a reason for Power CPU designers to > support and possibly optimize for more hardware threads per core. > > Finally, SMT provides middle ground between increasing the number of > ISA-visible CPU registers (which is limited by instruction size and the > number of register operands you can encode per instruction, as well as > by the need to maintain compatibility) and increasing the number of > rename registers. With SMT, there are sort of more ISA-visible CPU > registers: total across the many hardware threads. Those registers are > as good as ISA-visible ones for the purpose of replacing the need to > interleave instructions within 1 thread, yet they don't bump into > instruction size limitations. > > I expect that on a CPU with more than 2 hardware threads the speed > growth with the increase of threads/core in use is spread over the 1 to > max threads range. So e.g. the speedup at only 2 threads on an 8 > hardware threads CPU may very well be less than the speedup at 2 threads > on a 2 hardware threads CPU. I don't necessarily expect that the > speedup achieved at max threads is much or any greater than that > achieved at 2 threads on a CPU where 2 is the max. There's potential > for it to be greater (in the sense that the thread count doesn't limit > it to at most 2), but it might or might not be greater in practice. > > Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.