|
Message-ID: <20100516160311.GA4295@openwall.com> Date: Sun, 16 May 2010 20:03:11 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: OpenMP benchmarks on UltraSPARC T2 Hi, For the curious, here are benchmarks of 1.7.5-omp-2 on UltraSPARC T2 (quad-core, 8 threads per core). The system: $ uname -a SunOS host 5.10 Generic_142900-10 sun4v sparc SUNW,SPARC-Enterprise-T5120 "/usr/sbin/psrinfo -v" reports 32 "virtual processors", all of which are "online". "/usr/platform/sun4v/sbin/prtdiag -v" also reports all 32, but somehow only 28 of them are reported as "on-line". I did not look into this discrepancy. Both report the clock rate as 1165 MHz. The compiler: $ cc -V cc: Sun C 5.9 SunOS_sparc Patch 124867-14 2010/03/30 The default BF_mt of 24 (in BF_std.h) obviously would not use more than 24 threads, so I edited it to be 32. Then I built with: gmake solaris-sparc64-cc -j32 One thread (but OpenMP-capable build): $ ../run/john -te -fo=bf Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE Raw: 96.7 c/s real, 96.6 c/s virtual 4, 8, 16, and 32 threads: $ OMP_NUM_THREADS=4 ../run/john -te -fo=bf Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE Raw: 373 c/s real, 96.5 c/s virtual $ OMP_NUM_THREADS=8 ../run/john -te -fo=bf Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE Raw: 393 c/s real, 70.8 c/s virtual $ OMP_NUM_THREADS=16 ../run/john -te -fo=bf Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE Raw: 397 c/s real, 28.6 c/s virtual $ OMP_NUM_THREADS=32 ../run/john -te -fo=bf Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE Raw: 596 c/s real, 19.0 c/s virtual This scales pretty well (considering that the CPU is only quad-core with SMT, not 32-core indeed). For comparison, a non-OpenMP build does: Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE Raw: 110 c/s real, 110 c/s virtual So we're getting a 5.42x speedup by going with the OpenMP build and running 32 threads. That's not bad for a quad-core with SMT. Surprisingly, running 32 separate instances of the non-OpenMP build (started with a script at almost the same time) results in only 18.0 c/s per process, or 576 c/s total. So the efficiency, measured in this way, is 103%. Maybe the OpenMP build results in more efficient usage of the shared caches (a few mostly-read-only data structures may be shared), which more than compensates for the performance hit of the multi-threaded code (the speed reduction from 110 c/s to 96.7 c/s for a single thread). I've also tried increasing BF_mt to 96. This resulted in the following performance numbers for 32, 48, and 96 threads: $ OMP_NUM_THREADS=32 ../run/john -te -fo=bf Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE Raw: 601 c/s real, 19.0 c/s virtual $ OMP_NUM_THREADS=48 ../run/john -te -fo=bf Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE Raw: 601 c/s real, 19.0 c/s virtual $ OMP_NUM_THREADS=96 ../run/john -te -fo=bf Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE Raw: 602 c/s real, 19.0 c/s virtual That's 104.5% efficiency, and 5.47x the speed of a single thread. Overall, this is not a fast machine indeed, but it is good for OpenMP performance testing. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.