|
Message-ID: <20130915020632.GA18666@openwall.com> Date: Sun, 15 Sep 2013 06:06:32 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: GTX TITAN (was: new dev box wishes) On Wed, Jun 26, 2013 at 02:56:44PM +0400, Solar Designer wrote: > http://www.xbitlabs.com/articles/graphics/display/zotac-geforce-gtx-titan-amp-edition-6gb-sli.html#sect0 > > Stock clocks: > 837 MHz base, 876 MHz boost, 6008 MHz memory > Vendor o/c: > 902 MHz base, 954 MHz boost, 6608 MHz memory I just ran the FlopsCL_src_linux.zip benchmark from here http://olab.is.s.u-tokyo.ac.jp/~kamil.rocki/projects.html on our TITAN. It is in fact faster than stock TITAN, exceeding its peak GFLOPS at single-precision (should be 4500 GFLOPS for stock, we get 5000+ peak). However, as I was afraid, its double-precision performance is currently locked, which per some forum comments is unlock-able to full via nvidia-settings (need to add this GPU to xorg.conf first for that). Unfortunately, as expected, "nvidia-smi --gom=..." refused to work on this GPU (most nvidia-smi features work on TESLA cards only). [float ] Time: 0.085911s, 3199.58 GFLOP/s [float2 ] Time: 0.156523s, 3512.29 GFLOP/s [float4 ] Time: 0.219421s, 5010.96 GFLOP/s [float8 ] Time: 0.472510s, 4653.92 GFLOP/s [float16 ] Time: 0.885512s, 4966.67 GFLOP/s [double ] Time: 1.176065s, 233.73 GFLOP/s [double2 ] Time: 2.352377s, 233.70 GFLOP/s [double4 ] Time: 4.700395s, 233.92 GFLOP/s [double8 ] Time: 9.401957s, 233.89 GFLOP/s ERROR: clEnqueueNDRangeKernel failed, cl_out_of_resources [double16] Time: 0.016776s, 262160.00 GFLOP/s I don't know why double16 fails. FlopsCUDA_src_linux.zip, after teaching it about compute capability 3.5 corresponding to 192 SPs/MP (added one line to the table), gives: ----- Standard benchmark, sequential instructions are dependent ------------- [Device 0, GeForce GTX TITAN] Time: 0.042523 (s), Total FLOPs : 134217728000 [Device 0, GeForce GTX TITAN] Peak GFLOP/s: 5128.70, Actual GFLOP/s: 3156.4, 61.543% efficiency ----- Instruction-level parallelism (ILP): multiple independent instructions (i.e. used by Kepler's warp scheduler) ---- [Device 0, GeForce GTX TITAN] (ILP) Time: 0.122247 (s), Total FLOPs : 536870912000 [Device 0, GeForce GTX TITAN] (ILP) Peak GFLOP/s: 5128.70, Actual GFLOP/s: 4391.7, 85.630% efficiency The "Peak GFLOP/s" it calculates from querying the device for MP count, compute capability, and max boost clock rate (it gets 954 MHz here). I guess these results may teach us something about optimization for this GPU (and other Kepler GPUs?) - four-element vectors or(/and?) interleaving of independent instructions give best results. As to double-precision performance, indeed it does not matter for JtR (at least currently), yet it may be relevant if we let other projects use our dev boxes as well. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.