|
Message-ID: <20130915200808.GB21417@openwall.com> Date: Mon, 16 Sep 2013 00:08:09 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: GTX TITAN (was: new dev box wishes) On Sun, Sep 15, 2013 at 09:12:27AM -0700, Alain Espinosa wrote: > On 9/14/13, Solar Designer <solar@...nwall.com> wrote: > > I guess these results may teach us something about optimization for this > > GPU (and other Kepler GPUs?) - four-element vectors or(/and?) > > interleaving of independent instructions give best results. > > For a GT 630 (compute capability 2.1) using a vector of 3 elements for > NTLM hashing give a ~15-20% performance increase compared with a > vector of 1 element. I think vectors of 2-3 elements are best because > they reduce the number of registers providing sufficient parallelism, > but i do not test this assertion in a 3.5 GPU. FWIW, I read yesterday that TITAN allows for 4x more registers per thread than other GTX 7xx GPUs do. With CUDA, we can probably simulate either behavior (for tuning of our code for other GTX 7xx as well) by adjusting the target arch, but I don't know if we can do that with NVIDIA's OpenCL too (without having an actual lower-than-TITAN GTX 7xx card). (And we're almost exclusively using OpenCL now, with our CUDA code mostly abandoned - it works, but it lacks auto-tuning and we're not optimizing it further.) Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.