john-dev - Re: GTX TITAN (was: new dev box wishes)

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130915200808.GB21417@openwall.com>
Date: Mon, 16 Sep 2013 00:08:09 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: GTX TITAN (was: new dev box wishes)

On Sun, Sep 15, 2013 at 09:12:27AM -0700, Alain Espinosa wrote:
> On 9/14/13, Solar Designer <solar@...nwall.com> wrote:
> > I guess these results may teach us something about optimization for this
> > GPU (and other Kepler GPUs?) - four-element vectors or(/and?)
> > interleaving of independent instructions give best results.
> 
> For a GT 630 (compute capability 2.1) using a vector of 3 elements for
> NTLM hashing give a ~15-20% performance increase compared with a
> vector of 1 element. I think vectors of 2-3 elements are best because
> they reduce the number of registers providing sufficient parallelism,
> but i do not test this assertion in a 3.5 GPU.

FWIW, I read yesterday that TITAN allows for 4x more registers per
thread than other GTX 7xx GPUs do.  With CUDA, we can probably simulate
either behavior (for tuning of our code for other GTX 7xx as well) by
adjusting the target arch, but I don't know if we can do that with
NVIDIA's OpenCL too (without having an actual lower-than-TITAN GTX 7xx
card).  (And we're almost exclusively using OpenCL now, with our CUDA
code mostly abandoned - it works, but it lacks auto-tuning and we're not
optimizing it further.)

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.