john-dev - bf-opencl vectorization (was: bf-opencl fails self-test on CPU)

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20121018033243.GA15488@openwall.com>
Date: Thu, 18 Oct 2012 07:32:43 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: bf-opencl vectorization (was: bf-opencl fails self-test on CPU)

Sayantan -

On Tue, Oct 16, 2012 at 11:31:15PM +0530, Sayantan Datta wrote:
> On Mon, Aug 13, 2012 at 9:41 AM, Solar Designer <solar@...nwall.com> wrote:
> 
> > BTW, is there any way to target future Intel CPUs (those with AVX2)
> > with Intel's OpenCL SDK and see if this kernel would be vectorized then?
> > Of course, we won't be able to run it yet, except maybe on their SDE.
> 
> I was looking for opencl cpu optimizations targeting  sse but couldn't get
> a proper answer. So should I try vectorizing the bf kernel for cpu? If I'm
> targeting sse  then what should be the vector length?

You mean if you're targeting the future AVX2, right?  There's no gather
addressing in CPUs before AVX2 becomes available next year (or on Intel
SDE now).  I'd expect that uint4 or uint8 would be right.  We'll be
limited by 32 KB L1 data cache on the first CPUs to support AVX2, so
this means at most 8 bcrypt instances per CPU core.  AVX2 vectors are
256-bit, which gives us 8 too.  I don't know if two uint4's interleaved
(by the compiler) or one uint8 would be more efficient.  Also,
auto-vectorization of your existing bf-opencl code might just work (when
targeting AVX2).

Anyhow, even if you implement and try this now, you'd at best be able to
take a look at the generated code and test that it works in SDE (but is
very slow, as expected for emulation).  You will have little idea of how
fast it'd run on the actual CPUs.  So this is not worth a lot of your
time now.  I was merely curious whether the code, as-is, would be
getting auto-vectorized for AVX2 or not - in other words, whether we
have something to try as soon as suitable CPUs become available or not.

As to trying out explicit vectorization in bf-opencl on GPU, you may.
I doubt that we'll see much or any performance increase from this, but
feel free to try.

Thanks,

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.