john-dev - Re: bf-opencl fails self-test on CPU

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120811161744.GB1106@openwall.com>
Date: Sat, 11 Aug 2012 20:17:44 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: bf-opencl fails self-test on CPU

On Sat, Aug 11, 2012 at 01:12:16PM +0530, Sayantan Datta wrote:
> On Sat, Aug 11, 2012 at 8:17 AM, Solar Designer <solar@...nwall.com> wrote:
> > Any idea why bf-opencl fails self-test on CPU (with AMD's SDK)?  Will it
> > succeed with some other settings in opencl_bf_std.h maybe?
> 
> I guess it is due to the lack of LDS on CPU.  I'm not sure though but I'll
> find out.

Yes, please.  I'd expect that if we're exhausting some resource, we'd
get a compile-time or a runtime error rather than a self-test failure.

> Also is it necessary to run the bf-opencl on CPU? We might need a
> little modified kernel for that.

Ideally, we should be able to run the exact same OpenCL code on CPU as
well, although this would not be expected to deliver optimal
performance.  We'd do it just for more extensive testing of the code.

We've already seen that e.g. uses of uninitialized array elements are
not always detected in individual builds/tests, so doing more kinds of
builds may be more likely to expose bugs.

We also need bf-opencl working on future Intel CPUs with AVX2 (where
this might be faster than the existing CPU/OpenMP code) and on Intel MIC
architecture coprocessors (there's no OpenCL for those yet, but it is
expected to become available).  In this context, it is good news that
bf-opencl works with Intel's SDK already (as per magnum's message).

So currently the problem is just with AMD's SDK when the target is CPU,
which is less relevant - yet it could help possibly find and fix a bug.

Then, we know what near-optimal performance on current CPUs is - so we
have this target for performance when optimizing your OpenCL code on CPU.
You may, for example, try the two hashes at a time approach (BF_X2 in
the C code) and see if it helps on CPU and/or GPU.  I'd expect that it'd
only help on CPU currently, but who knows.

> Here's one out of topic question:
> In an sse2(with omp) build the non-opencl cpu version of bf scores around
> 5300 c/s on fx 8120 . However I found that using the same build on i5 2500k
> ,the cpu version benches at around 3300 c/s. Does that mean bulldozer is
> really that much better than SB in this test ?

Yes, Bulldozer is about the most suitable CPU for this task currently
(if we're talking stock clock rates), although faster Sandy Bridge CPUs
are not that much slower - e.g., Core i7-2600K (at stock clocks) with
Hyperthreading enabled does 4800 c/s.  I think i5-2500K lacks
Hyperthreading.  However, I think Sandy Bridge CPUs have more
overclocking potential, so with a maximum stable overclock I think
i7-2600K would outperform FX-8120 at this test.  My guess is that it'd
get to 6000+ c/s vs. overclocked FX-8120's 5650 c/s.  I don't know if
e.g. FX-8170 would be any faster than overclocked FX-8120 or not - I
suspect not.

6-core Intel CPUs are slightly faster (e.g., 10800 c/s on two E5-2630 at
stock clocks, meaning 5400 c/s per chip), but that's a different category.

With AVX2 and proper code for it (maybe OpenCL with auto-vectorization
by Intel's SDK, maybe intrinsics, maybe assembly), this should change,
making all of the speeds above appear low.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.