|
Message-ID: <CA+TsHUDkQyJ-w288eL8o+MqoEmL7FOC8catUeVG2fq8-vXA1oA@mail.gmail.com>
Date: Sat, 11 Aug 2012 21:59:16 +0530
From: Sayantan Datta <std2048@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: bf-opencl fails self-test on CPU
On Sat, Aug 11, 2012 at 9:47 PM, Solar Designer <solar@...nwall.com> wrote:
> On Sat, Aug 11, 2012 at 01:12:16PM +0530, Sayantan Datta wrote:
> > On Sat, Aug 11, 2012 at 8:17 AM, Solar Designer <solar@...nwall.com>
> wrote:
> > > Any idea why bf-opencl fails self-test on CPU (with AMD's SDK)? Will
> it
> > > succeed with some other settings in opencl_bf_std.h maybe?
> >
> > I guess it is due to the lack of LDS on CPU. I'm not sure though but
> I'll
> > find out.
>
> Yes, please. I'd expect that if we're exhausting some resource, we'd
> get a compile-time or a runtime error rather than a self-test failure.
>
> > Also is it necessary to run the bf-opencl on CPU? We might need a
> > little modified kernel for that.
>
> Ideally, we should be able to run the exact same OpenCL code on CPU as
> well, although this would not be expected to deliver optimal
> performance. We'd do it just for more extensive testing of the code.
>
> We've already seen that e.g. uses of uninitialized array elements are
> not always detected in individual builds/tests, so doing more kinds of
> builds may be more likely to expose bugs.
>
> We also need bf-opencl working on future Intel CPUs with AVX2 (where
> this might be faster than the existing CPU/OpenMP code) and on Intel MIC
> architecture coprocessors (there's no OpenCL for those yet, but it is
> expected to become available). In this context, it is good news that
> bf-opencl works with Intel's SDK already (as per magnum's message).
>
> So currently the problem is just with AMD's SDK when the target is CPU,
> which is less relevant - yet it could help possibly find and fix a bug.
>
> Then, we know what near-optimal performance on current CPUs is - so we
> have this target for performance when optimizing your OpenCL code on CPU.
> You may, for example, try the two hashes at a time approach (BF_X2 in
> the C code) and see if it helps on CPU and/or GPU. I'd expect that it'd
> only help on CPU currently, but who knows.
>
> > Here's one out of topic question:
> > In an sse2(with omp) build the non-opencl cpu version of bf scores around
> > 5300 c/s on fx 8120 . However I found that using the same build on i5
> 2500k
> > ,the cpu version benches at around 3300 c/s. Does that mean bulldozer is
> > really that much better than SB in this test ?
>
> Yes, Bulldozer is about the most suitable CPU for this task currently
> (if we're talking stock clock rates), although faster Sandy Bridge CPUs
> are not that much slower - e.g., Core i7-2600K (at stock clocks) with
> Hyperthreading enabled does 4800 c/s. I think i5-2500K lacks
> Hyperthreading. However, I think Sandy Bridge CPUs have more
> overclocking potential, so with a maximum stable overclock I think
> i7-2600K would outperform FX-8120 at this test. My guess is that it'd
> get to 6000+ c/s vs. overclocked FX-8120's 5650 c/s. I don't know if
> e.g. FX-8170 would be any faster than overclocked FX-8120 or not - I
> suspect not.
>
> 6-core Intel CPUs are slightly faster (e.g., 10800 c/s on two E5-2630 at
> stock clocks, meaning 5400 c/s per chip), but that's a different category.
>
> With AVX2 and proper code for it (maybe OpenCL with auto-vectorization
> by Intel's SDK, maybe intrinsics, maybe assembly), this should change,
> making all of the speeds above appear low.
>
> Alexander
>
I will make a cpu optimized kernel targeting for intel cpus.
Regards,
Sayantan
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.