john-dev - Re: [GSoC] John the Ripper support for PHC finalists

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150406133654.GA12722@openwall.com>
Date: Mon, 6 Apr 2015 16:36:54 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: [GSoC] John the Ripper support for PHC finalists

Hi Agnieszka,

Your e-mail quoting is still really weird.  It's mostly correct, but
just weird.  It looks like you're doing it manually.  Is this so?
Normally, a mail client (MUA) you'd use would take care of the (initial)
quoting for you (and then you just delete portions that you don't need
quoted in a particular reply).  For example, Mutt does it for me.

On Mon, Apr 06, 2015 at 02:40:45PM +0200, Agnieszka Bielec wrote:
> I'm including tests for __global memory and __private
> 
> I've added some printfs to know how many memory is used
> 
> http://pastebin.com/Rqe5yKsH

Please avoid using pastebin in your mailing list postings.  In this
case, the tests output is small enough that you could attach it as a
text file to your e-mail instead.

> I'm wondering why on --dev=2 opencl using
> global memory was fast, ~ 150k

That's because --dev=2 (and =3) is the CPUs.  There's no easy way (nor
do we want it, most of the time) for OpenCL driver to bypass use of CPU
caches.  So when you run an OpenCL kernel on CPUs, there is not supposed
to be a (significant, if any at all) speed difference between local and
global memory (it's the same memory subsystem anyway, consisting of
caches and RAM).  Usually, this results in the same CPU instructions,
possibly with (unimportant) differences in specific memory addresses
(but the addresses are virtual anyway, and the memory is cacheable
anyway), in (non-)use of prefetch instructions, and in instruction
scheduling (in case the compiler has different expectations for
latencies depending on whether you specified something as being local or
global).  Chances are that those differences have small or negligible
effect on performance (and it is unclear in which direction).

Similarly, when you access global memory on GPUs, some limited in-GPU
caching may nevertheless go on.  It's just that on GPUs those caches are
separate from local memory, and GPUs' local memory may only be addressed
explicitly (so you need to explicitly use it from your OpenCL kernels),
whereas CPUs don't actually have (explicitly addressable) local memory
(they only have caches).

I expect that others (magnum, Frank?) will reply to the rest of your
message.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.