john-dev - Re: PHC: Parallel in OpenCL

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABob6ipRyB433=g6eR2spVib+KvkQeou=Npe6-g-A6vswfmM4w@mail.gmail.com>
Date: Tue, 2 Jun 2015 05:37:30 +0200
From: Lukas Odzioba <lukas.odzioba@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Parallel in OpenCL

2015-05-31 13:39 GMT+02:00 Agnieszka Bielec <bielecagnieszka8@...il.com>:
> I was having problems with all cards after removing "add 0"
> instructions. (sometimes we need to call function with this
> optimization and sometimes normal function)
> I unrolled loops manually to do this and then the size of code
> increased and results were worse.

How exactly worse?

> I created 4 split kernels and I am getting better speed on my laptop
> and on --dev=5. but I have still a problem with AMD GCN which has less
> code cache size - 32KB. also instructions for gcn can take more size.
> and the speed in my laptop is strangely fast

I just compared code on two branches and I don't think that what you
did it is the proper way of doing split kernel...
I guess it should be clear to see using profiler.

> GCN without "add 0" optimization
> Device 1: Tahiti [AMD Radeon HD 7900 Series]
> Many salts:     45093 c/s real, 4915K c/s virtual

> GCN with unrolling one loop
> Many salts:     27536 c/s real, 3276K c/s virtual

The result you are giving here is for add 0 optimization and 4
kernels, and I guess the latter is the problem here.
This really confused me until I noticed a real big difference between
two branches - not just unrolling one loop.
Please be more specific in the future, otherwise we will be wasting time.

Just by moving code around I was able to do 32k out out this and most
of that should be applicable to the first result you mentioned.
You should clean up your code before moving forward, just count the
amount of tables you are dealing with, there is no need to copy data
back and forth from one to another. Also please review included header
files, for example there is no need to define SWAP_ENDIANNES_64 where
we already have likely faster SWAP64.

Lukas

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.