Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120705050657.GB16484@openwall.com>
Date: Thu, 5 Jul 2012 09:06:57 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Sayantan:Weekly Report #11

On Thu, Jul 05, 2012 at 08:57:01AM +0530, SAYANTAN DATTA wrote:
> Achievements:
> 1. Successfully compiled test openCL programs from modified il.

Can you demonstrate this to us so that we better understand what exactly
you're doing and what it looks like?  Maybe create a wiki page like:

http://openwall.info/wiki/john/development/AMD-IL

(or suggest a more suitable name).  Include the tiny(?) test programs in
there, command-lines used to compile, sample output.  The style of that
wiki page can be similar to:

http://openwall.info/wiki/internal/gcc-local-build

> Integration
> of the binary generator into JtR is possible but I first need your
> permission as the generator requires libelf-devel as additional dependency.

Is there already a need to integrate the support for this into JtR?
In other words, what would your next steps be after such integration?
Do you know what specific IL-level changes you'd try?

> Priorities:
> 1. More focus on optimizing the opencl kernels.

OK.  Can you try to answer these questions? -

Which execution units and how many per CU does your bf_kernel.cl (as
released in 1.7.9-jumbo-6) use on 7970?

Does it use scatter/gather addressing?  Or only gather?  Or neither?

How much LDS does it actually use?  Does it use any other memory type(s)
within the inner loop?

Before you started with bf_kernel.cl, I estimated that we might be able
to use up to 1/4th of 7970's computing resources yet fit in LDS.  Is
this what your code is trying to use or do you limit it to less, and why?

I recall that you wrote somewhere that you were only able to use 32 KB
of LDS per CU, not the full 64 KB.  (However, I can't find this now.)
Do I recall correctly, and if so why is that?

Besides all of the above, your idea to use local and global memory at
once is a good one.  If we don't achieve much with other optimizations,
perhaps we'll achieve something like 7000 c/s on 7970 by adding some use
of global memory in parallel with the local memory uses currently in
your kernel.

Thanks,

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.