Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120620165956.GA21653@openwall.com>
Date: Wed, 20 Jun 2012 20:59:56 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: bf-opencl

On Wed, Jun 20, 2012 at 05:44:50PM +0530, SAYANTAN DATTA wrote:
> But as far as I know IL codes are independent of ASICs. So it shouldn't
> matter whether its GCN or not.However ISA is ASIC dependent. So, did you
> mean ISA instead of IL?

I meant programming in IL or at least reading OpenCL-generated IL, but
having the specific ISA and device in mind.  And ideally we need to be
skimming over and be able to understand the generated GCN code.

For example, for bcrypt, we need to know what execution units are
actually in use (e.g., are the scalar units that are normally used for
control in use for actual computation here?), whether scatter/gather
addressing from the SIMD units is in use or not, what memory types and
regions are actually in use.  With pure OpenCL, we're kind of blind.

IL is mostly but not fully independent from the underlying ISA; some IL
instructions are documented to be specific to some GPU model ranges.

It's akin to use of intrinsics and OpenMP in C sources: the exact
instructions that are generated may vary (e.g., the same intrinsic may
produce SSE2 or AVX depending on compiler settings), we don't do
register allocation, and some intrinsics are specific to some CPU model
ranges.  Yet we happen to have enough control to achieve decent speed
when we review and benchmark the generated code and adjust our source.

Continuing this analogy, OpenCL is akin to C sources without intrinsics
and OpenMP, but with enabled auto-vectorization and auto-parallelization.
I had poor luck achieving decent performance in this way.  Of course,
OpenCL is more suitable for this than C, so better results are achieved,
yet specifying things more explicitly at the lower level may help -
especially when implementing things that don't fit the device perfectly
(e.g., OpenCL is fine for perfect match things like MD5, but we may need
something more explicit for poor matches like bcrypt on GPU).

> I think two or more john builds collided on the same card resulting in
> crash after 15 min.

No, I did not run a second instance of John, and no one else logged in.

> Otherwise the implementation is perfectly stable at
> stocks(I tested using pw-fake-unix hashes for 28 minutes).  And yes the
> card is unstable after 1225Mhz core.

OK.  So we did not trigger the same problem again yet.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.