Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20120514025144.GA7952@openwall.com>
Date: Mon, 14 May 2012 06:51:44 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Blowfish (bcrypt) on CPU vs. GPU (FX-8120 vs. HD 7970)

On Mon, May 14, 2012 at 05:25:02AM +0400, Solar Designer wrote:
> I tried to estimate possible speeds for bcrypt (Eksblowfish) on GCN (HD
> 7970), using the known speeds on Bulldozer (FX-8120) as reference (to
> verify my math, as well as to see the possible speedup over CPU, if any).

To make it clear: Blowfish encryption itself is implementable on GPUs
efficiently.  That is, multiple data streams may be encrypted or
decrypted in parallel on a GPU efficiently, but only with the same key.
This is described here:

http://researchweb.iiit.ac.in/~rishabh_m/gpu_crypto.pdf

It's Blowfish key setup that is far more difficult, because we have to
maintain separate S-boxes per key.  This is precisely what we need for
cracking of bcrypt hashes.

> We can issue up to 5 instructions per cycle per CU - apparently, this
> maximum is reached with 1 scalar and 4 SIMD instructions.  With four
> 16-lane SIMD units, we'd normally have 64 work-items per wavefront, and
> we'd have at least 10 waves/SIMD, 40 waves/CU, 2560 work-items per CU
> (as per slide 11).  However, the 64 KB of LDS only lets us keep up to
> 16 sets of Blowfish S-boxes in it (we need 4 KB per set).  So maybe we
> can/should only run 16 work-items per CU, thus making use of only 1/4 of
> total available lanes and incurring stalls on data dependencies after
> high-latency instructions.

Relevant question/answers:

http://devgurus.amd.com/thread/159171

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.