john-dev - Re: Parallella: bcrypt

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130526184250.GA22875@openwall.com>
Date: Sun, 26 May 2013 22:42:50 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: bcrypt

Hi Katja,

On Sun, May 26, 2013 at 08:31:15PM +0200, Katja Malvoni wrote:
> I tested the performance of bcrypt on one Epiphany core.

Thank you!

> These are the results:
> 
> SIZE-OPTIMIZED bcrypt implementation compiled with -O1
>     Message from eCore 0x88a ( 2, 2): Result:
> "$2a$05$CCCCCCCCCCCCCCCCCCCCC.E5YPO9kmyuRGyh0XouQYb4YMJKvyOeW"#
>     Execution time - Epiphany: 50.024000 ms
[...]
> ORIGINAL bcrypt implementation compiled with -O1
>     Message from eCore 0x88a ( 2, 2): Result:
> "$2a$05$CCCCCCCCCCCCCCCCCCCCC.E5YPO9kmyuRGyh0XouQYb4YMJKvyOeW"#
>     Execution time - Epiphany: 47.794000 ms

These are roughly 5 times slower than they're "supposed" to be.  50 ms
means 20 c/s, the speed JtR achieved at bcrypt on Pentium 120 MHz when I
first implemented and optimized the assembly code for the Pentium.  Each
Epiphany core is similar to the original Pentium in terms of its
processing power per-MHz (also dual-issue, and the original Pentium
needed the loads to be done with separate instructions in a RISC-like
fashion for optimal performance).  However, the clock rate on the
Epiphany prototypes we're using is 600 MHz, which is 5 times higher.
So with optimal code, we should expect to get 10 ms and 100 c/s.

This may require assembly programming, especially given that e-gcc
generated code probably keeps the FPU in floating-point mode, so we're
effectively using the cores as single-issue.  (This problem did not
exist in the original Pentium since it had two integer ALUs separate
from the FPU... but this sort of design would lower the efficiency of
Epiphany cores.)

For now, can you try -O2 instead of -O1?

> ORIGINAL bcrypt implementation using legacy.ldf
>     Message from eCore 0x88a ( 2, 2): Result:
> "$2a$05$CCCCCCCCCCCCCCCCCCCCC.E5YPO9kmyuRGyh0XouQYb4YMJKvyOeW"#
>     Execution time - Epiphany: 40921.396000 ms

Ouch. :-)  Is this with both code and data in external RAM?

> I also ran the runtime self-test and it returned correct result.

Sounds good.  Obviously, we need to exclude this self-test for JtR
integration.  JtR performs its own self-test.

So your next steps may be:

1. Try -O2 and report the speed numbers in here.

2. Use all Epiphany cores, not just 1.

3. Integrate with JtR.

Steps 2 and 3 may be combined.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.