|
Message-ID: <CA+EaD-Zi8NAYV6-1J0_rzAuSgw2yYJpiBbGUVDfzaPy=j84pvw@mail.gmail.com>
Date: Thu, 25 Jul 2013 13:26:19 +0200
From: Katja Malvoni <kmalvoni@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: bcrypt
Hi Alexander,
On Thu, Jul 25, 2013 at 3:36 AM, Solar Designer <solar@...nwall.com> wrote:
> Hi Katja,
>
> On Wed, Jul 24, 2013 at 11:42:53PM +0200, Katja Malvoni wrote:
> > I made use of dual-issue, the speed I'm getting is 976 c/s when compiling
> > Epiphany code with -O2. If I compile with -O3 I get 979 c/s.
>
> This is nice. This is for 1 instance of bcrypt per core per invocation,
> right? I mean that there's no interleaving yet.
>
That's right, only one instance.
> Can you try interleaving two instances, perhaps with C code initially?
>
Ok, I will.
>
> > Code is in https://github.com/kmalvoni/JohnTheRipper/tree/master
>
> I took a look, and surprisingly (besides the pieces of inline asm) I
> noticed something unrelated: you seem to have inconsistent BF_binary
> sizes between Epiphany and host sides. I thought you had addressed that
> already? Maybe you forgot to commit? Also, your host side code only
> checks 32 bits of the computed hash value, whereas you could check 64
> bits just as easily (so you should).
>
I had problems with my local github repo and I wasn't able to commit so I
edited files on GitHub online. That was a very bad idea... I forgot to
update host code and Makefile. I won't repeat this again and I apologize
for inconvenience.
On Thu, Jul 25, 2013 at 4:28 AM, Solar Designer <solar@...nwall.com> wrote:
I checked out, built, and tried to test this version of code. The first
hurdle was the 2 vs. 6 size BF_binary discrepancy. Because of it, the
program would just get stuck all the time. Once I fixed it in my copy
of parallella_bf_fmt.c, I am getting:
solar@...aro-ubuntu-desktop:~/
>
> 2/JohnTheRipper/run$ ./parallella_john.sh -te -form=bcrypt-parallella
> Benchmarking: bcrypt-parallella, OpenBSD Blowfish ("$2a$05", 32
> iterations) [Parallella]... DONE
> Raw: 865 c/s real, 865 c/s virtual
>
> ... which is much less than what you said it would be.
>
> So perhaps you forgot to commit multiple changes?
This is because fast.ldf is used in Makefile instead of internal.ldf. Now
everything should work.
On Thu, Jul 25, 2013 at 4:02 AM, Solar Designer <solar@...nwall.com> wrote:
> The code itself mostly looks good to me (including your delayed use of
> results from IMADD and IADD). Shouldn't you re-order these two, though? -
>
> | "eor %0, %0, r27\n" \
> | "eor r23, r22, r23\n" \
>
> because r22 is loaded sooner than r27? Well, maybe this makes no
> difference on the current chip, but it might if load's latency is
> increased in a future revision of Epiphany.
>
If I reorder them than there is no 4 cycles separation between iadd r23,
r24, r23 and eor r23, r22, r23 and that's required for dual-issue. In that
case, speed is 924 c/s.
> Now, here's an issue/bug in the above: you rely on registers being
> preserved across multiple pieces of inline asm, but gcc does not
> guarantee you that. Also, you don't declare which registers you
> clobber. To fix this, your BF_ROUND should not be the entire __asm__
> block, but rather just a portion of the string you put inside such
> block. The asm block itself, with proper confession on what registers
> you clobber, should be in the BF_encrypt function.
>
When I did that, e-gcc unnecessary used one more register to store L and
register being used changed for every BF_ROUND. And than there were 16
unnecessary mov instructions. So I removed clobbered registers list. I
added them back now, speed drops from 976 c/s to 970 c/s.
On Thu, Jul 25, 2013 at 7:18 AM, Solar Designer <solar@...nwall.com> wrote:
> On Thu, Jul 25, 2013 at 06:02:52AM +0400, Solar Designer wrote:
> > | "ldr r27, [r45], 0x1\n" \
>
> I guess this is read from the P-box. You should be able to use ldrd
> here, and thus only have this instruction in every other round (a total
> of 9 instructions to read the 18 elements). Don't forget that ldrd
> needs an even-numbered first register.
>
This instruction ensures 4 cycles separation between IADD r23, r24, r23 and
EOR r23, r22, r23, if I remove it, I'll lose dual-issue in one round. But
I'll try to reorder instructions so that dual-issue stays.
Katja
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.