|
Message-ID: <20130721002541.GA8765@openwall.com> Date: Sun, 21 Jul 2013 04:25:41 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: bcrypt Katja, On Wed, Jul 17, 2013 at 07:40:17PM +0200, Katja Malvoni wrote: > On Wed, Jul 17, 2013 at 7:17 PM, Katja Malvoni <kmalvoni@...il.com> wrote: > > I was surprised by that as well, I measured it again, 2.5 ms. I'm > > measuring clock ticks with Epiphany timers (assuming 600 MHz), I start > > timer after declarations in BF_crypt and I stop it before entering do{ ... > > }while(--count); > > I was passing pointers to shared buffer when calling BF_crypt(), I tried > > copying data from shared buffer into local variables, it's even slower - > > 809 c/s > > It comes from copying initial S box to data structure in BF_crypt() (takes > around 2.3 ms). If S box is not copied than it must be transferred before > each BF_crypt call. Initial S box is transferred when loading *.srec file. > I think this is cheaper than transferring it from host to Epiphany per > every computed hash. Yes, I also think that copying within local memory is faster. However, we may want to optimize the memcpy(). The existing implementation is too generic - it doesn't use the 64-bit dual-register load/store instructions, it has little unrolling, and it include support for sizes that are not a multiple of 4. (This is from a quick glance at "e-objdump -d parallella_e_bcrypt.elf".) Can you try creating a simpler specialized implementation instead, which would use the ldrd/strd insns and greater unrolling (e.g., 32 ldrd/strd pairs times 16 loop iterations, for a total of 512 x 64-bit data copies, or 4096 bytes total)? Also, re-order the instructions such that the store is not attempted immediately after its corresponding load, to hide its latency - e.g.: mov r8,16 start: ldrd r0,[r6] ldrd r2,[r6,+1] ldrd r4,[r6,+2] strd r0,[r7] strd r2,[r7,+1] strd r4,[r7,+2] [... repeat 9 more times using cpp macros, supplying different offsets ...] ldrd r0,[r6,+30] ldrd r2,[r6,+31] strd r0,[r7,+30] strd r2,[r7,+31] add r6,r6,0x100 add r7,r7,0x100 sub r8,r8,1 bne start (totally untested!) Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.