|
Message-ID: <CA+EaD-akDrK35mf+hWSi-isSf4i0P-ZG-4Zaca0i5XXwNQTZiw@mail.gmail.com>
Date: Sun, 21 Jul 2013 15:40:30 +0200
From: Katja Malvoni <kmalvoni@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: bcrypt
Hi Alexander,
Since Epiphany code became much smaller when I integrated it with JtR, I
tried using internal.ldf instead of fast.ldf and code fits in local memory.
Speed is 932 c/s (sometimes it's 934 c/s). BF_fmt port was slow because I
didn't copy salt in local memory, when I did that speed was 790 c/s. Than I
used internal.ldf and got speed of 932 c/s. If I try to interleave two
instances of bcrypt than code can't fit in local memory. At the moment,
interleaving two instances of bcrypt doesn't work, it fails self test on
get_hash[0](1). Should I pursue this approach further or not?
On Sun, Jul 21, 2013 at 2:25 AM, Solar Designer <solar@...nwall.com> wrote:
> Yes, I also think that copying within local memory is faster. However,
> we may want to optimize the memcpy(). The existing implementation is
> too generic - it doesn't use the 64-bit dual-register load/store
> instructions, it has little unrolling, and it include support for sizes
> that are not a multiple of 4. (This is from a quick glance at
> "e-objdump -d parallella_e_bcrypt.elf".) Can you try creating a simpler
> specialized implementation instead, which would use the ldrd/strd insns
> and greater unrolling (e.g., 32 ldrd/strd pairs times 16 loop iterations,
> for a total of 512 x 64-bit data copies, or 4096 bytes total)? Also,
> re-order the instructions such that the store is not attempted
> immediately after its corresponding load, to hide its latency - e.g.:
> [...]
>
Ok, I'll try this.
Katja
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.