john-dev - Re: Parallella: bcrypt

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFYn=yCdux8DEDDtT+hTO0wDZrR0wSA2VBRegOxSc9M7=gZPJA@mail.gmail.com>
Date: Sat, 13 Jul 2013 14:47:59 -0400
From: Yaniv Sapir <yaniv@...pteva.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: bcrypt

Katja,

For/While-loops are translated to B<cond> instructions. This means that for
every iteration (but the last one) you'll pay that penalty. This is why you
benefit form loop unrolling. However, the Epiphany has an undocumented
(yet) feature for hardware loops. This was described in a forum post from a
couple of weeks ago. Under certain limitations for those loops, the penalty
is eliminated. If you can find that post, you can implement that method (in
assembly, for now).


On Sat, Jul 13, 2013 at 2:41 PM, Katja Malvoni <kmalvoni@...il.com> wrote:

> Hi Yaniv, Alexander,
>
> My assembly code doesn't produce correct results but it has same
> execution time (20.7 ms for all cores and cost 5) as the one that produces
> correct result and it uses add instruction. When using fadd/iadd execution
> time is 22.6 ms.
> Except that, I don't see any other way to optimize BF_ROUND - C code is
> very optimal, almost every line corresponds to one instruction (two that
> don't are "tmp3 += BF_INDEX(ctx->s.S[0], tmp4);" and "R ^= ctx->s.P[N +
> 1];").
> In case of higher cost, my code is much slower (2403.999 ms vs 2182.041 msfor cost 12 and 9603.889 ms vs 8716.686
> ms for cost 14) - I didn't take care about pipeline structure and hazards
> and I don't think I'll be able to change order of instructions better than
> compiler did it.
>
> Yaniv, Epiphany Architecture Reference, p.69: "The branch prediction
> mechanism used by the CPU assumes that the branch was not taken. There is
> no penalty for branches not taken. For branches that are taken, there is a
> three-cycle constant penalty." - is this true for loops as well? If yes, in
> case of BF_encrypt this means penalty of (3 cycles * 1042) * 2^cost. Can
> that be avoided somehow?
>
> On Thu, Jun 27, 2013 at 5:08 PM, Solar Designer <solar@...nwall.com>wrote:
>
>> [...]
>>
>> We may try two things:
>>
>> 1. Interleave two instances of bcrypt.
>>
>> and/or
>>
>> 2. Rewrite this function in assembly.  The compiler-generated code does
>> look suboptimal.
>> [...]
>>
>
> I'll switch to other approach - interleaving two instances of bcrypt and
> I'll try to use Dual-Issue Scheduling Rules.
>
> Best regards,
>
> Katja
>



-- 
===========================================================
Yaniv Sapir
Adapteva Inc.
1666 Massachusetts Ave, Suite 14
Lexington, MA 02420
Phone: (781)-328-0513 (x104)
Email: yaniv@...pteva.com
Web: www.adapteva.com
============================================================
CONFIDENTIALITY NOTICE: This e-mail may contain information
that is confidential and proprietary to Adapteva, and Adapteva hereby
designates the information in this e-mail as confidential. The information
is
 intended only for the use of the individual or entity named above. If you
are
not the intended recipient, you are hereby notified that any disclosure,
copying,
distribution or use of any of the information contained in this
transmission is
strictly prohibited and that you should immediately destroy this e-mail and
its
contents and notify Adapteva.
==============================================================

Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.