|
Message-ID: <CAFYn=yDBtbXr7dd6njOd=b1iWec3DHejwo+YArufuP=zU_potA@mail.gmail.com>
Date: Thu, 27 Jun 2013 11:07:34 -0400
From: Yaniv Sapir <yaniv@...pteva.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: bcrypt
Katja
Can you please post the following:
1. C source used to generate this assembly,
2. The compilation command,
3. The caller - what parameters you used in the function call.
The code itself looks OK on the surface - no immediate problems that I can
identify from a glance, but I actually need to know how it was generated.
Using ADD or IADD by itself should not make a huge difference, but IADD may
leave space for some optimization.
Thanks,
Yaniv.
On Thu, Jun 27, 2013 at 10:54 AM, Katja Malvoni <kmalvoni@...il.com> wrote:
> Hi Alexander,
>
> On Thu, Jun 27, 2013 at 4:40 PM, Solar Designer <solar@...nwall.com>wrote:
>
>> Katja,
>>
>> On Mon, Jun 24, 2013 at 04:54:45PM +0200, Katja Malvoni wrote:
>> > On Tue, May 28, 2013 at 1:58 AM, Solar Designer <solar@...nwall.com>
>> wrote:
>> > > On Sun, May 26, 2013 at 07:37:55PM -0400, Yaniv Sapir wrote:
>> > > > -mfp-mode=int # this sets the FPU mode to integer. However,
>> please
>> > > > make sure that the generated code does not re-program the CONFIG
>> register
>> > > > before every integer operation
>> > >
>> > > Let's definitely try this. I was afraid we'd have to resort to
>> assembly
>> > > code to use the FPU in integer mode - it's great news to me that we
>> seem
>> > > not to have to.
>> >
>> > Unfortunately, this doesn't help a lot... Execution speed with -02 is
>> > 45.969000 ms and with -mfp-mode=int is 45.951000 ms. I checked generated
>> > assembly code it seems that CONFIG register isn't re-programmed before
>> > every integer operation.
>>
>> ... but are there uses of the IADD instruction (the one implemented on
>> the FPU) at all, or only plain ADD (the one implemented on IALU)?
>>
>
> In whole disassembly only ADD is used.
>
>
>>
>> Can you show us a piece of disassembly - e.g., for one Blowfish round?
>>
>>
> Here it is:
> 00000234 <_BF_encrypt>:
> 234: d54c 4400 ldr r22,[sp,+0x2]
> 238: a01b 4009 add r21,r0,72
> 23c: 1feb 4002 mov r16,0xff
> 240: 20ef 4002 mov r17,r0
> 244: 854c 2a00 ldr r12,[r17],+0x2
> 248: 860f 208a eor r12,r1,r12
> 24c: 920f 4406 lsr r20,r12,0x10
> 250: 510f 4406 lsr r18,r12,0x8
> 254: 330f 0406 lsr r1,r12,0x18
> 258: 905f 490a and r20,r20,r16
> 25c: 485f 490a and r18,r18,r16
> 260: 911b 4822 add r20,r20,274
> 264: 251b 0002 add r1,r1,18
> 268: 705f 450a and r19,r12,r16
> 26c: 905f 4806 lsl r20,r20,0x2
> 270: 2456 lsl r1,r1,0x2
> 272: 491b 4842 add r18,r18,530
> 276: 485f 4806 lsl r18,r18,0x2
> 27a: 6d1b 4862 add r19,r19,786
> 27e: 8249 4100 ldr r20,[r0,+r20]
> 282: 20c1 ldr r1,[r0,r1]
> 284: 6c5f 4806 lsl r19,r19,0x2
> 288: 4149 4100 ldr r18,[r0,+r18]
> 28c: 309f 080a add r1,r20,r1
> 290: 61c9 4100 ldr r19,[r0,+r19]
> 294: 250f 010a eor r1,r1,r18
> 298: 44cc 4900 ldr r18,[r17,-0x1]
> 29c: 259f 010a add r1,r1,r19
> 2a0: 250f 010a eor r1,r1,r18
> 2a4: 488a eor r2,r2,r1
> 2a6: 6a0f 4006 lsr r19,r2,0x10
> 2aa: 490f 4006 lsr r18,r2,0x8
> 2ae: 6c5f 490a and r19,r19,r16
> 2b2: 2b06 lsr r1,r2,0x18
> 2b4: 485f 490a and r18,r18,r16
> 2b8: 6d1b 4822 add r19,r19,274
> 2bc: 251b 0002 add r1,r1,18
> 2c0: 6c5f 4806 lsl r19,r19,0x2
> 2c4: 2456 lsl r1,r1,0x2
> 2c6: 491b 4842 add r18,r18,530
> 2ca: 485f 4806 lsl r18,r18,0x2
> 2ce: 61c9 4100 ldr r19,[r0,+r19]
> 2d2: 20c1 ldr r1,[r0,r1]
> 2d4: 4149 4100 ldr r18,[r0,+r18]
> 2d8: 2c9f 080a add r1,r19,r1
> 2dc: 250f 010a eor r1,r1,r18
> 2e0: 485f 410a and r18,r2,r16
> 2e4: 491b 4862 add r18,r18,786
> 2e8: 485f 4806 lsl r18,r18,0x2
> 2ec: 6149 4100 ldr r19,[r0,+r18]
> 2f0: 454c 4a00 ldr r18,[r17],+0x2
> 2f4: 259f 010a add r1,r1,r19
> 2f8: 250f 010a eor r1,r1,r18
> 2fc: 908f 240a eor r12,r12,r1
> 300: 26bf 090a sub r1,r17,r21
> 304: a410 bne 24c <_BF_encrypt+0x18>
> 306: 20cc 0002 ldr r1,[r0,+0x11]
> 30a: 8cdc 2000 str r12,[r3,+0x1]
> 30e: 288a eor r1,r2,r1
> 310: 2c54 str r1,[r3]
> 312: 6c1b 0001 add r3,r3,8
> 316: 59bf 080a sub r2,r22,r3
> 31a: 50ef 0402 mov r2,r12
> 31e: 9120 bgtu 240 <_BF_encrypt+0xc>
> 320: 04e2 mov r0,r1
> 322: 194f 0402 rts
> 326: 01a2 nop
>
>
> Execution time when using all 16 cores is 294.676000 ms
>
> Katja
>
--
===========================================================
Yaniv Sapir
Adapteva Inc.
1666 Massachusetts Ave, Suite 14
Lexington, MA 02420
Phone: (781)-328-0513 (x104)
Email: yaniv@...pteva.com
Web: www.adapteva.com
============================================================
CONFIDENTIALITY NOTICE: This e-mail may contain information
that is confidential and proprietary to Adapteva, and Adapteva hereby
designates the information in this e-mail as confidential. The information
is
intended only for the use of the individual or entity named above. If you
are
not the intended recipient, you are hereby notified that any disclosure,
copying,
distribution or use of any of the information contained in this
transmission is
strictly prohibited and that you should immediately destroy this e-mail and
its
contents and notify Adapteva.
==============================================================
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.