|
Message-ID: <20130729233800.GA12651@openwall.com> Date: Tue, 30 Jul 2013 03:38:00 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: bcrypt Katja, Yaniv - On Tue, Jul 30, 2013 at 12:44:09AM +0200, Katja Malvoni wrote: > I moved to separate assembly file and my code from yesterday worked. I > implemented whole BF_encrypt2() in assembly. > There are no enough registers to preload both P arrays so I'm preloading > only one. How is that - not enough registers to preload both P arrays? We got 64 registers and little demand for them other than for the two P's (need 36 for them). > Speed is 1175 c/s. Good, but it should be 1200 c/s with both P'c preloaded. ;-) > Code is in https://github.com/kmalvoni/JohnTheRipper/tree/master > > I left two rounds in one macro - since there needs to be 4 cycles between > ALU following FPU instruction to have dual issue, with only one round it's > not possible to have shift right by 22 on ALU and FPU for both instances > and to use iadd at the end. With two rounds in one macro one shift right by > 22 and one add are not parallelised for one macro. I find the above description a bit confusing, but I understand the general issue. OK. Have you tried replacing the right shift by 22 followed by AND with right shift by 24 followed by IMUL? (AND is non-free, whereas IMUL is potentially free.) > When I used r2 and r3 for L0 and R0, before preloading P array, speed was > 1083 c/s. But when I changed to r48 and r56 it became 1136 c/s. I guess > it's because with r2 and r3 some instructions were 16-bit. That's puzzling. Yaniv - is dual-issue (or something else) hampered by having some 16-bit instructions inter-mixed with 32-bit ones? Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.