|
Message-ID: <20130730163429.GA18902@openwall.com> Date: Tue, 30 Jul 2013 20:34:29 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: bcrypt Katja, On Tue, Jul 30, 2013 at 06:19:57PM +0400, Solar Designer wrote: > Here's another idea: replace the AND, not the right shift. You can > replace one AND with two IMULs - e.g., to extract the byte at bit offset > 16, you can IMUL by 0x100, then right shift by 24, then IMUL by 4 (to > get the 8 data bits into bit offsets 2 to 9 as we need for a load). Can > you have both IMULs for free with 2x interleave, or would you have to go > for 3x? In the latter case, you wouldn't be able to preload one of > three P arrays, which would defeat the purpose of this new trick for one > of two byte extracts - but we'd nevertheless potentially save a cycle on > the other byte extract. In terms of register usage for the constants, at first it feels like you'd need two more, for 0x100 and 0x10000. However, instead of 0x100 with IMUL you may reuse the existing 0xff constant (which you need for the AND of byte 0) with IMADD. And if you replace all of the AND 0x3fc's with this new approach, you would no longer need a register with the 0x3fc constant. Thus, this approach potentially needs no extra registers for the constants. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.