|
Message-ID: <20130730141957.GA17257@openwall.com> Date: Tue, 30 Jul 2013 18:19:57 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: bcrypt Katja - On Thu, Jul 25, 2013 at 08:06:48AM +0400, Solar Designer wrote: > Maybe you'll come up with another clever/crazy idea on how to do right > shifts with Epiphany's FPU instructions (like I mentioned, replacing one > right shift with multiple FPU instructions is OK). Here's another idea: replace the AND, not the right shift. You can replace one AND with two IMULs - e.g., to extract the byte at bit offset 16, you can IMUL by 0x100, then right shift by 24, then IMUL by 4 (to get the 8 data bits into bit offsets 2 to 9 as we need for a load). Can you have both IMULs for free with 2x interleave, or would you have to go for 3x? In the latter case, you wouldn't be able to preload one of three P arrays, which would defeat the purpose of this new trick for one of two byte extracts - but we'd nevertheless potentially save a cycle on the other byte extract. I think you can try using this trick with 2x interleave - perhaps it's usable in some places, but maybe not in all (two IMULs means needing an 8 cycles gap between where the input became available and where you use the result). Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.