|
Message-ID: <20110607010659.GA17482@openwall.com> Date: Tue, 7 Jun 2011 05:06:59 +0400 From: Solar Designer <solar@...nwall.com> To: crypt-dev@...ts.openwall.com Subject: Re: EskBlowFish with RAM results Yuri - On Mon, Jun 06, 2011 at 07:24:57PM -0300, Yuri Gonzaga wrote: > Yes. It is going attached right now. Even though I don't really know Verilog, I took a look at the code, thanks. It appears that you spend 5 clock cycles per Blowfish round: 1. L xor P[n] 2. read two S-box elements 3. process them, read two more S-box elements 4. process them 5. swap L and R, move to next round I think you could reduce this to three (while still using two read ports to BlockRAM only): 1. L xor P[n], read two S-box elements 2. process them, read two more S-box elements 3. process them, swap L and R, move to next round or to 4 per two rounds: 1. L xor P[n], read two S-box elements 2. process them, read two more S-box elements 3. process them, R xor P[n], read two S-box elements 4. process them, read two more S-box elements, move to next round I understand that you probably didn't optimize for speed yet, and maybe I am missing something. If it turns out that LUTs and not BlockRAMs are the scarce resource (that is, if we fail to reduce the LUT count needed substantially), then it'd make sense to waste half the BlockRAM space in order to use four read ports per EksBlowfish core. Then you should be able to do two rounds per two cycles. In fact, I am considering a larger EksBlowfish-like component, which would fully use the BlockRAMs in one clock cycle - simply make the S-boxes 9-to-36 (instead of Blowfish's 8-to-32). So if you test the above approach for the original EksBlowfish (which is directly useful e.g. for JtR), we may reuse it for ours as well (for our new hashing method). > I don't know. Maybe, next experimentations could answer this question. > I will try to reduce, but I don't know if I can achieve a factor of 10. The first thing to do is exclude all code that is not in the performance critical loop, as discussed before. > I think it is 4 because I am storing initial constants in a ROM and the > synthesizer implements this ROM with RAM. Yes, this must be the case. Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.