|
Message-ID: <20110511190714.GA11467@openwall.com> Date: Wed, 11 May 2011 23:07:14 +0400 From: Solar Designer <solar@...nwall.com> To: crypt-dev@...ts.openwall.com Subject: Re: alternative approach Hi Yuri, Is my understanding of your reply correct that you agree it is a good idea to use the "alternative approach", but you propose that when doing so we limit ourselves to "well-known and acceptable hashes/ciphers" as our building blocks? On Wed, May 11, 2011 at 11:22:38AM -0300, Yuri Gonzaga wrote: > It is complicated to proposed a new method that doesn't use a combination of > well-known and acceptable hashes/ciphers because of the lack of > certification and acceptance since it take some time to happens, only after > cientific and comercial communities appreciation. > Thus, it is better to our project to use validated hashes/ciphers in order > to achieve acceptance. Right. However, most modern cryptographic primitives were designed for efficient software implementation (as well as hardware implementation). So I'm afraid there isn't a truly software-unfriendly modern hash/cipher that has received much scrutiny and government certification. Blowfish (and perhaps Twofish) might be the closest we can get - it has received a lot of scrutiny, but no government certification. It is friendly to 32-bit CPUs, but not to typical SIMD architectures (which lack vectorized array lookups, for good reasons). Thus, we may choose to use a component based on Blowfish (such as Eksblowfish or the full bcrypt) or even one based on our own Blowfish variation (more unfriendly to software implementation and requiring less logic in hardware), but include proof that cryptographic security of our entire KDF is not dependent on that of this component. That is, even if a gaping cryptographic weakness is found in this component, the worst impact would be faster testing of a candidate key in a brute-force attack - partially or fully bypassing computation of this "non-crypto" component. Of course, we wouldn't expect such bypass to be actually practical, which is why we'd include this component. (We may use a similar approach to gain acceptance for a variation of scrypt, formally relying on SHA-256 and PBKDF2 rather than on its Salsa20/8 component.) > Also, the time of GSoC will be better spent if we > focus in implement these blocks instead of proposing a new method. Maybe, or maybe not. We definitely need to have something usable and useful by the end of summer. We've already decided to include an FPGA implementation and an optional FPGA-based component in our KDF. No matter what we choose, if we implement this at all and it works, it will be "usable". As to it being "useful", it will need to be somehow better than what exists now, for some uses and in some aspects. I see two ways to make it useful in this sense: 1. Have our KDF, when running on one CPU plus one FPGA, be significantly more resistant against offline brute-force attacks by just CPUs (no FPGAs) and/or by GPUs, compared to an existing KDF implemented on a CPU only. By "significantly" I mean something like a 10x difference in attack speed (assuming fixed $ cost of the attacker's equipment), and preferably more than 10x. 2. Include and use a host-unreadable local parameter on the FPGA board. For this, we need to figure out if Pico Computing's boards are readily usable to achieve this task or not... If not yet, then maybe we won't have this by the end of summer (although it'd be a pity). If we're 100% confident that we'd achieve one of these by the end of summer, then the other is optional (for the summer). However, at least one of these is a must to achieve by the end of the summer. Now, while we could wish FPGAs were some magic high-performance beasts, definitely much faster than CPUs when properly programmed, they are not. If we pick a software-friendly cryptographic primitive, it could turn out that our optimized implementation of it on a typical server CPU achieves comparable performance to our optimized implementation on our FPGA board (one we could offer for installation into servers). Here are some performance numbers for DES, which was originally designed for hardware implementation, but then had efficient software implementations developed for it: http://www.sump.org/projects/password/ The performance number given for "AMD64 3000+" is incorrect, please disregard it. See the comments (some of them posted by me) for correct performance numbers for CPU implementations. This shows that XC3S1000-4 achieves a speed similar to a quad-core CPU of 2008. Core i7-2600 (this year's quad-core CPU) is twice faster - it does over 20M of DES-based crypt(3) keys per second with my current code. Thus, apparently XC3S5000-4, which is reported to theoretically achieve 45 MKeys/sec in a table on that web page, is twice faster than a modern quad-core CPU. However, for our project a mere 2x performance gain of FPGA vs. CPU is not good enough to achieve the #1 goal mentioned above. (I don't know how FPGAs on Pico Computing's boards meant for production use in servers compare to a XC3S5000-4. I'd appreciate info on that.) Overall, the above makes me think that we may in fact need to design and implement a software-unfriendly component. Definitely more than DES (which can be made to use arbitrarily wide SIMD vectors in software with a bitslice implementation), and perhaps more than Blowfish. With original Blowfish, we might achieve a 4x "boost" in software unfriendliness vs. DES (since it only uses 32-bit operations, whereas modern CPUs can do 128-bit almost as fast), which might give us a total "FPGA boost" vs. CPUs of 8x (relying on speeds for XC3S5000-4 vs. Core i7-2600 mentioned above). However, I'd prefer something better than 8x (although 8x is already reasonable for our project), and I am not certain of it (full/original Blowfish probably consumes more FPGA logic than DES does, which I did not take into account in these calculations, so we'd actually fit fewer Blowfish cores in our FPGA chip; also, there are differences in pipelining ability, which is nearly perfect for DES). > So, my suggestion is that you have to define 3 components: one for CPU, > other for GPU and another for FPGA. Maybe. I suggest that we omit a GPU-specific component for now, for a variety of reasons. > The result of each component will be built in a final result. For example, > do a XOR operation using 3 results Yes, however XOR only works well when the components' outputs are of the same size and all are "random enough" - otherwise the combined output might reveal some bits of individual outputs, which would allow for faster brute-force attacks (on those revealed outputs separately). Thus, we'd want to feed the components' outputs into a generic KDF (relatively fast). We may even include the original password and salt along with those component outputs. This gives us proof that the security of our entire construct depends on that final KDF only. > or use one output as input in the next component. This reduces parallelism for actual use while not affecting parallelism available for attacks. Not good. > Finally, as you are the expert in this area, I think you should establish > these details so I can go on with the project implementation. Yes, I intend to define the project more specifically, but I am trying to get you involved in the design decisions and to let you learn new things. Also, some design decisions depend on what they'd mean for implementation. So I might ask you to try implementing different things before we decide on which of those to use. Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.