|
Message-ID: <20110405025216.GA16602@openwall.com> Date: Tue, 5 Apr 2011 06:52:16 +0400 From: Solar Designer <solar@...nwall.com> To: crypt-dev@...ts.openwall.com Subject: project rationale Hi, I finally got around to describing the project in more detail. I'll over-quote a little bit for people who have joined after I posted the below, then add new content below the quote: On Mon, Mar 28, 2011 at 10:12:07PM +0400, Solar Designer wrote: > This is our new mailing list for discussing the project described as a > GSoC 2011 "idea" as follows: > > http://openwall.info/wiki/ideas > > # New password hashing method for servers (full scope exceeds one summer, > so the task may be split in two or three) > > * New crypt(3) flavor using concepts of scrypt (not only iterations, > but also parallelism and memory), optionally making use of AES-NI > * GPU or/and FPGA accelerated password hashing on servers (to better > compete with similarly accelerated or distributed offline password hash > cracking), optional local parameterization on specific hardware > (parameter unreadable from host OS) > > I suggest that we start by discussing "high-level" stuff (such as the > "context" for this project: what already exists and why there's room for > something new), then go into detail. What we have: Purposefully slow password hashing methods with configurable iteration counts (which are stored along with the hashes). As far as I'm aware, this was first introduced with BSDI's extended DES-based crypt(3) hashes, and on modern systems we use bcrypt and SHA-crypt instead. We also have scrypt, which defines a key derivation function, but not a crypt(3) salt and return value string syntax yet. The scrypt KDF has some advantages over bcrypt: it can be configured to use large amounts of memory and/or contain a lot of parallelism (for just one instance). Why is parallelism in a password hashing method desirable: Offline password cracking (against leaked or stolen hashes) has a lot of potential for parallelism: different candidate passwords may be hashed in parallel. Thus an attacker is generally able to utilize the parallel processing capabilities of their hardware. (There are some exceptions to this - e.g., Blowfish's S-box lookups typically can't be efficiently implemented on SIMD architectures - but an attacker can deal with this by picking suitable hardware, such as an FPGA instead of a GPU.) However, a server doing authentication might only have one instance of a user's password to check at a given time. If the hashing method lacks parallelism, only a subset of the server CPUs' execution units will be usable for the task. The administrator has to set the iteration count (for newly set/changed passwords) such that password hashes are computed in reasonable time (say, in 10 ms). If only 1/10th of the execution units of a given CPU type are usable to hash a single password (because of data dependencies), this means that an attacker can try candidate passwords against hashes of this type on a comparable CPU 10 times faster (by exploiting the extra parallelism available due to having multiple candidate passwords). (In practice, for bcrypt the speedup only available to password cracking is currently around 80% on Intel Core 2'ish CPUs in 64-bit mode. That's for just one CPU core and without making use of SIMD extensions, because Blowfish is not well suited for those, whereas an FPGA or ASIC implementation "comparable in size" to the server's CPU could achieve a lot of further speedup.) By including a lot of inherent parallelism in one instance of password hash computation, we make this parallelism available to both password validation on the server and password cracking. Thus, we let the server administrator configure the parameters such that hashes are still computed in reasonable time (say, the same 10 ms), but this process makes more complete use of the server's resources. (In simple cases, this would not require additional configuration beyond setting a higher iteration count, where each iteration would automatically make use of a larger number of CPU execution units and/or wider SIMD vectors to run faster.) Then the attacker will only be able to check fewer candidate passwords in parallel (or will have to throw more resources at the attack). Why it may be desirable to use more memory: Similarly to a hashing method that lacks inherent parallelism, one that only needs a little bit of memory is not making use of all server resources that it reasonably could. This allows for smaller and cheaper specialized implementations (that don't include extra memory), thereby letting an attacker build and run a larger number of those in parallel (in an offline attack). By allowing the administrator to configure password hashes (for newly set/changed passwords) to require a substantial amount of memory for their computation, we increase the cost of each instance of a specialized implementation that an attacker would use, thereby decreasing the number of those that will run in parallel. Why not just use scrypt: Actually, we might build upon scrypt. This needs further research - scalability to wider SIMD vectors, GPU implementations (to make use of GPUs in servers doing authentication), FPGA implementations. We could see whether scrypt is usable as-is, whether it needs changes, or whether we need something different/new. Then, one common complaint regarding bcrypt was that it relied on a cipher (Blowfish) that was not NIST-approved. In fact, this was cited as the primary reason to create SHA-crypt. scrypt uses Salsa20, which is likely to result in similar difficulties in getting it accepted, even though IIRC it also uses SHA-256, which we can refer to. So I think it could be better to design a similar KDF based on a SHA-2 hash or on AES, although this is not trivial to do when we want to have near-optimal implementations for very different machine word and SIMD vector widths. Local parameterization: Offline attacks on stolen/leaked hashes may be defeated when hash computation involves a local parameter (a secret value with high guessing entropy, such that it is infeasible to brute-force it) and this parameter is not stolen/leaked. With a custom FPGA board for use in servers, we may have a little bit of storage unreadable from the host system, yet usable by the FPGA logic. A company could program the board with a certain local parameter value (and store a backup copy of this value elsewhere, e.g. printed on paper and locked in a safe), then put the board into their authentication server. If the server is ever compromised over a network, the intruder would only be able to capture plaintext passwords of users authenticating until the compromise is detected and dealt with. The intruder won't be able to mount a successful offline attack on all of the hashes, which would otherwise result in compromise of a much larger percentage of accounts, longer recovery from the compromise, and/or in increased burden on customer support (for having a larger number of accounts locked and passwords changed in an emergency). I think the above should be enough to get us started. I'd appreciate any comments and questions. Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.