|
Message-ID: <20201017211118.GA10877@openwall.com> Date: Sat, 17 Oct 2020 23:11:19 +0200 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: how bitcoin2john works Hi Jack, On Sat, Oct 10, 2020 at 03:45:06PM +0000, Jack Anderson wrote: > I want to know how the output hash is reached by file analysis and calculation. How the wallet's password protection works: The wallet is a Berkeley database containing, among other things, the master key. When you have a password set, the master key is encrypted with a symmetric key derived from your password. In order to slow down password cracking, the symmetric key derivation method is deliberately computationally expensive, as specified with parameters also stored in the wallet. In order to prevent amortizing the cost of attacks on multiple wallets at once (have the cost proportional to the number of wallets, rather than only to the number of candidate passwords tested) and to prevent precomputation (require that the attack can only be started once the attacker has a copy of the wallet), a random input (so-called salt) to the key derivation method is used and is also stored in the wallet. How bitcoin2john works: bitcoin2john uses the Berkeley database library, accessing it via a Python module, to parse the wallet.dat file. It extracts the fields mentioned above (encrypted master key, salt, key derivation method and its parameters), makes sure they appear to be supported by our code, and if so encodes the needed information into the "hash" it outputs. Specifically, the "hash" output by current bitcoin2john includes the last two AES blocks of the encrypted master key, the salt, and the iteration count to the only currently supported key derivation method (one based on a large number of uses of SHA-512). Older versions of bitcoin2john included more data in the output "hash" (the full encrypted master key and two more fields), but we've since enhanced bitcoin2john not to include that extra data in order to make the output "hashes" less security-sensitive (enough for cracking the password, but not for much else). How password cracking works given the "hash": For each candidate password, we derive the corresponding symmetric key. We then use that symmetric key to decrypt the last AES block of the master key. We can do that without having to decrypt prior blocks (nor to have more than two) due to how CBC mode works. We then check the PKCS#7 padding on the decrypted block. Due to how supported master key sizes map onto whole numbers of AES blocks, the last AES block contains either 16 (is padding-only) or 8 bytes of padding. Either way, correct padding is an almost sure indicator that the password was correct. In order to utilize modern hardware's parallel processing capabilities, we buffer candidate passwords and test them (as above) in large batches. Terminology issue: You might have noticed that I put "hash" in quotes. That's because this isn't actually a hash, and we crack its password differently from how we crack passwords against hashes. In this case, we attempt decryption and check whether the decrypted data is correct-looking. For actual hashes, we attempt hashing and check whether the computed hashes match the hash being cracked. With JtR supporting so many non-hashes like this lately, maybe we need to adjust our wording to no longer refer to JtR inputs as hashes. Maybe call them targets, e.g. as in "Loaded X targets with Y different salts"? Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.