|
Message-ID: <541e07c58ced7b4c4af5328f03e123a0@smtp.hushmail.com> Date: Sun, 23 Feb 2014 23:55:30 +0100 From: magnum <john.magnum@...hmail.com> To: Jeff Keller <jakeller@...r.com>, john-users@...ts.openwall.com Subject: False positives essay :) This ended up long and might be of public interest so I cc the list: On 2014-02-23 22:40, Jeff Keller wrote: > Out of curiosity, how does it come up with false positives? > Isn’t there only *one* password that would work? It sure is, but in this case we have a hard time detecting when we got the right one. For login passwords it's simple: The hash is made my feeding the password through a one-way function F, that in it's crudest form is just a single SHA1, for example. So hash = SHA1("correct password") When we brute force, we know the hash but not the password. So we just do lots of hash2 = SHA1("password candidate") and compare our hash2 with the correct hash. When they match, we have found the correct password. Very simple, no false positives, no false negatives. Now, for disk or file encryption we don't have a hash! The DMG image was basically made like this when created: key = F("correct password") encrypted_data = AES(key, plain_data) We obviously do not know the key. We do have an encrypted block (eg. first or last block from filesystem, not a block from any particular file) from the DMG. The encryption is symmetrical so you decrypt using the same key. When we brute force, we do key = F("password candidate") decrypted_block = AES(key, encrypted_block) The thing is that if we use the wrong key [ie. derived from the wrong password], AES will NOT emit any "error" but a perfectly fine output block of random garbage. Herein lies the problem: How do we know the decrypted_block is correct or not? We don't have any correct data to compare with! So we are forced to check for "known plaintext" within the decrypted block. All false positives in this case came from a test for the string "Apple" within the decrypted block. While the chance for that string appearing in a specific spot in random data would be about one in a thousand billions, the chance for it appearing *anywhere* within a block of 8 KB is much greater. I'm not sure but I think it's in the order of one in 134 millions. We got 5-6 false positives in ~20 hours at ~5000 c/s which might indicate it's slightly more probable so I'm not sure about the exact figures. All other known-plain tests in the format are at least 8 bytes and that wont give any false positive in 20,000 years or so. The problem is we are not 100% sure we can do without the "Apple" test. We think we can, but we are not sure. If we drop that test and your decrypted block really has the "Apple" string but none of the other strings we test for, we'd end up with a false NEGATIVE instead, which as you can imagine is much, much worse. Shrug. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.