john-dev - Re: Implementing compatibility with the 3PC protocol

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 5 Sep 2023 21:22:01 +0200
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Implementing compatibility with the 3PC protocol

Hello Håvard,

Please see inline:

On Tue, Sep 05, 2023 at 09:35:09AM +0200, Håvard Heimli wrote:
> Let me preface this by saying I'm not sure whether this is the correct channel of communication, but I'm giving it a shot.

It's fine to have this in here.

> I am currently a student at the University of Oslo, working on my master's thesis focused on a novel protocol for privacy-preserving password cracking. The full article is available at Arxiv under the name: «Privacy-Preserving Password Cracking: How a Third Party Can Crack Our Password Hash Without Learning the Hash Value or the Cleartext»

This is https://arxiv.org/abs/2306.08740

> To ensure John's compatibility with this protocol, it is necessary to represent the "hash-to-be-cracked" as a vector. Instead of inputting a file containing various distinct hashes, the idea is to input a single vector. This vector should encompass multiple hashes that need to be cracked simultaneously.
> The vector itself is twice the length of the hash, with each pair of values within the vector corresponding to a digit in the hash. For instance, a vector like "34ABFF" would describe the hashes 3AF, 3BF, 4AF, and 4BF.

As I understand, each pair of digits define a min/max range for the
corresponding digit of the hash value.  This is in "Definition III.1
(Predicate Function)."

> As far as I'm aware the only way of doing this per now involves providing a file with all the hashes described by the vector as input.

Right.

> However, this approach becomes less scalable when the vector encompasses millions of hashes.

It can still work for millions of hashes, but it becomes infeasible for
one of the examples in the paper, which suggests using 10^70 hashes.

> My question to you is whether integrating this function into existing systems would be viable without significant modifications. While I lack experience with open-source projects, I do have a background in C programming. Any insights, tips, or advice you could provide would be greatly appreciated.

In general, no, this is not viable, because we generally do not deal
with full ASCII-encoded hashes during cracking (instead, decoding them
at loading time and filling out data structures such as bitmaps and hash
tables) and because the more direct comparison functions are per-format.

You can manage to efficiently integrate this into individual formats for
a proof-of-concept or a specialized use case, but not easily integrate
it into JtR as a whole.

For JtR as a whole, you could expand the vectors into the individual
hashes at loading time, but that would still be infeasible for cases
when the number of decoy hashes is very large (more than mere millions).

Overall, this isn't a good fit for our codebase and is conflicting with
our optimizations.

That said, this could be useful not only for privacy-preserving
cracking, but also in obscure cases where only partial hashes are
available or there's uncertainty about some portions.  As I heard, the
closed source MDXfind cracker is good at this, and could possibly be
easier to add support for such range vectors to if it were not closed
source or if its author chooses to do that.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.