john-dev - Re: Re: Aleksey's status report #10

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120712114329.GA15020@debian>
Date: Thu, 12 Jul 2012 15:43:29 +0400
From: Aleksey Cherepanov <aleksey.4erepanov@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Re: Aleksey's status report #10

On Thu, Jul 12, 2012 at 01:16:16PM +0200, Frank Dittrich wrote:
> On 07/12/2012 12:53 PM, Aleksey Cherepanov wrote:
> > The opposite problem: how to find the same files? In general before
> > addition we should compare new file with all existing in the store. I
> > think I will speed it up with index file (consisted of checksums) in
> > the store (I will not push that file to avoid conflicts). Though there
> > is a race condition: two users could add the same file in parallel. So
> > we could get two equal files. But it does not really matter.
> 
> I am not a git expert, but:

I am not a git expert too.

> Can't you define a pre-commit hook which either computes sha1sum of a
> file, and commits the file under this name instead of the name given by
> the user, then adds a line to your index file in a post-commit hook?
> Can't possible conflicts be resolved automatically if you keep that
> index file sorted?

It does not seem that conflicts could be solved by sort. We touched
that when talked about commits of additions to one .pot file. git is
not good if we want to track unordered file that only grows.

Though if we store files named by checksums we do not need an index
file at all.

> But that would possibly require to rewrite attack descriptions in a
> similar way, so that they use the checksum instead of the user-supplied
> file name.
> And when you checkout the files, they could be renamed to the
> user-specified file name again.

I think renaming is not needed. We could just store two names:
original will be used in properties of attack and checksum will be
used to refer real file when user runs attack.

> Am I mising something? Can this work? Does it make sense at all? How
> hard would it be to implement it in a way that works flawlessly?

I'd say that there could be two files with the same sha1 checksum but
it does not seem to be very probable. At least git itself stores meta
data in files named like sha1 taken from content of the file. (Though
it could be useful (for my paranoid nerves) to compare files byte to
byte and yell if we have different files with the same sha1sum).

I guess this only makes sense if we would like to see what is in the
store manually (like for debugging during the contest). Because
original file names would be easier to understand.

So original file names are a bit more convenient for investigation but
hard to implement and/or slow. I'll store files renamed into sha1.

Thanks!

Regards,
Aleksey Cherepanov

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.