john-dev - Re: Office >=2007 formats

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <0589e07418809efed460b9279f0ba764@smtp.hushmail.com>
Date: Wed, 19 Sep 2012 01:59:09 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Office >=2007 formats

On 18 Sep, 2012, at 21:42 , magnum <john.magnum@...hmail.com> wrote:
> Office2007-opencl and office2010-opencl are now committed to magnum-jumbo. Both exceed 2 billion sha1/s on 7970. Both now use split kernels, so no single kernel duration should exceed 200 ms (by far) - I hope there will be no more ASIC hangs. Office2013-opencl is next.

The "exceed (by far)" was poor wording, I meant that durations are much shorter than 200 ms.

By the way, I have a slight problem with keeping the tally between calls. I use three kernels:

1. GenerateSHA1pwhash(). Initial hash from salt.password, output to global memory pwhash array (which is not mapped by host, it's just for inter-kernel communication).

2. Hash1k() does 1024 iterations of SHA-1. This is called eg. 49 times in a row for Office 2007. Its input *and* output is pwhash.

3. Final kernel, input from pwhash. This does the remaining < 1024 iterations and finish things off with some more code. Output to host buffer.

The pwhash array works perfectly for passing the hash between iterations, no barriers or host calls need to be made. Each thread reads and writes its own share of it. However, the Hash1k() kernel need to know what run it is at, because it inserts a "serial number" in the hash input. So I first added 4 bytes to the pwhash array and did something like this:

In end of GenerateSHA1pwhash():

	if (get_global_id(0) == 0)
		pwhash[get_global_size(0) * 5] = 0;

In start of Hash1k():

	base = pwhash[get_global_size(0) * 5];

In end of Hash1k():

	if (get_global_id(0) == 0)
		pwhash[get_global_size(0) * 5] += 1024;

This worked just fine... except it would intermittently fail. After a lot of debugging I could see that some threads would not be in phase, so they would read 1024 although it was their first call to Hash1k(). I tried the various barriers that OpenCL provides, and/or adding a clFinish() in host code, but nothing helped. So for now, the workaround is that all threads do the above, to their *own* counter element of the array. This works just fine but there's got to be a better way! I could pass it as a kernel argument but I'd like to keep such overhead at a minimum.

Any suggestions appreciated.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.