Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4F04C136.1070103@hushmail.com>
Date: Wed, 04 Jan 2012 22:14:30 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: SSE/intrinsics for sapB/sapG

On 12/31/2011 11:44 AM, magnum wrote:
> If we had a format that always needed n buffers we could have a GETPOS
> that actually spans n key buffers, and a crypt call (or macro) that do
> all of them. Then I think the fmt.c would not need to handle anything
> specially.

For sapG (intermediate key, up to 248 bytes = 4 limbs) I first tried to 
come up with a GETPOS that would do the job, but this did not work well. 
Here's how I did it instead:

1. Change GETPOS so we don't write past 63 bytes but start over from 0: 
Just change the "(i)&(0xffffffff-3))" to "(i)&60)".

2. Allocate a separate buffer for each 64-byte limb:

	unsigned char saved_key[4][80*4*NBKEYS];

3. This is a sample set_key loop:

	while((temp = *key++) && len < PLAINTEXT_LENGTH) {
		saved_key[len>>6][GETPOS(len, index)] = temp;
		len++;
	}

The [len>>6] will place each character in the correct buffer, the rest 
is just normal procedure.

4. Saved the length in the correct place:

	((unsigned int*)saved_key[(len+8)>>6])[15*MMX_COEF + (index&3) + 
(index>>2)*80*MMX_COEF] = len << 3;

Here, the (len+8)>>6 will place this length word in the right buffer. 
Other than that, just as usual.

5. Now, everything is set. There's nothing more to it, except for this 
problem:

> But I guess the real problem is if *some* of the keys are shorter than
> 56 bytes and some of them are longer.

That is: If some - but not all - of the keys in a batch are done, they 
will be trashed by the next call to SSESHA1body(). I currently solve 
this by crypting to a temporary output buffer. If I know a particular 
index is "done", I copy it to the final buffer with a small inline 
function, in sapG it's called crypt_done(). I keep track of lengths in 
an array, call crypt_done() for the indexes that are done, and (only) if 
needed, call another crypt.

No matter how we improve this, we will always have the problem that if 
just one key out of NBKEYS (which is typically 12), we will get what 
could be viewed as a "12x slowdown" instead of a "1/12x slowdown" that 
would happen with 1x code. But for sapG, this does not seem to be much 
of a problem - nearly all keys need two crypts.

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.