Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20150529075609.GB25177@openwall.com>
Date: Fri, 29 May 2015 10:56:09 +0300
From: Solar Designer <solar@...nwall.com>
To: Alain Espinosa <alainesp@...ta.cu>
Cc: john-dev@...ts.openwall.com
Subject: Re: bitslice SHA-256

On Fri, May 29, 2015 at 01:22:10AM -0400, Alain Espinosa wrote:
> ...I briefly experimented with merged ADDs in this md5slice.c revision
> 
> I will take a look.
> 
> ...add32c() is a 3-input ADD where one of the inputs is a constant
> 
> I check this code searching how to reduce sum instructions count. If I understand it correctly you use more than 5 for one add (more than 10 for 2, if I recall correctly you use 11).

My add32() appears to use 5 (not counting the loads and the store):

		a = *x++;
		b = *y++;
		*z++ = (p = a ^ b) ^ c;
		c = (p & c) | (a & b);

But you're right - my add32c()'s code path when the constant has a 1 bit
uses 11 (with XNOR) or 12 (without).  This feels wrong, and there got to
be a way to optimize this to 10 or less within the same instruction set.
Its code path for when the current constant bit is 0 has only 7
operations, though - so this demonstrates how the addition of a constant
can be cheaper than of a variable:

		a = *x++;
		b = *y++;
		if (c & 1) {
			*z++ = ~(a ^ b) ^ c1 ^ c2;
			c2 = (a & b & (p = c1 | c2)) | (c1 & c2 & (q = a | b));
			c1 = p | q;
		} else {
			*z++ = (q = (p = a ^ b) ^ c1) ^ c2;
			c1 = (p & c1) | (a & b);
			c2 &= q;
		}

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.