john-dev - Latin-1 to UTF-16 conversion (was Lei's GSoC progress)

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <FDA6BA75-697E-4D59-9F58-17A6DB844713@gmail.com>
Date: Wed, 29 Jul 2015 11:35:22 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Latin-1 to UTF-16 conversion (was Lei's GSoC progress)


> On Jul 27, 2015, at 5:01 PM, magnum <john.magnum@...hmail.com> wrote:
> 
> On 2015-07-27 03:15, Lei Zhang wrote:
>> 
>> 1. The input key is appropriately padded in set_key() for the SIMD
>> SHA function, and key length is also determined in the process. What
>> do I do if the key is UTF16-encoded? In episerver(non-SIMD), it uses
>> enc_to_utf16() to convert the key and get its length. But each key is
>> not contiguously stored for the SIMD SHA function, thus
>> enc_to_utf16() won't be applicable.
> 
> So episerver is sha256($s.utf16($p)) or sha1($s.utf16($p)). The MSSQL formats are similar but appends salt instead of prepending (actually that's more tricky to optimize since we can't keep the salt at a fixed position).
> 
> For fast formats like this, flat enc_to_utf16() is far too slow. You should convert right into SIMD buffer like in MSSQL05's set_key.
> 
> Then you would just store the (bit-)length in the Merkel-Damgard buffer and be done with it. You'd read it back in get_key when needed.
> 
> You don't need it for anything else: For best performance, you should write the salt right into SIMD buffer in set_salt() (repeated for all of the vector width of course). The set_key and get_key functions will know there's a fixed salt length of 16 (octets) so can just start writing/reading after it, and write (read) the bit length with these extra 16 in mind. Then they'd write the Merkel-Damgard bit length field as 8 * (16 + keylen) with keylen counted in octets...
> 
> After all this, crypt_all() is simply just a matter of calling the SHA256 (or SHA1) function - the buffer is ready to use.

I looked at set_key() in mssql05 and nt2, which both convert latin-1 to utf-16 into SIMD key buffer. Yet there're still some details I don't understand.

1. mssql05 uses SHA1 and nt2 uses MD4, both of which use the same padding scheme, except for the endianness of the padded length at the tail of the block. But their code for converting are somehow different,

e.g. in mssql05's set_key():
	*keybuf_word = JOHNSWAP((temp << 16) | temp2);
and in nt2:
	temp2 |= (temp << 16);
	*keybuf_word = temp2;

Why is there no endianness swapping in nt2?

2. In mssql05's set_key():
	unsigned int *keybuf_word = (unsigned int*)&saved_key[GETPOS(3, index)];

What's the intention of the number 3 here? Salts are appended to message in mssql05, so this is not for preserving space for salt. And the salt size is not 3 anyway.

BTW, there're so many hardcoded values in the code for SIMD buffer handling. This would cause a lot of headaches for a newcomer...

3. I see that the returned value in get_salt() and get_binary() are sometimes endianness-swapped for a SIMD build and sometimes not. What's the point here?


Thanks,
Lei

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.