Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <55345CD2.8090709@openwall.com>
Date: Mon, 20 Apr 2015 04:56:34 +0300
From: Alexander Cherepanov <ch3root@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Format interface changes

On 2015-04-18 02:30, magnum wrote:
> On 2015-04-17 02:05, Alexander Cherepanov wrote:
>> On 2015-04-16 22:08, magnum wrote:
>>> I think it may be about time we change key index/count variables to
>>> unsigned int in Jumbo. Rationale:
>>> (...)
>
>> I generally agree that we should use unsigned more. I think it makes
>> many things easier. It's great that you even have hard data on speed
>> gain. Maybe change "char *ciphertext" to "unsigned char *ciphertext"?
>
> Yes, nearly all uses of char in JtR should be unsigned. It has
> irritating side effects though - like having to cast any strlen() or
> you'd get compiler warnings.

Yeah, we have two contradicting requirements.

Plain char is required by:
- string function like strlen, strtok etc.
Unsigned char is required by:
- array indexing;
- ctype function like isdigit.

I'm not yet sure which way is better. One of the benefits of plain char 
is that it's easy to convert it to unsigned char -- it's well-defined in 
the C standards while converting to potentially signed char is 
implementation-defined.

> BTW I hate that - when would that EVER be a
> valuable warning? IMO the few systems that has char default to unsigned
> has it right (as long as it's called "char" anyway) but there isn't much
> point in having any opinion about it :-)
>
>> Another couple of wild ideas:
>>
>> - add restrict to help compiler optimize code where we are sure that
>> parameters cannot overlap. Many parameters are of type char * and I
>> guess compiler cannot reason about its aliasing because char can be used
>> to access any data;
>
> WE can't reason about it either! Not in the interface.

On the contrary: not only we can reason about it, we have to declare it. 
As part of interface definition for format authors we have to specify if 
a function have to deal with overlapping arguments or we guarantee that 
arguments will not overlap. The same question about overlapping between 
arguments and globals. Every time memcpy is used instead of memmove in 
one of the discussed functions such a guarantee is assumed.

> It's a contract with the devil.

I don't think it's that bad:-)

> Besides, I see no place in the format interface where it
> would do any good.

Hm, maybe. I primarily thought about functions like this one:

char *(*split)(char *ciphertext, int index, struct fmt_main *self);

Hinting that *ciphertext cannot overlap with *self could be useful but 
it seems self is almost never used in such functions.

> We can still use it carefully without any change to the interface but
> it's more dangerous than you'd think - consider this patch to
> nt2_fmt_plug.c:
>
> diff --git a/src/nt2_fmt_plug.c b/src/nt2_fmt_plug.c
> --- a/src/nt2_fmt_plug.c
> +++ b/src/nt2_fmt_plug.c
> @@ -286,8 +286,8 @@
>   static void set_key(char *_key, int index)
>   {
>   #ifdef SIMD_COEF_32
> -       const unsigned char *key = (unsigned char*)_key;
> -       unsigned int *keybuf_word = buf_ptr[index];
> +       const unsigned char *__restrict key = (unsigned char*)_key;
> +       unsigned int *__restrict keybuf_word = buf_ptr[index];
>          unsigned int len, temp2;
>
>          len = 0;
> (...)
> 	((unsigned int *)saved_key)[14*SIMD_COEF_32 + (index&(SIMD_COEF_32-1))
> + index/SIMD_COEF_32*16*SIMD_COEF_32] = len << 4;
>
> It looks clever but does absolutely nothing for performance. However,
> we're actually breaking the contract because saved_key is aliased by
> buf_ptr. The bug may surface years later on some obscure compiler.
> Admittedly in this case I can't see how it would ever surface since we
> depend on "len", but this example shows how easy it is to miss. And
> anyway we ARE still breaking the contract.

1. Are you sure that we are breaking the contract in this case? 
saved_key is aliased by buf_ptr as an array but do they aliased as ints? 
If there is no an int which is accessed through both of them I think the 
contract is not broken. But I haven't check indexes to make sure which 
case we have here.

2. Just add braces around the use of the restricted pointers and end it 
before the use of saved_key.

3. Indeed, it's not clear that this particular fragment can benefit from 
restricts.

> We could try using it for a few helper functions though, like strzcpy()
> and strlen8(). But it wont cause any change to the format struct so
> that's a separate project

Sure.

> (we should only commit such changes once we
> see significant gain and/or are 101% sure we're not mistaken about
> aliasing - which means reviewing all callers and then review them again).

Take strnzcpy for example. Right now it works for overlapping strings 
only if src > dst. So:
- if it is supposed to work for any overlapping strings then it's broken 
right now;
- if it is supposed to work for overlapping strings with src > dst then 
restrict cannot be added;
- if it is not supposed to work for overlapping strings then restrict 
can be added but the code should reviewed for potentially overlapping 
arguments for this function.

BTW it's not evident that this function will benefit from restrict.

>> - is there need for parameters to be aligned? If many formats will
>> benefit from this we can enforce it at the level of the interface.
>
> We already do in salt/binary functions.

Then it's probably worth documenting by amending its parameters with 
something like "alignas(64)".

> While set_key() would benefit in
> some cases (actually very few - perhaps only raw-sha1-ng and only on
> older CPUs), it would slow down some modes more than it would boost
> others. Newish x86 can do unaligned load (even SIMD) with no performance
> hit. On a side note, MIC can't do it at all - Lei had to emulate that
> intrinsic. But I think that is a paren, I assume AVX512 will have it.

The C standards prohibit unaligned access so compilers are free to 
assume any access to be aligned. Don't know how practical this problem is.

-- 
Alexander Cherepanov

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.