|
Message-ID: <55345CD2.8090709@openwall.com> Date: Mon, 20 Apr 2015 04:56:34 +0300 From: Alexander Cherepanov <ch3root@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Format interface changes On 2015-04-18 02:30, magnum wrote: > On 2015-04-17 02:05, Alexander Cherepanov wrote: >> On 2015-04-16 22:08, magnum wrote: >>> I think it may be about time we change key index/count variables to >>> unsigned int in Jumbo. Rationale: >>> (...) > >> I generally agree that we should use unsigned more. I think it makes >> many things easier. It's great that you even have hard data on speed >> gain. Maybe change "char *ciphertext" to "unsigned char *ciphertext"? > > Yes, nearly all uses of char in JtR should be unsigned. It has > irritating side effects though - like having to cast any strlen() or > you'd get compiler warnings. Yeah, we have two contradicting requirements. Plain char is required by: - string function like strlen, strtok etc. Unsigned char is required by: - array indexing; - ctype function like isdigit. I'm not yet sure which way is better. One of the benefits of plain char is that it's easy to convert it to unsigned char -- it's well-defined in the C standards while converting to potentially signed char is implementation-defined. > BTW I hate that - when would that EVER be a > valuable warning? IMO the few systems that has char default to unsigned > has it right (as long as it's called "char" anyway) but there isn't much > point in having any opinion about it :-) > >> Another couple of wild ideas: >> >> - add restrict to help compiler optimize code where we are sure that >> parameters cannot overlap. Many parameters are of type char * and I >> guess compiler cannot reason about its aliasing because char can be used >> to access any data; > > WE can't reason about it either! Not in the interface. On the contrary: not only we can reason about it, we have to declare it. As part of interface definition for format authors we have to specify if a function have to deal with overlapping arguments or we guarantee that arguments will not overlap. The same question about overlapping between arguments and globals. Every time memcpy is used instead of memmove in one of the discussed functions such a guarantee is assumed. > It's a contract with the devil. I don't think it's that bad:-) > Besides, I see no place in the format interface where it > would do any good. Hm, maybe. I primarily thought about functions like this one: char *(*split)(char *ciphertext, int index, struct fmt_main *self); Hinting that *ciphertext cannot overlap with *self could be useful but it seems self is almost never used in such functions. > We can still use it carefully without any change to the interface but > it's more dangerous than you'd think - consider this patch to > nt2_fmt_plug.c: > > diff --git a/src/nt2_fmt_plug.c b/src/nt2_fmt_plug.c > --- a/src/nt2_fmt_plug.c > +++ b/src/nt2_fmt_plug.c > @@ -286,8 +286,8 @@ > static void set_key(char *_key, int index) > { > #ifdef SIMD_COEF_32 > - const unsigned char *key = (unsigned char*)_key; > - unsigned int *keybuf_word = buf_ptr[index]; > + const unsigned char *__restrict key = (unsigned char*)_key; > + unsigned int *__restrict keybuf_word = buf_ptr[index]; > unsigned int len, temp2; > > len = 0; > (...) > ((unsigned int *)saved_key)[14*SIMD_COEF_32 + (index&(SIMD_COEF_32-1)) > + index/SIMD_COEF_32*16*SIMD_COEF_32] = len << 4; > > It looks clever but does absolutely nothing for performance. However, > we're actually breaking the contract because saved_key is aliased by > buf_ptr. The bug may surface years later on some obscure compiler. > Admittedly in this case I can't see how it would ever surface since we > depend on "len", but this example shows how easy it is to miss. And > anyway we ARE still breaking the contract. 1. Are you sure that we are breaking the contract in this case? saved_key is aliased by buf_ptr as an array but do they aliased as ints? If there is no an int which is accessed through both of them I think the contract is not broken. But I haven't check indexes to make sure which case we have here. 2. Just add braces around the use of the restricted pointers and end it before the use of saved_key. 3. Indeed, it's not clear that this particular fragment can benefit from restricts. > We could try using it for a few helper functions though, like strzcpy() > and strlen8(). But it wont cause any change to the format struct so > that's a separate project Sure. > (we should only commit such changes once we > see significant gain and/or are 101% sure we're not mistaken about > aliasing - which means reviewing all callers and then review them again). Take strnzcpy for example. Right now it works for overlapping strings only if src > dst. So: - if it is supposed to work for any overlapping strings then it's broken right now; - if it is supposed to work for overlapping strings with src > dst then restrict cannot be added; - if it is not supposed to work for overlapping strings then restrict can be added but the code should reviewed for potentially overlapping arguments for this function. BTW it's not evident that this function will benefit from restrict. >> - is there need for parameters to be aligned? If many formats will >> benefit from this we can enforce it at the level of the interface. > > We already do in salt/binary functions. Then it's probably worth documenting by amending its parameters with something like "alignas(64)". > While set_key() would benefit in > some cases (actually very few - perhaps only raw-sha1-ng and only on > older CPUs), it would slow down some modes more than it would boost > others. Newish x86 can do unaligned load (even SIMD) with no performance > hit. On a side note, MIC can't do it at all - Lei had to emulate that > intrinsic. But I think that is a paren, I assume AVX512 will have it. The C standards prohibit unaligned access so compilers are free to assume any access to be aligned. Don't know how practical this problem is. -- Alexander Cherepanov
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.