john-dev - Re: extend SIMD intrinsics

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5362d8d846e6539bdbf667ed353f1cdd@smtp.hushmail.com>
Date: Fri, 03 Jul 2015 11:10:51 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: extend SIMD intrinsics

On 2015-07-02 09:52, Lei Zhang wrote:
> I'm thinking about how to gracefully add AltiVec intrinsics into the
> existing code. Currently pseudo_intrinsics.h makes a good base for
> incorporating x86 intrinsics, but yet I think it has some problems
> that may hinder its extendability.

Yes, I expected this to happen (just from guessing - I have no idea how 
other arch's intrinsics look like).

> First, as for the existing x86 intrinsics, it's not clear which
> intrinsics are supported by all platforms. As the width of Intel's
> SIMD instructions advances, some instructions are deprecated while
> some new ones are added. If someone implements a new format and wants
> to use pseudo-intrinsics, he has to check each section of
> pseudo_intrinsics.h to make sure a specific intrinsic is available on
> his target platform. I think it would better if we provide a minimum
> supported set, that is the set of intrinsics supported by ALL
> platforms. Usually this set contains the mostly used operations, and
> should meet most demands. If someone want to do further optimization,
> he could then dig into the comprehensive list of intrinsics supported
> by a platform, and choose what he needs.

I guess you mean eg. the "common" stuff in the end of the file. We could 
copy those macros to each section instead, which would be much easier to 
follow - but it would sometimes mean verbatim copies of macros, risking 
enhancements to be made to just one place of potentially several.

> Second, the interfaces exposed by pseudo_intrinsics.h are width
> agnostic, but not platform agnostic enough. Currently they are too
> tightly bound to Intel's intrinsics set. Some of them are
> inconvenient or inefficient to implement with Power/ARM's native
> intrinsics. OTOH, this may not be a big issue, if non-x86 platforms
> are not our major concerns.

Can you see a way to make it better while still using pseudo-intrinsics? 
What is the difference, is it things like three-operand instructions?

The current sse-intrinsics.c is just using the pseudo-intrinsics and is 
almost free from alternatives. If we need to, we can have alternative 
implementations depending on what (pseudo) intrinsics are available, eg.

#ifdef v3xor
#define MD5_H(x, y, z) \
     tmp[i] = v3xor((x[i]), (y[i]), (z[i]));
#else
#define MD5_H(x, y, z) \
     tmp[i] = vxor((x[i]), (y[i])); \
     tmp[i] = vxor(tmp[i], (z[i]));
#endif

An alternative (for this example) might be to define a v3xor() even for 
two-op archs, using a local tmp variable. But this might depend too much 
on the optimizer to end up good. Or not? I'm not sure.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.