|
|
Message-ID: <5362d8d846e6539bdbf667ed353f1cdd@smtp.hushmail.com>
Date: Fri, 03 Jul 2015 11:10:51 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: extend SIMD intrinsics
On 2015-07-02 09:52, Lei Zhang wrote:
> I'm thinking about how to gracefully add AltiVec intrinsics into the
> existing code. Currently pseudo_intrinsics.h makes a good base for
> incorporating x86 intrinsics, but yet I think it has some problems
> that may hinder its extendability.
Yes, I expected this to happen (just from guessing - I have no idea how
other arch's intrinsics look like).
> First, as for the existing x86 intrinsics, it's not clear which
> intrinsics are supported by all platforms. As the width of Intel's
> SIMD instructions advances, some instructions are deprecated while
> some new ones are added. If someone implements a new format and wants
> to use pseudo-intrinsics, he has to check each section of
> pseudo_intrinsics.h to make sure a specific intrinsic is available on
> his target platform. I think it would better if we provide a minimum
> supported set, that is the set of intrinsics supported by ALL
> platforms. Usually this set contains the mostly used operations, and
> should meet most demands. If someone want to do further optimization,
> he could then dig into the comprehensive list of intrinsics supported
> by a platform, and choose what he needs.
I guess you mean eg. the "common" stuff in the end of the file. We could
copy those macros to each section instead, which would be much easier to
follow - but it would sometimes mean verbatim copies of macros, risking
enhancements to be made to just one place of potentially several.
> Second, the interfaces exposed by pseudo_intrinsics.h are width
> agnostic, but not platform agnostic enough. Currently they are too
> tightly bound to Intel's intrinsics set. Some of them are
> inconvenient or inefficient to implement with Power/ARM's native
> intrinsics. OTOH, this may not be a big issue, if non-x86 platforms
> are not our major concerns.
Can you see a way to make it better while still using pseudo-intrinsics?
What is the difference, is it things like three-operand instructions?
The current sse-intrinsics.c is just using the pseudo-intrinsics and is
almost free from alternatives. If we need to, we can have alternative
implementations depending on what (pseudo) intrinsics are available, eg.
#ifdef v3xor
#define MD5_H(x, y, z) \
tmp[i] = v3xor((x[i]), (y[i]), (z[i]));
#else
#define MD5_H(x, y, z) \
tmp[i] = vxor((x[i]), (y[i])); \
tmp[i] = vxor(tmp[i], (z[i]));
#endif
An alternative (for this example) might be to define a v3xor() even for
two-op archs, using a local tmp variable. But this might depend too much
on the optimizer to end up good. Or not? I'm not sure.
magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.