Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5362d8d846e6539bdbf667ed353f1cdd@smtp.hushmail.com>
Date: Fri, 03 Jul 2015 11:10:51 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: extend SIMD intrinsics

On 2015-07-02 09:52, Lei Zhang wrote:
> I'm thinking about how to gracefully add AltiVec intrinsics into the
> existing code. Currently pseudo_intrinsics.h makes a good base for
> incorporating x86 intrinsics, but yet I think it has some problems
> that may hinder its extendability.

Yes, I expected this to happen (just from guessing - I have no idea how 
other arch's intrinsics look like).

> First, as for the existing x86 intrinsics, it's not clear which
> intrinsics are supported by all platforms. As the width of Intel's
> SIMD instructions advances, some instructions are deprecated while
> some new ones are added. If someone implements a new format and wants
> to use pseudo-intrinsics, he has to check each section of
> pseudo_intrinsics.h to make sure a specific intrinsic is available on
> his target platform. I think it would better if we provide a minimum
> supported set, that is the set of intrinsics supported by ALL
> platforms. Usually this set contains the mostly used operations, and
> should meet most demands. If someone want to do further optimization,
> he could then dig into the comprehensive list of intrinsics supported
> by a platform, and choose what he needs.

I guess you mean eg. the "common" stuff in the end of the file. We could 
copy those macros to each section instead, which would be much easier to 
follow - but it would sometimes mean verbatim copies of macros, risking 
enhancements to be made to just one place of potentially several.

> Second, the interfaces exposed by pseudo_intrinsics.h are width
> agnostic, but not platform agnostic enough. Currently they are too
> tightly bound to Intel's intrinsics set. Some of them are
> inconvenient or inefficient to implement with Power/ARM's native
> intrinsics. OTOH, this may not be a big issue, if non-x86 platforms
> are not our major concerns.

Can you see a way to make it better while still using pseudo-intrinsics? 
What is the difference, is it things like three-operand instructions?

The current sse-intrinsics.c is just using the pseudo-intrinsics and is 
almost free from alternatives. If we need to, we can have alternative 
implementations depending on what (pseudo) intrinsics are available, eg.

#ifdef v3xor
#define MD5_H(x, y, z) \
     tmp[i] = v3xor((x[i]), (y[i]), (z[i]));
#else
#define MD5_H(x, y, z) \
     tmp[i] = vxor((x[i]), (y[i])); \
     tmp[i] = vxor(tmp[i], (z[i]));
#endif

An alternative (for this example) might be to define a v3xor() even for 
two-op archs, using a local tmp variable. But this might depend too much 
on the optimizer to end up good. Or not? I'm not sure.

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.