|
Message-ID: <5362d8d846e6539bdbf667ed353f1cdd@smtp.hushmail.com> Date: Fri, 03 Jul 2015 11:10:51 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: extend SIMD intrinsics On 2015-07-02 09:52, Lei Zhang wrote: > I'm thinking about how to gracefully add AltiVec intrinsics into the > existing code. Currently pseudo_intrinsics.h makes a good base for > incorporating x86 intrinsics, but yet I think it has some problems > that may hinder its extendability. Yes, I expected this to happen (just from guessing - I have no idea how other arch's intrinsics look like). > First, as for the existing x86 intrinsics, it's not clear which > intrinsics are supported by all platforms. As the width of Intel's > SIMD instructions advances, some instructions are deprecated while > some new ones are added. If someone implements a new format and wants > to use pseudo-intrinsics, he has to check each section of > pseudo_intrinsics.h to make sure a specific intrinsic is available on > his target platform. I think it would better if we provide a minimum > supported set, that is the set of intrinsics supported by ALL > platforms. Usually this set contains the mostly used operations, and > should meet most demands. If someone want to do further optimization, > he could then dig into the comprehensive list of intrinsics supported > by a platform, and choose what he needs. I guess you mean eg. the "common" stuff in the end of the file. We could copy those macros to each section instead, which would be much easier to follow - but it would sometimes mean verbatim copies of macros, risking enhancements to be made to just one place of potentially several. > Second, the interfaces exposed by pseudo_intrinsics.h are width > agnostic, but not platform agnostic enough. Currently they are too > tightly bound to Intel's intrinsics set. Some of them are > inconvenient or inefficient to implement with Power/ARM's native > intrinsics. OTOH, this may not be a big issue, if non-x86 platforms > are not our major concerns. Can you see a way to make it better while still using pseudo-intrinsics? What is the difference, is it things like three-operand instructions? The current sse-intrinsics.c is just using the pseudo-intrinsics and is almost free from alternatives. If we need to, we can have alternative implementations depending on what (pseudo) intrinsics are available, eg. #ifdef v3xor #define MD5_H(x, y, z) \ tmp[i] = v3xor((x[i]), (y[i]), (z[i])); #else #define MD5_H(x, y, z) \ tmp[i] = vxor((x[i]), (y[i])); \ tmp[i] = vxor(tmp[i], (z[i])); #endif An alternative (for this example) might be to define a v3xor() even for two-op archs, using a local tmp variable. But this might depend too much on the optimizer to end up good. Or not? I'm not sure. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.