Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <4172ED7D-A458-456B-B610-9015019A94D0@gmail.com>
Date: Sun, 5 Jul 2015 11:01:48 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: extend SIMD intrinsics

I found some more problems.

As x86's vector type is element-type-agnostic, some intrinsics are also type agnostic, e.g. _mm_load_si128 & _mm_and_si128, which can be used for whatever types of integer vectors. However, AltiVec strictly distinguishes among different vector types. To load a 'vector int' from memory, you must pass a 'int*' pointer to vec_ld(); 'void*' or any other types of pointer would trigger a compile error. This means, like we divide vtype into vtype32 and vtype64, we may have to divide vload into vload_epi32 and vload_epi64, and other intrinsics alike.

I took a look at ARM's NEON intrinsics, which, like AltiVec, also  distinguishes among different vector types. So a element-type-aware interface of pseudo-intrinsics may be more generally applicable.

Another thing that bothered me is the use of SIMD_COEF_32/SIMD_COEF_64. For SIMD archs that support both int32 and int64 operations, SIMD_COEF_64 is certainly half of SIMD_COEF_32. I think the use of SIMD_COEF_64 is for SIMD archs that lack int64 support, where you can detect the absence of SIMD_COEF_64, and then use int32 intrinsics only. Unfortunately, many formats in JtR just simply check the presence of SIMD_COEF_32, and then assume the host arch supports both int32 & int64 intrinsics. This already caused me troubles. As AltiVec lacks some int64 intrinsics, I want to temporarily disable the use of int64 intrinsics, and consider emulating them later. So I only defined SIMD_COEF_32 but not SIMD_COEF_64. However, many formats just assume the co-existence of SIMD_COEF_64 with SIMD_COEF_32, and give me a bunch of compile errors when SIMD_COEF_64 isn't there. 

I think either we should enforce every SIMD-capable target to support both int32 and int64 operations (by emulation if necessary), or we should make it clear that int64 intrinsics (or functions that rely on them, like SSESHA512body) shouldn't be used without the presence of SIMD_COEF_64.


Lei

> On Jul 4, 2015, at 5:48 AM, magnum <john.magnum@...hmail.com> wrote:
> 
> On 2015-07-03 14:13, Lei Zhang wrote:
>> I mean we should make it clear which intrinsics are support
>> by all archs, e.g. a list like:
>> ------
>> vadd
>> vand
>> vload
>> vstore
>> vsll
>> vsrl
>> ...
>> ------
>> 
>> Those primitive intrinsics should be available in any decent SIMD
>> arch, and can be used portably.
>> 
>> The current situation is that, without such a list, we may risk
>> losing portability when writing intrinsics. Imagine that, when I
>> implement a format on a AVX2 laptop, I just look at the AVX2 section
>> in pseudo_intrinsics.h and find vloadu and vshuffle_epi8 to be in the
>> large list of supported intrinsics, so I add them to my code.
>> Unfortunately this code won't work when I port it to MIC, because
>> those two intrinsics are not in MIC's list.
> 
> Ah, yes. We have a common set, of which some are more or less emulated on some archs. That emulation may itself use intrinsics that are not among the common ones so can't be used outside that header file unless we add emulation for all other archs. I agree we should try to make it clear which ones are common.
> 
>> I think the easiest way to tackle this issue is to, for each arch,
>> split the list of supported intrinsics into two parts: one part
>> contains primitive intrinsics, and the other contains more "advanced"
>> intrinsics. The set of primitive intrinsics for are the same for each
>> arch and are always portable. The "advanced" intrinsics can be used
>> for more optimized code, and need to be wrapped with #ifdefs in user
>> code.
> 
> Agreed.
> 
>> I think it's no problem using pseudo-intrinsics, but some interfaces
>> needs redesigning. For example, x86's __m512i is
>> element-type-agnositc, but AltiVec is not. 'vector int' and 'vector
>> long' are different types in AltiVec, and cannot be used
>> interchangeably (unless explicit casting). Currently we use type
>> (__m512i) pervasively in our code, and don't distinguish element
>> types when declaring variables. This already caused me headaches when
>> incorporating AltiVec intrinsics. Maybe we can define two different
>> types, e.g. vtype32 and vtype64.
> 
> OK, so we should refactor vtype to vtype32, and add a vtype64. On intel they will be the same. Sounds good to me.
> 
>> I just found that some formats still use raw x86
>> intrinsics and some use too advanced intrinsics to be found on
>> non-x86 archs. Those all needs handling in order to support non-x86
>> intrinsics.
> 
> There's core files like DES, which use their own kind of pseudo intrinsics I think. And there are a few formats in Jumbo that use their own stuff, be it pseudo or just stacks of ifdefs. I think it's GOST, Blake, Keccak, Scrypt and Pomelo. Pomelo will be replaced with Agneiszka's format sooner or later and perhaps Scrypt too. We'll see if you get time to have a look at Keccak & co but let's start with the ones that use pseudo-intrinsics.h.
> 
> magnum
> 

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.