Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Mon, 7 Sep 2015 19:38:21 +0200
From: magnum <>
Subject: Intrinsics experiment with CPP macros

Solar, all,

A little experiment is currently in the cpp-intrinsics topic branch. 
Specifically 51f3fe6 for now.

To the formats, nothing changed. But eg. SIMDmd4body() is now a 
function-like macro that will optimize away some branching and make the 
actual functions smaller (this could be taken further).

Currently only MD4 & MD5 are done, and more could be done to them. What 
is done, is there are now (behind the curtain) two different functions - 
one for single (or first) block and another for "reload". Also, the 
"flat to interleaved" is moved to a separate function and that is also 
hidden by PP macros (optimized away unless needed since SSEi_flags are a 

Boost seems to be 5-10% depending on format. Still, I'm not quite sure 
we want to walk this path at all?

One effect of this (version) is we now have two copies of the core 
MD4(w, a, b, c, d) function inside simd-intrinsics.c. The good thing 
about that is the optimizer may do some good stuff with the beginning of 
the "non-reload" version since a, b, c and d are now constants but I'm 
not sure the optimizer actually manages to do that with intrinsics (for 
plain C or OpenCL I'm pretty sure it does).

Perhaps we can do this in a different (and better) way, or perhaps this 
is fine. Or perhaps we should forget about this whole idea, for 
easier-to-follow code. I have no opinion yet, I'm just experimenting.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.