Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 5 Sep 2015 07:17:49 +0300
From: Solar Designer <>
Subject: Re: MD5 on XOP, NEON, AltiVec

On Sat, Sep 05, 2015 at 05:25:16AM +0300, Solar Designer wrote:
> Here's what we had last year:
> Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 8x]... (8xOMP) DONE
> Raw:    201472 c/s real, 25152 c/s virtual
> Here's what we have now:
> Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 4x2]... (8xOMP) DONE
> Raw:    150272 c/s real, 18784 c/s virtual

I sort of found it: somehow the code handling SSEi_FLAT_OUT, when
compiled in, changes the stack frame layout in such a way that
performance drops.  I wasn't yet able to tell why it drops.  The
offsets look properly aligned to me either way.

With SSEi_FLAT_OUT support #if 0'ed out, I get:

Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 4x2]... (8xOMP) DONE
Raw:    223232 c/s real, 27869 c/s virtual

And that's not even with the MD5_I change yet (haven't tried it yet).

For raw-md5, performance stays the same.  For raw-md4, commenting out
its SSEi_FLAT_OUT results in very slight regression, and its speed is
unexpectedly low either way (about the same as raw-md5's, whereas it
should be faster).  So there's yet another performance issue for us to
investigate for raw-md4 (and maybe other MD4-based formats).

As to commenting out SSEi_FLAT_OUT support for both of these, as far as
I can tell we are not using it currently.  (We're using it for SHA*.)


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.