john-dev - Re: Re: mmap()

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <a605aa1dd15b84b05e318a085a505255@smtp.hushmail.com>
Date: Mon, 28 Apr 2014 21:59:26 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Re: mmap()

On 2014-04-27 22:47, magnum wrote:
> I'm experimenting with using SSE *with* mmap (not Atom's code) but
> since most words are shorter than 16 bytes it seems to be better using
> 32-bit or even 8-bit stuff.

The mmap stuff is now committed to bleeding-jumbo. The "problem" with 
SSE described above is gone: If the word is shorter we'll copy 16 bytes 
but then we'll leave the loop knowing where to put the null byte. So 
it's now very fast for any length.

I have another problem though, calling for help or knowledge: The SSE2 
version is a fine boost on Linux, I've tested it on a couple of Intels 
and an AMD. But when I run it on OSX with an i7 mobile, it *halves* the 
speed. At first I thought it was something with poor handling of 
unaligned SSE but it did seem unlikely for this CPU. And now I booted 
Linux on the Macbook and could confirm the SSE code runs just fine 
there, with a 6-7% boost over the 64-bit alternative code. In both cases 
it was compiled with gcc 4.7-ish. How the heck can SSE intrinsics end up 
that different? The OS should have absolutely nothing to do with it!?

For now I disable the SSE2 code path for __APPLE__ but I really think 
this is weird. I'll try peaking at the assembler output from the compiler.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.