Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <CDA7A222-6B3E-46E0-A891-1A73BB186EF7@gmail.com>
Date: Sat, 9 May 2015 09:23:45 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Adding OpenMP support to SunMD5


> On May 9, 2015, at 8:08 AM, Solar Designer <solar@...nwall.com> wrote:
> 
> It fails on super:
> 
> [solar@...er src]$ ../run/john -te -form=sunmd5
> Will run 32 OpenMP threads
> Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (32xOMP) FAILED (cmp_all(7))

There seems to be some compatibility issue among different compilers' implementation of OpenMP. Initially I used the following OpenMP clause, which works fine with icc on my laptop:

#pragma omp parallel for default(none) private(idx) copyin(input_buf_big) \
	shared(saved_salt, data, constant_phrase, ngroups, group_sz)

But when I experimented it on well, gcc failed to compile, saying that constant_phrase is already a constant, thus no need to be declared as shared. So I removed it, and then icc failed to compile... Magnum suggested I use the simplified form to avoid this issue:

#pragma omp parallel for copyin(input_buf_big)

It works well both on my laptop and well. I don't know it fails on super before you pointing it out. I tried changing the OpenMP clause back to its lengthy form, and now it works on super:

[lei@...er src]$ ../run/john --test --format=sunmd5
Will run 32 OpenMP threads
Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (32xOMP) DONE
Speed for cost 1 (iteration count) of 5000
Raw:	5907 c/s real, 194 c/s virtual

The default gcc version on super is 4.4.7, and I'm using gcc-4.9.2 on well. I assume there's incompatibility even between different versions of gcc. I'll see if I can change the clause to some other form to accommodate all those three compilers.

> and the single-thread performance is a bit lower than it was before.
> It was:
> 
> [solar@...er run]$ ./john -te -form=sunmd5
> Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... DONE
> Speed for cost 1 (iteration count) of 5000
> Raw:    538 c/s real, 538 c/s virtual

I'm not sure of the penalty introduced by a new outer loop, even if the loop is only iterated once in non-openmp mode. I can use macros to disable the outer loop in a non-openmp build if necessary. I haven't done so because the code looks cleaner the current way.


Lei

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.