Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <15A4931D-F31D-49A0-912B-E1044AE003EC@gmail.com>
Date: Fri, 17 Jul 2015 16:38:27 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Interleaving of intrinsics

Now I manually unrolled all 5 formats in see-intrinsics.c, and tested the newly interleaved code on my laptop and MIC respectively.

Here's the result:

On my laptop (gcc-5, AVX, 4 HTs)

hash\para  |       1  |       2  |       3  |       4  |       5  |
-----------|----------|----------|----------|----------|----------|
md4        |   17688  |   28320  | **31884**|   31680  |   30613  |
md4-omp    |   43680  |   62080  | **63207**|   61184  |   59280  |
md5        |   12601  |   20464  | **22562**|   21696  |   21440  |
md5-omp    |   32544  | **43840**|   42960  |   41216  |   39360  |
sha1       | **10484**|   10000  |    8856  |    4048  |    3640  |
sha1-omp   | **19472**|   19296  |   16560  |   10201  |    8396  |
sha256     |  **4360**|    2241  |    1752  |    1679  |    1660  |
sha256-omp |  **8304**|    4800  |    4944  |    4499  |    4118  |
sha512     |  **1744**|     764  |     700  |     680  |     666  |
sha512-omp |  **3304**|    1984  |    1872  |    1819  |    1742  |

On MIC (icc-14)

hash\para  |       1  |       2  |       3  |       4  |       5  |
-----------|----------|----------|----------|----------|----------|
md4        |    5647  |    6337  |    6400  |    6337  |  **6415**|
md4-omp    |  669148  |**745411**|  492116  |  671067  |  608316  |
md5        |    4172  |    4988  |    5085  |    5019  |  **5098**|
md5-omp    |  519529  |**547485**|  508235  |  472615  |  456237  |
sha1       |  **2588**|    2299  |    1882  |    1267  |    1196  |
sha1-omp   |**286117**|  253514  |  193900  |  144905  |  129230  |
sha256     |  **1093**|     646  |     628  |     615  |     592  |
sha256-omp |**124235**|   81230  |   76075  |   77445  |   71775  |
sha512     |   **123**|     116  |     102  |      78  |      72  |
sha512-omp | **15567**|   14628  |   12613  |   12094  |   11707  |

Formats used here are the same as previous, i.e. pbkdf2-*.

BTW, the way I unrolled the code is turning something like

#define XXX_STEP(...) {
  XXX_PARA_DO(i) {
      ...
  }
}

into

#define XXX_STEP_0(...) {
  i = 0;
  ...
}

#if SIMD_PARA_XXX > 1
#define XXX_STEP_1(...) {
  i = 1;
  ...
}
#else
#define XXX_STEP_1(...)
#endif

(...)

#define XXX_STEP(...) {
  XXX_STEP_0(...);
  XXX_STEP_1(...);
  XXX_STEP_2(...);
  ...
}

Lei


> On Jul 14, 2015, at 12:11 PM, Lei Zhang <zhanglei.april@...il.com> wrote:
> 
> 
>> On Jun 23, 2015, at 2:03 AM, Solar Designer <solar@...nwall.com> wrote:
>> 
>> One thing that is clear is that non-fully-unrolled *_PARA_DO are not
>> acceptable.  If there are not enough registers for fully unrolling
>> these without incurring spilling, then the interleaving factor should be
>> smaller.  On MIC, there should be enough registers for the interleaving
>> factors considered above (up to 5x).
> 
> I just manually unrolled SHA256_STEP and SHA512_STEP respectively, and compared the performance with the auto-unrolled ones, using magnum's testpara.pl. The figures below are obtained on my laptop (formats are pbkdf2-*):
> 
> [auto]
> hash\para  |       1  |       2  |       3  |       4  |       5  |
> -----------|----------|----------|----------|----------|----------|
> sha256     |  **4020**|    3760  |    3924  |    3801  |    3940  |
> sha512     |  **1624**|    1092  |    1413  |    1409  |    1435  |
> 
> [manual]
> hash\para  |       1  |       2  |       3  |       4  |       5  |
> -----------|----------|----------|----------|----------|----------|
> sha256     |  **4144**|    1888  |    1817  |    1837  |    1821  |
> sha512     |  **1646**|     748  |     708  |     720  |     722  |
> 
> With manual unrolling, the performance degrades drastically from interleaving x1 to x2, but not so much upwards. BTW, I didn't change the original array tmps. Just the loop is manually unrolled here.
> 
> 
> Lei
> 

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.