|
Message-Id: <15A4931D-F31D-49A0-912B-E1044AE003EC@gmail.com> Date: Fri, 17 Jul 2015 16:38:27 +0800 From: Lei Zhang <zhanglei.april@...il.com> To: john-dev@...ts.openwall.com Subject: Re: Interleaving of intrinsics Now I manually unrolled all 5 formats in see-intrinsics.c, and tested the newly interleaved code on my laptop and MIC respectively. Here's the result: On my laptop (gcc-5, AVX, 4 HTs) hash\para | 1 | 2 | 3 | 4 | 5 | -----------|----------|----------|----------|----------|----------| md4 | 17688 | 28320 | **31884**| 31680 | 30613 | md4-omp | 43680 | 62080 | **63207**| 61184 | 59280 | md5 | 12601 | 20464 | **22562**| 21696 | 21440 | md5-omp | 32544 | **43840**| 42960 | 41216 | 39360 | sha1 | **10484**| 10000 | 8856 | 4048 | 3640 | sha1-omp | **19472**| 19296 | 16560 | 10201 | 8396 | sha256 | **4360**| 2241 | 1752 | 1679 | 1660 | sha256-omp | **8304**| 4800 | 4944 | 4499 | 4118 | sha512 | **1744**| 764 | 700 | 680 | 666 | sha512-omp | **3304**| 1984 | 1872 | 1819 | 1742 | On MIC (icc-14) hash\para | 1 | 2 | 3 | 4 | 5 | -----------|----------|----------|----------|----------|----------| md4 | 5647 | 6337 | 6400 | 6337 | **6415**| md4-omp | 669148 |**745411**| 492116 | 671067 | 608316 | md5 | 4172 | 4988 | 5085 | 5019 | **5098**| md5-omp | 519529 |**547485**| 508235 | 472615 | 456237 | sha1 | **2588**| 2299 | 1882 | 1267 | 1196 | sha1-omp |**286117**| 253514 | 193900 | 144905 | 129230 | sha256 | **1093**| 646 | 628 | 615 | 592 | sha256-omp |**124235**| 81230 | 76075 | 77445 | 71775 | sha512 | **123**| 116 | 102 | 78 | 72 | sha512-omp | **15567**| 14628 | 12613 | 12094 | 11707 | Formats used here are the same as previous, i.e. pbkdf2-*. BTW, the way I unrolled the code is turning something like #define XXX_STEP(...) { XXX_PARA_DO(i) { ... } } into #define XXX_STEP_0(...) { i = 0; ... } #if SIMD_PARA_XXX > 1 #define XXX_STEP_1(...) { i = 1; ... } #else #define XXX_STEP_1(...) #endif (...) #define XXX_STEP(...) { XXX_STEP_0(...); XXX_STEP_1(...); XXX_STEP_2(...); ... } Lei > On Jul 14, 2015, at 12:11 PM, Lei Zhang <zhanglei.april@...il.com> wrote: > > >> On Jun 23, 2015, at 2:03 AM, Solar Designer <solar@...nwall.com> wrote: >> >> One thing that is clear is that non-fully-unrolled *_PARA_DO are not >> acceptable. If there are not enough registers for fully unrolling >> these without incurring spilling, then the interleaving factor should be >> smaller. On MIC, there should be enough registers for the interleaving >> factors considered above (up to 5x). > > I just manually unrolled SHA256_STEP and SHA512_STEP respectively, and compared the performance with the auto-unrolled ones, using magnum's testpara.pl. The figures below are obtained on my laptop (formats are pbkdf2-*): > > [auto] > hash\para | 1 | 2 | 3 | 4 | 5 | > -----------|----------|----------|----------|----------|----------| > sha256 | **4020**| 3760 | 3924 | 3801 | 3940 | > sha512 | **1624**| 1092 | 1413 | 1409 | 1435 | > > [manual] > hash\para | 1 | 2 | 3 | 4 | 5 | > -----------|----------|----------|----------|----------|----------| > sha256 | **4144**| 1888 | 1817 | 1837 | 1821 | > sha512 | **1646**| 748 | 708 | 720 | 722 | > > With manual unrolling, the performance degrades drastically from interleaving x1 to x2, but not so much upwards. BTW, I didn't change the original array tmps. Just the loop is manually unrolled here. > > > Lei >
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.