|
Message-ID: <CAKGDhHUKO77MOW86F_ZmY=OPN8LjWJdjY6PYCsT-ZiVBWxrLQQ@mail.gmail.com> Date: Tue, 2 Jun 2015 19:26:52 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Lyra2 on CPU 2015-06-02 3:45 GMT+02:00 Solar Designer <solar@...nwall.com>: > On Mon, Jun 01, 2015 at 11:57:34PM +0200, Agnieszka Bielec wrote: >> Lyra 2 uses by default openmp threads in one hashing function. > > IIRC, their implementation uses pthreads directly, not via OpenMP. > Do you know otherwise? in source code I downloaded lately is omp > >> nPARALLEL option determines how many omp threads are running. and if >> nPARALLEL changes, output also changes. >> nPARALLEL by default equals to 2 > > What we need is an implementation of Lyra2 that would work for any > thread-level parallelism setting _without_ necessarily creating any > threads. In its threads-disabled mode, it would compute those threads' > portions of work sequentially. This is much like Colin Percival's > original implementation of scrypt works when called with p > 1. yes, I implemented also lyra2 for nPARALLEL > 1 without any threads. (version c) after removing threads) these results I included are for nPARALLEL=2 > I haven't looked at your code yet - I should. I've just uploadeded my versions to branches: a) - "omp_nested" b) - "lyra" c) - "lyra_external_threads" but my code contains warnings . I though that after we select the winner I will be working on my code look in version b) function crypt_all may look unfamiliar. it's because I had problems with barriers. my all threads after reaching a barrier were blocked. in funciton for(i=0;i<2;i++) only two threads were running in function LYRA2, but printf omp_get_num_threads() was returning 8. also on super I had problems with barriers > >> a) >> Will run 8 OpenMP threads >> Benchmarking: Lyra2, Generic Lyra2 [ ]... (8xOMP) DONE >> Speed for cost 1 (t) of 8, cost 2 (m) of 8 >> Many salts: 4896 c/s real, 848 c/s virtual >> Only one salt: 5005 c/s real, 856 c/s virtual >> >> b) >> Will run 8 OpenMP threads >> Benchmarking: Lyra2, Generic Lyra2 [ ]... (8xOMP) DONE >> Speed for cost 1 (t) of 8, cost 2 (m) of 8 >> Many salts: 6608 c/s real, 876 c/s virtual >> Only one salt: 7120 c/s real, 935 c/s virtu >> >> c) >> Will run 8 OpenMP threads >> Benchmarking: Lyra2, Generic Lyra2 [ ]... (8xOMP) DONE >> Speed for cost 1 (t) of 8, cost 2 (m) of 8 >> Many salts: 7032 c/s real, 943 c/s virtual >> Only one salt: 7872 c/s real, 1035 c/s virtual >> >> without openmp) >> Benchmarking: Lyra2, Generic Lyra2 [ ]... DONE >> Speed for cost 1 (t) of 8, cost 2 (m) of 8 >> Many salts: 2130 c/s real, 2152 c/s virtual >> Only one salt: 2160 c/s real, 2138 c/s virtual >> >> I think that method b) is slower because we are using synchronization >> many times and we have barriers for all omp threads. > > Maybe. You can test this hypothesis by benchmarking at higher cost > settings and thus at lower c/s rates. At lower c/s rates, the > synchronization overhead becomes relatively lower. I choose only the biggest noticed speeds for tests: ; 8896/9144 ~0.97287839020122484689 ; 2312/2368 ~0.97635135135135135135 > If confirmed, a way to reduce the overhead at higher c/s rates as well > would be via computing larger batches of hashes per parallel for loop. > This is what we normally call OMP_SCALE, but possibly applied at a > slightly lower level in this case. lyra2 hash uses barriers in one hash computation so I'm not sure, maybe I don't understand your point > >> I couldn't find a way how to do it for only x threads. > > What do you mean by x here? only nPARALLEL number of threads. > >> I am leaving to you to decide which method to implement to jtr. > > I think the order of our experiments should be as I outlined at the > start of this reply. > > For nPARALLEL, make it a runtime parameter encoded with the hashes. > > What other options "like this" are there? "where PARAMETERS can be: nCols = (number of columns, default is 256) nThreads = (number of threads, default is 2) nRoundsSponge = (number of Rounds performed for reduced sponge function [1 - 12], default is 1) bSponge = (number of sponge blocks, bitrate, 8 or 10 or 12, default is 12) sponge = (0, 1 or 2, default is 0) 0 means Blake2b, 1 means BlaMka and 2 means half-round BlaMka" > While we're at it, have you moved the memory (de)allocation out of the > cracking loop? And have you done it for POMELO too, as we had > discussed - perhaps reusing the allocators from yescrypt? I don't > recall you reporting on this, so perhaps this task slipped through the > cracks? If so, can you please (re-)add it to your priorities? not yet for both, I will do it in this week thanks
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.