|
Message-ID: <CAKGDhHUBjGfQXzPiMjtQaH4aU9x9-Wdg5eA=p31e=7LxbJRFMw@mail.gmail.com> Date: Tue, 11 Aug 2015 08:00:54 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Agnieszka's weekly report #15 accomplishments: - I made optimizations for argon2i/d although argon2d is faster only slightly but I didn't do benchmarks because super wasn't idle, maybe slow hash shouldn't be impaired on GPU but it's better not to risk. In argon2i I did coalescing and vectorization and both gave me the better speed but I have problem in my own nvidia card: argon2i has 2 functions ComputeBlock, the first is working on private memory, the second firstly copy memory from __global to __private and after that do the same computations as ComputerBlock. vectorization the first function gave me better speed on all GPU's but when I also vectorized the second I received better speed on GPU's on super but strange slow-down on my own card. speed wasn't better after vectorization function ComputeBlock in argon2d although it's better after vectorized memory access. I did also coalescing in argon2d but speed was worse. now I have argon2d without coalescing but it's easy to turn it on/off. don't know if my optimizations are fully because I couldn't understand how new addresses are computed in FillSegment() . priorities: - check if this slow-down on my laptop is caused by the size of the kernel - I will be working on makwa - I wrote somewhere on ml about MEM_SIZE/4. indeed auto-tune returns GWS properly after this division but I discovered some problems and I will investigate this more
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.