|
Message-ID: <CAKGDhHX1EVCOiE0sALrvOJcFHbttZMecmpBsHEd9hhq4e_QKUg@mail.gmail.com> Date: Mon, 24 Aug 2015 01:52:35 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU 2015-08-23 8:15 GMT+02:00 Solar Designer <solar@...nwall.com>: > While private memory might be larger and faster on specific devices, I > think that not making any use of local memory is wasteful. By using > both private and local memory at once, we should be able to optimally > pack more concurrent Argon2 instances per GPU and thereby hide more of > the various latencies. why will we pack more argon2 per gpu using both types of memory? I'm using only very small portions of private memory. BTW in my vectorized kernels shuffling between two groups of argon rounds takes very long time so I did something that I grouped kernel instances to 4 and I'm interleaving data to this local memory and I can avoid shuffling but in my laptop I can gain 3k c/s for LWS=8 so no speedup. (4k is in bleeding-jumbo branch) but I think this is not what you mean here I uploaded this to branch interleaving4 (argon2d only) I updated vector8 branch and created vector16 some time ago
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.