Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+TsHUBxHaMfE9TYLQ6y2HNGHdrr05ABZ5YxkJANuHrQU6or7Q@mail.gmail.com>
Date: Thu, 18 Oct 2012 22:21:11 +0530
From: Sayantan Datta <std2048@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: bitslice DES on GPU

On Thu, Oct 18, 2012 at 11:47 AM, Solar Designer <solar@...nwall.com> wrote:

> It would imply something other than bitslicing for K as well.  And the
> S-boxes would be represented differently, too.  But I think you should
> stop thinking in this direction.  I don't expect there's another useful
> representation inbetween bitslicing and straightforward table lookups.
>

The primary reason for thinking in direction is to decrease the  load on
registers and local memory so that there are more number of inflight
wavefront. Also decreasing the size of B[] to 16 ints would almost ensure
that it really stays in private register space. Right now I doubt the B[]
arrays are stored in register address space because each VGPR is only
256bit wide. However that depends entirely on how the VGPRs are used. For
SIMD execution I think it is more logical to assume that each VGPR is
loaded with data from 16 different kernels and not from only one kernel.
Still if the array is not in register space we could be loosing lots of
performance.  Also it could be possible to use 4 array of 16 ints to
represent the B[] array of 64 ints. But the indirect addersing using the 96
index array is causing problem.
       Since the 96 index array is almost constant the indexing could be
done prior to execution. However it might require some radical approach
like pre compiling the kernels manually.

Regards,
Sayantan

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.