Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTimXEh4jZbVJmrZPP5mRFmUsnftsooX9vOfKDsyH@mail.gmail.com>
Date: Mon, 13 Sep 2010 10:58:34 -0700
From: Alain Espinosa <alainesp@...il.com>
To: john-users@...ts.openwall.com
Subject: Re: NTLM and OpenCL 1.0

This is a more "polish" version. In my tests there are a speedup of 6x
(not all john, only transfer and crack). Erik may test this for a more
real benchmark. John in cygwin continue to not work, i test the code
in Windows with a simple program.

All testing result in Windows 7 Ultimate 64 bits, Core 2 Duo 2.1GHz
T8100, Nvidia Quadro FX 3600M. Keys of length 7. M = millions
passwords/sec

<<<Executing kernel only>>>
Time: 2075, c/s: 126.0 M
Execut kernel: 2028

How good is that? ighashgpu_v062 performs at 195 M, taking into
account that this kernel access a lot of global memory and the
work-items-size its reduced (for low memory use) i think its nearly
optimum.

<<<Executing nt_crypt_all_opencl>>>
Time: 5912, c/s: 44.0 M
Transfer keys: 2309, Bandwidth: 886MB/sec
Execut kernel: 1827
Transfer bbbs: 1729, Bandwidth: 592MB/sec

As you can see the transfer affect the performance a lot. I do not
know why the bandwidth its so bad when Nvidia say that up to 5GB/sec
can be reached. I think optimizations need to be made here to reduce
transfers or use concurrent memory copy & execute.

<<<Executing 2 instances at the same time>>>
// Instance 1-----------------------------------------------------
Time: 7270, c/s: 36.0 M
Transfer keys: 2106, Bandwidth: 972MB/seg
Execut kernel: 2612
Transfer bbbs: 2505, Bandwidth: 408MB/seg
// Instance 2-----------------------------------------------------
Time: 7223, c/s: 36.0 M
Transfer keys: 1653, Bandwidth: 1238MB/seg
Execut kernel: 2199
Transfer bbbs: 3340, Bandwidth: 306MB/seg
-------------------------------------------------------------------

As can be seen this approach can be use for optimum use of GPU. Note
that my GPU support concurrent memory copy & execute.

As can be seen in the kernel code, i use little and big endian
detection. There is more types of "endianness" in general opencl
devices? OpenCL specification only mention this 2.

saludos,
alain

Download attachment "john-1.7.6-jumbo-7-ntopencl-2.diff" of type "application/octet-stream" (26101 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.