|
Message-ID: <07FC1EE7269740FDA0A1CA6D5B489DAC@D9VGLK61> Date: Mon, 9 May 2011 07:28:10 -0500 From: "JimF" <jfoug@....net> To: <john-dev@...ts.openwall.com> Subject: Re: John core change patch (and md5-gen, etc) ----- Original Message ----- From: "bartavelle" Sent: Monday, May 09, 2011 5:07 AM > How do you want to work from here ? Is that enough for you or do you > want me to patch some more ? I believe a linux-x86-64-icc target would > be handy. I started in on this last night also, and did almost exactly the same things you did (commenting out blocks in md5_gen which simply are not ready, etc), to get things running, except for the new change you did to add 'init' or not init. I think your init/non-init will be all that is required to get SSE and PARA SSE working for 2 block data. The uglyness in 2 block data comes when you are 'close', and have longer PW's that push you over the limit. Then, if you have a couple buffers under 56 bytes, and a couple at or just over, it gets ugly. In that case, SSE is pretty much out for that block, unless you can pull in data from other blocks, which becomes very tricky, and you end up losing ALL speed benefits. It 'may' be possible to run these mixed blocks, harvesting off the results for the COEF values that are shorter than 56 bytes, then running the 2nd loop of SSE, and get the results of the other COEF. Still takes 2 loops if any are over 55 bytes, but in the end, all residues are correct, which is what matters. The easiest is to simply handle SSE cases where all items in the block are 55 bytes or less, or all items in the block are 56 bytes to 119 using SSE, while all other cases where there are mixed sizes get processed using MD5_go2 or openssl. I will have to see how much of a change it is. There are only a few places within md5_gen where the actual crypt functions are called, and they are all exactly the same (except they take different input and output values). Getting it right in one of them, and then it becomes almost a cut and paste to get them all working properly. The reason there are so many failures right now for PARA builds in md5_gen, is that I made changes to the .S sse/mmx blocks, so that processing would fall through to the md5_go code, if the data was too long. Prior versions simply bailed out for these formats totally if built for MMX instructions. The new version will do what work it can do uisng mmx, and then fall back to generic code, if the data is too long. However, that code was not put into the PARA blocks, and some md5_gen processing instructions that allow the format writer to tell the runtime to switch data back and forth from Any to SSE. Thus the PARA does not know how to do this, and simply still has the old behavior of aborting out for formats the go over 55 bytes long. As for speed diff of using PARA vs non PARA on 64 bit gcc, it is much less beneficial now (at least in md5_gen), since I switched over to using the code from md5_std.c and use the MD5_X2 logic also. 64 bit gcc 'generic' is running as fast as 32 bit SSE2 builds (or close). Yes, the PARA=2 for gcc does get a little bump over that, but not a significant amount. But I will continue forward getting the port right, and if PARA is the right choice for all x86 64 bit compilers, then it will be the choice provided. As for x86-64.h, for the *_PARA_SSE sellection, we should have that in #ifdef's. For gcc it is 2, but icc it is 3 IIRC. What was clang set at? Jim.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.