|
Message-ID: <FE2BC24A82354B5CAA14EF0DCDFB7591@ath64dual> Date: Sat, 5 Sep 2009 21:11:29 -0500 From: "JimF" <jfoug@....net> To: <john-users@...ts.openwall.com> Subject: Re: Thoughts and questions on creation of a 'generic' MD5 hash set format (to handle 'all' of them) Since the initial release, I have tightened up the coding for the 'generic' md5 module, and added some new functionality, and sped things up (and fixed a nasty 'bug' in the non-salted hashes, which caused them to behave in a 'salted' way, slowing them way down). One of the main speed up's was putting keys directly into the #1 input buffer instead of into an array of strings made to hold the keys. This can ONLY be done under a certain set of circumstances, and the code which loads the function 'primatives' into the array checks for all of these condions. Doing this, allows us to not have an extra buffer load (and various other performance penalties). Things like md5($s.$p) could never load the keys directly into the input, since they are not the first part of the string anyway. This speedup, allows john to run md5($p) as fast (faster actually on some systems), than the original 'hand coded' raw-md5. It also provides about a 10% speedup (or 5% if 2 md5's are done, 3.3% if 3 md5s done, etc) for the formats where this 'option' is valid. About 1/2 of the formats worked with this improvement. md5-gen is now working with bench testing (-test) and a -subformat=md5_gen(#) command line option has also been added. Now, these work just fine: john -test john -test -format=md5-gen (tests md5_gen(0) or md5($p) ) john -test -format=md5-gen -subformat=md5_gen(4) (tests the OSC or md5($s.$p) ) The phpass code was also added to md5-gen. This was done by making 2 primatives specifically for phpass (phpasssetup and phpasscrypt), and there were 3 different base md5-gen functions built. When the md5-gen goes into phpass mode, it first attaches the 3 different base functions (salt, set_salt, salt_hash), and then runs. The phpasscrypt function is simply: void MD5GenBaseFunc__PHPassCrypt() { unsigned Lcount = 1<<atoi64[ARCH_INDEX(cursalt[8]); MD5GenBaseFunc__clean_input(); MD5GenBaseFunc__append_salt(); MD5GenBaseFunc__append_keys(); MD5GenBaseFunc__crypt_to_input_raw(); MD5GenBaseFunc__append_keys(); while (--Lcount) // note last crypt not done in this loop. MD5GenBaseFunc__crypt_to_input_raw_Overwrite_NoLen(); // last crypt is done to the output buffer. MD5GenBaseFunc__crypt(); } That function is written to use all 'primative' functions, just like it would if it would have been loaded as an array of function pointers. The reason it was not, is there is no way in the current language to load variables, other than the count of keys being a global, and having 2 input 'buffers' and 2 output buffers. There also is no looping or logical 'constraints' within the simple language, so building a function like this was my best bet. NOW, once that was done (the phpass hashes working), I wanted to see just what would be involved in hooking the md5-gen code to be used by other *_fmt.c files which use md5. I started by seeing what was needed to get this to work for the existing phpass_fmt code. I found that it was VERY easy. The code is written in C, but I simply had to step back a bit, and ask myself what would be required to do this in C++ (a language I MUCH prefer over C). In building a building a set of polymorphic classes in C++, one method, is to build a good strong base class (like the md5-gen 'class'), and then derive classes from that, which define a small amount of code, but use most of what was already coded for in the base class. Now, this is C where we do not have language help to do this, so What I did, is use the fmt_main structures to do this. One of the key parts of the fmt_main is the function pointers. I simply created a fmt_main for the phpass, and set it to have a init and valid function. Within the init, I actually 'do' something. In there, assign all of the other function pointers in the phpass fmt_main structure to point to the functions listed in the fmt_main of the md5-gen (I built a function in md5-gen-fmt.c which does this 'linkage' for you). Then, besides the valid, and init, there are 2 other functions which are 'required' within the phpass_fmt code. These are salt and binary. Now, within valid, all of the 'original' validation code is there. However, the 'last' step I do, is to call valid from the md5-gen code. However, I have to 'convert' from $P$9ssssssssxxxxxxxxxxxxxxxx format into md5_gen(17)xxxxxxxxxxxxxx$ssssssss9 format. I created a function that does this in a single sprintf step. Now, within the salt and binary, the ONLY thing they do, is to convert from phpass syntax, into the md5_gen(17) syntax, and call the binary (or salt) function within md5-gen. That is all. The phpass_fmt.c file is now very thin. It has a fmt_main structure, that fills out very little at compile time. It then has a conversion function, and has a valid/init/salt/binary functions. We could easily do this for most any md5() type format. Thus, running on a file full of 'raw-md5' hashes would work just fine (if raw-md5 'links' to md5-md5), and the end user will see no difference, however, the code acually DOING the work, is localized in the md5-gen-fmt.c file, and thus when new hardware is coded for, that is the ONLY area which needs to be tweaked. So, when say CUDA gets added to md5-gen, then ALL formats that use those base objects will now be CUDA sped up, without having to spend the time to port each and every format. I hope to have a -v2 of the generic md5 patch out this weekend, since it is a 3 day weekend. Jim.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.