|
Message-ID: <8D0A71B710B04F3FB112AC070162F376@ath64dual> Date: Sat, 25 Jul 2009 20:06:02 -0500 From: "JimF" <jfoug@....net> To: <john-users@...ts.openwall.com> Subject: Re: Work (optimization) in progress, and some ideas > 1. Cache the length of "in" (or the previous command's "out") in a > variable across the rule commands in rules_apply(). This would > eliminate the need for most strlen() "calls". Indeed, strlen() is done > inline these days, but nevertheless the loops have a cost when the > strings are of non-negligible length. Also, it may let some strcpy()'s > and the like be replaced with memcpy(). Updating the cached length will > usually not involve a strlen(). Instead, it would be done based on the > previous length - e.g., 'd' and 'f' will double the previous length, > whereas '$' and '^' will increase it by 1. Length should never be computed, except to start with. Once you know the length of the starting string, you know what you are doing to it, so in the end, always know the length of the string. > 2. To make prepends and similar faster, reserve some space before the > start of the input buffer (that is, point "in" to the middle of a > buffer initially). Then prepends will be like "*(out = in - 1) = value;" > when there's sufficient space in the buffer. This is one big thing I saw, that the prepending code is most greatly impacted by (but so is double and other stuff). I would think allocate a buffer 5x the length of the current 'max' buffer, and then initially copy the string 2/5's of the way in. That way you have 2x buffer in front, and 2x buffer in the tail end. Then simply 'abort' if the length of the string grows to 'almost' 3x the length (or abort if any operation would make the string grow that over that 3x threashold. The string should not grow that far (if it was today, then it would be blowing buffers already). If done like this, then you simply need to keep track of the length of the string (per #1), and a pointer the the first character. I would also say, keeping the string null terminated would be a waste of time. Simply null terminate prior to returning the string. All work would then be done 'in-place' on the string. Any prepend would prepend, and move the pointer backwards. Appending would simply shove bytes on the end (both would of course increase the length of the string). If characters are removed I doubt it would make much difference in speed, between inplace and double buffer, but if everything was being done inplace, then I think the inplace method should be used, so there is no back and forth buffer adjusting at the end of the loop. > 3. Don't copy the input "word" to "in" when that is possible to avoid > without modifying "word" (tricky, the extra if's might outweigh the > performance advantage for short words). I saw some work here. You already have logic to flop around the in and out view of the word, so it is likely not too bad. > 4. Introduce string append/prepend/insert commands, like: > > A"string" - append > B"string" - prepend > CN"string" - insert in position N These would be WONDERFUL additions. I have about 12000 rules, which were gleaned off running cracked words against other dictionaries, looking for prefix and postfix strings, then sorting based upon how often a prefix is seen, then keeping the ones that were seen 3 or more times (in sorted order of course). I ended up with 12k rules. I built another program to convert those prefix and postfix strings into 'proper' rules for john (knowing that prefix has to be in reverse order, etc), and on it's output, it alternated prefix and postfix rules (there were more postfix rules, btw). That set of rules when run cracks a LOT of passwords, once the 'normal' john default rules have run, but it runs quite a bit slower, due to so much string moving stuff in john. If we had string append that would speed things up a lot. Appending with existing rules is one time through all of the logic of a rule for each letter, vs a simple one time run through the rule logic. No matter how much the prepend and append logic for one letter is optimized, it will not be faster than append of a word (string.) > In case the double-quote character is to be included in the string, any > other character may be used in its place, just like with the 's' command > of ed/sed/vi/perl: So, your rule would be: A!string! where ! is any unused character within the string? That type logic works fine for me. > A,string, - append a string that possibly contains the double-quote > character, but does not contain the comma. Well, I guess I should read ahead :) > Unfortunately, the 'P' and 'I' command characters are already taken. Do they have to be 1 letter? It could be something like: $.!string! and ^.!string! (again ! being any char not used in string) this would require $\. or ^\. (or $[.] ^[.] for the pre-processor) to append just a period. This way, the same 'prepend' 'append' rule semantics are used, keeping the 'learning' of new rules to a minimum. I am not sure if preprocessing rules would need messed with, as I don't think pre-generated strings would really benefit from the pre-processor. > Finally, I feel that it may be appropriate to rework the rules syntax > at some point - invent something more generic - but not now. The rules take a little time to figure out (reading some examples helps). When I have been working on rules, I will make very small test files such as: 1 123 pass PASS Pass 0pass And then run my new rules on that using stdout, to make sure that rejection logic, and the rules themselves work the way I want them to. It does take a little while to figure out the rules (which are VERY powerful), but they are not overly complex. Speaking of rules, I have found the rejection rules to be a little on the slow side. Part of this, is due to the rejection rule having to operate on each word, one at a time, and if you have many rules with the same rejection rule, john does not know that this word was rejected already, so checks it again. I have found if you are working on a wordlist where you will do numerous rules, all of which have the same rejection logic, and a lot of candidates get rejected by that logic, that using grep perl and/or awk on the wordlist to do the rejection prior to running john ends up being much faster. I am not sure we can optimize john a whole lot for this type of situation. Does anyone have any other ideas? With the wordfiles possibly being read into memory, this 'possibly' can be improved if john can detect such cases. Where I was doing this, was taking my 12000 prefix/append rules, and doing things to the existing word (UPCASE, lowcase, Case, etc) and then appending/prepending, or appending first, and then doing the extra logic. However, for ones like lowcase, I wanted to not deal with words already lowcase. But each rule had the exact same type /?L or /?U or (?u type rejection rule in it, depending upon which rule mangling was going to be done. But if there are only 5% of the words in a list that have chars other than lowercase, and you check each word for this 12000 times, that is a lot of overhead. The difference was a 5 hour run vs a few minute run, if I first grep'd out that words which the rejection rule would elminate. That is a pretty big difference. Sorry for the ramble. Jim -- To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply to the automated confirmation request that will be sent to you.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.