|
Message-ID: <00b701ccdd3a$d7b33770$8719a650$@net> Date: Fri, 27 Jan 2012 15:30:07 -0600 From: "jfoug" <jfoug@....net> To: <john-dev@...ts.openwall.com> Subject: RE: unique using more than 2 GB RAM >We should make "unique" able to use more than 2 GB of memory. As it >turns out, "unique" at 2 GB is about twice slower than "sort -u -S 14G" >(on a 16 GB RAM machine), although of course this may vary by input >data. Maybe "unique" should start using 40-bit offsets (good for up to >1 TB of RAM). Unique in Jtr will almost always be slower than sort | uniq type of work. It is much harder to search/unique on very large worksets, than sort and a brain dead search (simple compare) > I am concerned that it will become slightly less >efficient (in terms of both speed and memory usage) when this new >functionality is not being made use of, though. However, if we did this with a --hugefile (or some switch, or set of switches), then for sure, we can get it faster in large memory than it is today. Also, if we had a --tinyfile then we could also optimize the other way (making smaller files faster). I do not remember if we do a file length, prior to computing hash table size (I am not next to the code right now), then a --hugemem or --hugefile would simply tell existing code that it is OK to use a few larger items. I really think, that is about all there should be to changing it. But I agree, this is a pretty good 'wish list' item. I know there are people with 100gb very dirty wordlists. Even using the max memory john's unique uses now, that is many times run through. The fewer of those 'large' block re-runs, the faster overall. Remove as much of that ^2 from the O(n^2) part. Jim.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.