|
Message-Id: <CDC48B48-7FF5-42E2-A9E0-666FFFA9E939@gmail.com> Date: Sat, 19 Sep 2020 18:44:17 -0700 From: Fred Wang <waffle.contest@...il.com> To: john-dev@...ts.openwall.com Subject: ring, etc Thanks for looking at rling. I spent a couple of weeks looking at the rli program, and made a number of improvements. I am concerned about a couple of things you said on twitter, though. The default operation of rling is using a hash, and always keeps input line order. In fact, I go to great pains on that. The number of threads in operation does not, in any way, affect line ordering. If you have found a case that you think it does, I would sure like to see it so it can be fixed. rling -b changes the operation to a binary search, rather than a hash. This still will not change line order on output, unless the -s switch is also given (in which case, it uses lexical sorted order on output). It’s quite fast on sorting, usually beating gnu sort by a large margin (several times, depending on size). In addition, it checks sort order on input, which means that rling -b -s on an already sorted file is blindingly fast (7.5 seconds to read and write a 1 billion line file on my development system, including the “sort”). rling -2 requires that all files are sorted already (and produces a proper error message if they aren’t). rling -f uses a file-based “virtual memory” system, and should be used as a last resort on systems with limited memory, and large files). Would you be able to put your test file up somewhere, so I can snag it? For my 1 billion line file, I generated it with a perl script: ‘for ($x=0; $x<1000000000; $x++) {print “$x\n”;}’ Thanks!
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.