|
Message-ID: <164a70f9994a98a504e01e6f26dd6be3@smtp.hushmail.com> Date: Wed, 16 Sep 2015 02:43:42 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: Judy array On 2015-09-16 01:09, Solar Designer wrote: > On Wed, Sep 16, 2015 at 12:43:44AM +0200, magnum wrote: >> Also, I don't observe any gain from disabling mmap and only minimal gain >> from using --mem=0 when mmap is enabled (I stopped using -mem after mmap >> was implemented). > > What does it mean when with mmap I am getting "Each node loaded 1/8 of > wordfile to memory (about 15 MB/node)"? Doesn't mmap imply that each > node has the full wordlist mapped into its address space? > > In fact, without mmap I am getting "Each node loaded the whole wordfile > to memory". Doesn't not using mmap enable easy and efficient loading of > portions of the wordlist into each node's memory? > > This looks backwards to me. Can you explain? It is backwards for sure. It grew organically. I'll be stating a few obvious (for you) things below, to explain for a broader audience. Before mmap and MPI/fork, we would either just fgetl() each line, or use a memory buffer. The latter would load the whole file into a contiguous buffer once, and then modify that buffer (eg. replace \n with null). Also, index pointers was set up to point to each word. So we could immediately get word number 12345 using a pointer to it. This was mostly meant for -rules but that initial load proved to be faster even without rules IIRC. So far, things were pretty sane. Then, with MPI, came some messy code that could do the above but only for "my words" for a multi-node run. That was implemented on a leap-frog (or should I say round-robin) basis, so we wouldn't end up with 200,000 short words for one node, and 40,000 long words for an other. But it also had to take into account edge cases like "just a few words, and a humongous number of rules" or vice versa. From this point it went downhill. Then I implemented mmap and dropped that other buffer for a while. The beauty of mmap is it's shared between processes (and not just forks but any processes that use the same files) and I was hoping to do without the other buffer. But unfortunately our mapped memory is read-only... so we can't prepare it and just point to ready-to-use words. Instead, I implemented an "mgetl()" that works just like fgetl() but reads from the mmap instead of the file. BTW it's SIMD capable (using our pseudo intrinsics), pretty damn fast scanning for next newline. It's nearly as fast as the old mem buffer, much more straightforward and potentially uses much less memory, BUT we can't suppress dupes. Loopback mode *really* needs dupe suppression. So I re-enabled the simpler (whole wordlist) version of memory buffer on top of the mmap but it's really mostly meant for loopback. Oh, and there's also encodings... if we do use the memory buffer and need re-encoding, we obviously only do that once, when preparing. I can't even remeber all details. This is by far the messiest source file throughout the Jumbo tree. It's just that everything works pretty good and pretty fast, so I'm a bit afraid of touching it. But what we should do, is completely separate loopback mode from wordlist mode. Loopback mode should be it's own code. Then we should simplify wordlist mode, eg. drop support for full dupe suppression and some other crazy things. BTW another idea is to load, prepare and index a (non-mmap) buffer before forking. If/when we're re-writing wordlist.c, we really should set the goals beforehand... and stick to them. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.