|
Message-ID: <20170620041429.zjmzwpeyycwwpcvr@voyager> Date: Tue, 20 Jun 2017 06:14:29 +0200 From: Markus Wichmann <nullplan@....net> To: musl@...ts.openwall.com Subject: Re: Query regarding malloc if statement On Mon, Jun 19, 2017 at 09:02:00PM +0000, Jamie Mccrae wrote: > My understanding is that doing a read followed by a possible write is slower than always doing a write for the reason that upon doing a read the process will halt > until the memory is brought into the CPU's cache which isn't a problem when just doing a write. I've just thrown together a simple application to test this (testing on a modern PC running alpine linux 64-bit in a virtualbox VM with 512MB RAM and 1 CPU core) with a normal musl library and a modified one whereby I've removed the 'if' check: > Woah, you're mixing up a few things here. A cache miss and a page fault are two very different things. Besides, doesn't a cache miss on write mean that a cache-line for the write area has to be allocated first? > #include <time.h> > #include <stdlib.h> > #include <stdio.h> > #include <stdint.h> > > void TimedFunc() > { > uint32_t loops = 64; > uint32_t *ptr; > while (loops > 0) > { > ptr = calloc(64, 2); > free(ptr); > --loops; > } > } > > void main() > { > clock_t stime, etime; > stime = clock(); > > uint32_t runs = 0; > while (runs < 16384) > { > TimedFunc(); > ++runs; > } > > etime = clock(); > printf("%d loops in %d ms\r\n", runs, ((etime - stime) * 1000 / CLOCKS_PER_SEC)); > } > Hmm... looks about right (except for "void main", but let's not be pedantic here). But, as I said, the whole thing only works if brk() is disabled. If you don't want to recompile your kernel, you can use a seccomp filter to disallow that system call. This forces musl to fall back to allocating heap with mmap(). Also, you are allocating 128 bytes, which is too small to trigger the effect. Try 100kB (if my maths did not fail me, for a 32-bit platform the mmap threshold is at 112kB, and for a 64-bit platform it is twice that, so 100kB is well below that). > > Results are 74-148ms for the normal library and 70-72ms when the if statement is removed (about twice as fast). I've also got am original raspberry pi with a single CPU and have alpine linux on that so I've performed the same test using 32 loops, calloc(32, 2) and 8192 loops instead and see a similar result although it's much closer 411-412ms for the normal library and 405-408ms when the if statement is removed. Interesting. So it appears to not be beneficial, time-wise, for small allocations. > Surely a page fault will occur when attempting to read memory not writing it, it doesn't need to bring the page into the cache if no read is taking place therefore a page fault will not occur? No, not really. See, if Linux is doing the right thing, then it will always have a zero page handy. If an application requests memory via mmap() with anonymous pages, what Linux should do is write into the page tables in the CPU-facing bytes that the pages exist and all point to the zero page and are read-only. In the OS-facing bits, it needs to record that those pages are copy-on-write, of course. Then a read of those pages will return bytes from the zero page (so always zero), and a write will cause a page fault. Linux will of course handle that page fault by allocating a fresh physical page and copying the zero page there and rewriting the page tables and invalidating the page table cache. Before continuing the program. Of course, I don't know if Linux really does that. It might just answer a request for memory with completely inaccessible pages that cause a fault as soon as they are accessed in any way. The interface would be fulfilled either way. Oh, and the CPU cache doesn't have anything to do with this. The page fault mechanism is so slow that a cache miss or two make no odds here. Ciao, Markus
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.