|
Message-ID: <CAKbZUD3=7Mrfz9sCWO6EK0m9TwWvvDKtUAt73bJBSjoOsOZCwQ@mail.gmail.com> Date: Thu, 29 Feb 2024 01:13:20 +0000 From: Pedro Falcato <pedro.falcato@...il.com> To: musl@...ts.openwall.com Subject: Re: [PATCH] add memcmpeq: memcmp that returns length of first mismatch On Thu, Feb 29, 2024 at 12:58 AM Robert Clausecker <fuz@....su> wrote: > > Greetings, > > Am Thu, Feb 29, 2024 at 12:10:05AM +0000 schrieb Thorsten Glaser: > > Pedro Falcato dixit: > > > > >Small note: This isn't quite true for remotely modern x86, unaligned > > > > It’s very much true, e.g. it breaks atomicity (ok, not relevant > > *here*, but in general). > > > > AIUI, even modern amd64 chips of all vendors are reverting to > > optimising rep movsb/lodsb instead again, for stringops. > > That is not the case. REP MOVSB and friends have a high startup latency, > so you only want to use them for large-ish blocks. Too large and all of > the sudden AVX-512 is faster again. For small blocks however, you do not > want to use this instruction. It's indeed much better to do a pair of > overlapping stores. They do not perform crazy well, but it's still better > than all alternatives. This. Plus the "FSRM" (fast short rep movsb) stuff is all fugazi - I'm yet to see a microarchitecture where my GPR-only memcpy (which resembles/resembled mjg@'s FreeBSD kernel memcpy, which you might've read before) doesn't completely beat out rep movsb under ~256 bytes. Oh n, and they're sometimes incredibly naive - the zen rep movsb's get into an awful fallback mode (which I suspect is either byte-by-byte or word-by-word) if you give it an unaligned buffer, it doesn't seem to ever attempt to use wider stores. Anyway, I digress, this is somewhat offtopic :) -- Pedro
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.