|
Message-ID: <Zd_WjNcFLBWFkAIG@fuz.su> Date: Thu, 29 Feb 2024 01:57:48 +0100 From: Robert Clausecker <fuz@....su> To: musl@...ts.openwall.com Subject: Re: [PATCH] add memcmpeq: memcmp that returns length of first mismatch Greetings, Am Thu, Feb 29, 2024 at 12:10:05AM +0000 schrieb Thorsten Glaser: > Pedro Falcato dixit: > > >Small note: This isn't quite true for remotely modern x86, unaligned > > It’s very much true, e.g. it breaks atomicity (ok, not relevant > *here*, but in general). > > AIUI, even modern amd64 chips of all vendors are reverting to > optimising rep movsb/lodsb instead again, for stringops. That is not the case. REP MOVSB and friends have a high startup latency, so you only want to use them for large-ish blocks. Too large and all of the sudden AVX-512 is faster again. For small blocks however, you do not want to use this instruction. It's indeed much better to do a pair of overlapping stores. They do not perform crazy well, but it's still better than all alternatives. Also note that REP LODSB is pretty useless; did you perhaps mean REP STOSB? Source: have spent a good part of last year implementing <string.h> in x86 assembly for FreeBSD's libc. > Of course the status on other architectures should be sufficient to > not use unaligned accesses. Yours, Robert Clausecker -- () ascii ribbon campaign - for an encoding-agnostic world /\ - against html email - against proprietary attachments
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.