Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 29 Feb 2024 01:13:20 +0000
From: Pedro Falcato <pedro.falcato@...il.com>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] add memcmpeq: memcmp that returns length of first mismatch

On Thu, Feb 29, 2024 at 12:58 AM Robert Clausecker <fuz@....su> wrote:
>
> Greetings,
>
> Am Thu, Feb 29, 2024 at 12:10:05AM +0000 schrieb Thorsten Glaser:
> > Pedro Falcato dixit:
> >
> > >Small note: This isn't quite true for remotely modern x86, unaligned
> >
> > It’s very much true, e.g. it breaks atomicity (ok, not relevant
> > *here*, but in general).
> >
> > AIUI, even modern amd64 chips of all vendors are reverting to
> > optimising rep movsb/lodsb instead again, for stringops.
>
> That is not the case.  REP MOVSB and friends have a high startup latency,
> so you only want to use them for large-ish blocks.  Too large and all of
> the sudden AVX-512 is faster again.  For small blocks however, you do not
> want to use this instruction.  It's indeed much better to do a pair of
> overlapping stores.  They do not perform crazy well, but it's still better
> than all alternatives.

This. Plus the "FSRM" (fast short rep movsb) stuff is all fugazi - I'm
yet to see a microarchitecture where my GPR-only memcpy (which
resembles/resembled mjg@'s FreeBSD kernel memcpy, which you might've
read before) doesn't completely beat out rep movsb under ~256 bytes.
Oh n, and they're sometimes incredibly naive - the zen rep movsb's get
into an awful fallback mode (which I suspect is either byte-by-byte or
word-by-word) if you give it an unaligned buffer, it doesn't seem to
ever attempt to use wider stores.

Anyway, I digress, this is somewhat offtopic :)

-- 
Pedro

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.