Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 29 Feb 2024 01:57:48 +0100
From: Robert Clausecker <fuz@....su>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] add memcmpeq: memcmp that returns length of first
 mismatch

Greetings,

Am Thu, Feb 29, 2024 at 12:10:05AM +0000 schrieb Thorsten Glaser:
> Pedro Falcato dixit:
> 
> >Small note: This isn't quite true for remotely modern x86, unaligned
> 
> It’s very much true, e.g. it breaks atomicity (ok, not relevant
> *here*, but in general).
> 
> AIUI, even modern amd64 chips of all vendors are reverting to
> optimising rep movsb/lodsb instead again, for stringops.

That is not the case.  REP MOVSB and friends have a high startup latency,
so you only want to use them for large-ish blocks.  Too large and all of
the sudden AVX-512 is faster again.  For small blocks however, you do not
want to use this instruction.  It's indeed much better to do a pair of
overlapping stores.  They do not perform crazy well, but it's still better
than all alternatives.

Also note that REP LODSB is pretty useless; did you perhaps mean REP STOSB?

Source: have spent a good part of last year implementing <string.h> in
x86 assembly for FreeBSD's libc.

> Of course the status on other architectures should be sufficient to
> not use unaligned accesses.

Yours,
Robert Clausecker

-- 
()  ascii ribbon campaign - for an encoding-agnostic world
/\  - against html email  - against proprietary attachments

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.