Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200115192215.GJ30412@brightrain.aerifal.cx>
Date: Wed, 15 Jan 2020 14:22:15 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH 2/2] Add big-endian support to ARM assembler memcpy

On Wed, Jan 15, 2020 at 10:41:08AM -0800, Andre McCurdy wrote:
> On Wed, Jan 15, 2020 at 7:46 AM Rich Felker <dalias@...c.org> wrote:
> > On Fri, Sep 13, 2019 at 01:38:34PM -0700, Andre McCurdy wrote:
> > > On Fri, Sep 13, 2019 at 11:59 AM Rich Felker <dalias@...c.org> wrote:
> > > > On Fri, Sep 13, 2019 at 11:44:32AM -0700, Andre McCurdy wrote:
> > > > > Allow the existing ARM assembler memcpy implementation to be used for
> > > > > both big and little endian targets.
> > > >
> > > > Nice. I don't want to merge this just before release, but as long as
> > > > it looks ok I should be able to review and merge it afterward.
> > > >
> > > > Note that I'd really like to replace this giant file with C using
> > > > inline asm just for the inner block copies and C for all the flow
> > > > control, but I don't mind merging this first as long as it's correct.
> > >
> > > Sounds good. I'll wait for your feedback after the upcoming release.
> >
> > Sorry this dropped off my radar. I'd like to merge at least the thumb
> > part since it's simple enough to review quickly and users have
> > actually complained about memcpy being slow on armv7 with -mthumb as
> > default.
> 
> Interesting. I wonder what the reference was against which the musl C
> code was compared? From my own benchmarking I didn't find the musl
> assembler to be much faster than the C code. There are armv6 and maybe
> early armv7 CPUs where explicit prefetch instructions make a huge
> difference (much more so than C -vs- assembler). Did the users who
> complained about musl memcpy() compare against a memcpy() which uses
> prefetch? For armv7 using NEON may help, although the latest armv7
> cores seem to perform very well with plain old C code too. There are
> lots of trade offs so it's impossible for a single implementation to
> be universally optimal. The "arm-mem" routines used on Raspberry Pi
> seem to be a very fast for many targets, but unfortunately the armv6
> memcpy generates mis-aligned accesses so isn't suitable for armv5.
> 
>   https://github.com/bavison/arm-mem/

I'm not sure of the details but the comparison was just between the
armv6 version of Alpine and the armv7 version (so using musl's
memcpy_le.S vs memcpy.c).

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.