musl - Re: Thinking about release

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPfzE3YDFjqHxRaZFeiy0CvbYWYGKzgDGEp-71xSz-03GhNTxw@mail.gmail.com>
Date: Thu, 11 Jul 2013 10:44:16 +1200
From: Andre Renaud <andre@...ewatersys.com>
To: Andre Renaud <andre@...ewatersys.com>
Cc: musl@...ts.openwall.com
Subject: Re: Thinking about release

> This results in 95MB/s on my platform (up from 65MB/s for the existing
> memcpy.c, and down from 105MB/s with the asm optimised version). It is
> essentially identically readable to the existing memcpy.c. I'm not
> really famiilar with any other cpu architectures, so I'm not sure if
> this would improve, or hurt, performance on other platforms.

Reviewing the assembler that is produced, it appears that GCC will
never generate an ldm/stm instruction (load/store multiple) that reads
into more than 4 registers, where as the optimised assembler does them
that read 8 (ie: 8 * 32bit reads in a single instruction). I've tried
various tricks/optimisations with the C code, and can't convince GCC
to do more than 4. I assume that this is probably where the remaining
10MB/s is between these two variants.

Rich - do you have any comments on whether either the C or assembler
variants of memcpy might be suitable for inclusion in musl?

Regards,
Andre

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.