Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20201003223209.10307-1-vda.linux@googlemail.com>
Date: Sun,  4 Oct 2020 00:32:09 +0200
From: Denys Vlasenko <vda.linux@...glemail.com>
To: Rich Felker <dalias@...c.org>
Cc: Denys Vlasenko <dvlasenk@...hat.com>,
	musl@...ts.openwall.com
Subject: [PATCH] x86/memset: avoid performing final store twice

From: Denys Vlasenko <dvlasenk@...hat.com>

For not very short NBYTES case:

To handle the tail alignment, the code performs a potentially
misaligned word store to fill the final 8 bytes of the buffer.
This is done even if the buffer's end is aligned.

Eventually code fills the rest of the buffer, which is a multiple
of 8 bytes now, with NBYTES / 8 aligned word stores.

However, this means that if NBYTES *was* divisible by 8,
we store last word too, again.

This patch decrements byte count before dividing it by 8,
making one less store in "NBYTES is divisible by 8" case,
and not changing anything in all other cases.

CC: Rich Felker <dalias@...c.org>
CC: musl@...ts.openwall.com
Signed-off-by: Denys Vlasenko <dvlasenk@...hat.com>
---
 src/string/i386/memset.s   | 7 ++++---
 src/string/x86_64/memset.s | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/string/i386/memset.s b/src/string/i386/memset.s
index d00422c4..b1c5c2f8 100644
--- a/src/string/i386/memset.s
+++ b/src/string/i386/memset.s
@@ -47,7 +47,7 @@ memset:
 	mov %edx,(-1-2-4-8-8)(%eax,%ecx)
 	mov %edx,(-1-2-4-8-4)(%eax,%ecx)
 
-1:	ret 	
+1:	ret
 
 2:	movzbl 8(%esp),%eax
 	mov %edi,12(%esp)
@@ -57,13 +57,14 @@ memset:
 	mov %eax,-4(%edi,%ecx)
 	jnz 2f
 
-1:	shr $2, %ecx
+1:	dec %ecx
+	shr $2, %ecx
 	rep
 	stosl
 	mov 4(%esp),%eax
 	mov 12(%esp),%edi
 	ret
-	
+
 2:	xor %edx,%edx
 	sub %edi,%edx
 	and $15,%edx
diff --git a/src/string/x86_64/memset.s b/src/string/x86_64/memset.s
index 2d3f5e52..85bb686c 100644
--- a/src/string/x86_64/memset.s
+++ b/src/string/x86_64/memset.s
@@ -53,7 +53,7 @@ memset:
 2:	test $15,%edi
 	mov %rdi,%r8
 	mov %rax,-8(%rdi,%rdx)
-	mov %rdx,%rcx
+	lea -1(%rdx),%rcx
 	jnz 2f
 
 1:	shr $3,%rcx
-- 
2.25.0

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.