|
Message-ID: <bc3d8d98e5d64f9e8407290c68b21ed2@AcuMS.aculab.com> Date: Tue, 21 Apr 2020 15:31:08 +0000 From: David Laight <David.Laight@...LAB.COM> To: 'Adhemerval Zanella' <adhemerval.zanella@...aro.org>, Rich Felker <dalias@...c.org> CC: 'Nicholas Piggin' <npiggin@...il.com>, "libc-dev@...ts.llvm.org" <libc-dev@...ts.llvm.org>, "libc-alpha@...rceware.org" <libc-alpha@...rceware.org>, "linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>, "musl@...ts.openwall.com" <musl@...ts.openwall.com> Subject: RE: Powerpc Linux 'scv' system call ABI proposal take 2 From: Adhemerval Zanella > Sent: 21 April 2020 16:01 > > On 21/04/2020 11:39, Rich Felker wrote: > > On Tue, Apr 21, 2020 at 12:28:25PM +0000, David Laight wrote: > >> From: Nicholas Piggin > >>> Sent: 20 April 2020 02:10 > >> ... > >>>>> Yes, but does it really matter to optimize this specific usage case > >>>>> for size? glibc, for instance, tries to leverage the syscall mechanism > >>>>> by adding some complex pre-processor asm directives. It optimizes > >>>>> the syscall code size in most cases. For instance, kill in static case > >>>>> generates on x86_64: > >>>>> > >>>>> 0000000000000000 <__kill>: > >>>>> 0: b8 3e 00 00 00 mov $0x3e,%eax > >>>>> 5: 0f 05 syscall > >>>>> 7: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax > >>>>> d: 0f 83 00 00 00 00 jae 13 <__kill+0x13> > >> > >> Hmmm... that cmp + jae is unnecessary here. > > > > It's not.. Rather the objdump was just mistakenly done without -r so > > it looks like a nop jump rather than a conditional tail call to the > > function that sets errno. > > > > Indeed, the output with -r is: > > 0000000000000000 <__kill>: > 0: b8 3e 00 00 00 mov $0x3e,%eax > 5: 0f 05 syscall > 7: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax > d: 0f 83 00 00 00 00 jae 13 <__kill+0x13> > f: R_X86_64_PLT32 __syscall_error-0x4 > 13: c3 retq Yes, I probably should have remembered it looked like that :-) ... > >> I also suspect it gets predicted very badly. > > > > I doubt that. This is a very standard idiom and the size of the offset > > (which is necessarily 32-bit because it has a relocation on it) is > > orthogonal to the condition on the jump. Yes, it only gets mispredicted as badly as any other conditional jump. I believe modern intel x86 will randomly predict it taken (regardless of the direction) and then hit a TLB fault on text.unlikely :-) > > FWIW a syscall like kill takes global kernel-side locks to be able to > > address a target process by pid, and the rate of meaningful calls you > > can make to it is very low (since it's bounded by time for target > > process to act on the signal). Trying to optimize it for speed is > > pointless, and even size isn't important locally (although in > > aggregate, lots of wasted small size can add up to more pages = more > > TLB entries = ...). > > I agree and I would prefer to focus on code simplicity to have a > platform neutral way to handle error and let the compiler optimize > it than messy with assembly macros to squeeze this kind of > micro-optimizations. syscall entry does get micro-optimised. Real speed-ups can probably be found by optimising other places. I've a patch i need to resumbit that should improve the reading of iov[] from user space. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.