|
Message-ID: <20111104003731.GB8202@openwall.com> Date: Fri, 4 Nov 2011 04:37:31 +0400 From: Solar Designer <solar@...nwall.com> To: owl-dev@...ts.openwall.com Subject: Re: %optflags for new gcc On Wed, Oct 26, 2011 at 08:52:14AM +0400, Solar Designer wrote: > We may want to adjust our %optflags now (in .rpmmacros). I am thinking > of two changes: > > 1. Use -Os instead of -O2. BTW, the kernel already builds with -Os. > In my (limited) testing, -Os results in significantly smaller code that > is about as fast as -O2's (and is sometimes faster than -O2's). I am not going to do this now. It is not obvious to me whether "-Os -fomit-frame-pointer" or -O2 (with -fomit-frame-pointer implied) is more appropriate for a distro. > 2. Drop our -mpreferred-stack-boundary* options, which arguably break > the ABI and are more likely to actually break things now that gcc > generates SSE2 instructions (-msse2 is implied on x86_64). I was almost going to do this, mostly out of concern that new glibc would use SSE2 a lot and expect the stack to be properly aligned for that, but then I found this in glibc-2.14's sysdeps/i386/Makefile: # Most of the glibc routines don't ever call user defined callbacks # nor use any FPU or SSE* and as such don't need bigger %esp alignment # than 4 bytes. # Lots of routines in math will use FPU, so make math subdir an exception # here. ifeq ($(subdir),math) sysdep-CFLAGS += -mpreferred-stack-boundary=4 else ifeq ($(subdir),csu) sysdep-CFLAGS += -mpreferred-stack-boundary=4 else sysdep-CFLAGS += -mpreferred-stack-boundary=2 # Likewise, any function which calls user callbacks uses-callbacks += -mpreferred-stack-boundary=4 # Likewise, any stack alignment tests stack-align-test-flags += -malign-double -mpreferred-stack-boundary=4 endif endif So glibc itself is aware that stack alignment has its performance cost and it tries to avoid it. And it uses its own string functions too, which seems to imply that those should be designed to work with -mpreferred-stack-boundary=2 in the caller as well. Indeed, the SSE2-using versions of strlen() and memset() under sysdeps/i386/i686/multiarch/ appear not to make larger than 4-byte accesses to the stack (they do to non-stack locations). Thus, it appears that we can continue to use -mpreferred-stack-boundary=2 when building source code for our binaries without SSE2 enabled (no -msse2, no -march=... that would imply SSE2). As to libraries, things are trickier: some might have callbacks, and we have no control over how third-party binaries are compiled (some may expect their called back functions to have 16-byte stack alignment). Luckily, glibc specifically appears to take care of this problem as above, but for other libraries we have to choose the -mpreferred-stack-boundary=... setting for them in their entirety (since we can't afford to spend time on separating their callback-possible code paths from others, and such activity on our part would be error-prone). Most likely, I will keep our current -mpreferred-stack-boundary=2 in %optflags_bin_i686, but drop our current -mpreferred-stack-boundary=3 from %optflags_lib_i686 (so it will default to "4" instead). Other options I am considering: 3. -Wl,-z,relro I am doing a test build now, no problems so far. This is so-called "partial RELRO", which should have no performance impact. We may also do "full RELRO" for specific programs that are security but not performance-critical (I am thinking network services). Many other distros are using the same approach. http://isisblogs.poly.edu/2011/06/01/relro-relocation-read-only/ 4. -fstack-protector --param=ssp-buffer-size=2 Fedora is using "--param=ssp-buffer-size=4", the default is 8. In my testing, going to 2 or even to 1 adds very few functions to those for which checks are enabled, compared to the default of 8. However, -fstack-protector-all would add a whole lot more of them. So plain -fstack-protector with a very low setting of --param=ssp-buffer-size seems like the best choice. The value 2 specifically represents the smallest reasonable NUL-terminated string buffer, e.g. in code like: char s[2]; sprintf(s, "%d", n); where "n" is expected to be in the range of 0 to 9, but might not. Problems with this: A. -fstack-protector needs to be passed at link time as well, and linking must be done via gcc as well. Right now, this is not true for many of our packages. In my test build with -fstack-protector on x86_64, 63 out of 153 packages failed to build, with errors like: sysutil.o: In function `vsf_sysutil_accept_timeout': sysutil.c:(.text+0x1b1c): undefined reference to `__stack_chk_guard' sysutil.c:(.text+0x1c67): undefined reference to `__stack_chk_guard' sysutil.c:(.text+0x1cc6): undefined reference to `__stack_chk_fail' These are vsftpd's, but other failed packages are similar. BTW, I don't see how/whether Fedora's vsftpd.spec or patches solve this - since I think they do use -fstack-protector, this is puzzling to me. B. With our current gcc.spec, gcc doesn't let us use -fstack-protector and -fPIE -pie at once: /usr/bin/ld: /usr/lib64/gcc/x86_64-openwall-linux/4.6.2/../../../../lib64/libssp.a(ssp.o): relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object; recompile with -fPIC /usr/lib64/gcc/x86_64-openwall-linux/4.6.2/../../../../lib64/libssp.a: could not read symbols: Bad value 5. PIE is another thing we may want to enable selectively (like "full RELRO") or to let our advanced users enable fully (in their rebuild of Owl), so it is a problem that we can't have it along with -fstack-protector. Apparently, we need to have parts of gcc static libraries built with -fPIC to solve this. Maybe we need to review Fedora's gcc.spec for it. Vasiliy? Another thing that is not clear to me is whether we should use -fPIE or -fpie, which are similar but not exactly the same. I was not able to quickly locate a reliable and sufficiently complete explanation of the differences and intended uses. Here's what I found: http://gcc.gnu.org/ml/gcc-patches/2003-06/msg00140.html http://gcc.gnu.org/ml/gcc-help/2009-07/msg00348.html http://blog.flameeyes.eu/2011/08/15/compilers-rant BTW, in a few quick tests I ran, PIE has a more significant performance impact than -fstack-protector does, even on x86_64 where I previously read that PIE's performance impact was negligible (maybe it is for a system overall, where most CPU time is spent running library code, which is PIC anyway, but I measured on a standalone program build). 6. After we update glibc, we'll want to use -D_FORTIFY_SOURCE=2. Somehow Fedora uses -Wp,-D_FORTIFY_SOURCE=2 - does this really matter or are they trying to save some CPU cycles during builds? -Wp,option You can use -Wp,option to bypass the compiler driver and pass option directly through to the preprocessor. If option contains commas, it is split into multiple options at the commas. However, many options are modified, translated or interpreted by the com- piler driver before being passed to the preprocessor, and -Wp forcibly bypasses this phase. The preprocessor's direct interface is undocumented and subject to change, so whenever possible you should avoid using -Wp and let the driver handle the options instead. 7. What's the deal with Fedora's use of -fexceptions? Why do they do it? For reference, Fedora's gcc flags may be found here: http://pkgs.fedoraproject.org/gitweb/?p=redhat-rpm-config.git;a=tree I'd appreciate any comments. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.