Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111104003731.GB8202@openwall.com>
Date: Fri, 4 Nov 2011 04:37:31 +0400
From: Solar Designer <solar@...nwall.com>
To: owl-dev@...ts.openwall.com
Subject: Re: %optflags for new gcc

On Wed, Oct 26, 2011 at 08:52:14AM +0400, Solar Designer wrote:
> We may want to adjust our %optflags now (in .rpmmacros).  I am thinking
> of two changes:
> 
> 1. Use -Os instead of -O2.  BTW, the kernel already builds with -Os.
> In my (limited) testing, -Os results in significantly smaller code that
> is about as fast as -O2's (and is sometimes faster than -O2's).

I am not going to do this now.  It is not obvious to me whether "-Os
-fomit-frame-pointer" or -O2 (with -fomit-frame-pointer implied) is more
appropriate for a distro.

> 2. Drop our -mpreferred-stack-boundary* options, which arguably break
> the ABI and are more likely to actually break things now that gcc
> generates SSE2 instructions (-msse2 is implied on x86_64).

I was almost going to do this, mostly out of concern that new glibc
would use SSE2 a lot and expect the stack to be properly aligned for
that, but then I found this in glibc-2.14's sysdeps/i386/Makefile:

# Most of the glibc routines don't ever call user defined callbacks
# nor use any FPU or SSE* and as such don't need bigger %esp alignment
# than 4 bytes.
# Lots of routines in math will use FPU, so make math subdir an exception
# here.
ifeq ($(subdir),math)
sysdep-CFLAGS += -mpreferred-stack-boundary=4
else
ifeq ($(subdir),csu)
sysdep-CFLAGS += -mpreferred-stack-boundary=4
else
sysdep-CFLAGS += -mpreferred-stack-boundary=2
# Likewise, any function which calls user callbacks
uses-callbacks += -mpreferred-stack-boundary=4
# Likewise, any stack alignment tests
stack-align-test-flags += -malign-double -mpreferred-stack-boundary=4
endif
endif

So glibc itself is aware that stack alignment has its performance cost
and it tries to avoid it.  And it uses its own string functions too,
which seems to imply that those should be designed to work with
-mpreferred-stack-boundary=2 in the caller as well.  Indeed, the
SSE2-using versions of strlen() and memset() under
sysdeps/i386/i686/multiarch/ appear not to make larger than 4-byte
accesses to the stack (they do to non-stack locations).

Thus, it appears that we can continue to use
-mpreferred-stack-boundary=2 when building source code for our binaries
without SSE2 enabled (no -msse2, no -march=... that would imply SSE2).
As to libraries, things are trickier: some might have callbacks, and we
have no control over how third-party binaries are compiled (some may
expect their called back functions to have 16-byte stack alignment).
Luckily, glibc specifically appears to take care of this problem as
above, but for other libraries we have to choose the
-mpreferred-stack-boundary=... setting for them in their entirety (since
we can't afford to spend time on separating their callback-possible code
paths from others, and such activity on our part would be error-prone).

Most likely, I will keep our current -mpreferred-stack-boundary=2 in
%optflags_bin_i686, but drop our current -mpreferred-stack-boundary=3
from %optflags_lib_i686 (so it will default to "4" instead).

Other options I am considering:

3. -Wl,-z,relro

I am doing a test build now, no problems so far.  This is so-called
"partial RELRO", which should have no performance impact.  We may also
do "full RELRO" for specific programs that are security but not
performance-critical (I am thinking network services).  Many other
distros are using the same approach.

http://isisblogs.poly.edu/2011/06/01/relro-relocation-read-only/

4. -fstack-protector --param=ssp-buffer-size=2

Fedora is using "--param=ssp-buffer-size=4", the default is 8.  In my
testing, going to 2 or even to 1 adds very few functions to those for
which checks are enabled, compared to the default of 8.  However,
-fstack-protector-all would add a whole lot more of them.  So plain
-fstack-protector with a very low setting of --param=ssp-buffer-size
seems like the best choice.  The value 2 specifically represents the
smallest reasonable NUL-terminated string buffer, e.g. in code like:

char s[2];
sprintf(s, "%d", n);

where "n" is expected to be in the range of 0 to 9, but might not.

Problems with this:

A. -fstack-protector needs to be passed at link time as well, and
linking must be done via gcc as well.  Right now, this is not true for
many of our packages.  In my test build with -fstack-protector on
x86_64, 63 out of 153 packages failed to build, with errors like:

sysutil.o: In function `vsf_sysutil_accept_timeout':
sysutil.c:(.text+0x1b1c): undefined reference to `__stack_chk_guard'
sysutil.c:(.text+0x1c67): undefined reference to `__stack_chk_guard'
sysutil.c:(.text+0x1cc6): undefined reference to `__stack_chk_fail'

These are vsftpd's, but other failed packages are similar.  BTW, I don't
see how/whether Fedora's vsftpd.spec or patches solve this - since I
think they do use -fstack-protector, this is puzzling to me.

B. With our current gcc.spec, gcc doesn't let us use -fstack-protector
and -fPIE -pie at once:

/usr/bin/ld: /usr/lib64/gcc/x86_64-openwall-linux/4.6.2/../../../../lib64/libssp.a(ssp.o): relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object; recompile with -fPIC
/usr/lib64/gcc/x86_64-openwall-linux/4.6.2/../../../../lib64/libssp.a: could not read symbols: Bad value

5. PIE is another thing we may want to enable selectively (like "full
RELRO") or to let our advanced users enable fully (in their rebuild of
Owl), so it is a problem that we can't have it along with
-fstack-protector.  Apparently, we need to have parts of gcc static
libraries built with -fPIC to solve this.  Maybe we need to review
Fedora's gcc.spec for it.  Vasiliy?

Another thing that is not clear to me is whether we should use -fPIE or
-fpie, which are similar but not exactly the same.  I was not able to
quickly locate a reliable and sufficiently complete explanation of the
differences and intended uses.  Here's what I found:

http://gcc.gnu.org/ml/gcc-patches/2003-06/msg00140.html
http://gcc.gnu.org/ml/gcc-help/2009-07/msg00348.html
http://blog.flameeyes.eu/2011/08/15/compilers-rant

BTW, in a few quick tests I ran, PIE has a more significant performance
impact than -fstack-protector does, even on x86_64 where I previously
read that PIE's performance impact was negligible (maybe it is for a
system overall, where most CPU time is spent running library code, which
is PIC anyway, but I measured on a standalone program build).

6. After we update glibc, we'll want to use -D_FORTIFY_SOURCE=2.
Somehow Fedora uses -Wp,-D_FORTIFY_SOURCE=2 - does this really matter
or are they trying to save some CPU cycles during builds?

       -Wp,option
           You can use -Wp,option to bypass the compiler driver and pass
           option directly through to the preprocessor.  If option contains
           commas, it is split into multiple options at the commas.  However,
           many options are modified, translated or interpreted by the com-
           piler driver before being passed to the preprocessor, and -Wp
           forcibly bypasses this phase.  The preprocessor's direct interface
           is undocumented and subject to change, so whenever possible you
           should avoid using -Wp and let the driver handle the options
           instead.

7. What's the deal with Fedora's use of -fexceptions?  Why do they do it?

For reference, Fedora's gcc flags may be found here:

http://pkgs.fedoraproject.org/gitweb/?p=redhat-rpm-config.git;a=tree

I'd appreciate any comments.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.