Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110724124034.GI132@brightrain.aerifal.cx>
Date: Sun, 24 Jul 2011 08:40:34 -0400
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: holywar: malloc() vs. OOM

On Sun, Jul 24, 2011 at 02:33:25PM +0400, Vasiliy Kulikov wrote:
> Rich,
> 
> This is more a question about your malloc() failure policy for musl than
> an actual proposal.
> 
> [...]
> 
> In theory, these are bugs of applications and not of libc, and they
> should be fully handled in programs, not in libc.  Period.
> 
> But looking at the problem from the pragmatic point of view we'll see
> that libc is actually the easiest place where the problem may be 
> workarounded (not fixed, surely).  The workaround would be simply
> raising SIGKILL if malloc() fails (either because of brk() or mmap()).
> For the rare programs craving to handle OOM such code should be used:

This is absolutely wrong and non-conformant. It will also ruin all
robust programs and result in massive data loss, deadlock with shared
locks due to failure to release locks before termination, and all
sorts of ills. It also creates trivial DoS opportunities; for example
you could kill a daemon that uses glob() simply by passing it a glob
expression that matches millions or billions of files. (It may be a
bad idea, from a load standpoint, to be using glob in a daemon, but it
should simply result in high load then failure, not crashing.)

> #define _OOM_MAY_FAIL_
> #include <stdlib.h>
> 
> Then the workaround is disabled.

Being broken by default is not acceptable to me. The other way around
could be acceptable, but I'm very doubtful that it would fix any
real-world bugs. The modern mmap min address is very high, and it's
quite rare for apps to access the end of their allocation before the
beginning anyway. The only common situation I can think of where it
might happen to initially access a high offset first is when calling
glibc's memcpy which sometimes chooses to copy backwards. musl's
memcpy does not take this liberty, even if it might be faster in some
cases, for that very reason - it's dangerous to access high offsets
first if a program was not careful about checking the return value of
malloc.

A better solution might be to have a gcc option to generate a read
from the base address the first time a function performs arithmetic on
a pointer it has not already checked. This is valid because the C
language does not allow pointer arithmetic to cross object boundaries,
and this approach could be made 100% correct rather than being a
heuristic that breaks correct applications. It would impose some
performance cost, but I doubt it would be high. (Note: Some special
handling might be required for "one past the end of an array" pointers
here. I'd have to think a bit longer to work out the details but I
think it's possible to handle them safely in a similar way.)

> Probably I overestimate the importance of OOM errors, and (1) in
> particular.   However, I think it is worth discussing.

I don't think you overestimate the importance of OOM errors. Actually
Linux desktop is full of OOM errors that ruin usability, like file
managers that hang the system for 5 minutes then crash if you navigate
to a directory with a 15000x15000 image file. Unfortunately I don't
think it's possible to fix at the libc level, and fixing the worst
issues (DoS from apps crashing when they should not crash) usually
involves both sanity-checking the size prior to calling malloc *and*
checking the return value of malloc...

BTW great subject line! :-)

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.