Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Thu, 16 Sep 2010 08:01:30 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Cc: Roland McGrath <roland@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, oss-security@...ts.openwall.com,
        Solar Designer <solar@...nwall.com>,
        Kees Cook <kees.cook@...onical.com>, Al Viro <viro@...iv.linux.org.uk>,
        Oleg Nesterov <oleg@...hat.com>, Neil Horman <nhorman@...driver.com>,
        linux-fsdevel@...r.kernel.org, pageexec@...email.hu,
        "Brad Spengler <spender@...ecurity.net>, Eugene Teo" <eugene@...hat.com>,
        KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Subject: Re: [PATCH 2/2] execve: check the VM has enough memory at first

2010/9/15 KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>:
>
> Briefly says, to introduce new limit has bad benefit/risk balance. Sadly.

Well, I mostly agree. That said, I do think we could extend the
limiter some ways.

For example, I think the "stack limit / 4" is perfectly sane, but it
would make total sense to perhaps also take into account the AS and
RSS limits.

And I do think that your attempt to use __vm_enough_memory() was good.
It happens to be coded in a way that makes it useless for a one-pass
model, and some of what it does would be too expensive to do up-front
when you can't short-circuit it, but I do think that it would probably
be appropriate to at least try to take the _rough_ code there and use
it as a limit for maximum stack size too.

For example, we could have a function somewhat like

    unsigned long max_stack_size(void)
   {
        unsigned long allowed, used, limit;

        switch (sysctl_overcommit_memory) {
        case OVERCOMMIT_ALWAYS:
                allowed = ULONG_MAX;
                break;
        case OVERCOMMIT_GUESS:
                .. maybe we can come up with some upper bound here too ..
                break;
        default:
                allowed = (totalram_pages - hugetlb_total_pages())
                        * sysctl_overcommit_ratio / 100;
                if (!cap_sys_admin)
                        allowed -= allowed / 32;
                allowed += total_swap_pages;
                /* Don't let a single process grow too big:
                   leave 3% of the size of this process for other processes */
                if (mm)
                        allowed -= mm->total_vm / 32;
                /* What is already committed to? */
                used = percpu_counter_read_positive(&vm_committed_as);
                if (used > allowed)
                        return 0;
                allowed -= used;
                break;
        }
        limit = ACCESS_ONCE(rlim[RLIMIT_STACK].rlim_cur) / 4;
        if (allowed > limit)
                allowed = limit;
        return allowed;
    }

which we'd call once at the beginning of the execve(), and then
remember that result and use it instead of the current 'rlimit/4'
value.

Now, admittedly the OVERCOMMIT_GUESS case is the interesting one, and
the one that is hard to write efficiently. But maybe we could make
'nr_free_pages()' cheap enough that doin that whole OVERCOMMIT_GUESS
"approximate free pages" thing from __vm_enough_memory would work out
too?

I dunno. It doesn't look hopeless.

                      Linus

Powered by blists - more mailing lists

Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.