libc-coord - Re: Allocating for execve and related functions

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240724183708.GF10433@brightrain.aerifal.cx>
Date: Wed, 24 Jul 2024 14:37:09 -0400
From: Rich Felker <dalias@...c.org>
To: libc-coord@...ts.openwall.com
Subject: Re: Allocating for execve and related functions

On Mon, Jul 22, 2024 at 08:37:04AM +0200, Florian Weimer wrote:
> In some cases, it is necessary to allocate before making an execve
> system call.  In execvp and similar functions, space for constructing
> the pathname is needed.

Assuming existence of a PATH_MAX, the constructed path can be assumed
to fit in automatic storage and doesn't require allocation.

If an implementation choses not to have a PATH_MAX, that's a fun way
of shooting oneself in the foot in many many places... but thankfully
it looks like there's a solution anyway (see below).

> For execl, the argument vector needs to be

The argument vector is just pointers, and these pointers were already
passed to execl on the stack, so the storage for them is at least *of
the same order* as the size the caller has already assumed the stack
to be (roughly 2x). I think this makes it fairly reasonable to
construct on-stack as a VLA, crashing with stack overflow if it
doesn't fit (since the execl call itself would already have crashed
similarly from passing too many args; you're just changing the
threshold within the same order of magnitude). In the real world, you
don't call execl with hundreds of arguments; Translation Limits
probably don't even consider that valid. You use one of the execv
forms if you need a large or variable-length argument vector; execl is
for small, fixed numbers of args.

> built.  Some functions have fallback to the shell for missing script
> interpreters, which also requires copying the argument vector.

This is the one case where allocation really is needed, I think. The
existing argument vector is not on the incoming stack and can't be
assumed to be tiny. If you have a low ARG_MAX (no contract to accept
or refuse larger ones), you could potentially assume
ARG_MAX/2*sizeof(void*) fits on stack and fail if argc exceeds
ARG_MAX/2, but even that is quite large for stack.

> Thread-safe environment access may require a copy of the environment
> vector.

I don't think this is a reasonable motivation. The environment
fundamentally cannot be made thread-safe to modify. The interfaces
don't admit doing that. And I don't think there's any reasonable way
you could make exec* obtain a lock to copy it while still being
AS-safe. At the very least you'd have to make all accesses to the
environment block and unblock signals to make the lock AS-safe, which
would be prohibitively slow for many real-world uses.

> The allocation needs to be performed in an async-signal-safe fashion,
> but that isn't the main problem.  In a vfork scenario, the allocation
> happens in the original process, and if execve is successful, any
> allocation leaks.
> 
> Has anyone found a way to work around this?  A single per-thread buffer
> again runs into signal safety issues.  Maybe a stack of buffers, and
> cleanup code in vfork for anything allocated in the new process?

If this needs to be supported, I think what you can do is have the
vfork asm tail-call in the parent to a cleanup function that inspects
TLS for a pointer to an allocation made by mmap in the child and
unmaps it if present. I don't see any need for "stack of buffers".
There's at most one block of data that needs to be freed: one
containing everything that had to be marshalled into the SYS_execve
or SYS_execveat syscall. Anything else allocated admits an opportunity
to free it before the chile ceases to exist.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.