Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <56F34851-93B5-43D7-8968-4316F0F76157@gmail.com>
Date: Sat, 27 Apr 2019 17:51:17 -0500
From: Rodger Combs <rodger.combs@...il.com>
To: musl@...ts.openwall.com
Subject: Re: [PATCH 3/3] crt: add dcrt1, with support for locating the
 dynamic loader at runtime



> On Apr 27, 2019, at 12:19, Rich Felker <dalias@...c.org> wrote:
> 
> On Fri, Apr 26, 2019 at 08:13:29PM -0500, Rodger Combs wrote:
>> diff --git a/crt/dcrt1.c b/crt/dcrt1.c
>> new file mode 100644
>> index 0000000..47c6dc2
>> --- /dev/null
>> +++ b/crt/dcrt1.c
>> @@ -0,0 +1,362 @@
>> +#define SYSCALL_NO_TLS
>> +
>> +#include <elf.h>
>> +#include <errno.h>
>> +#include <fcntl.h>
>> +#include <features.h>
>> +#include <libgen.h>
>> +#include <sys/mman.h>
>> +#include <string.h>
>> +#include <unistd.h>
>> +#include "atomic.h"
>> +#include "dynlink.h"
>> +#include "syscall.h"
>> +
>> +extern weak hidden const size_t _DYNAMIC[];
>> +
>> +int main();
>> +weak void _init();
>> +weak void _fini();
>> +weak _Noreturn int __libc_start_main(int (*)(), int, char **,
>> +	void (*)(), void(*)(), void(*)());
>> +
>> +#define DLSTART_PROLOGUE __libc_start_main(main, argc, argv, _init, _fini, 0);
>> +
>> +#define START "_start"
>> +#define _dlstart_c _start_c
>> +#include "../ldso/dlstart.c"
>> +
>> +struct dso {
>> +	unsigned char *base;
>> +	struct fdpic_loadmap *loadmap;
>> +	size_t *dynv;
>> +	Phdr *phdr;
>> +	int phnum;
>> +	size_t phentsize;
>> +	unsigned char *map;
>> +	size_t map_len;
>> +};
> 
> There is no need for making a dummy strict dso or exposing struct dso
> to this file. The code to be taken/shared from map_library just needs
> to be refactored so that the part that writes things into struct dso
> remains in dynlink.c. Or, since we're punting on fdpic support for
> this for now, a simplified version of map_library without fdpic
> support (it gets quite simple then) could perhaps just be put here for
> now, with the intent of refactoring to unify later.

There are enough relevant values here that it seems to warrant using a struct (rather than a bunch of individual pointer args), and fairly little of map_library had to be disabled (basically just the stuff dealing with TLS and GNU relro/stack stuff) to avoid accesses to other members. I suppose those portions could be factored out fairly easily, and then this minimal struct could be made a member of the dso struct? That just seemed like it'd require a fair bit more change to the existing dynlink code than this.

> 
>> +#ifndef PAGE_SIZE
>> +#define PAGE_SIZE SYSCALL_MMAP2_UNIT
>> +#endif
>> [...]
>> +static inline int crt_mprotect(void *addr, size_t len, int prot)
>> +{
>> +	size_t start, end;
>> +	start = (size_t)addr & -PAGE_SIZE;
>> +	end = (size_t)((char *)addr + len + PAGE_SIZE-1) & -PAGE_SIZE;
>> +	return __syscall(SYS_mprotect, start, end-start, prot);
>> +}
>> +#define mprotect crt_mprotect
> 
> This is wrong. mprotect does not take units of SYSCALL_MMAP2_UNIT. You
> need to get the variable page size from auxv.

Ah, I was confused because the actual mprotect implementation uses PAGE_SIZE, which is defined in the public headers as a fixed value (though looking closer, I'm not sure if it's always the same as SYSCALL_MMAP2_UNIT anyway), but in the internal headers it's an access to the __libc struct. Sure.

> 
>> +static char *crt_getenv(const char *name, char **environ)
>> +{
>> +	size_t l = crt_strchrnul(name, '=') - name;
>> +	if (l && !name[l] && environ)
>> +		for (char **e = environ; *e; e++)
>> +			if (!crt_strncmp(name, *e, l) && l[*e] == '=')
>> +				return *e + l+1;
>> +	return 0;
>> +}
> 
> I really question the use of environment here at all, both from a
> standpoint of simplicity/least-surprise, and from a standpoint of
> intended usage case. If you're on a non-musl system, the environment
> is likely to contain settings that wouldn't be appropriate to musl
> anyway, meaing you probably need to remove them for this to work at
> all, and then if you really need to force a path other than the
> builtin relative one for a particular musl-linked binary, you might as
> well just invoke ldso as a command, making the override local to that
> invocation rather than inherited across exec of other programs.

My goal is to make locating the dynamic loader work the same way as locating any other library you load, including the ability to override that path via LD_LIBRARY_PATH for testing, interception, etc. Having a dedicated var for it means you can set it and have a program and all of its child processes use an alternate loader when testing something, or as a way to adjust the loader path after build (if patchelf or similar tools don't work well for you).

> 
> If you have compelling reasons you want to search the environment,
> let's discuss it, but if it's just gratuitous "maybe it will be
> useful", let's apply YAGNI.
> 
>> +static void get_rpaths(const char **rpath, const char **runpath, const size_t *dyn)
>> +{
>> +	/* DT_STRTAB is pre-relocated for us by dlstart */
>> +	const char *strings = (void*)dyn[DT_STRTAB];
>> +
>> +	if (dyn[0] & (1 << DT_RPATH))
>> +		*rpath = strings + dyn[DT_RPATH];
>> +	if (dyn[0] & (1 << DT_RUNPATH))
>> +		*runpath = strings + dyn[DT_RUNPATH];
>> +}
> 
> musl does not honor the legacy difference in search order between
> rpath and runpath (see dynlink.c), but if wwe don't search via
> environment here it doesn't matter anyway.

It was easy enough to implement, so I did ¯\_(ツ)_/¯ but if we explicitly don't want to implement the old behavior I can remove it.

> 
>> +static size_t find_linker(char *outbuf, size_t bufsize, const char *this_path, size_t thisl, const size_t *dyn, char **environ, int secure)
>> +{
>> +	const char *paths[3] = {0}; // rpath, envpath, runpath
>> +	size_t i;
>> +	int fd;
>> +
>> +	if (thisl)
>> +		thisl--;
>> +	while (thisl > 1 && this_path[thisl] == '/')
>> +		thisl--;
>> +	while (thisl > 0 && this_path[thisl] != '/')
>> +		thisl--;
>> +
>> +	if (!secure) {
>> +		const char *envpath = crt_getenv("LD_LOADER_PATH", environ);
>> +		if (envpath) {
>> +			size_t envlen = crt_strlen(envpath);
>> +			if (envlen < bufsize) {
>> +				crt_memcpy(outbuf, envpath, envlen + 1);
>> +				return envlen + 1;
>> +			}
>> +		}
>> +	}
>> +
>> +	get_rpaths(&paths[0], &paths[2], dyn);
>> +
>> +	paths[1] = secure ? NULL : crt_getenv("LD_LIBRARY_PATH", environ);
>> +
>> +	for (i = 0; i < 3; i++) {
>> +		int relative = 0;
>> +		const char *p = paths[i];
>> +		char *o = outbuf;
>> +		if (!p)
>> +			continue;
>> +		for (;;) {
>> +			if (!crt_strncmp(p, "$ORIGIN", 7) ||
>> +					!crt_strncmp(p, "${ORIGIN}", 9)) {
>> +				relative = 1;
>> +				if (o + thisl + 1 < outbuf + bufsize) {
>> +					crt_memcpy(o, this_path, thisl);
>> +					o += thisl;
>> +				} else {
>> +					o = outbuf + bufsize - 1;
>> +				}
>> +				p += (p[1] == '{' ? 9 : 7);
>> +			} else if (*p == ':' || !*p) {
>> +#define LDSO_FILENAME "ld-musl-" LDSO_ARCH ".so.1"
>> +				relative |= outbuf[0] != '/';
>> +				if ((!secure || !relative) && o + sizeof(LDSO_FILENAME) + 1 < outbuf + bufsize) {
>> +					*o++ = '/';
>> +					crt_memcpy(o, LDSO_FILENAME, sizeof(LDSO_FILENAME));
>> +					if (!access(outbuf, R_OK | X_OK))
>> +						return (o + sizeof(LDSO_FILENAME)) - outbuf;
>> +				}
>> +				if (!*p)
>> +					break;
>> +				relative = 0;
>> +				o = outbuf;
>> +				p++;
>> +			} else {
>> +				if (o < outbuf + bufsize)
>> +					*o++ = *p;
>> +				p++;
>> +			}
>> +		}
>> +	}
>> +
>> +	// Didn't find a usable loader anywhere, so try the hardcoded path :shrug:
>> +	crt_memcpy(outbuf, LDSO_PATHNAME, sizeof(LDSO_PATHNAME));
>> +	return sizeof(LDSO_PATHNAME);
>> +}
> 
> Including fallbacks that alter behavior is probably a bad idea
> (failing preferable). If the changes in your other patch to dynlink.c
> are needed to make it work with dcrt1, then it won't necessarily even
> work to load a (possibly older) system-wide musl ldso; however if
> that's the case I Think we should fix it.

This behavior is analogous to dynlink.c's behavior when it searches rpath for a file and can't find it there. It uses "/etc/ld-musl-" LDSO_ARCH ".path" and falls back on "/lib:/usr/local/lib:/usr/lib" if that's unavailable and I'm just using a single hardcoded path, but beyond that I don't see this as significantly conceptually different. It also allows executables built with this to work on systems with musl installed even without having the vendored copy on-hand, which makes it easier to deploy a single binary that can use a vendored copy if provided, but can be packaged without it for systems with musl.

The changes aren't needed for dcrt1 when the app is run directly; they're needed when running a dcrt app (or other static PIE app) via ldso on the command line.

> 
>> +static void final_start_c(long *p)
>> +{
>> +	int argc = p[0];
>> +	char **argv = (void *)(p+1);
>> +	__libc_start_main(main, argc, argv, _init, _fini, 0);
>> +}
>> +
>> +hidden _Noreturn void __dls2(unsigned char *base, size_t *p, size_t *dyn)
>> +{
>> +	int argc = p[0];
>> +	char **argv = (void *)(p+1);
>> +	int fd;
>> +	int secure;
>> +	int prot = PROT_READ;
>> +	struct dso loader;
>> +	Ehdr *loader_hdr;
>> +	Phdr *new_hdr;
>> +	void *entry;
>> +	char this_path[2*NAME_MAX+2] = {0};
>> +	size_t thisl;
>> +	char linker_path[2*NAME_MAX+2] = {0};
> 
> 2*NAME_MAX is a fairly short arbitrary limit. Why not PATH_MAX, or a
> VLA that adapts up to PATH_MAX (to avoid expanding stack when longer
> is not needed)?

I got this limit from load_library; if we think it should be larger, we should change it in both places.
I don't think there's much use in doing VLAs here (at least, not if they're going to have fixed maximums that are reasonably low), since we're going to CRTJMP out of this anyway.

> 
>> +	size_t linker_len;
>> +	size_t i;
>> +	size_t aux[AUX_CNT];
>> +	size_t *auxv;
>> +	char **environ = argv + argc + 1;
>> +
>> +	/* Find aux vector just past environ[] and use it to initialize
>> +	* global data that may be needed before we can make syscalls. */
>> +	for (i = argc + 1; argv[i]; i++);
>> +	auxv = (void *)(argv + i + 1);
>> +	decode_vec(auxv, aux, AUX_CNT);
>> +	secure = ((aux[0] & 0x7800) != 0x7800 || aux[AT_UID] != aux[AT_EUID]
>> +		|| aux[AT_GID] != aux[AT_EGID] || aux[AT_SECURE]);
> 
> At this point we can just abort if secure != 0. There is unbounded
> attack surface trying to load a (possibly relative) ldso with elevated
> privileges.

No more so than dynlink.c normally has when loading other SOs. Like there, I don't follow $ORIGIN in secure mode, and additionally here I don't handle relative-to-cwd paths in secure mode. I don't see a problem with allowing a load from an absolute rpath, or from the hardcoded path, using this mechanism, though.
Basically, I'm intending for this to be a feature that you could just turn on in your linker flags for everything you build, and get the functionality in the cases where you want it, at no significant cost in those where you don't.

> 
>> +	thisl = readlink("/proc/self/exe", this_path, sizeof this_path);
> 
> We might consider whether use of AT_EXECFN (possibly with additional
> resolution of symlinks etc) would be better. It would avoid dependency
> on /proc, making it possible to run dcrt1 binaries at early boot or
> inside chroots, but does have different semantics that might be less
> (or maybe more?) desirable.
> 

Hmmmm, so it's possible to get relative paths that way (e.g. ./test-prog), but I suppose that might be fine? But I don't know how useful this would really be, since dynlink.c depends on /proc for rpath anyway.

>> +	// Copy the program headers into an anonymous mapping
>> +	new_hdr = mmap(0, (aux[AT_PHENT] * (aux[AT_PHNUM] + 2) + linker_len + PAGE_SIZE - 1) & -PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>> +	if (map_library_failed(new_hdr))
>> +		goto error;
> 
> Can you remind us why patched program headers are needed? I think it
> was absence of PT_PHDR or something...

Yeah, the linker doesn't add PT_PHDR when we tell it not to set a dynamic loader, and dynlink needs it.

> 
>> +	// Point it back at the original kernel-provided base
>> +	new_hdr->p_type = PT_PHDR;
>> +	new_hdr->p_vaddr = (size_t)new_hdr - (size_t)base;
>> +
>> +	((Phdr*)((char*)new_hdr + aux[AT_PHENT]))->p_type = PT_INTERP;
>> +	((Phdr*)((char*)new_hdr + aux[AT_PHENT]))->p_vaddr = new_hdr->p_vaddr + aux[AT_PHENT] * (aux[AT_PHNUM] + 2);
> 
> But here you're also adding a PT_INTERP. What's the motivation for that?

It's pretty trivial to provide since we're already making a duplicate for PHDR, and it lets the loader know its own path, which might be useful for debug stuff at some point. Generally I'm trying to make this act as much like a kernel-loaded ldso as possible.

> 
>> +	crt_memcpy((char*)new_hdr + aux[AT_PHENT] * (aux[AT_PHNUM] + 2), linker_path, linker_len);
>> +
>> +	for (i = 0; i < aux[AT_PHNUM]; i++) {
>> +		Phdr *hdr = (void*)((char*)aux[AT_PHDR] + aux[AT_PHENT] * i);
>> +		Phdr *dst = (void*)((char*)new_hdr + aux[AT_PHENT] * (i + 2));
>> +		if (hdr->p_type == PT_PHDR || hdr->p_type == PT_INTERP) {
> 
> Isn't it impossible to have a PT_INTERP already here?

Normally, yeah, I'm just covering my bases so we behave predictably even if you do something weird.

> 
>> +			// Can't have a duplicate
>> +			dst->p_type = PT_NULL;
> 
> Is adding PT_NULL headers in the middle valid, or do they get
> interpreted as end of headers?

It's valid; they're just skipped over.

> 
>> +		} else {
>> +			crt_memcpy(dst, hdr, aux[AT_PHENT]);
>> +		}
>> +	}
>> +
>> +	if (mprotect(new_hdr, aux[AT_PHENT] * (aux[AT_PHNUM] + 2) + linker_len, PROT_READ))
>> +		goto error;
>> +
>> +	for (i=0; auxv[i]; i+=2) {
>> +		if (auxv[i] == AT_BASE)
>> +			auxv[i + 1] = (size_t)loader_hdr;
>> +		if (auxv[i] == AT_PHDR)
>> +			auxv[i + 1] = (size_t)new_hdr;
>> +		if (auxv[i] == AT_PHNUM)
>> +			auxv[i + 1] += 2;
>> +	}
>> +
>> +	entry = laddr(&loader, loader_hdr->e_entry);
>> +
>> +#ifndef LD_FDPIC
>> +	/* Undo the relocations performed by dlstart */
>> +
>> +	if (NEED_MIPS_GOT_RELOCS) {
>> +		const size_t *dynv = _DYNAMIC;
>> +		size_t local_cnt = 0;
>> +		size_t *got = (void *)(base + dyn[DT_PLTGOT]);
>> +		for (i=0; dynv[i]; i+=2) if (dynv[i]==DT_MIPS_LOCAL_GOTNO)
>> +			local_cnt = dynv[i+1];
>> +		for (i=0; i<local_cnt; i++) got[i] -= (size_t)base;
>> +	}
>> +
>> +	size_t *rel = (void *)((size_t)base+dyn[DT_REL]);
>> +	size_t rel_size = dyn[DT_RELSZ];
>> +	for (; rel_size; rel+=2, rel_size-=2*sizeof(size_t)) {
>> +		if (!IS_RELATIVE(rel[1], 0)) continue;
>> +		size_t *rel_addr = (void *)((size_t)base + rel[0]);
>> +		*rel_addr -= (size_t)base;
>> +	}
>> +#endif
>> +
>> +	CRTJMP(entry, argv - 1);
>> +
>> +error:
>> +	for(;;) a_crash();
>> +}
>> diff --git a/ldso/dlstart.c b/ldso/dlstart.c
>> index 20d50f2..7ff79d6 100644
>> --- a/ldso/dlstart.c
>> +++ b/ldso/dlstart.c
>> @@ -21,7 +21,7 @@
>> hidden void _dlstart_c(size_t *sp, size_t *dynv)
>> {
>> 	size_t i, aux[AUX_CNT], dyn[DYN_CNT];
>> -	size_t *rel, rel_size, base;
>> +	size_t *rel, rel_size, base, loader_phdr;
>> 
>> 	int argc = *sp;
>> 	char **argv = (void *)(sp+1);
>> @@ -41,6 +41,13 @@ hidden void _dlstart_c(size_t *sp, size_t *dynv)
>> 		 * space and moving the extra fdpic arguments to the stack
>> 		 * vector where they are easily accessible from C. */
>> 		segs = ((struct fdpic_loadmap *)(sp[-1] ? sp[-1] : sp[-2]))->segs;
>> +		if (aux[AT_BASE]) {
>> +			Ehdr *eh = (void*)aux[AT_BASE];
>> +			for (i = 0; eh->e_phoff - segs[i].p_vaddr >= segs[i].p_memsz; i++);
>> +			loader_phdr = (eh->e_phoff - segs[i].p_vaddr + segs[i].addr);
>> +		} else {
>> +			loader_phdr = aux[AT_PHDR];
>> +		}
>> 	} else {
>> 		/* If dynv is null, the entry point was started from loader
>> 		 * that is not fdpic-aware. We can assume normal fixed-
>> @@ -55,6 +62,7 @@ hidden void _dlstart_c(size_t *sp, size_t *dynv)
>> 		segs[0].p_memsz = -1;
>> 		Ehdr *eh = (void *)base;
>> 		Phdr *ph = (void *)(base + eh->e_phoff);
>> +		loader_phdr = (size_t)ph;
>> 		size_t phnum = eh->e_phnum;
>> 		size_t phent = eh->e_phentsize;
>> 		while (phnum-- && ph->p_type != PT_DYNAMIC)
>> @@ -69,13 +77,38 @@ hidden void _dlstart_c(size_t *sp, size_t *dynv)
>> 
>> #if DL_FDPIC
>> 	for (i=0; i<DYN_CNT; i++) {
>> -		if (i==DT_RELASZ || i==DT_RELSZ) continue;
>> +		if (i==DT_RELASZ || i==DT_RELSZ || i==DT_RPATH || i==DT_RUNPATH) continue;
>> 		if (!dyn[i]) continue;
>> 		for (j=0; dyn[i]-segs[j].p_vaddr >= segs[j].p_memsz; j++);
>> 		dyn[i] += segs[j].addr - segs[j].p_vaddr;
>> 	}
>> 	base = 0;
>> +#else
>> +	/* If the dynamic linker is invoked as a command, its load
>> +	 * address is not available in the aux vector. Instead, compute
>> +	 * the load address as the difference between &_DYNAMIC and the
>> +	 * virtual address in the PT_DYNAMIC program header. */
>> +	base = aux[AT_BASE];
>> +	if (!base) {
>> +		size_t phnum = aux[AT_PHNUM];
>> +		size_t phentsize = aux[AT_PHENT];
>> +		Phdr *ph = (void *)aux[AT_PHDR];
>> +		for (i=phnum; i--; ph = (void *)((char *)ph + phentsize)) {
>> +			if (ph->p_type == PT_DYNAMIC) {
>> +				base = (size_t)dynv - ph->p_vaddr;
>> +				break;
>> +			}
>> +		}
>> +	}
>> +	loader_phdr = base + ((Ehdr*)base)->e_phoff;
>> +#endif
>> 
>> +#ifdef DLSTART_PROLOGUE
>> +	if (aux[AT_PHDR] != loader_phdr)
>> +		DLSTART_PROLOGUE
>> +#endif
> 
> As mentioned before, this does not work. It puts a symbol reference
> into a function that is required to be pure PIC that can run prior to
> relocations. Having the reference in a branch that's not taken does
> not help.

Ah, alright.

> 
> Instead it should be something like (pseudocode):
> 
> 	if (relocs_already_done) goto done:
> 	// ...
> done:
> 	stage2_func dls2;
> 	GETFUNCSYM(&dls2, __dls2, base+dyn[DT_PLTGOT]);
> 	dls2((void *)base, sp);
> 	
> This does not require any DLSTART_PROLOGUE macro that's specific to
> dcrt1; the exact same thing works in rcrt1 (static pie) and will also
> make it safe to invoke static pie programs via ldso (seems pointless,
> but useful when you don't realize the binary is static).

Alright, simple enough; just added a static int to __dls2 that I set to 1 before CRTJMP-ing to ldso, and had this same aux[AT_PHDR] check in dlstart goto past the relocs. It still needs an ifdef, since otherwise it'd trigger in the loader itself.

> 
> The dcrt1 code in dls2 (which can vary between ldso, rcrt1, and dcrt1)
> then has to do its own test to tell which code path it needs to take,
> but that's fine. General principle here: repeating small amounts of
> work is better than increasing dependency between stages/layers. Same
> applies to passing dyn into dls2 -- doing so prevents the call from
> being a tail call, and keeps _dlstart_c's stack frame around despite
> being done with it.

Re-parsing dynv is simple enough, but I need to get at it somehow (this function gets it from _start). I suppose I could read the _DYNAMIC symbol in C if you prefer?

The stack frame sticking around doesn't really matter, since we're going to CRTJMP out of this anyway.

> 
>> +	/* For convenience in dcrt1, we relocate STRTAB here (as with FDPIC) */
>> +	dyn[DT_STRTAB] += base;
> 
> Just do it where you need it.

It's simple enough in the non-FDPIC case; I just did this here for consistency with FDPIC, where this is significantly more complex. I can drop it if we want to handle FDPIC differently long-term.

> 
>> 	/* MIPS uses an ugly packed form for GOT relocations. Since we
>> 	 * can't make function calls yet and the code is tiny anyway,
>> @@ -144,5 +163,5 @@ hidden void _dlstart_c(size_t *sp, size_t *dynv)
>> 
>> 	stage2_func dls2;
>> 	GETFUNCSYM(&dls2, __dls2, base+dyn[DT_PLTGOT]);
>> -	dls2((void *)base, sp);
>> +	dls2((void *)base, sp, dyn);
> 
> See above.


I'll hold off on re-sending this series until we've decided what to do about the questions above; once that's done, I'll apply whatever we decide on, and also write up some more useful commit messages.

Most of your comments from your other mail are addressed (or at least discussed) above; the one about rewriting auxv/phdrs is not:
- Both linux and freebsd's linux emu layer always insert entries for all the auxv entries we care about (their values just might be zero), so we don't need to worry about them potentially being missing (unless there's some other implementation we care about that I need to check; maybe Microsoft's?). The aux vector is on the stack, so we don't need to worry about it being read-only or something like that. glibc actually already rewrites values in there (when invoked directly), so there's precedent for doing so.
- We make an anonymous mapping in which to rewrite the program headers, rather than doing it in-place. The only particularly odd bit is that we take the difference between the effectively-random anonymous mapping address and the actual kernel-mapped base address, which may be negative (or very large). In practice, this works fine with musl's loader as-is (also happens to work with glibc's), and both do an unsigned subtract with no bounds check, so I think we're fine here as well.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.