Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200908192746.GA7854@voyager>
Date: Tue, 8 Sep 2020 21:27:46 +0200
From: Markus Wichmann <nullplan@....net>
To: musl@...ts.openwall.com
Subject: Re: realpath without procfs

On Tue, Sep 08, 2020 at 01:19:04PM -0400, Rich Felker wrote:
> Since it was raised yet again on #musl, I took some time to research
> justification for, and ease of implementing, realpath without procfs.
>

I do remember dietlibc's implementation of realpath(). But that has
serious side effects that make it not thread-safe. The basic idea they
had was to use chdir() and getcwd() to get the kernel to normalize the
paths without having to read it from procfs. Not needing procfs was one
of the design goals of that project, so that is why they implemented it
that way.

Unfortunately, in some cases chdir() is irreversible (e.g. deleted
working directory), and also, there is only one working directory per
process, so while this is going on, all other threads will have trouble
finding their files. Adding locking to prevent the other threads from
noticing this would be challenging, to say the least, if not outright
impossible. There are just so many places where the working directory
plays a role.

Oh, and one more side effect: While the working directory is switched
elsewhere, another process may unmount the volume containing the
original directory. You could open "." first, to prevent this, but that
adds another two syscalls overhead.

> - ttyname (important to things that use it)
>

I don't see much of an alternative to using procfs for that one. You
could probably search for device and inode of the fd among /dev/tty* and
/dev/pts/* but that seems like a hack. That should probably be at most a
fallback, if the normal way through /proc doesn't work.

> - dynamic linker identifying executable pathname
>

Well, Linux could just pass AT_EXECFN. But if it doesn't, unless they
want to add Solaris' getexecname() syscall, /proc/self/exe is the only
link to the executable file name.

> This is actually a lot less than I expected, and makes it reasonable
> to envision a path to eventually not needing procfs at all.
>
> So, I did the work to figure out what would be needed to write a
> procfs-free realpath, and it turns out that actually writing it was
> not any harder, so I did. Attached is a draft. It needs testing, and
> I'm undecided whether we should keep the old code and just use this as
> a fallback, or just replace it. (The old code has fixed 5-syscall
> overhead and ugly side effects on kernels too old to have O_PATH; new
> code needs one syscall per path component and might (?) have worse or
> different behavior under concurrent changes to the dir tree.)
>
> Some notes:
>
> - Attempts to support all pathnames where no intermediate exceeds
>   PATH_MAX.
>
> - Initial // is treated as special, but //. and //.. resolve to /
>
> - getcwd is expanded initially if pathname is relative. This might be
>   a bad choice since it causes failure whenever pwd is not
>   representable even if the symlink reached via a relative pathname
>   would take us to an absolute path that is representable.

I just checked, and glibc does the same thing. So at least you are in
good company with being unable to handle unreachable working directories
in realpath().

> We could
>   accumulate a relative path, including preserving .. components,
>   until the first absolute-target symlink, and only apply it by
>   prepending (and cancelling ..) at the end if no absolute-target
>   symlink was encountered, but that requires some rework to do.
>
> Thoughts?
>
> Rich

> #define _GNU_SOURCE
> #include <stdlib.h>
> #include <limits.h>
> #include <errno.h>
> #include <unistd.h>
> #include <string.h>
>
> char *realpath(const char *restrict filename, char *restrict resolved)
> {
> 	char output[PATH_MAX], stack[PATH_MAX];
> 	size_t p, q, l, cnt=0;
>
> 	l = strlen(filename);
> 	if (l > sizeof stack) goto toolong;

Shouldn't that be strnlen(), then?

> 	p = sizeof stack - l - 1;
> 	memcpy(stack+p, filename, l+1);
>
> 	if (stack[p] != '/') {
> 		if (getcwd(output, sizeof output) < 0) return 0;
> 		q = strlen(output);
> 	} else {
> 		q = 0;
> 	}
>
> 	while (stack[p]) {
> 		if (stack[p] == '/') {
> 			q=0;
> 			p++;
> 			/* Initial // is special. */
> 			if (stack[p] == '/' && stack[p+1] != '/') {

You already incremented p here. Did you want to test for "///"? The
comment indicated otherwise.

> 				output[q++] = '/';
> 			}
> 			while (stack[p] == '/') p++;
> 		}
> 		char *z = __strchrnul(stack+p, '/');
> 		l = z-(stack+p);
> 		if (l<=2 && stack[p]=='.' && stack[p+l-1]=='.') {
> 			if (l==2) {
> 				while(q>1 && output[q-1]!='/') q--;
> 				if (q>1) q--;
> 			}
> 			p += l;
> 			while (stack[p] == '/') p++;
> 			continue;
> 		}
> 		if (l==1 && stack[p]=='.')
> 		if (l+2 > sizeof output - q) goto toolong;

I believe you forgot to finish the first "if" line here. Also, you have
already handled the "." path at this point.

> 		output[q] = '/';
> 		memcpy(output+q+1, stack+p, l);
> 		output[q+1+l] = 0;
> 		p += l;
> 		ssize_t k = readlink(output, stack, p);
> 		if (k==-1) {
> 			if (errno == EINVAL) {
> 				q += 1+l;
> 				while (stack[p] == '/') p++;
> 				continue;
> 			}
> 			return 0;
> 		}
> 		if (k==p) goto toolong;
> 		if (++cnt == SYMLOOP_MAX) {
> 			errno = ELOOP;
> 			return 0;
> 		}
> 		p -= k;
> 		memmove(stack+p, stack, k);
> 	}
> 	if (!q) output[q++] = '/';
> 	output[q] = 0;
> 	return resolved ? strcpy(resolved, output) : strdup(output);
>
> toolong:
> 	errno = ENAMETOOLONG;
> 	return 0;
> }

Ciao,
Markus

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.