Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200908201757.GX3265@brightrain.aerifal.cx>
Date: Tue, 8 Sep 2020 16:17:57 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: realpath without procfs

On Tue, Sep 08, 2020 at 09:27:46PM +0200, Markus Wichmann wrote:
> On Tue, Sep 08, 2020 at 01:19:04PM -0400, Rich Felker wrote:
> > Since it was raised yet again on #musl, I took some time to research
> > justification for, and ease of implementing, realpath without procfs.
> >
> 
> I do remember dietlibc's implementation of realpath(). But that has
> serious side effects that make it not thread-safe. The basic idea they
> had was to use chdir() and getcwd() to get the kernel to normalize the
> paths without having to read it from procfs. Not needing procfs was one
> of the design goals of that project, so that is why they implemented it
> that way.
> 
> Unfortunately, in some cases chdir() is irreversible (e.g. deleted
> working directory), and also, there is only one working directory per
> process, so while this is going on, all other threads will have trouble
> finding their files. Adding locking to prevent the other threads from
> noticing this would be challenging, to say the least, if not outright
> impossible. There are just so many places where the working directory
> plays a role.
> 
> Oh, and one more side effect: While the working directory is switched
> elsewhere, another process may unmount the volume containing the
> original directory. You could open "." first, to prevent this, but that
> adds another two syscalls overhead.

This can be solved by forking (or possibly just making a nonstandard
thread that unshares CLONE_FILES or whatever), but that's awful too.
The version here is far less expensive in runtime cost, and not even
large (just over 1k on i386).

> > - ttyname (important to things that use it)
> >
> 
> I don't see much of an alternative to using procfs for that one. You
> could probably search for device and inode of the fd among /dev/tty* and
> /dev/pts/* but that seems like a hack. That should probably be at most a
> fallback, if the normal way through /proc doesn't work.

For ptys, which are the most common, you can use TIOCGPTN. Otherwise
yes you can scan /dev, but that's a pain and may be very slow.

> > - dynamic linker identifying executable pathname
> >
> 
> Well, Linux could just pass AT_EXECFN. But if it doesn't, unless they
> want to add Solaris' getexecname() syscall, /proc/self/exe is the only
> link to the executable file name.

Linux does, but AT_EXECFN is different. It's the string passed to
SYS_execve, which has subtle differences (in particular it's not
realpath-resolved).

> > This is actually a lot less than I expected, and makes it reasonable
> > to envision a path to eventually not needing procfs at all.
> >
> > So, I did the work to figure out what would be needed to write a
> > procfs-free realpath, and it turns out that actually writing it was
> > not any harder, so I did. Attached is a draft. It needs testing, and
> > I'm undecided whether we should keep the old code and just use this as
> > a fallback, or just replace it. (The old code has fixed 5-syscall
> > overhead and ugly side effects on kernels too old to have O_PATH; new
> > code needs one syscall per path component and might (?) have worse or
> > different behavior under concurrent changes to the dir tree.)
> >
> > Some notes:
> >
> > - Attempts to support all pathnames where no intermediate exceeds
> >   PATH_MAX.
> >
> > - Initial // is treated as special, but //. and //.. resolve to /
> >
> > - getcwd is expanded initially if pathname is relative. This might be
> >   a bad choice since it causes failure whenever pwd is not
> >   representable even if the symlink reached via a relative pathname
> >   would take us to an absolute path that is representable.
> 
> I just checked, and glibc does the same thing. So at least you are in
> good company with being unable to handle unreachable working directories
> in realpath().

Thanks for checking that.

> > We could
> >   accumulate a relative path, including preserving .. components,
> >   until the first absolute-target symlink, and only apply it by
> >   prepending (and cancelling ..) at the end if no absolute-target
> >   symlink was encountered, but that requires some rework to do.
> >
> > Thoughts?
> >
> > Rich
> 
> > #define _GNU_SOURCE
> > #include <stdlib.h>
> > #include <limits.h>
> > #include <errno.h>
> > #include <unistd.h>
> > #include <string.h>
> >
> > char *realpath(const char *restrict filename, char *restrict resolved)
> > {
> > 	char output[PATH_MAX], stack[PATH_MAX];
> > 	size_t p, q, l, cnt=0;
> >
> > 	l = strlen(filename);
> > 	if (l > sizeof stack) goto toolong;
> 
> Shouldn't that be strnlen(), then?

Yes.

> > 	p = sizeof stack - l - 1;
> > 	memcpy(stack+p, filename, l+1);
> >
> > 	if (stack[p] != '/') {
> > 		if (getcwd(output, sizeof output) < 0) return 0;
> > 		q = strlen(output);
> > 	} else {
> > 		q = 0;
> > 	}
> >
> > 	while (stack[p]) {
> > 		if (stack[p] == '/') {
> > 			q=0;
> > 			p++;
> > 			/* Initial // is special. */
> > 			if (stack[p] == '/' && stack[p+1] != '/') {
> 
> You already incremented p here. Did you want to test for "///"? The
> comment indicated otherwise.

The second condition is !=, and it's checking for // that's not ///.
// is special; /// is equivalent to /.

> > 				output[q++] = '/';
> > 			}
> > 			while (stack[p] == '/') p++;
> > 		}
> > 		char *z = __strchrnul(stack+p, '/');
> > 		l = z-(stack+p);
> > 		if (l<=2 && stack[p]=='.' && stack[p+l-1]=='.') {
> > 			if (l==2) {
> > 				while(q>1 && output[q-1]!='/') q--;
> > 				if (q>1) q--;
> > 			}
> > 			p += l;
> > 			while (stack[p] == '/') p++;
> > 			continue;
> > 		}
> > 		if (l==1 && stack[p]=='.')
> > 		if (l+2 > sizeof output - q) goto toolong;
> 
> I believe you forgot to finish the first "if" line here. Also, you have
> already handled the "." path at this point.

Uhg, that second-to-last line was supposed to be removed. Thanks for
catching it -- it bypassed the bounds check. Fixed now in my draft.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.