musl - Re: [PATCH 1/1] ldso: continue searching if wrong architecture is found first

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGt4E5v9nVbtMJceSZ4Af04pYovgSM5=yGKzzuVNKa8qdkfg-Q@mail.gmail.com>
Date: Wed, 7 Feb 2024 12:59:37 -0800
From: Markus Mayer <mmayer@...adcom.com>
To: Musl Mailing List <musl@...ts.openwall.com>
Cc: Rich Felker <dalias@...c.org>
Subject: Re: [PATCH 1/1] ldso: continue searching if wrong architecture
 is found first

On Wed, 7 Feb 2024 at 09:30, Rich Felker <dalias@...c.org> wrote:
>
> On Tue, Feb 06, 2024 at 05:22:43PM -0800, Markus Mayer wrote:
> > When LD_LIBRARY_PATH is being used, the shared library loader may end up
> > finding a shared library with the correct name but from a wrong
> > architecture. This primarily happens when a 32-bit / 64-bit mismatch
> > occurs.
> >
> > Rather than giving up immediately and aborting, the shared library
> > loader should continue to look in all known locations for a matching
> > library that may work.
>
> I don't think the concept is objectionable, but the implementation
> looks wrong in at least two major ways:

I had a feeling that my approach might be too simplistic. At least it
served as a proof-of-concept.

> 1. It does not seem to continue the path search where it left off, but
>    just skips the rest of the LD_LIBRARY_PATH search on retry. Yes
>    this solves your problem of having a wrong-arch path element
>    present, but it breaks search of any correct-arch path elements
>    also present later there. This might not be a big deal, but it
>    strongly violates principle of least surprise, I think.

You are correct. The implementation does not deal with the remainder
of LD_LIBRARY_PATH and just searches system directories after it finds
an issue with a library in LD_LIBRARY_PATH. I can see how that could
be problematic in other scenarios.

> The bigger one is:
>
> 2. No consideration is made for the *cause of failure* when retrying.
>    Any in particular, lots of potential errors that were supposed to
>    be reportable errors now turn into vectors for loading the wrong
>    library, potentially under control of malicious inputs or
>    environmental factors (kernel resource exhaustion, etc.).

Okay, That makes a lot of sense. We definitely wouldn't want to
introduce potential instabilities or even create attack vectors.

> If "wrong library arch" is going to be a continue-path-search
> operation, it needs to be distinguishable (as "we successfully read
> the Ehdrs and determined the library is not suitable for this arch")
> from inconclusive errors (like out-of-memory, too many mmaps, etc.)

I did some more digging and found where and how it is failing in my
32-bit / 64-bit scenario. I am getting this output when I sprinkle
some debug messages throughout the code.

map_library, 647: fd=3, phsize=324108288, l=960
map_library, 652: fd=3, phsize=324108288, l=0
map_library, 816: errno=8

Of course the line numbers are off between the output and the actual sources.

 640         ssize_t l = read(fd, buf, sizeof buf);
 641         eh = buf;
 642         if (l<0) return 0;
 643         if (l<sizeof *eh || (eh->e_type != ET_DYN && eh->e_type
!= ET_EXEC))
 644                 goto noexec;
 645         phsize = eh->e_phentsize * eh->e_phnum;
> "Line 647" output
 646         if (phsize > sizeof buf - sizeof *eh) {
 647                 allocated_buf = malloc(phsize);
 648                 if (!allocated_buf) return 0;
 649                 l = pread(fd, allocated_buf, phsize, eh->e_phoff);
> "Line 652" output
 650                 if (l < 0) goto error;
 651                 if (l != phsize) goto noexec;
 652                 ph = ph0 = allocated_buf;
 653         } else if (eh->e_phoff + phsize > l) {
 ...
 803 error:
> "Line 816" output
 804         if (map!=MAP_FAILED) unmap_library(dso);
 805         free(allocated_buf);
 806         return 0;
 807 }

So it is finding out only rather late in the process that mapping the
library didn't work. By this point, it isn't easy to go back to
searching. Conversely, it knows right away when open() fails, so it
can easily keep looking.

> This probably means some minor refactoring is called for, replacing
> the open calls in path_open and the direct use of open for absolute
> paths by a new function like lib_open or something that would also
> place the start of the file into a caller-provided buffer and validate
> that it's usable before returning success, moving that logic out of
> map_library and letting map_library assume it already has the initial
> buffer contents available on entry.

I'll mull this one over a bit. I guess we'd want a way to treat "exec
format error" in the same way as open() failing, i.e. as "we didn't
find anything useful, keep on searching". So, we need to know sooner
whether or not mapping will work or not. If I am understanding your
proposal correctly, that is exactly what you are getting at, as well.

> If you think there are less invasive ways to do this, I'd be happy to
> hear other ideas too.

We'll see. :-)

Regards,
-Markus
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.