musl - Re: [PATCH 1/1] Ignore incorrect elf architecture libraries

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGt4E5tVSDAUvwribZftBsePeCkAxV_apJ+n-WR-HgQuvcfVTQ@mail.gmail.com>
Date: Thu, 26 Sep 2024 12:35:52 -0700
From: Markus Mayer <mmayer@...adcom.com>
To: Markus Mayer <mmayer@...adcom.com>, Musl Mailing List <musl@...ts.openwall.com>, 
	Colin Cross <ccross@...roid.com>, Colin Cross <ccross@...gle.com>
Subject: Re: [PATCH 1/1] Ignore incorrect elf architecture libraries

On Tue, 23 Jul 2024 at 14:08, Szabolcs Nagy <nsz@...t70.net> wrote:

Thanks for reviewing the proposed changes.

It's been a little while since the submission. I found some time to
work on this again, and I was able to incorporate most of the proposed
changes. Everything is still working as intended.

Also, Colin reached out to me following my re-submission of his
proposal. He told me he wasn't going to have time to work on it for
the foreseeable future and that I was welcome to continue the work and
to take over authorship in that case.

Here's a link to the current state of affairs.[1]

I am not posting the new patch yet and just posting a link, because
work isn't quite finished. Also, this makes it easier to continue the
already existing thread. I am providing my thoughts and responses
below.

> > Modify the loader to continue searching when a file is found that is
> > a valid elf file but for the an elf machine or class that differs from
> > that of the loader itself.
>
> fwiw i think the proposed feature is reasonable, i added review
> comments that may help inclusion into musl.
>
> i'd note that skipping wrong e_machine and e_ident[EI_CLASS] is
> required by the elf spec:
>
[...]
>
> if we do this change i think e_ident[EI_DATA] should be checked
> too for le/be and e_flags for abi variants like arm hf.

I agree with regards to EI_DATA. It makes a lot of sense checking that
as well while we are at it, and it is now already implemented in the
current revision of this proposed change.

Checking the e_flags also sounds very sensible, especially in the
context of ARM. However, I am not sure if it is feasible. The
specification says that e_flags are architecture-specific, so
interpreting and checking the flags would become a per-architecture
endeavour. That would mean the loader has to know a lot about the
different architectures and then decide at runtime which check to run.
Ideally, we would probably need conditional compilation to only build
in the checks for the architecture in question. Otherwise the code
gets bloated with routines that will never be called. It sounds rather
involved and a little messy. Also, there doesn't seem to be a
precedent for architecture-specific code in the loader.

To get an idea, see [2] for what readelf is getting up to in order to
parse the ELF flags for ARM. (Spoiler alert, it's over 200 lines of
code.)

That being said, if there is a non-messy way to implement checks for
ARM soft- and hard-float, I am all for it.

> e_type, e_version, EI_OSABI and EI_ABIVERSION are unlikely to be
> useful (the latter may be used in the future to bump musl abi)
> i don't have a strong opinion about these.

Too many checks might also lead to false positives, i.e. binaries
being rejected that would have run just fine. All it would take is for
some obscure ELF header fields to be set improperly at build time and
nobody notices, because the fields are ignored by almost everyone.
Alternatively, our checks could end up being too strict even when the
fields are all set correctly.

> https://www.sco.com/developers/gabi/latest/ch5.dynamic.html#shobj_dependencies
>
> > ---
> >  ldso/dynlink.c | 77 +++++++++++++++++++++++++++++++++++++++-----------
> >  1 file changed, 61 insertions(+), 16 deletions(-)
> >
> > diff --git a/ldso/dynlink.c b/ldso/dynlink.c
> > index 324aa85919f0..2f1ac7e9d089 100644
> > --- a/ldso/dynlink.c
> > +++ b/ldso/dynlink.c
> > @@ -68,6 +68,8 @@ struct dso {
> >       size_t *dynv;
> >       struct dso *next, *prev;
> >
> > +     int elfmachine;
> > +     int elfclass;
>
> i don't think we need this in every dso. i'd make them global
>
> e.g. ldso_ehdr (or just the ldso_e_machine and ldso_e_class)

Yes, I did that. At first I created individual, global variables, but
ultimately I decided to encapsulate them in a struct to make the code
a little cleaner and introduce only one global variable rather than
four. The struct now looks a bit like it is duplicating Ehdr, but
there are far fewer fields inside.

> >       Phdr *phdr;
> >       int phnum;
> >       size_t phentsize;
> > @@ -684,6 +686,19 @@ static void unmap_library(struct dso *dso)
> >       }
> >  }
> >
> > +static int verify_elf_magic(const Ehdr* eh) {
> > +     return eh->e_ident[0] == ELFMAG0 &&
> > +             eh->e_ident[1] == ELFMAG1 &&
> > +             eh->e_ident[2] == ELFMAG2 &&
> > +             eh->e_ident[3] == ELFMAG3;
> > +}
> > +
> > +/* Verifies that an elf header's machine and class match the loader */
> > +static int verify_elf_arch(const Ehdr* eh) {
> > +     return eh->e_machine == ldso.elfmachine &&
> > +             eh->e_ident[EI_CLASS] == ldso.elfclass;
> > +}
>
> i'd do more checks and drop the comment.

Done.

> > +
> >  static void *map_library(int fd, struct dso *dso)
> >  {
> >       Ehdr buf[(896+sizeof(Ehdr))/sizeof(Ehdr)];
> > @@ -706,6 +721,10 @@ static void *map_library(int fd, struct dso *dso)
> >       if (l<0) return 0;
> >       if (l<sizeof *eh || (eh->e_type != ET_DYN && eh->e_type != ET_EXEC))
> >               goto noexec;
> > +     if (!verify_elf_magic(eh)) goto noexec;
> > +     if (!verify_elf_arch(eh)) goto noexec;
>
> ok.
>
> > +     dso->elfmachine = eh->e_machine;
> > +     dso->elfclass = eh->e_ident[EI_CLASS];
>
> i don't think this is needed.

Indeed. Removed.

> >       phsize = eh->e_phentsize * eh->e_phnum;
> >       if (phsize > sizeof buf - sizeof *eh) {
> >               allocated_buf = malloc(phsize);
> > @@ -870,29 +889,53 @@ error:
> >       return 0;
> >  }
> >
> > -static int path_open(const char *name, const char *s, char *buf, size_t buf_size)
> > +static int path_open_library(const char *name, const char *s, char *buf, size_t buf_size)
>
> no need to rename.

Done.

> >  {
> >       size_t l;
> >       int fd;
> > +     const char *p;
> >       for (;;) {
> >               s += strspn(s, ":\n");
> > +             p = s;
> >               l = strcspn(s, ":\n");
>
> i would keep the orig logic and add new stuff to
>
>         if ((fd = ...) >= 0) {
>                 // here
>         } else switch (errno) {
>                 // this is orig logic
>         }
>
> then the diff is easier to review.

Done. However, there's an indentation change nonetheless, so the diff
still looks quite large. The unchanged "else" branch shows up in the
diff because of whitespace. Using "git diff -w" for this section helps
a bit.

> >               if (l-1 >= INT_MAX) return -1;
> > -             if (snprintf(buf, buf_size, "%.*s/%s", (int)l, s, name) < buf_size) {
> > -                     if ((fd = open(buf, O_RDONLY|O_CLOEXEC))>=0) return fd;
> > -                     switch (errno) {
> > -                     case ENOENT:
> > -                     case ENOTDIR:
> > -                     case EACCES:
> > -                     case ENAMETOOLONG:
> > -                             break;
> > -                     default:
> > -                             /* Any negative value but -1 will inhibit
> > -                              * futher path search. */
> > +             s += l;
> > +             if (snprintf(buf, buf_size, "%.*s/%s", (int)l, p, name) < buf_size) {
> > +                     fd = open(buf, O_RDONLY|O_CLOEXEC);
> > +                     if (fd < 0) {
> > +                             switch (errno) {
> > +                             case ENOENT:
> > +                             case ENOTDIR:
> > +                             case EACCES:
> > +                             case ENAMETOOLONG:
> > +                                     /* Keep searching in path list. */
> > +                                     continue;
> > +                             default:
> > +                                     /* Any negative value but -1 will
> > +                                      * inhibit further path search in
> > +                                      * load_library. */
> > +                                     return -2;
> > +                             }
> > +                     }
> > +                     Ehdr eh;
> > +                     ssize_t n = pread(fd, &eh, sizeof(eh), 0);
>
> i'd use read instead of pread

I tried. It doesn't work. The reason is that read() advances the file
pointer whereas pread() does not. In other words, pread() doesn't
affect subsequent read() calls, whereas using read() here throws off a
subsequent read that was expecting to read the beginning of the file.

I didn't spend the time to narrow down where exactly it is tripping up
when using read() here. However, I tried read() followed by lseek() to
reset the file pointer to the beginning to verify my theory. Doing
this works, and therefore proves that the advanced file pointer is
indeed the problem.

To summarize:

- using pread() works
- using read() fails
- using read() followed by lseek(fd, 0, SEEK_SET) works, but adds an
extra syscall

As a result, the code is continuing to use pread().

> this adds an extra syscall per dso that uses path lookup.
> this is the main cost of the patch, so i'd note this in
> the commit msg.

Done. For now, at least. I still need to address Rich's feedback about
the caller-provided buffer.

> > +                     /* If the elf file is invalid return -2 to inhibit
> > +                      * further path search in load_library. */
> > +                     if (n < 0 ||
> > +                         n != sizeof eh ||
> > +                         !verify_elf_magic(&eh)) {
> > +                             close(fd);
> >                               return -2;
> >                       }
>
> ok.
>
> > +                     /* If the elf file has a valid header but is for the
> > +                      * wrong architecture ignore it and keep searching the
> > +                      * path list. */
> > +                     if (!verify_elf_arch(&eh)) {
> > +                             close(fd);
> > +                             continue;
> > +                     }
> > +                     return fd;
>
>
> that's a weird way to end a loop. i'd do
>
>         if (verify_elf_arch(&eh))
>                 return fd;
>         close(fd);

Done.

Regards,
-Markus

[1] https://github.com/mmayer/musl-libc/compare/master...ld_so-squashed
[2] https://sourceware.org/git?p=binutils-gdb.git;a=blob;f=binutils/readelf.c;h=0f8dc1b9716ed5c0ba13ececfc012ed59f8ba270;hb=HEAD#l3511
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.