Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220917055807.GA2645@voyager>
Date: Sat, 17 Sep 2022 07:58:07 +0200
From: Markus Wichmann <nullplan@....net>
To: musl@...ts.openwall.com
Subject: Re: getrandom fallback - wrapper functions dilema

On Fri, Sep 16, 2022 at 12:05:02PM -0600, Lance Fredrickson wrote:
> I'm using musl on an arm embedded router (netgear R7000) running an old
> kernel, 2.6.36.4. I compiled an application using the meson build system
> which does a check for the getentropy function which it does of course find
> in musl and ultimately the program aborts. I see getentropy uses getrandom
> which is a wrapper around the syscall which came around kernel version 3.17
> . In the mailing list I saw in one discussion way back about adding a
> fallback to getrandom, maybe after integrating arc4random which doesn't seem
> to have ever happened.
>
> I appreciate that musl strives for correctness, so what is the correct
> solution for this issue?
> I think meson checks for the function availability, but I'm not sure that it
> checks for valid output. Is this a meson issue?
>

I think the application that aborts if getentropy() fails is in the
wrong here. Well, possibly. It is possible that application sees kernel
3.17 as minimum necessity. In that case, they are doing fine (although
then I would question why they detect the function during
configuration). If not, then aborting if a system call fails seems like
the wrong thing to do.  The application should instead attempt to fall
back to a different method, e.g. opening /dev/urandom, trying
getauxval(AT_RANDOM) and seeding an RNG with it, or anything of the
sort.

The fundamental disconnect here is that just because a function is
available doesn't mean it will succeed at run-time.

And no, the build system isn't doing anything wrong. The most it can do
is compile a test binary and if that worked, it has to be good enough.
In case of cross-compilation, it cannot run the binary. And anyway, the
build system is not necessarily the run-time system.

> Should a libc be compiling in syscalls and functions the running kernel
> can't support?

Yes. libc and kernel are always linked together dynamically through the
syscall interface. In general, libc cannot know what syscalls the kernel
will support. So musl uses the newest syscall interface and falls back
to older ones as necessary. In case of getentropy(), however, no
fallback was ever implemented.

> Help my lack of understanding but I think at least syscalls will return not
> supported right? So maybe the bigger issue are these syscall wrappers?

Yes, unsupported system calls will return failure with ENOSYS. And I
just checked getentropy(), and it too will report getrandom() failure.
SO the application should see failure with errno set to ENOSYS and act
accordingly. And that doesn't mean abort.

> I know that if down the road I try to run musl on another router, mipsel &
> kernel 2.6.22.19, I'm going to run into prlimit issues because prlimit came
> after this kernel version, but the prlimit function will be unconditionally
> compiled in. And it seems the autoconfs and cmakes and mesons are only
> really checking for the function availability and not so much if the syscall
> they're wrapping is actually going to work.
> getentropy is even more removed because it's a  function that relies on a
> syscall wrapped in another function.
>

It is possible that lots of open source code out there is badly made. In
this case, they assume that libc defining a certain function means the
run-time kernel will also support the underlying system call, and
absolutely nothing can fail. But it is also illogical to have a
configure option for a function and then abort if it fails. I thought
the function was optional?

Your options are:
1) Bump up the kernel version. Apparently not an option for you.
2) Patch the application to deal with failures appropriately.
3) Patch musl to fall back on failure.

Whether to pursue 2 or 3 depends highly on the applications involved and
whether a change to the libc or the application is more appropriate. For
instance, instead of getentropy(), you can open /dev/urandom. Now, the
application might be the better place to contain that change, since it
can more easily manage the file descriptor life cycle. Changing it in
libc would mean you open the file on each call to getrandom() and close
it again at the end. Or else you use a static variable for the FD and
then the application gets messed with in other ways.

> Or do the software authors and build systems need better syscall/function
> availability checks?
>

They need better run-time logic to deal with failures. Function
availability does not mean the function will succeed.

Ciao,
Markus

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.