Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220720015457.GC7074@brightrain.aerifal.cx>
Date: Tue, 19 Jul 2022 21:54:59 -0400
From: Rich Felker <dalias@...c.org>
To: "Nieminen, Jussi" <Jussi.Nieminen@...atrace.com>
Cc: "musl@...ts.openwall.com" <musl@...ts.openwall.com>
Subject: Re: Bug in getaddrinfo causing spurious returns with wrong
 error values

On Tue, Nov 23, 2021 at 02:47:49PM +0000, Nieminen, Jussi wrote:
> Hi,
> 
> I'm a developer from the performance monitoring company Dynatrace, and I've been
> recently investigating curious problems at our customers' environments where a
> call to musl's getaddrinfo appears to spuriously return ENOENT when called from
> a node.js application that is being monitored with the Dynatrace agent.
> 
> I managed to pinpoint the problem to the code that performs the AI_ADDRCONFIG
> check. If an address family that is not enabled on the host is specified, a call
> to "connect" in that code fails, the socket fd is closed, and the value of
> "errno" is then evaluated.
> 
> The problem is that the call to "close" can change the value of errno, which
> will break the switch-case that follows it. Especially if aio is used (which is
> the case when the Dynatrace agent is included in the application), the call to
> close will end up setting errno to ENOENT by default (even without a failure)
> within the "aio_cancel" function if an aio operation is active. In such a case
> getaddrinfo will then incorrectly return EAI_SYSTEM with errno set to ENOENT.
> 
> (After some error code translations within libuv, node.js will then print an
> error message claiming that getaddrinfo failed with ENOENT which is rather
> confusing.)
> 
> Even if aio is not used, the code might fail whenever "close" gets interrupted
> and returns with errno set to EINTR. As the return value of close is not
> checked, the errno might thus "silently" change before getting evaluated with
> the assumption that it still contains the value set when "connect" failed.
> 
> Below is a simple patch that should take care of this problem. Let me know if I
> can provide any more information or if there is anything else I can help with.
> 
> Thanks,
> Jussi
> 
> 
> -------------------------------------------------------------------------------
> diff --git a/src/network/getaddrinfo.c b/src/network/getaddrinfo.c
> index efaab306..71809856 100644
> --- a/src/network/getaddrinfo.c
> +++ b/src/network/getaddrinfo.c
> @@ -16,6 +16,7 @@ int getaddrinfo(const char *restrict host, const char *restrict serv, const stru
>         char canon[256], *outcanon;
>         int nservs, naddrs, nais, canon_len, i, j, k;
>         int family = AF_UNSPEC, flags = 0, proto = 0, socktype = 0;
> +       int saved_errno = 0;
>         struct aibuf *out;
> 
>         if (!host && !serv) return EAI_NONAME;
> @@ -66,11 +67,14 @@ int getaddrinfo(const char *restrict host, const char *restrict serv, const stru
>                                 pthread_setcancelstate(
>                                         PTHREAD_CANCEL_DISABLE, &cs);
>                                 int r = connect(s, ta[i], tl[i]);
> +                               /* The call to "close" might change errno, especially if aio is in use;
> +                                * save the value set by "connect" for the later comparison. */
> +                               if (r < 0) saved_errno = errno;
>                                 pthread_setcancelstate(cs, 0);
>                                 close(s);
>                                 if (!r) continue;
>                         }
> -                       switch (errno) {
> +                       switch (saved_errno) {
>                         case EADDRNOTAVAIL:
>                         case EAFNOSUPPORT:
>                         case EHOSTUNREACH:
> -------------------------------------------------------------------------------

A couple minor problems with the patch:

- The errno from socket() is not used if the failure was from
  socket(). I'm not sure yet if that matters but I think it may if
  IPv6 was disabled in a way that makes socket() fail.

- In the case where EAI_SYSTEM is returned, the error was not restored
  back into errno, so the caller cannot get the cause of error if it
  was clobbered by close.

I'll work on a fixed version. I think the right thing to do is just
save/restore errno itself rather than switching on saved_errno.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.