Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <2112011203500.21490@stax.localdomain>
Date: Wed, 1 Dec 2021 12:49:07 +0000 (GMT)
From: Mark Hills <mark@...x.org>
To: musl@...ts.openwall.com
Subject: DNS resolver fails prematurely when server reports failure?

With multiple DNS servers in /etc/resolv.conf, the docs [1] are clear:

  "musl's resolver queries them all in parallel and accepts whichever 
   response arrives first."

So dual configuration is expected to give greater resiliancy:

  nameserver 213.186.33.99  # OVH
  nameserver 1.1.1.1        # Cloudflare

However, 1.1.1.1 appears quite prone to some kind of internal SERVFAIL 
(may be internal load shedding; though we are not making excessive DNS 
queries)

With glibc's cascading behaviour (or perhaps another OS) this may be dealt 
with by the client.

But if the wiki is read literally, the first response received is "this 
server has failed" then a good response from another server is ignored?

And indeed this seems to be the behaviour we experience, as removing 
1.1.1.1 restored reliability.

I tried to confirm this in the source [2] but found I'd need more time to 
understand this code.

Also, diagnosis was made more difficult by a colleage diligently following 
the resolv.conf(5) man page on the host (installed via man-pages on Alpine 
Linux) but this documents glibc. Perhaps musl could/should provide its 
own, but I expect there is a policy for this and similar issues.

Thanks

[1] https://wiki.musl-libc.org/functional-differences-from-glibc.html
[2] https://git.musl-libc.org/cgit/musl/tree/src/network/lookup_name.c#n296

-- 
Mark

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.