Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211201152311.GF7074@brightrain.aerifal.cx>
Date: Wed, 1 Dec 2021 10:23:11 -0500
From: Rich Felker <dalias@...c.org>
To: Mark Hills <mark@...x.org>
Cc: musl@...ts.openwall.com
Subject: Re: DNS resolver fails prematurely when server reports
 failure?

On Wed, Dec 01, 2021 at 12:49:07PM +0000, Mark Hills wrote:
> With multiple DNS servers in /etc/resolv.conf, the docs [1] are clear:
> 
>   "musl's resolver queries them all in parallel and accepts whichever 
>    response arrives first."
> 
> So dual configuration is expected to give greater resiliancy:
> 
>   nameserver 213.186.33.99  # OVH
>   nameserver 1.1.1.1        # Cloudflare
> 
> However, 1.1.1.1 appears quite prone to some kind of internal SERVFAIL 
> (may be internal load shedding; though we are not making excessive DNS 
> queries)
> 
> With glibc's cascading behaviour (or perhaps another OS) this may be dealt 
> with by the client.
> 
> But if the wiki is read literally, the first response received is "this 
> server has failed" then a good response from another server is ignored?

No. ServFail is an inconclusive response, treated basically the same
as if no packet had arrived at all. (Slight difference: it triggers
immediate retry up to a limited number of times.)

> And indeed this seems to be the behaviour we experience, as removing 
> 1.1.1.1 restored reliability.

Have you looked at a packet capture of what's happening? Likely
1.1.1.1 was returning a false conclusive result (NxDomain or NODATA)
rather than ServFail.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.