Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 30 Mar 2018 14:19:48 -0500
From: William Pitcock <nenolod@...eferenced.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] resolver: only exit the search path loop there are
 a positive number of results given

Hello,

On Fri, Mar 30, 2018 at 2:14 PM, Rich Felker <dalias@...c.org> wrote:
> On Fri, Mar 30, 2018 at 06:52:25PM +0000, William Pitcock wrote:
>> In the event of no results being given by any of the lookup modules, EAI_NONAME will still
>> be thrown.
>>
>> This is intended to mitigate problems that occur when zones are hosted by weird DNS servers,
>> such as the one Cloudflare have implemented, and appear in the search path.
>> ---
>>  src/network/lookup_name.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
>> index 209c20f0..b068bb92 100644
>> --- a/src/network/lookup_name.c
>> +++ b/src/network/lookup_name.c
>> @@ -202,7 +202,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[
>>                       memcpy(canon+l+1, p, z-p);
>>                       canon[z-p+1+l] = 0;
>>                       int cnt = name_from_dns(buf, canon, canon, family, &conf);
>> -                     if (cnt) return cnt;
>> +                     if (cnt > 0) return cnt;
>>               }
>>       }
>
> This patch is incorrect, and the reason should be an FAQ item if it's
> not already. Only a return value of 0 means that the requested name
> does not exist and that it's permissible to continue search. Other
> nonpositive return values indicate either that the name does exist but
> does not have a record of the quested type, or that a transient error
> occurred, making it impossible to determine whether the search can be
> continued and thus requiring the error to be reported to the caller.
> Anything else results in one or both of the following bugs:
>
> - Nondeterministically returning different results for the same query
>   depending on transient unavailability of the nameservers to answer
>   on time.
>
> - Returning inconsistent results (for different search components)
>   depending on whether AF_INET, AF_INET6, or AF_UNSPEC was requested.
>
> I'm aware that at least rancher-dns and Cloudflare's nameservers have
> had bugs related to this issue. I'm not sure what the status on
> getting them fixed is, and for Cloudflare I don't know exactly what it
> is they're doing wrong or why. But I do know the problem is that
> they're returning semantically incorrect dns replies.

Kubernetes imposes a default search path with the cluster domain last, so:

  - local.prod.svc.whatever
  - prod.svc.whatever
  - svc.whatever
  - yourdomain.com

The cloudflare issue is that they send SUCCESS code with 0 replies,
which causes musl to error when it hits the yourdomain.com.

Do you have any suggestions on a mitigation which would be more
palatable?  We need to ship a mitigation for this in Alpine 3.8
regardless.  I would much rather carry a patch that is upstreamable,
but I am quite willing to carry one that isn't, in order to solve this
problem.

William

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.