musl - Re: TCP support in the stub resolver

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87a736y8nu.fsf@mid.deneb.enyo.de>
Date: Mon, 20 Apr 2020 08:26:45 +0200
From: Florian Weimer <fw@...eb.enyo.de>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: TCP support in the stub resolver

* Rich Felker:

> On Sun, Apr 19, 2020 at 10:12:56AM +0200, Florian Weimer wrote:
>> * Rich Felker:
>> 
>> >> No, you can reuse the connection for the second query (in most cases).
>> >> However, for maximum robustness, you should not send the second query
>> >> until the first response has arrived (no pipelining).  You may still
>> >> need a new connection for the second query if the TCP stream ends
>> >> without a response, though.
>> >
>> > That's why you need one per request -- so you can make them
>> > concurrently (can't assume pipelining).
>> 
>> Since the other query has likely already been cached in the recursive
>> resolver due to the UDP query (which is already in progress), the
>> second TCP query only saves one round-trip, I think.  Is that really
>> worth it?
>
> If the nameserver is not local, absolutely. A round trip can be over
> 500 ms.

Sure, but you have to put this into context. In this situation, you
already need three roundtrips (UDP query, TCP handshake, TCP query).
The other TCP handshake increases the packet count quite noticeably.

>> >> Then it might be possible that no one will notice the missing TCP
>> >> fallback.
>> >
>> > Really almost no one has noticed it so far, and the places where it
>> > was noticed were buggy (IIRC Google or Cloudflare) nameservers that
>> > were sending an empty response on truncation rather than a properly
>> > truncated response, which seems to have since been fixed. (And in this
>> > case the fallback would have been a major performance hit, so it was
>> > nice that it was caught and fixed instead).
>> 
>> SPF lookups for various domains return other TXT records, which push
>> the size of the response over the limit.  There is no way to fix this
>> on the recursive resolver side because the TXT RRset is itself larger
>> than 512 bytes.
>> 
>> TXT RRsets for DKIM can also approach, but i have not seen them cross
>> it.
>> 
>> This is just one application, receiving mail with some form of
>> authentcation, that requires TCP fallback.  I'm sure there other
>> applications.
>
> Yes. I don't claim there aren't potential cases where it's wanted,
> just that it hasn't come up aside from the buggy NS with empty TC
> response.

I don't quite understand why you keep claiming that.

For this TXT response, it's not a bug:

; <<>> DiG 9.11.5-P4-5.1-Debian <<>> +ignore +noedns ebay.com txt
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43378
;; flags: qr tc rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;ebay.com.			IN	TXT

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Apr 20 07:57:10 CEST 2020
;; MSG SIZE  rcvd: 26

Lack of users reporting this could just mean that there are no users
running mail servers that use SPF authentication with musl.

> Anything related to mail is a case where you really really should be
> running a local DNSSEC-validating nameserver, which adds to the appeal
> of just doing TCP to begin with (activated by use-vc) or not at all.

Always using TCP for what is essentially a fringe case (but
unfortunately one that is needed for correctness) seems very wasteful.

With a local DNS server, EDNS with really large buffer size seems much
more attractive.  But for maximum compatibility, you will have to
rewrite the response to strip out the OPT record.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.