Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200417160726.GG11469@brightrain.aerifal.cx>
Date: Fri, 17 Apr 2020 12:07:26 -0400
From: Rich Felker <dalias@...c.org>
To: Florian Weimer <fw@...eb.enyo.de>
Cc: musl@...ts.openwall.com
Subject: Re: Outgoing DANE not working

On Fri, Apr 17, 2020 at 11:22:34AM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > On Wed, Apr 15, 2020 at 08:27:08PM +0200, Florian Weimer wrote:
> >> >> I don't understand your PTR example.  It seems such a fringe case that
> >> >> people produce larger PTR responses because they add all virtual hosts
> >> >> to the reverse DNS zone.  Sure, it happens, but not often.
> >> >
> >> > I think it's probably more a matter of the concurrent lookups from
> >> > multiple nameservers (e.g. local, ISP, and G/CF, where local has
> >> > fastest round-trip but not much in cache, G/CF has nearly everything
> >> > in cache but slowest round trip, and ISP is middle on both) than lack
> >> > of tcp fallback that makes netstat etc. so much faster.
> >> 
> >> The question is: Why would you get a TC bit response?  Is the musl
> >> resolver code triggering some anti-spoofing measure that tries to
> >> validate source addresses over TCP?  (I forgot about this aspect of
> >> DNS.  Ugh.)
> >
> > TC bit is for truncation, and means that the complete response would
> > have been larger than 512 bytes and was truncated to whatever number
> > of whole RRs fit in 512 bytes.
> 
> You mentioned that TC processing added observable latency to the
> netstat tool.  netstat performs PTR queries.  Non-DNSSEC responses to
> PTR queries are rarely larger than 512 bytes.  (The only exception I
> have seen occur when people list all their HTTP virtual hosts in PTR
> records, but again, that's very rare.)  Typically, they are less than
> 150 bytes.  Non-minimal responses can be larger, but the additional
> data is removed without setting the TC bit.
> 
> This is why something very odd must have happened during your test.
> One explanation would be a middlebox that injects TC queries to
> validate source addresses.

I think this was just a misunderstanding. What I said was that things
like netstat run a lot faster in practice with musl than with other
resolvers in a range of typical setups, and the two potential factors
are concurrent requests to multiple nameservers and non-fallback to
TCP, and that I didn't have evidence for which it was. It sounds like
you have a good hypothesis that TCP is not the cause here.

> >> > However it's not clear how "fallback to tcp" logic should interact
> >> > with such concurrent requests -- switch to tcp for everything and
> >> > just one nameserver as soon as we get any TC response?
> >> 
> >> It's TCP for this query only, not all subsequent queries.  It makes
> >> sense to query the name server that provided the TC response: It
> >> reduces latency because that server is more likely to have the large
> >> response in its cache.
> >
> > I'm not talking about future queries but other unfinished queries that
> > are part of the same operation (presently just concurrent A and AAAA
> > lookups).
> 
> If the second response has TC set (but not the first), you can keep
> the first response.  Re-querying both over TCP increases the
> likelihood that you get a response from the same cluster node (so more
> consistency), but you won't get that over UDP, ever, so I don't think
> it matters.
> 
> If the first response has TC set, you have an open TCP connection you
> could use for the second query as well.  Pipelining of DNS requests
> has compatibility issues because there is no application-layer
> connection teardown (an equivalent to HTTP's Connection: close).  If
> the server closes the connection after sending the response to the
> first query, without reading the second, this is a TCP data loss
> event, which results in an RST segment and potentially, loss of the
> response to the first query.  Ideally, a client would wait for the
> second UDP response and the TCP response to arrive.  If the second UDP
> response is TC as well, the TCP query should be delayed until the
> first TCP response came back.
> 
> (We should move this discussion someplace else.)

Yes. I just took postfix-users off the CC and added musl. Discussing
it further on postfix-users does not seem constructive as the
arguments are mostly ideological (about what the roles of different
components "should be") vs practical (can we reasonably improve
behavior here?).

Indeed it sounds like one TCP connection would be needed per request,
so switchover would just be per-request if done.

My leaning is probably not to do fallback at all (complex logic,
potential for unexpected slowness, not needed by vast majority of
users) and just add TCP support with option use-vc for users who
really want complete replies. All of this would be contingent anyway
on making internal mechanisms able to handle variable result size
rather than fixed-size 512 bytes so it's not happening right away.
Doing it carelessly would create possibly dangerous bugs.

I'm still also somewhat of the opinion that users who want a resolver
library (res_* API) with lots of features should just link BIND's, but
it would be nice not to have to do that.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.