Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALATqH=0MkjYhTYcuMxpu+nKmQxn25N+gmQDha_E3EUdxnwWkQ@mail.gmail.com>
Date: Fri, 3 Jun 2022 00:28:40 +0200
From: Sascha Braun <sascha.braun.lpz@...glemail.com>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: Problably Issue in name_from_dns // __res_msend_rc (lookup_name.c)

Hi Rich,

I think I narrowed the thing down. Below is a dump of what happens in a
'normal' situation and what happened when the sporadic issue appeared.

Normal:
begin lookup3...
__syscall_send_internal 020000350808040400000000000000008.8.4.4:53]
ecea010000010000000000000377777706676f6f676c6503636f6d0000010001
__syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53]
ecea010000010000000000000377777706676f6f676c6503636f6d0000010001
__syscall_send_internal 020000350909090900000000000000009.9.9.9:53]
ecea010000010000000000000377777706676f6f676c6503636f6d0000010001
__syscall_send_internal 020000350808040400000000000000008.8.4.4:53]
ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001
__syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53]
ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001
__syscall_send_internal 020000350909090900000000000000009.9.9.9:53]
ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001
__syscall_recv begin EP[0.0.0.0:0]
__syscall_recv'd_internal [020000000000000000000000000000009.9.9.9:53]
ecea818000010001000000000377777706676f6f676c6503636f6d0000010001c00c00010001000000390004d83ad4840000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
__syscall_recv begin EP[9.9.9.9:53]
__syscall_recv'd_internal [020000350909090900000000000000009.9.9.9:53]
ecea818000010001000000000377777706676f6f676c6503636f6d00001c0001c00c001c00010000003400102a0014504001080800000000000020040000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
connect:0200ffffd83ad4840000000000000000
connect:0a00ffff000000002a00145040010808000000000000200400000000

So we did sent the bytes
ecea010000010000000000000377777706676f6f676c6503636f6d0000010001 and the
other sequence (hex encoded, 2 characters per byte) to each of the DNS and
we received back one 'short' and one 'long' reply from 9.9.9.9.
I guess the short one is IPv4, long one IPv6(?). That's the case with all
successful lookups, i.e. the 99% ok ones - (at least) one short - (at
least) one long.

Now the problematic one:

begin lookup1...
__syscall_send_internal 020000350808040400000000000000008.8.4.4:53]
94d90100000100000000000003777777037765620264650000010001
__syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53]
94d90100000100000000000003777777037765620264650000010001
__syscall_send_internal 020000350909090900000000000000009.9.9.9:53]
94d90100000100000000000003777777037765620264650000010001
__syscall_send_internal 020000350808040400000000000000008.8.4.4:53]
94d901000001000000000000037777770377656202646500001c0001
__syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53]
94d901000001000000000000037777770377656202646500001c0001
__syscall_send_internal 020000350909090900000000000000009.9.9.9:53]
94d901000001000000000000037777770377656202646500001c0001

__syscall_recv begin EP[0.0.0.0:0]
__syscall_recv'd_internal [020000000000000000000000000000009.9.9.9:53]
94d981800001000100010000037777770377656202646500001c0001c00c000500010000004e000f0377777708672d68612d776562c014c02c00060001000000340031036e733102706f0675692d646e73c0140a686f73746d6173746572c04378860f1200002a3000000e1000093a800000003c000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
__syscall_recv begin EP[9.9.9.9:53]
__syscall_recv'd_internal [020000350909090900000000000000008.8.4.4:53]
94d981800001000100010000037777770377656202646500001c0001c00c0005000100000114000f0377777708672d68612d776562c014c02c000600010000002c0031036e733102706f0675692d646e73c0140a686f73746d6173746572c04378860f1200002a3000000e1000093a800000003c000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
getaddrinfo1: Address in use

We received two 'long' responses, one from 9.9.9.9; one from 8.8.4.4

All occurrences of the problem show this constellation - two 'long'
responses received.
As a note, of course my implementation of recv returns the correct number
of bytes received. The zeros you see are only from the dump function, it's
dumping the 512 byte buffer.

I hope this is helpful in some manner.

I came across this, I seems glibc had a similar issue (I did not look
in-depth, just want to share the link)
https://bugzilla.redhat.com/show_bug.cgi?id=1044628
https://sourceware.org/legacy-ml/libc-alpha/2014-04/msg00321.html

Thanks
Sascha



Am Do., 2. Juni 2022 um 16:25 Uhr schrieb Rich Felker <dalias@...c.org>:

> On Wed, Jun 01, 2022 at 11:52:46PM +0200, Sascha Braun wrote:
> > I see - getaddrinfo always returns -2, EAI_NONAME in case of these rare
> > errors.
> >
> > The errno value (EAGAIN) came from my syscall, recvfrom, which does not
> > have data in these cases (all packets indicated by 'poll' were read
> > previously by recvfrom calls). (Btw., I did try to return 0 for these
> cases
> > in my recvfrom implementation, did not help either.)
>
> It's normal to see these EAGAINs. They are not errors. Returning 0
> would be wrong; a zero-length read means EOF not EAGAIN.
>
> I don't see an immediate cause visible for why you're getting
> EAI_NONAME. Note that your debug output does not seem to show the
> *length* recvfrom returned, so is it possible you're just showing old
> data that was already in the buffer, when recvfrom really
> malfunctioned and returned a short read? At the res_msend loop layer
> musl would possibly accept this (like you seem to be seeing) but then
> find no results later.
>
> Rich
>
>
>
> >
> > Thanks
> >
> > Am Mi., 1. Juni 2022 um 23:30 Uhr schrieb Rich Felker <dalias@...c.org>:
> >
> > > On Wed, Jun 01, 2022 at 10:35:52PM +0200, Sascha Braun wrote:
> > > > Hi Rich,
> > > >
> > > > Thanks for your time. The EADDRINUSE is in reality an EAGAIN, this
> is a
> > > > glitch in the Emscripten/WASI modification for musl (there are
> different
> > > > values for errnos per implementation):
> > > > __WASI_ERRNO_ADDRINUSE == -3 == EAI_AGAIN.
> > > >
> > > > My question currently breaks down to: Is it common for this
> > > implementation
> > > > of getaddrinfo to return EAGAIN sometimes (I experience 1:100), even
> if
> > > > answers from DNS servers came in?
> > > > If not, I need to investigate much further. I really could not find
> > > issues
> > > > with my sockets implementation, although your thought is of course
> very
> > > > reasonable.
> > >
> > > OK, looking at your code again, part of your problem is that you're
> > > misusing perror. You need to save the return value of getaddrinfo.
> > > Unless it's EAI_SYSTEM, errno is meaningless and you need to use
> > > gai_strerror to get a string for the error code (the value that was
> > > returned) rather than inspecting errno.
> > >
> > > If getaddrinfo actually returned EAI_AGAIN, this probably means you
> > > got an inconclusive result from one of the nameservers, probably
> > > ServFail. However if your debug output is packet dumps, I'm not seeing
> > > a ServFail there, and I'm not convinced it actually returned EAI_AGAIN
> > > since you're not saving the value to check it. The value that happens
> > > to be in errno is NOT the error getaddrinfo returned. Can you check
> > > what it's actually returning?
> > >
> > >
> > > > Am Mi., 1. Juni 2022 um 16:14 Uhr schrieb Rich Felker <
> dalias@...c.org>:
> > > >
> > > > > On Tue, May 31, 2022 at 12:24:19PM +0200, Sascha Braun wrote:
> > > > > > Hi everyone,
> > > > > >
> > > > > > I'm implementing a socket protocol with similar works to
> Emscripten
> > > - ok:
> > > > > >
> > > > > > Here's what I noticed
> > > > > >
> > > > > > - resolv.conf using multiple DNS servers
> > > > > > options timeout:22 attempts:5
> > > > > > nameserver 8.8.4.4
> > > > > > nameserver 208.67.222.222
> > > > > > nameserver 9.9.9.9
> > > > > > nameserver 1.1.1.1
> > > > > >
> > > > > > - getaddrinfo with no hints, so that IPV6 and IPV4 is resolved
> > > > > > void socketstest1_client_dnsonly(void) {
> > > > > >
> > > > > > struct addrinfo hints, * res;
> > > > > > memset(&hints, 0, sizeof(hints));
> > > > > > printf("begin lookup1...\n");
> > > > > > if (getaddrinfo("www.web.de", "80", &hints, &res) != 0) {
> > > > > > perror("getaddrinfo1");
> > > > > > }
> > > > > > printf("begin lookup2...\n");
> > > > > > if (getaddrinfo("www.google.de", "80", &hints, &res) != 0) {
> > > > > > perror("getaddrinfo2");
> > > > > > }
> > > > > > printf("begin lookup3...\n");
> > > > > > if (getaddrinfo("www.google.com", "80", &hints, &res) != 0) {
> > > > > > perror("getaddrinfo3");
> > > > > > }
> > > > > >
> > > > > > }
> > > > > >
> > > > > > When repeating socketstest1_client_dnsonly often, it appears that
> > > IPV6
> > > > > > answers for IPV4 requests to another server [or vice-versa]
> > > responses can
> > > > > > get mixed up and getaddrinfo reports sometimes only an error
> where
> > > none
> > > > > is.
> > > > > >
> > > > > > Below is a console dump of my test where you see the critical
> > > situation
> > > > > > (end)
> > > > > > 8.8.4.4:53 was queried, 9.9.9.9:53 responded for a (different IP
> > > Proto),
> > > > > > resulting in error getaddrinfo1: Address in use
> > > > > >
> > > > > > This happens in about 1/100 tests. When you specify IPV4 or IPV6
> in
> > > > > hints,
> > > > > > this issue does NOT show up.
> > > > >
> > > > > Can you provide more information on how to interpret the console
> dump,
> > > > > and in particular, which of your syscalls is returning the
> EADDRINUSE
> > > > > error? I'm pretty sure this is just a bug in your socket
> > > > > stack/emulation. It looks like it's coming from recvfrom, and
> > > > > EADDRINUSE is not a valid error for recvfrom to produce.
> > > > >
> > > > > Rich
> > > > >
> > >
>

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.