Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sat, 17 Feb 2024 12:08:12 +0100
From: g1pi@...ero.it
To: musl@...ts.openwall.com
Subject: dns resolution failure in virtio-net guest


Hi all.

I stumped on a weird instance of domain resolution failure in a
virtualization scenario involving a MUSL-based guest.  A little
investigation turned out results that are puzzling, at least for me.

This is the scenario:

Host:
- debian 12 x86_64
- kernel 6.1.0-18-amd64, qemu 7.2
- caching nameserver listening on 127.0.0.1

Guest:
- void linux x86_64
- kvm acceleration
- virtio netdev, configured in (default) user-mode
- kernel 6.1.71_1, musl-1.1.24_20
- /etc/resolv.conf:
    nameserver 10.0.2.2         the caching dns in the host
    nameserver 192.168.1.123    non existent

In this scenario, "getent hosts example.com" consistently fails.

The problem vanishes when I do any of these:
- strace the command (!)
- replace 10.0.2.2 with another working dns across a physical cable/wifi
  (e.g. 192.168.1.1)
- remove the non-existent dns
- swap the nameservers in /etc/resolv.conf

I wrote a short test program (see below) to perform the same system calls
done by the MUSL resolver, and it turns out that

- when all sendto() calls are performed in short order, the (unique)
  response packet is never received

    $ ./a.out 10.0.2.2 192.168.1.123
    poll: 0 1 0
    recvfrom() -1
    recvfrom() -1

- if a short delay (16 msec) is inserted between the calls, all is fine

    $ ./a.out 10.0.2.2 delay 192.168.1.123
    poll: 1 1 1
    recvfrom() 45
    <response packet>
    recvfrom() -1

The program's output is the same in several guests with different
kernel/libc combinations (linux/glibc, linux/musl, freebsd, openbsd).
Only when the emulated netdev was switched from virtio to pcnet, did
the problem go away.

I guess that, when there is no delay between the sendto() calls, the
second one happens exactly while the kernel is receiving the response
packet, and the latter is silently dropped.  A short delay before
the second sendto(), or a random delay in the response (because the
working dns is "far away"), apparently solve the issue.

I don't know what the UDP standard mandates, and especially what should
happen when a packet is received on a socket at the exact time another
packet is sent out on the same socket.

If the kernel is allowed to drop the packet, then the MUSL resolver
could be modified to introduce some minimal delay between calls, at
least when retrying.

Otherwise, there could be a race-condition in the network layer.
Perhaps in the host linux/kvm/qemu.  Perhaps in virtio-net, since the
problem shows up in guests with different kernels, and only when they
use virtio-net; but it might just be that other emulated devices mask
the issue just by adding a little overhead.

Please, CC me in replies.

Best regards,
        g.b.

===== cut here =====

    #include <stdio.h>
    #include <time.h>
    #include <poll.h>
    #include <assert.h>
    #include <string.h>

    #include <arpa/inet.h>
    #include <netdb.h>
    #include <netinet/in.h>
    #include <sys/socket.h>
    #include <sys/socket.h>
    #include <sys/types.h>

    static void dump(const char *s, size_t len) {
        while (len--) {
            char t = *s++;
            if (' ' <= t && t <= '~' && t != '\\')
                printf("%c", t);
            else
                printf("\\%o", t & 0xff);
        }
        printf("\n");
    }

    int main(int argc, char *argv[]) {
        int sock, rv, n;
        const char req[] =
            "\202\254\1\0\0\1\0\0\0\0\0\0\7example\3com\0\0\1\0\1";
        struct timespec delay_l = { 1, 0 }; /* 1 sec */
        struct pollfd pfs;
        struct sockaddr_in me = { 0 };

        sock = socket(AF_INET, SOCK_DGRAM | SOCK_CLOEXEC | SOCK_NONBLOCK,
                      IPPROTO_IP);
        assert(sock >= 0);

        me.sin_family = AF_INET;
        me.sin_port = 0;
        me.sin_addr.s_addr = inet_addr("0.0.0.0");
        rv = bind(sock, (struct sockaddr *) &me, sizeof me);
        assert(0 == rv);

        for (n = 1; n < argc; n++) {
            if (0 == strcmp("delay", argv[n])) {
                struct timespec delay_s = { 0, (1 << 24) }; /* ~ 16 msec */
                nanosleep(&delay_s, NULL);
            } else {
                struct sockaddr_in dst = { 0 };
                dst.sin_family = AF_INET;
                dst.sin_port = htons(53);
                dst.sin_addr.s_addr = inet_addr(argv[n]);
                rv = sendto(sock, req, sizeof req - 1, MSG_NOSIGNAL,
                            (struct sockaddr *) &dst, sizeof dst);
                assert(rv >= 0);
            }
        }

        nanosleep(&delay_l, NULL);
        pfs.fd = sock;
        pfs.events = POLLIN;
        rv = poll(&pfs, 1, 2000);
        printf("poll: %d %d %d\n", rv, pfs.events, pfs.revents);

        for (n = 1; n < argc; n++) {
            char resp[4000];
            if (0 == strcmp("delay", argv[n]))
                continue;
            rv = recvfrom(sock, resp, sizeof resp, 0, NULL, NULL);
            printf("recvfrom() %d\n", rv);
            if (rv > 0)
                dump(resp, rv);
        }

        return 0;
    }

===== cut here =====

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.