Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150819015132.GP32742@brightrain.aerifal.cx>
Date: Tue, 18 Aug 2015 21:51:32 -0400
From: Rich Felker <dalias@...c.org>
To: Chuck Lever <chuck.lever@...cle.com>
Cc: musl@...ts.openwall.com
Subject: Re: Re: nfs-utils broken with musl: "select: Bad file
 descriptor"

On Tue, Aug 18, 2015 at 06:44:46PM -0700, Chuck Lever wrote:
> 
> On Aug 18, 2015, at 6:24 PM, Rich Felker <dalias@...c.org> wrote:
> 
> > On Tue, Aug 18, 2015 at 06:05:01PM -0700, Chuck Lever wrote:
> >>>> i think this call goes wrong:
> >>>> 
> >>>> 
> >>> http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=utils/statd/rmtcall.c;hb=HEAD#l56
> >>> 
> >>>> 
> >>>> it loops for 100 iterations and if all ports are used
> >>>> according to getservbyport then it FD_SET(sockfd, &SVC_FDSET);
> >>>> with some random high sockfd (eg. 105) that is closed.
> >>>> 
> >>>> ...so should getservbyport fail there?
> >>>> 
> >>>> (according to strace it tries ports 883 to 982)
> >>> 
> >>> I think the application's expectation is that it fail rather than
> >>> returning a decimal-string-only service entity. However it looks like
> >>> the code is written to handle the case where all 100 iterations fail
> >>> to get an anonymous port. The problem seems to be that, when the loop
> >>> stops due to hitting the iteration count rather than exiting with
> >>> break, i has already been incremented past the last tmp_socket slot,
> >>> so the close loop closes the fd that they actually want to use, later
> >>> causing EBADF. This is purely an application bug, but it happens not
> >>> to get noticed if getservbyport fails anywhere along the way, which
> >>> they expect to happen in the usual case.
> >> 
> >> statd_get_socket() is hunting for a privileged source port that
> >> is not just unused at the moment, but that is also not going to be
> >> used by some other well-known service. This is a long-lived socket
> >> that statd uses to communicate with the kernel. It must use a
> >> privileged port.
> >> 
> >> if getservbyport(3) is returning something for every port that
> >> is tried, then statd_get_socket() will fail to find a usable
> >> port.
> >> 
> >> If it's returning 105, that suggests it has run out of retries.
> >> It should return -1 in this case. That is a logic bug.
> >> 
> >> But is it true that every port returned by bindresvport(3) is
> >> actually defined in /etc/services? Surely there is one open
> >> port that can be used. What port does bindresvport(3) start
> >> with?
> 
> > The logic bug is the count-down loop that closes all the temp sockets.
> > In the case where the loop terminates via break, it leaves the last
> > one open and only closes the extras. But in the case where where the
> > loop terminates via the end condition in the for statement, the close
> > loop closes all the sockets including the one it intends to use.
> 
> OK. Do you have a patch?

Something like this:

+        if (i == loopcnt) i--;
         while (--i >= 0)
                 close(tmp_sockets[i]);

I don't have a checkout of the source to make a proper diff, and I've
got too many things I'm trying to do at the moment already. But it
should be easy to make a real patch from what I wrote above.
 
> Still not clear why it would take 100 tries exactly.

Because getservbyport never fails, and the code path with 100 failures
to find a port where getservbyport fails was never tested. We can (and
probably should) make it fail in musl when there's no named service to
go with the port, but that doesn't change that there's a bug in this
previously-untested codepath of nfs-utils that's the source of the
EBADF error from select.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.