musl - Re: [PATCH musl v2 3/3] mntent: fix parsing lines with optional fields

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220512181553.GK7074@brightrain.aerifal.cx>
Date: Thu, 12 May 2022 14:15:53 -0400
From: Rich Felker <dalias@...c.org>
To: Alyssa Ross <hi@...ssa.is>
Cc: musl@...ts.openwall.com
Subject: Re: [PATCH musl v2 3/3] mntent: fix parsing lines with
 optional fields

On Thu, May 12, 2022 at 06:02:35PM +0000, Alyssa Ross wrote:
> On Thu, May 12, 2022 at 10:08:37AM -0400, Rich Felker wrote:
> > On Thu, Jan 13, 2022 at 06:53:19PM +0000, Alyssa Ross wrote:
> > > Rich Felker <dalias@...c.org> writes:
> > >
> > > > On Thu, Jan 13, 2022 at 04:30:18PM +0000, Alyssa Ross wrote:
> > > >> Hi Rich, thanks for following up on this.
> > > >>
> > > >> Rich Felker <dalias@...c.org> writes:
> > > >>
> > > >> > On Mon, Sep 20, 2021 at 12:21:41AM -0400, Rich Felker wrote:
> > > >> >> On Wed, Sep 15, 2021 at 10:11:55PM +0000, Alyssa Ross wrote:
> > > >> >> > According to fstab(5), the last two fields are optional, but this
> > > >> >> > wasn't accepted by Musl.  After this change, only the first field is
> > > >> >> > required, which matches Glibc's behaviour.
> > > >> >> >
> > > >> >> > Using sscanf as before, it would have been impossible to differentiate
> > > >> >> > between 0 fields and 4 fields, because sscanf would have returned 0 in
> > > >> >> > both cases due to the use of assignment suppression and %n for the
> > > >> >> > string fields (which is important to avoid copying any strings).  So
> > > >> >> > instead, before calling sscanf, initialize every string to the empty
> > > >> >> > string, and then we can check which strings are empty afterwards to
> > > >> >> > know how many fields were matched.
> > > >> >> > ---
> > > >> >> >
> > > >> >> > We could also be stricter about it, and enforce that the first four
> > > >> >> > fields are present, since the man page says only the last two are
> > > >> >> > optional.  Doing that would be a simple change of checking for the
> > > >> >> > presence of mnt_opts instead of mnt_fsname at the end of my patch.
> > > >> >> >
> > > >> >> > v2: don't change n from int to size_t
> > > >> >> >
> > > >> >> >  src/misc/mntent.c | 18 +++++++++++++-----
> > > >> >> >  1 file changed, 13 insertions(+), 5 deletions(-)
> > > >> >> >
> > > >> >> > diff --git a/src/misc/mntent.c b/src/misc/mntent.c
> > > >> >> > index eabb8200..238a0efd 100644
> > > >> >> > --- a/src/misc/mntent.c
> > > >> >> > +++ b/src/misc/mntent.c
> > > >> >> > @@ -21,7 +21,8 @@ int endmntent(FILE *f)
> > > >> >> >
> > > >> >> >  struct mntent *getmntent_r(FILE *f, struct mntent *mnt, char *linebuf, int buflen)
> > > >> >> >  {
> > > >> >> > -	int cnt, n[8], use_internal = (linebuf == SENTINEL);
> > > >> >> > +	int n[8], use_internal = (linebuf == SENTINEL);
> > > >> >> > +	size_t len, i;
> > > >> >> >
> > > >> >> >  	mnt->mnt_freq = 0;
> > > >> >> >  	mnt->mnt_passno = 0;
> > > >> >> > @@ -39,10 +40,14 @@ struct mntent *getmntent_r(FILE *f, struct mntent *mnt, char *linebuf, int bufle
> > > >> >> >  			errno = ERANGE;
> > > >> >> >  			return 0;
> > > >> >> >  		}
> > > >> >> > -		cnt = sscanf(linebuf, " %n%*s%n %n%*s%n %n%*s%n %n%*s%n %d %d",
> > > >> >> > -			n, n+1, n+2, n+3, n+4, n+5, n+6, n+7,
> > > >> >> > -			&mnt->mnt_freq, &mnt->mnt_passno);
> > > >> >> > -	} while (cnt < 2 || linebuf[n[0]] == '#');
> > > >> >> > +
> > > >> >> > +		len = strlen(linebuf);
> > > >> >> > +		for (i = 0; i < sizeof n / sizeof *n; i++) n[i] = len;
> > > >> >> > +		if (sscanf(linebuf, " %n%*s%n %n%*s%n %n%*s%n %n%*s%n %d %d",
> > > >> >> > +			n, n+1, n+2, n+3, n+4, n+5, n+6, n+7,
> > > >> >> > +			&mnt->mnt_freq, &mnt->mnt_passno) == EOF && ferror(f))
> > > >> >> > +			return 0;
> > > >> >> > +	} while (linebuf[n[0]] == '#');
> > > >> >> >
> > > >> >> >  	linebuf[n[1]] = 0;
> > > >> >> >  	linebuf[n[3]] = 0;
> > > >> >> > @@ -54,6 +60,9 @@ struct mntent *getmntent_r(FILE *f, struct mntent *mnt, char *linebuf, int bufle
> > > >> >> >  	mnt->mnt_type = linebuf+n[4];
> > > >> >> >  	mnt->mnt_opts = linebuf+n[6];
> > > >> >> >
> > > >> >> > +	if (!*mnt->mnt_fsname)
> > > >> >> > +		return 0;
> > > >> >> > +
> > > >> >> >  	return mnt;
> > > >> >> >  }
> > > >> >>
> > > >> >> It looks like your patch changes the behavior for malformed lines from
> > > >> >> skipping them (and continuing to search for the next valid line) to
> > > >> >> returning 0. Is that intentional? Maybe it's better; I'm not sure. But
> > > >> >> won't it even cause blank lines to return 0?
> > > >>
> > > >> Because I only check for the first field being present, a lot of
> > > >> nonsensical lines will be accepted.
> > > >>
> > > >> As I said in the patch commentary:
> > > >>
> > > >> > After this change, only the first field is
> > > >> > required, which matches Glibc's behaviour.
> > > >> >
> > > >> > [snip]
> > > >> >
> > > >> > We could also be stricter about it, and enforce that the first four
> > > >> > fields are present, since the man page says only the last two are
> > > >> > optional.  Doing that would be a simple change of checking for the
> > > >> > presence of mnt_opts instead of mnt_fsname at the end of my patch.
> > > >>
> > > >> It probably would make more sense to check that the four fields the man
> > > >> pages implies are required are all there, by making the change I
> > > >> suggested in the commentary, at least until somebody complains about
> > > >> their two-field fstab being accepted by Glibc and not Musl.
> > > >>
> > > >> > Indeed it also seems to be skipping empty lines, contrary to what you
> > > >> > said in another message:
> > > >> >
> > > >> >>  • Empty lines should be skipped.
> > > >>
> > > >> Yes, it looks like I was mistaken before when I thought that Musl didn't
> > > >> properly handle comments and empty lines.  Looking back, it seems that
> > > >> the tests I was running were against mntent files with only four fields,
> > > >> so the parsing failures I was seeing were because of that, not because
> > > >> of an issue in the comment or empty line handling.
> > > >>
> > > >> My patch does (inadventently) change the behaviour of empty line
> > > >> handling.  We should leave the current behaviour of skipping over empty
> > > >> lines as is.  Suggested fix at the end of this message.
> > > >>
> > > >> > Do you have a preference on how to proceed? We could add back a
> > > >> > condition to the while loop, something like linebuf[n[0]]=='#' ||
> > > >> > n[6]==len (i.e. skip lines with too few fields, possibly using a
> > > >> > different number instead of 6 if more appropriate). Or we could do
> > > >> > what I suggested before:
> > > >> >
> > > >> >> A less invasive change might be adding "%1[ \t\n\v\f\r]" and a dummy
> > > >> >> char* argument to collect the value before the " %d %d". Then you can
> > > >> >> check for cnt<1. But I'm not sure even the 4th field should be
> > > >> >> mandatory. This same apprach could be used to make just 3 mandatory if
> > > >> >> desired though.
> > > >> >
> > > >> > Thoughts?
> > > >>
> > > >> I think it would clearer to have an explicit check that the last
> > > >> mandatory field is set, like I currently do with the mnt_fsname check at
> > > >> the end of the function.  I don't particularly mind how many fields are
> > > >> mandatory, as long as its four or fewer so Musl's behaviour follows the
> > > >> fstab format described in the man page.
> > > >>
> > > >> So overall my proposed revisions would be the following.  There's a
> > > >> change to move to the next line if the current one is empty, and a
> > > >> change to ensure the first four fields are all present.  (If you decide
> > > >> you'd like the fourth field to also be optional, we can just change
> > > >> mnt_opts to mnt_type in that check.)  It's been a long time since I last
> > > >> this code, btw, so I hope I'm not missing anything around the empty line
> > > >> check.  I'd be happy to put together a revised series, with these
> > > >> changes and a corresponding change to my libc-test patch, if you'd like.
> > > >>
> > > >> diff --git i/src/misc/mntent.c w/src/misc/mntent.c
> > > >> index 169e9789..7782cb10 100644
> > > >> --- i/src/misc/mntent.c
> > > >> +++ w/src/misc/mntent.c
> > > >> @@ -47,7 +47,7 @@ struct mntent *getmntent_r(FILE *f, struct mntent *mnt, char *linebuf, int bufle
> > > >>  			n, n+1, n+2, n+3, n+4, n+5, n+6, n+7,
> > > >>  			&mnt->mnt_freq, &mnt->mnt_passno) == EOF && ferror(f))
> > > >>  			return 0;
> > > >> -	} while (linebuf[n[0]] == '#');
> > > >> +	} while (linebuf[0] == '\n' || linebuf[n[0]] == '#');
> > > >>
> > > >>  	linebuf[n[1]] = 0;
> > > >>  	linebuf[n[3]] = 0;
> > > >> @@ -59,7 +59,7 @@ struct mntent *getmntent_r(FILE *f, struct mntent *mnt, char *linebuf, int bufle
> > > >>  	mnt->mnt_type = linebuf+n[4];
> > > >>  	mnt->mnt_opts = linebuf+n[6];
> > > >>
> > > >> -	if (!*mnt->mnt_fsname)
> > > >> +	if (!*mnt->mnt_opts)
> > > >>  		return 0;
> > > >>
> > > >>  	return mnt;
> > > >
> > > > What I still don't like here is that this changes the behavior on
> > > > something that's not a valid record (in whatever sense we define that)
> > > > from continuing the loop looking for the next one, to returning a null
> > > > pointer with no indication of what the error was. Treating empty lines
> > > > as an error (rather than continuing) was just one special case of
> > > > that. For an analogy, see how the pwd/grp functions work. Something
> > > > malformed in the file is just "not a record" and search continues for
> > > > the next valid record (if any) rather than giving the caller a
> > > > non-actionable (since this is assumed to be a sort of
> > > > trusted/authoritative data outside the application's control) error.
> >
> > Just getting back to this now after the release.. Apologies for the
> > long delay.
> >
> > > Okay, that makes sense.  So it seems like our choices when faced with an
> > > invalid record are:
> > >
> > >  • Skip it and move on to the next one, as you've proposed here; or
> > >
> > >  • Try to fill in as many fields of the mntent structure as possible,
> > >    and return successfully, like Glibc does.
> > >
> > > If we go with the first option, as you've proposed, do you think the
> > > difference in behaviour from Glibc would be an issue?
> >
> > I think the latter is better. If I remember correctly what we'd found,
> > glibc hardly has any condition on what's mandatory (just first field?
> > or first and second?) so I think the condition would be something
> > like:
> >
> > 		...
> > 	} while (linebuf[n[0]] == '#' || n[1]==len);
> >
> > or similar, possibly replacing 1 with 3 if 2 fields are mandatory,
> > etc.
> 
> I'm pretty sure it was just the first field that's mandatory with Glibc.

OK.

> > > (I'm mostly thinking of the case where a record doesn't have enough
> > > fields here.  There are also the cases where a record has too many
> > > fields, or when fields 5 and 6 are numeric.  I haven't looked into how
> > > Glibc handles those yet.)
> >
> > A few other changes I think should be made:
> >
> > - The two numeric fields should be read into temporaries initialized
> >   with default values so that it doesn't matter if they're read or
> >   not.
> 
> Does that make a difference?  We already explicitly zero them at the
> start of the function.  (Maybe I'm forgetting some scanf arcana — it's
> been a while since I've thought about this.)

Oh, I missed that. In that case I think it's okay as-is. As general
policy, I'd rather not modify the caller's object until we're
committed to storing a successful result in it, but changing that is
orthogonal to your patch and can be done separately if wanted.

> > - The n[] array should be changed to size_t[] and the %n's to %zn's.
> >   This should actually be done as a separate change, as it's a fix for
> >   a bug overlooked when 05973dc3bbc1aca9b3c8347de6879ed72147ab3b made
> >   the buffer length potentially longer than INT_MAX.
> 
> Yes.  IIRC I half did this in my original attempt at this patch, but
> didn't change the format specifiers.

Alright. It sounds like your patch is pretty much okay as-is except
for the added " || n[1]==len" in the while condition and removing the
(now unreachable) return-zero condition at the end. If that sounds
right and it's okay with you I'm happy to commit your existing patch
with those changes then make any other improvements myself or review
further patches from you separately if you like. What do you think?

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.