Follow @Openwall on Twitter for new release announcements and other news
[<prev] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87plo0tzyd.fsf@keithp.com>
Date: Tue, 15 Oct 2024 21:48:58 -0700
From: Keith Packard <keithp@...thp.com>
To: Alyssa Ross <hi@...ssa.is>, libc-coord@...ts.openwall.com
Subject: Re: sscanf("0x", "%x", &out)


> So, what's the right thing to do here?

Whoa, there's a nice corner of the spec. Let's see if my reading is at
all helpful.

Here's a bit of relevant text from fscanf (C17 7.21.6.2 paragraph 9):

"An input item is defined as the longest sequence of input characters
 which does not exceed any specified field width and which is, or is a
 prefix of, a matching input sequence."

The 'matching input sequence' wording refers to words in the strtoul
spec, as that's what all of the numeric conversions reference (C17
7.22.1.4 paragraph 2):

"If the value of base is between 2 and 36 (inclusive), the expected form
 of the subject sequence is a sequence of letters and digits
 representing an integer with the radix specified by base, optionally
 preceded by a plus or minus sign, but not including an integer
 suffix. The letters from a (or A) through z (or Z) are ascribed the
 values 10 through 35; only letters and digits whose ascribed values are
 less than that of base are permitted. If the value of base is 16, the
 characters 0x or 0X may optionally precede the sequence of letters and
 digits, following the sign if present."

It's the last sentence which seems a bit misleading to me. as it says
the "may optionally preceded the sequence of letters and digits". If we
read that as saying that the 'subject sequence' is only the letters and
digits, and not the preceding "0x" or "0X", then perhaps we're supposed
to ignore those for the purposes of computing the width. However, the
first paragraph of the strtoul spec seems unambiguous to me:

"decompose the input string into three parts: an initial, possibly
 empty, sequence of white-space characters (as specified by the isspace
 function), a subject sequence resembling an integer represented in some
 radix determined by the value of base, and a final string of one or
 more unrecognized characters"

So, the 'subject sequence' is all of the non white-space characters,
including the leading 0x/0X. Applying this to the fscanf wording above,
the field width test would include the leading 0x/0X, so a width-2
sequence from 0x01 would be "0x".

        ret = sscanf("0x01", "%2hhx%n", &out, &lim);

        ret == 1
        out == 0
        lim == 1

Now I wonder if any existing implementation agrees with this (I just
checked picolibc and it looks like it will set lim to 2 as it consumes
the 'x').

-- 
-keith

Download attachment "signature.asc" of type "application/pgp-signature" (833 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.