Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c2c7d1ce-1e0d-a504-f8be-313fe7385240@gmail.com>
Date: Mon, 10 Jul 2017 10:22:37 +0200
From: Bartosz Brachaczek <b.brachaczek@...il.com>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] handle whitespace before %% in scanf

Hello,

On 7/10/2017 4:00 AM, Rich Felker wrote:
> On Sun, Jul 09, 2017 at 11:00:18PM +0200, Bartosz Brachaczek wrote:
>> this is mandated by C and POSIX standards and is in accordance with
>    ^^^^
>> glibc behavior.
> 
> Can you explain exactly what "this" refers to?

Ah, poor wording choice on my part. Yes, I meant that %% consumes 
whitespace. Shall I resend the patch with restated commit message if you 
think it's otherwise good?

> It looks like you're claiming %% consumes space, which I can't find
> any support for in the C standard. Has this topic been discussed
> somewhere I should see?

Sorry, I didn't think this would be controversial. No prior discussion. 
Let me present my reasoning below.

The following paragraph in the description of the fscanf function in the 
C11 standard, §7.21.6.2, establishes that '%%' is a "conversion 
specification", where '%' is the "conversion specifier":

> The format shall be a multibyte character sequence, beginning and
> ending in its initial shift state. The format is composed of zero or
> more directives: one or more white-space characters, an ordinary
> multibyte character (neither '%' nor a white-space character), or a
> conversion specification. Each conversion specification is introduced
> by the character '%'. After the '%', the following appear in sequence:
> 
> -- . . .
> 
> -- A "conversion specifier" character that specifies the type of
>    conversion to be applied.

That '%' is a valid conversion specifier is established a few paragraphs 
below:

> The conversion specifiers and their meanings are:
> 
> . . .
> 
> '%'     Matches a single '%' character; no conversion or assignment
>         occurs. The complete conversion specification shall be '%%'.

Between the above paragraphs, there is a definition of how a conversion 
specification is executed:

> A directive that is a conversion specification defines a set of matching
> input sequences, as described below for each specifier. A conversion
> specification is executed in the following steps:
> 
> Input white-space characters (as specified by the 'isspace' function)
> are skipped, unless the specification includes a '[', 'c', or 'n'
> specifier.
> 
> . . .

 From the above I conclude that all conversion specifications, except 
'%[', '%c', and '%n', consume whitespace. This includes the '%%' 
conversion specification.

The above can be applied just as well to C99. However, C11 added a new 
example (still in §7.21.6.2) that seems to confirm my reading of the 
normative text:

> EXAMPLE 5 The call:
> 
>     #include <stdio.h>
>     /* ... */
>     int n, i;
>     n = sscanf("foo % bar 42", "foo%%bar%d", &i);
> 
> will assign to 'n' the value 1 and to 'i' the value 42 because input
> white-space characters are skipped for both the '%' and 'd' conversion
> specifiers.

Now, the code in the example is clearly broken, as either the format 
string should be "foo%% bar%d" or the input string should be
"foo %bar 42", but the explanation does imply that '%%' consumes whitespace.

Bartosz

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.