Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230602143759.GQ4163@brightrain.aerifal.cx>
Date: Fri, 2 Jun 2023 10:38:00 -0400
From: Rich Felker <dalias@...c.org>
To: Jens Gustedt <Jens.Gustedt@...ia.fr>
Cc: musl@...ts.openwall.com, jens.gustedt@...teo.eu
Subject: Re: [C23 printf 2/3] C23: implement the wN length specifiers
 for printf

On Tue, May 30, 2023 at 08:55:27AM +0200, Jens Gustedt wrote:
> These are mandatory for C23 and concern all types for which the
> platform has `int_leastN_t` and `uint_leastN_t`. For musl these types
> always coincide with `intN_t` and `uintN_t` and are always present for
> N equal 8, 16, 32 and 64.
> 
> They can be added for general use since all lowercase letters were
> previously reserved.
> 
> Nevertheless, users that use these modifiers will see a lot of
> warnings from compilers in the beginning. This is because the
> compilers have not yet integrated this form of a specifier into their
> correponding extensions (gcc attributes). So unfortunately also
> testing this feature may be a bit noisy for the moment.
> 
> The only architecture dependend choice is the type for N == 64, which
> may be `long` or `long long`. We just mimick the test that is done in
> other places to compare `UINTPTR_MAX` and `UINT64_MAX` to determine
> that.
> ---
>  src/stdio/vfprintf.c  | 28 +++++++++++++++++++++++++---
>  src/stdio/vfwprintf.c | 28 +++++++++++++++++++++++++---
>  2 files changed, 50 insertions(+), 6 deletions(-)
> 
> diff --git a/src/stdio/vfprintf.c b/src/stdio/vfprintf.c
> index cbc79783..265fb7ad 100644
> --- a/src/stdio/vfprintf.c
> +++ b/src/stdio/vfprintf.c
> @@ -33,7 +33,7 @@
>  
>  enum {
>  	BARE, LPRE, LLPRE, HPRE, HHPRE, BIGLPRE,
> -	ZTPRE, JPRE,
> +	ZTPRE, JPRE, WPRE,
>  	STOP,
>  	PTR, INT, UINT, ULLONG,
>  	LONG, ULONG,
> @@ -57,7 +57,7 @@ static const unsigned char states[]['z'-'A'+1] = {
>  		S('s') = PTR, S('S') = PTR, S('p') = UIPTR, S('n') = PTR,
>  		S('m') = NOARG,
>  		S('l') = LPRE, S('h') = HPRE, S('L') = BIGLPRE,
> -		S('z') = ZTPRE, S('j') = JPRE, S('t') = ZTPRE,
> +		S('z') = ZTPRE, S('j') = JPRE, S('t') = ZTPRE, S('w') = WPRE,
>  	}, { /* 1: l-prefixed */
>  		S('b') = ULONG, S('B') = ULONG,
>  		S('d') = LONG, S('i') = LONG,
> @@ -101,6 +101,12 @@ static const unsigned char states[]['z'-'A'+1] = {
>  		S('o') = UMAX, S('u') = UMAX,
>  		S('x') = UMAX, S('X') = UMAX,
>  		S('n') = PTR,
> +	}, { /* 8: w-prefixed */
> +		S('b') = UINT, S('B') = UINT,
> +		S('d') = INT, S('i') = INT,
> +		S('o') = UINT, S('u') = UINT,
> +		S('x') = UINT, S('X') = UINT,
> +		S('n') = PTR,
>  	}
>  };
>  
> @@ -447,7 +453,7 @@ static int printf_core(FILE *f, const char *fmt, va_list *ap, union arg *nl_arg,
>  	int w, p, xp;
>  	union arg arg;
>  	int argpos;
> -	unsigned st, ps;
> +	unsigned st, ps, width=0;
>  	int cnt=0, l=0;
>  	size_t i;
>  	char buf[sizeof(uintmax_t)*CHAR_BIT+3+LDBL_MANT_DIG/4];
> @@ -527,9 +533,25 @@ static int printf_core(FILE *f, const char *fmt, va_list *ap, union arg *nl_arg,
>  			if (OOB(*s)) goto inval;
>  			ps=st;
>  			st=states[st]S(*s++);
> + 			if (st == WPRE) {
> +				if (*s == '0') goto inval;
> +				width = getint(&s);
> +                        }
>  		} while (st-1<STOP);
>  		if (!st) goto inval;
>  
> +		if (ps == WPRE) switch (width) {
> +		case 8:   ps = HHPRE; st = (st == UINT) ? UCHAR  : ((st == INT) ? CHAR  : PTR); break;
> +		case 16:  ps = HPRE;  st = (st == UINT) ? USHORT : ((st == INT) ? SHORT : PTR); break;
> +		case 32:  ps = BARE;  break;
> +#if UINTPTR_MAX >= UINT64_MAX
> +		case 64:  ps = LPRE;  st = (st == UINT) ? ULONG  : ((st == INT) ? LONG  : PTR); break;
> +#else
> +		case 64:  ps = LLPRE; st = (st == UINT) ? ULLONG : ((st == INT) ? LLONG : PTR); break;
> +#endif

The logic here for whether w64 is LPRE or LLPRE is wrong. The
condition for whether [u]int64_t is long or long long is not whether
uintptr_t or long long would be too wide (they never are, in our ABIs)
but whether long is long enough. All our [u]intN_t are the lowest-rank
type of the desired size, and this seems to be consistent with what
other implementations like glibc do.

Short of a compelling reason not to, though, I plan to just go with
the fix-up of my v2 patch for this functionality. Its ordering avoids
duplication of logic (the ?: branches in the cases above) and the need
for maintaining extra side state (the width variable) thru the state
machine.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.