musl - Re: [PATCH v3] sysconf: add _SC_NPROCESSORS

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210710171137.GW13220@brightrain.aerifal.cx>
Date: Sat, 10 Jul 2021 13:11:38 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH v3] sysconf: add _SC_NPROCESSORS_CONF support

On Sat, Jul 10, 2021 at 07:31:57AM +0200, jason wrote:
> Here's a simpler bit of code to do the same thing.  The differences:
> - Doesn't bother using stdio.  /proc and /sys reads like this don't sleep
>   and so don't return short or EINTR or any of that hair.

Non-use of stdio may be better but also requires care. The code as
you've done it is not safe against cancellation (which it needs to
block).

> - Check for the terminating newline, and use it in the parsing loop
>   to trigger completing a range, just like ','.
> 
>   The previous code never actually ensured that the buffer was '\0'
>   terminated. :-(
> 
> - Use one "current number being parsed" variable rather than two.
>   Just copy n to prev_n on finding a '-'.  (Feel free to bikeshed the
>   variable names to "range_start" or something if you prefer.)
> 
> If you do want to use a FILE *, musl should probably have an internal
> API for reading into the stdio buffer and parsing straight from there,
> avoiding the separate stack buffer.  (Or even allocating the FILE on
> the stack and avoiding the malloc.)

It does. __fopen_rb_ca. This is probably the preferred way to do it.
You can then just consume bytes with getc_unlocked (which expands as a
macro within libc).

> This is probably far from the last small config file (/proc or /sys or
> perhaps something like /etc/hostname) that someone will want to access.
> 
> (You could even get ambitious and make it reallocate the buffer as
> required to get the whole file into memory

That's not a good thing.

> Probably something other
> code would be interested in, like timezone parsing.)

Time zones are mmapped and used in-place.

> #include <unistd.h>
> #include <fcntl.h>
> #include <errno.h>
> 
> long get_nrprocessors_conf1(void)
> {
> 	int fd = open("/sys/devices/system/cpu/possible",
> 			O_RDONLY | O_NOCTTY | O_CLOEXEC);
> 	if (fd < 0)
> 		goto failure;
> 
> 	char buffer[128];	/* Some reasonable size. */
> 	ssize_t ss = read(fd, buffer, sizeof buffer);
> 	/*
> 	 * Short reads, EINTR, etc. can't happen with /sys files.
> 	 * We could in theory preserve the read's errno in case the close
> 	 * overwrites it, but that seems excessive.
> 	 */
> 	if (close(fd) < 0 || ss < 0)
> 		goto failure;
> 	if (ss < 2 || buffer[ss-1] != '\n')
> 		goto parse_error;
> 	/*
> 	 * The possible CPUs are given as a comma-separated list of ranges,
> 	 * with each range being either a single decimal integer ("1") or
> 	 * a hyphen-separated range ("1-10").
> 	 *
> 	 * Q: How careful do we want the parsing to be?  For now, assume
> 	 * the kernel can be trusted not to give us "1-2-3" or ",," or
> 	 * other strangeness.
> 	 */
> 	char const *p = buffer;
> 	unsigned prev_n = -1u;
> 	unsigned n = 0;
> 	unsigned total = 0;
> 	for (;;) {
> 		char c = *p++;
> 		switch (c) {
> 		  case '0': case '1': case '2': case '3': case '4':
> 		  case '5': case '6': case '7': case '8': case '9':
> 			if (n > (-1u - 9)/10)
> 				goto parse_error;
> 			n = 10 * n + c - '0';
> 			break;
> 		  case '-':
> 			prev_n = n;
> 			n = 0;
> 			break;
> 		  case ',': case '\n':
> 			if (n > prev_n)
> 				total += n - prev_n;
> 			total++;
> 			if (c == '\n')
> 				return (long)total;	/* Success! */
> 			prev_n = -1u;
> 			n = 0;
> 			break;
> 		  default:
> 			goto parse_error;
> 		}
> 	}
> 
> parse_error:
> 	errno = ENOTSUP;	/* What's a good error to return? */

There is no need to account for data errors at all. If you can't trust
the kernel not to give you something invalid, you're running in a
compromised context where you can't do anything.

If this is really the right interface to get this data from, I would
hope there's some simpler way to do it, simplifying it down to assume
the data is well-formed (don't invoke UB if it's not, but returning an
utterly bogus value if it's not would be fine).

A couple minor style nits: no spaces for indentation or extra
indentation of case labels, and no need to cast in any place where
there's an implicit conversion that does the same (unsigned to long).

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.