Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190316022534.GN26605@port70.net>
Date: Sat, 16 Mar 2019 03:25:34 +0100
From: Szabolcs Nagy <nsz@...t70.net>
To: musl@...ts.openwall.com
Cc: Michael Jeanson <mjeanson@...icios.com>,
	Richard Purdie <richard.purdie@...uxfoundation.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Jonathan Rajotte-Julien <jonathan.rajotte-julien@...icios.com>
Subject: Re: sysconf(_SC_NPROCESSORS_CONF) returns the wrong value

* Jonathan Rajotte-Julien <jonathan.rajotte-julien@...icios.com> [2019-03-15 17:02:02 -0400]:
> We are currently in the process of making sure that lttng [1] (linux tracer) run
> smoothly on system using musl (Yocto, Alpine etc.). Most things work
> fine. Still, we currently have tests that are failing due to an issue regarding
> the reported number of configured processors on the system (__SC_NPROCESSORS_CONF).
> Note that users of LTTng are also affected by this if they chose to modify the
> sched affinity of their instrumented apps. This is relatively a big deal for us.
> 
> Long story short, we start an app with "taskset -c 0" and we need to allocate
> data structure internally but using the number of configured processors not the
> number of online processors. To do so we call sysconf(__SC_NPROCESSORS_CONF).
> Slight problem: the value returned is the _SC_NPROCESSORS_ONLN value instead of
> __SC_NPROCESSORS_CONF.
...
> A simple command line to show this:
> 
>   taskset -c 0 nproc --all
> 
> This is equivalent to asking sysconf(__SC_NPROCESSORS_CONF).

the right way to check the sysconf from a shell is getconf

on glibc system
$ taskset -c 0 getconf -a |grep NPROC
_NPROCESSORS_CONF                  8
_NPROCESSORS_ONLN                  8

on musl
$ taskset -c 0 getconf -a |grep NPROC
_NPROCESSORS_CONF                  1
_NPROCESSORS_ONLN                  1

so both values differ (plain nproc returns the affinity number,
*_ONLN is all the cpus that the kernel schedules to, *_CONF
includes offline cpus that may be hotplugged)

these are documented linux extensions so i think musl should follow
the linux sysconf man page. (but the semantics is not entirely clear
e.g. there is /sys/devices/system/cpu/possible which can have larger
number than echo /sys/devices/system/cpu/cpu[0-9]* |wc -w which is
what glibc seems to be doing for *_CONF)

i think we need to know why does a process care if musl returns
the wrong number? or what are the valid uses of such a number?
(there are heterogeous systems like arm big-little, numa systems
with many sockets, containers, virtualization,.. how deep may a
user process need to go down in this rabbit hole?)

note that most of /sys/devices/system/cpu/* is documented under
Documentation/ABI/testing in linux, not in Documentation/ABI/stable
and the format is not detailed, and some apis (e.g. /proc/cpuinfo)
are known to be different on android (and grsec?) kernels it may
be unmounted during early boot or in chroots, so sysfs parsing is
only done when really necessary.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.