musl - Re: C threads, v. 6.2

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140827234657.GW12888@brightrain.aerifal.cx>
Date: Wed, 27 Aug 2014 19:46:57 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: C threads, v. 6.2

On Thu, Aug 28, 2014 at 12:11:45AM +0200, Jens Gustedt wrote:
> diff --git a/arch/arm/bits/alltypes.h.in b/arch/arm/bits/alltypes.h.in
> index 183c4c4..333d43f 100644
> --- a/arch/arm/bits/alltypes.h.in
> +++ b/arch/arm/bits/alltypes.h.in
> @@ -19,7 +19,7 @@ TYPEDEF long time_t;
>  TYPEDEF long suseconds_t;
>  
>  TYPEDEF struct { union { int __i[9]; unsigned __s[9]; } __u; } pthread_attr_t;
> -TYPEDEF struct { union { int __i[6]; volatile void *volatile __p[6]; } __u; } pthread_mutex_t;
> -TYPEDEF struct { union { int __i[12]; void *__p[12]; } __u; } pthread_cond_t;
> +TYPEDEF struct { union { int __i[6]; volatile void *volatile __p[6]; } __u; } __pthread_mutex_t;
> +TYPEDEF struct { union { int __i[12]; void *__p[12]; } __u; } __pthread_cond_t;
> [...]
> +TYPEDEF __pthread_cond_t pthread_cond_t;
> +TYPEDEF __pthread_cond_t cnd_t;
> +TYPEDEF __pthread_mutex_t pthread_mutex_t;
> +TYPEDEF __pthread_mutex_t mtx_t;

This changes the C++ ABI by changing the tag for the POSIX types, so
it can't be done this way. We could return to some serious language
lawyering about whether it's valid to access members via the 'wrong'
struct type when we have two identical struct types, or just assume
that's safe for now and do it.

There's also the approach of just having them both be separate struct
types that contain, as their first member, the real struct; then
pointer casts are legal and the problem is solved via a trivial
one-line addition to each function along the lines of:

	struct __actual_mutex *m = (void *)incoming_m_arg;

but I'm okay with deferring a change to do this unless/until it's
deemed a real issue.

> diff --git a/include/threads.h b/include/threads.h
> new file mode 100644
> index 0000000..cc396c3
> --- /dev/null
> +++ b/include/threads.h
> @@ -0,0 +1,170 @@
> +#ifndef _THREADS_H
> +#define _THREADS_H
> +
> +/* This one is explicitly allowed to be included. */
> +#include <time.h>
> +
> +#ifdef __GNUC__
> +#define _Nonnull(...) __attribute__((__nonnull__(__VA_ARGS__)))
> +#else
> +#define _Nonnull(...)
> +#endif

If we want to use attributes like this, it's a separate change from
adding threads. But I'm fairly much against doing it anyway since the
compiler already knows how to make these warnings for standard
functions, without help from the headers.

> +#ifndef __THRD_EXPERIMENTAL
> +# define __THRD_EXPERIMENTAL 1
> +#endif
> +
> +  /* The following list of 9 integer constants makes up for the binary
> +     compatibility of this C thread implementation. You must never
> +     link code against versions of the C library that do not agree
> +     upon these ABI parameters.
> [...]

Getting this committed to musl, or at least making the first release
with it present, should be committing to the ABI, so I don't think we
need any ABI-check type stuff here. But I am curious how it works.
__THRD_ABI_CHECK doesn't seem to be used anywhere. Is the intent just
that you would put it in a program manually?

> +enum {
> +  thrd_success = __THRD_SUCCESS,
> +  thrd_busy = __THRD_BUSY,
> +  thrd_error = __THRD_ERROR,
> +  thrd_nomem = __THRD_NOMEM,
> +  thrd_timedout = __THRD_TIMEDOUT,
> +};
> +
> +enum {
> +  mtx_plain = __MTX_PLAIN,
> +  mtx_recursive = __MTX_RECURSIVE,
> +  // all mutexes are timed, here. so this is a no-op
> +  mtx_timed = __MTX_TIMED,
> +};

Without ABI checking, I think we can just put the values here where
they're immediately visible rather than going through an extra level
of indirection with the preprocessor.

> +#define thread_local _Thread_local

This needs to be omitted for C++11+, I think.

> diff --git a/include/time.h b/include/time.h
> index dc88070..13dccaa 100644
> --- a/include/time.h
> +++ b/include/time.h
> @@ -129,6 +129,19 @@ int stime(const time_t *);
>  time_t timegm(struct tm *);
>  #endif
>  
> +#if __STDC_VERSION__ >= 201112L
> +  /* Implementation specific choice: The epoch that the TIME_UTC clock
> +     is based upon is the Unix Epoch, that is a struct timespec
> +     represents the number of seconds that have elapsed since 00:00:00
> +     Coordinated Universal Time (UTC), Thursday, 1 January
> +     1970. Because of differences in leap seconds this is not
> +     completely equivalent to UTC. */
> +  /* Beware that the TIME_UTC constant itself per the standard must be
> +     greater than 0. */
> +#define TIME_UTC 1
> +int timespec_get(struct timespec *, int);
> +#endif
> +
>  #ifdef __cplusplus
>  }
>  #endif

So far, when it comes to C11 library functionality as opposed to
compiler features that might not be available without compiler
support, the approach we've taken in musl is to expose them
unconditionally. This makes it possible to use the library features
even on compilers without explicit C11 support. Arguments could be
made either way on this but I think we should be consistent. It might
also be more appropriate to factor out timespec_get as a separate
patch since it's nominally separate from threads, although I don't
care much either way. Presumably the main use of it is for arguments
to timedwait/timedlock operations.

> diff --git a/src/mman/mprotect.c b/src/mman/mprotect.c
> index f488486..535787b 100644
> --- a/src/mman/mprotect.c
> +++ b/src/mman/mprotect.c
> @@ -2,10 +2,12 @@
>  #include "libc.h"
>  #include "syscall.h"
>  
> -int mprotect(void *addr, size_t len, int prot)
> +int __mprotect(void *addr, size_t len, int prot)
>  {
>  	size_t start, end;
>  	start = (size_t)addr & -PAGE_SIZE;
>  	end = (size_t)((char *)addr + len + PAGE_SIZE-1) & -PAGE_SIZE;
>  	return syscall(SYS_mprotect, start, end-start, prot);
>  }
> +
> +weak_alias(__mprotect, mprotect);

Here it might be nicer to just use __syscall(SYS_mprotect...) directly
in pthread_create. We don't need the extra page alignment overhead
there, only for the public function, and __syscall is lighter than a
function call anyway.

> diff --git a/src/thread/call_once.c b/src/thread/call_once.c
> new file mode 100644
> index 0000000..742a707
> --- /dev/null
> +++ b/src/thread/call_once.c
> @@ -0,0 +1,9 @@
> +#include "pthread_impl.h"
> +#include <threads.h>
> +
> +int __pthread_once(once_flag *, void (*)(void));
> +
> +void (call_once)(once_flag *flag, void (*func)(void)) {
> +	__THRD_ABI_MARK;
> +	(void)__pthread_once(flag, func);
> +}

We have questions about cancellation and longjmp, but I think it's
okay to leave it like this for now and address those later.

> diff --git a/src/thread/cnd_broadcast.c b/src/thread/cnd_broadcast.c
> new file mode 100644
> index 0000000..b1db667
> --- /dev/null
> +++ b/src/thread/cnd_broadcast.c
> @@ -0,0 +1,9 @@
> +#include "pthread_impl.h"
> +#include <threads.h>
> +
> +int __pthread_cond_broadcast(pthread_cond_t *);
> +
> +int cnd_broadcast(cnd_t * cnd) {
> +	int ret = __pthread_cond_broadcast(cnd);
> +	return ret ? thrd_error : thrd_success;
> +}

This could/should actually be a separate function using
__private_signal from pthread_cond_timedwait.c, I think. There's no
reason to pull in the extra process-shared broadcast code from
pthread_cond_broadcast. Right now the latter is small anyway, but it
could possibly become larger if we try to fix some of the problems
with process-shared cond vars (which are hard to fix, unfortunately).

Also I'm fine with omitting the error conversion logic since the
called function is implemented so as never to return an error (and in
fact POSIX defines no errors, so it would have to be an additional
implementation-defined error, which musl strongly avoids). If you want
this to be more clear, a short comment should probably suffice without
dead code to convert error numbers that never occur.

> diff --git a/src/thread/cnd_destroy.c b/src/thread/cnd_destroy.c
> new file mode 100644
> index 0000000..11cfc19
> --- /dev/null
> +++ b/src/thread/cnd_destroy.c
> @@ -0,0 +1,17 @@
> +#include "pthread_impl.h"
> +#include <threads.h>
> +
> +int __pthread_cond_destroy(pthread_cond_t *);
> +
> +/* The behavior of cnd_destroy is undefined if cnd is still in
> +   use. The choice for pthread_cond_destroy in that situation is to
> +   wake up all users before destroying. I am not sure that we should
> +   do it like that here, too. Alternatives would be:
> +   - complain by using perror or equivalent
> +   - assert that there is no waiter
> +   - abort when there is a waiter
> +   - do nothing
> +   */
> +void (cnd_destroy)(cnd_t *cnd) {
> +	(void)__pthread_cond_destroy(cnd);
> +}

This can just be a NOP now. The non-NOP pthread_cond_destroy is only
for process-shared cond vars. Non-shared ones do not need any action
for destruction.

> diff --git a/src/thread/cnd_timedwait.c b/src/thread/cnd_timedwait.c
> new file mode 100644
> index 0000000..cce47a4
> --- /dev/null
> +++ b/src/thread/cnd_timedwait.c
> @@ -0,0 +1,14 @@
> +#include "pthread_impl.h"
> +#include <threads.h>
> +
> +int __pthread_cond_timedwait(pthread_cond_t *restrict c, pthread_mutex_t *restrict m, const struct timespec *restrict ts);
> +
> +int cnd_timedwait(cnd_t *restrict cond, mtx_t *restrict mutex, const struct timespec *restrict ts) {
> +	int ret = __pthread_cond_timedwait(cond, mutex, ts);
> +	switch (ret) {
> +	/* May also return EINVAL or EPERM. */
> +	default:        return thrd_error;
> +	case 0:         return thrd_success;
> +	case ETIMEDOUT: return thrd_timedout;
> +	}
> +}

C11 isn't clear on what happens when the timespec is non-normalized.
Should we still just treat this as an error, or attempt to normalize
it first? While EPERM also may happen, the cases that would lead to
this seem to be UB, but I'm fine with your above mapping of it to an
error.

> diff --git a/src/thread/mtx_init.c b/src/thread/mtx_init.c
> new file mode 100644
> index 0000000..827a56a
> --- /dev/null
> +++ b/src/thread/mtx_init.c
> @@ -0,0 +1,10 @@
> +#include "pthread_impl.h"
> +#include <threads.h>
> +
> +int mtx_init(mtx_t * m, int type)
> +{
> +	*m = (pthread_mutex_t) {
> +		._m_type = ((type&mtx_recursive) ? PTHREAD_MUTEX_RECURSIVE : 0),
> +	};
> +	return thrd_success;
> +}

I'm okay with a 0 here since we assume all over that {0} init for a
mutex is a default type mutex, but you might prefer something like
PTHREAD_MUTEX_NORMAL here instead of 0.

> diff --git a/src/thread/mtx_lock.c b/src/thread/mtx_lock.c
> new file mode 100644
> index 0000000..89730f1
> --- /dev/null
> +++ b/src/thread/mtx_lock.c
> @@ -0,0 +1,14 @@
> +#include "pthread_impl.h"
> +#include <threads.h>
> +
> +int mtx_lock(mtx_t *m)
> +{
> +	__THRD_ABI_MARK;
> +	if (m->_m_type == PTHREAD_MUTEX_NORMAL && !a_cas(&m->_m_lock, 0, EBUSY))
> +		return thrd_success;
> +	/* Calling mtx_timedlock with a null pointer is an
> +	   extension. Such a call is convenient, here, since it avoids
> +	   to repeat the case analysis that is already done for
> +	   mtx_timedlock. */
> +	return mtx_timedlock(m, 0);
> +}
> diff --git a/src/thread/mtx_timedlock.c b/src/thread/mtx_timedlock.c
> new file mode 100644
> index 0000000..c42a096
> --- /dev/null
> +++ b/src/thread/mtx_timedlock.c
> @@ -0,0 +1,19 @@
> +#include "pthread_impl.h"
> +#include <threads.h>
> +
> +int __pthread_mutex_timedlock(pthread_mutex_t *restrict m, const struct timespec *restrict ts);
> +
> +int mtx_timedlock(mtx_t *restrict mutex, const struct timespec *restrict ts) {

It might be worth duplicating the a_cas attempt here for normal-type;
I'm not sure.

> +	/* In the best of all worlds this is a tail call. */
> +	int ret = __pthread_mutex_timedlock(mutex, ts);
> +	switch (ret) {
> +	/* May also return EINVAL, EPERM, EDEADLK or EAGAIN. EAGAIN is
> +	   specially tricky since C11 doesn't define how many recursive
> +	   calls can be done. (this *isn't* the maximum amount of nested
> +	   calls!) This implementation here deals this with a counter and
> +	   detects overflow, so this is definitively UB. */
> +	default:        return thrd_error;
> +	case 0:         return thrd_success;
> +	case ETIMEDOUT: return thrd_timedout;
> +	}
> +}

I don't think EPERM is possible for locking. And EDEADLK can't happen
with mutex types used for C11. So I think only EINVAL (invalid
timespec) and EAGAIN (recursive lock count overflow) apply. The code
looks fine; this is just a comment on the comments.

> diff --git a/src/thread/mtx_trylock.c b/src/thread/mtx_trylock.c
> new file mode 100644
> index 0000000..5fd823a
> --- /dev/null
> +++ b/src/thread/mtx_trylock.c
> @@ -0,0 +1,22 @@
> +#include "pthread_impl.h"
> +#include <threads.h>
> +
> +int __pthread_mutex_trylock(pthread_mutex_t *restrict m);
> +
> +int mtx_trylock(mtx_t *restrict m) {
> +	if (m->_m_type == PTHREAD_MUTEX_NORMAL)
> +		return (a_cas(&m->_m_lock, 0, EBUSY) & EBUSY) ? thrd_busy : thrd_success;
> +
> +	/* In the best of all worlds this is a tail call. */
> +	int ret = __pthread_mutex_trylock(m);
> +	switch (ret) {
> +	/* In case of UB may also return EINVAL, EPERM, EDEADLK or EAGAIN. EAGAIN is
> +	   specially tricky since C11 doesn't define how many recursive
> +	   calls can be done. (this *isn't* the maximum amount of nested
> +	   calls!) This implementation here deals this with a counter and
> +	   detects overflow, so this is definitively UB. */
> +	default:    return thrd_error;
> +	case 0:     return thrd_success;
> +	case EBUSY: return thrd_busy;
> +	}
> +}

Same here.

> diff --git a/src/thread/mtx_unlock.c b/src/thread/mtx_unlock.c
> new file mode 100644
> index 0000000..d1721a6
> --- /dev/null
> +++ b/src/thread/mtx_unlock.c
> @@ -0,0 +1,11 @@
> +#include "pthread_impl.h"
> +#include <threads.h>
> +
> +int __pthread_mutex_unlock(pthread_mutex_t *);
> +
> +int (mtx_unlock)(mtx_t *mtx) {
> +	/* In the best of all worlds this is a tail call. */
> +	int ret = __pthread_mutex_unlock(mtx);
> +	/* In case of UB may also return EPERM. */
> +	return ret ? thrd_error : thrd_success;
> +}

For unlock, C11 makes it UB to unlock a mutex you don't own (even for
recursive type), so the tail call would be legal. I don't have a
strong opinion either way here.

> diff --git a/src/thread/pthread_cond_broadcast.c b/src/thread/pthread_cond_broadcast.c
> index 69f840f..f53b41d 100644
> --- a/src/thread/pthread_cond_broadcast.c
> +++ b/src/thread/pthread_cond_broadcast.c
> @@ -2,7 +2,7 @@
>  
>  int __private_cond_signal(pthread_cond_t *, int);
>  
> -int pthread_cond_broadcast(pthread_cond_t *c)
> +int __pthread_cond_broadcast(pthread_cond_t *c)
>  {
>  	if (!c->_c_shared) return __private_cond_signal(c, -1);
>  	if (!c->_c_waiters) return 0;
> @@ -10,3 +10,5 @@ int pthread_cond_broadcast(pthread_cond_t *c)
>  	__wake(&c->_c_seq, -1, 0);
>  	return 0;
>  }
> +
> +weak_alias(__pthread_cond_broadcast, pthread_cond_broadcast);

This can be avoided if the C11 function doesn't call here but calls
__private_cond_signal directly.

> diff --git a/src/thread/pthread_cond_destroy.c b/src/thread/pthread_cond_destroy.c
> index 8c55516..b2abeb8 100644
> --- a/src/thread/pthread_cond_destroy.c
> +++ b/src/thread/pthread_cond_destroy.c
> @@ -1,6 +1,6 @@
>  #include "pthread_impl.h"
>  
> -int pthread_cond_destroy(pthread_cond_t *c)
> +int __pthread_cond_destroy(pthread_cond_t *c)
>  {
>  	if (c->_c_shared && c->_c_waiters) {
>  		int cnt;
> @@ -12,3 +12,5 @@ int pthread_cond_destroy(pthread_cond_t *c)
>  	}
>  	return 0;
>  }
> +
> +weak_alias(__pthread_cond_destroy, pthread_cond_destroy);

This can be avoided since destruction for private cv is a NOP.

> diff --git a/src/thread/pthread_cond_signal.c b/src/thread/pthread_cond_signal.c
> index 119c00a..d1d3c77 100644
> --- a/src/thread/pthread_cond_signal.c
> +++ b/src/thread/pthread_cond_signal.c
> @@ -2,7 +2,7 @@
>  
>  int __private_cond_signal(pthread_cond_t *, int);
>  
> -int pthread_cond_signal(pthread_cond_t *c)
> +int __pthread_cond_signal(pthread_cond_t *c)
>  {
>  	if (!c->_c_shared) return __private_cond_signal(c, 1);
>  	if (!c->_c_waiters) return 0;
> @@ -10,3 +10,5 @@ int pthread_cond_signal(pthread_cond_t *c)
>  	__wake(&c->_c_seq, 1, 0);
>  	return 0;
>  }
> +
> +weak_alias(__pthread_cond_signal, pthread_cond_signal);

And likewise this can be avoided using __private_cond_signal directly.

> diff --git a/src/thread/pthread_cond_timedwait.c b/src/thread/pthread_cond_timedwait.c
> index 2d192b0..393d60e 100644
> --- a/src/thread/pthread_cond_timedwait.c
> +++ b/src/thread/pthread_cond_timedwait.c
> @@ -1,5 +1,9 @@
>  #include "pthread_impl.h"
>  
> +void __pthread_testcancel(void);
> +int __pthread_mutex_lock(pthread_mutex_t *);
> +int __pthread_mutex_unlock(pthread_mutex_t *m);
> +
>  /*
>   * struct waiter
>   *
> @@ -119,7 +123,7 @@ static void unwait(void *arg)
>  	}
>  }
>  
> -int pthread_cond_timedwait(pthread_cond_t *restrict c, pthread_mutex_t *restrict m, const struct timespec *restrict ts)
> +int __pthread_cond_timedwait(pthread_cond_t *restrict c, pthread_mutex_t *restrict m, const struct timespec *restrict ts)
>  {
>  	struct waiter node = { .cond = c, .mutex = m };
>  	int e, seq, *fut, clock = c->_c_clock;
> @@ -130,7 +134,7 @@ int pthread_cond_timedwait(pthread_cond_t *restrict c, pthread_mutex_t *restrict
>  	if (ts && ts->tv_nsec >= 1000000000UL)
>  		return EINVAL;
>  
> -	pthread_testcancel();
> +	__pthread_testcancel();
>  
>  	if (c->_c_shared) {
>  		node.shared = 1;
> @@ -151,7 +155,7 @@ int pthread_cond_timedwait(pthread_cond_t *restrict c, pthread_mutex_t *restrict
>  		unlock(&c->_c_lock);
>  	}
>  
> -	pthread_mutex_unlock(m);
> +	__pthread_mutex_unlock(m);
>  
>  	do e = __timedwait(fut, seq, clock, ts, unwait, &node, !node.shared);
>  	while (*fut==seq && (!e || e==EINTR));
> @@ -197,3 +201,5 @@ int __private_cond_signal(pthread_cond_t *c, int n)
>  
>  	return 0;
>  }
> +
> +weak_alias(__pthread_cond_timedwait, pthread_cond_timedwait);

All this looks necessary and okay.

> diff --git a/src/thread/pthread_cond_wait.c b/src/thread/pthread_cond_wait.c
> index 8735bf1..58656f7 100644
> --- a/src/thread/pthread_cond_wait.c
> +++ b/src/thread/pthread_cond_wait.c
> @@ -1,6 +1,8 @@
>  #include "pthread_impl.h"
>  
> +int __pthread_cond_timedwait(pthread_cond_t *restrict c, pthread_mutex_t *restrict m, const struct timespec *restrict ts);
> +
>  int pthread_cond_wait(pthread_cond_t *restrict c, pthread_mutex_t *restrict m)
>  {
> -	return pthread_cond_timedwait(c, m, 0);
> +	return __pthread_cond_timedwait(c, m, 0);
>  }

This is unnecessary.

> diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
> index e441bda..193b295 100644
> --- a/src/thread/pthread_create.c
> +++ b/src/thread/pthread_create.c
> @@ -4,102 +4,21 @@
>  #include "libc.h"
>  #include <sys/mman.h>
>  #include <string.h>
> +#include <threads.h>
> +
> +void *__mmap(void *, size_t, int, int, int, off_t);
> +int __munmap(void *, size_t);
> +int __mprotect(void *, size_t, int);
> +void __thread_enable(void);
> +_Noreturn void __pthread_exit(void *);
> +void *__copy_tls(unsigned char *);
> +extern volatile size_t __pthread_tsd_size;
>  
>  static void dummy_0()
>  {
>  }
>  weak_alias(dummy_0, __acquire_ptc);
>  weak_alias(dummy_0, __release_ptc);
> -weak_alias(dummy_0, __pthread_tsd_run_dtors);
> -weak_alias(dummy_0, __do_private_robust_list);
> -weak_alias(dummy_0, __do_orphaned_stdio_locks);
> -
> -_Noreturn void pthread_exit(void *result)
> -{
> -	pthread_t self = __pthread_self();
> -	sigset_t set;
> -
> -	self->result = result;
> -
> -	while (self->cancelbuf) {
> -		void (*f)(void *) = self->cancelbuf->__f;
> -		void *x = self->cancelbuf->__x;
> -		self->cancelbuf = self->cancelbuf->__next;
> -		f(x);
> -	}
> -
> -	__pthread_tsd_run_dtors();
> -
> -	__lock(self->exitlock);
> -
> -	/* Mark this thread dead before decrementing count */
> -	__lock(self->killlock);
> -	self->dead = 1;
> -
> -	/* Block all signals before decrementing the live thread count.
> -	 * This is important to ensure that dynamically allocated TLS
> -	 * is not under-allocated/over-committed, and possibly for other
> -	 * reasons as well. */
> -	__block_all_sigs(&set);
> -
> -	/* Wait to unlock the kill lock, which governs functions like
> -	 * pthread_kill which target a thread id, until signals have
> -	 * been blocked. This precludes observation of the thread id
> -	 * as a live thread (with application code running in it) after
> -	 * the thread was reported dead by ESRCH being returned. */
> -	__unlock(self->killlock);
> -
> -	/* It's impossible to determine whether this is "the last thread"
> -	 * until performing the atomic decrement, since multiple threads
> -	 * could exit at the same time. For the last thread, revert the
> -	 * decrement and unblock signals to give the atexit handlers and
> -	 * stdio cleanup code a consistent state. */
> -	if (a_fetch_add(&libc.threads_minus_1, -1)==0) {
> -		libc.threads_minus_1 = 0;
> -		__restore_sigs(&set);
> -		exit(0);
> -	}
> -
> -	if (self->locale != &libc.global_locale) {
> -		a_dec(&libc.uselocale_cnt);
> -		if (self->locale->ctype_utf8)
> -			a_dec(&libc.bytelocale_cnt_minus_1);
> -	}
> -
> -	__do_private_robust_list();
> -	__do_orphaned_stdio_locks();
> -
> -	if (self->detached && self->map_base) {
> -		/* Detached threads must avoid the kernel clear_child_tid
> -		 * feature, since the virtual address will have been
> -		 * unmapped and possibly already reused by a new mapping
> -		 * at the time the kernel would perform the write. In
> -		 * the case of threads that started out detached, the
> -		 * initial clone flags are correct, but if the thread was
> -		 * detached later (== 2), we need to clear it here. */
> -		if (self->detached == 2) __syscall(SYS_set_tid_address, 0);
> -
> -		/* The following call unmaps the thread's stack mapping
> -		 * and then exits without touching the stack. */
> -		__unmapself(self->map_base, self->map_size);
> -	}
> -
> -	for (;;) __syscall(SYS_exit, 0);
> -}
> -
> -void __do_cleanup_push(struct __ptcb *cb)
> -{
> -	if (!libc.has_thread_pointer) return;
> -	struct pthread *self = __pthread_self();
> -	cb->__next = self->cancelbuf;
> -	self->cancelbuf = cb;
> -}
> -
> -void __do_cleanup_pop(struct __ptcb *cb)
> -{
> -	if (!libc.has_thread_pointer) return;
> -	__pthread_self()->cancelbuf = cb->__next;
> -}
>  
>  static int start(void *p)
>  {
> @@ -121,28 +40,10 @@ static int start(void *p)
>  
>  #define ROUND(x) (((x)+PAGE_SIZE-1)&-PAGE_SIZE)
>  
> -/* pthread_key_create.c overrides this */
> -static volatile size_t dummy = 0;
> -weak_alias(dummy, __pthread_tsd_size);
> -static void *dummy_tsd[1] = { 0 };
> -weak_alias(dummy_tsd, __pthread_tsd_main);
> -
> -static FILE *volatile dummy_file = 0;
> -weak_alias(dummy_file, __stdin_used);
> -weak_alias(dummy_file, __stdout_used);
> -weak_alias(dummy_file, __stderr_used);
> -
> -static void init_file_lock(FILE *f)
> -{
> -	if (f && f->lock<0) f->lock = 0;
> -}
> -
> -void *__copy_tls(unsigned char *);
> -
>  int pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict attrp, void *(*entry)(void *), void *restrict arg)
>  {
>  	int ret;
> -	size_t size, guard;
> +	size_t size, guard = 0;
>  	struct pthread *self, *new;
>  	unsigned char *map = 0, *stack = 0, *tsd = 0, *stack_limit;
>  	unsigned flags = CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND
> @@ -153,16 +54,7 @@ int pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict attrp
>  
>  	if (!libc.can_do_threads) return ENOSYS;
>  	self = __pthread_self();
> -	if (!libc.threaded) {
> -		for (FILE *f=libc.ofl_head; f; f=f->next)
> -			init_file_lock(f);
> -		init_file_lock(__stdin_used);
> -		init_file_lock(__stdout_used);
> -		init_file_lock(__stderr_used);
> -		__syscall(SYS_rt_sigprocmask, SIG_UNBLOCK, SIGPT_SET, 0, _NSIG/8);
> -		self->tsd = (void **)__pthread_tsd_main;
> -		libc.threaded = 1;
> -	}
> +	if (!libc.threaded) __thread_enable();
>  	if (attrp) attr = *attrp;
>  
>  	__acquire_ptc();
> @@ -191,14 +83,14 @@ int pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict attrp
>  
>  	if (!tsd) {
>  		if (guard) {
> -			map = mmap(0, size, PROT_NONE, MAP_PRIVATE|MAP_ANON, -1, 0);
> +			map = __mmap(0, size, PROT_NONE, MAP_PRIVATE|MAP_ANON, -1, 0);
>  			if (map == MAP_FAILED) goto fail;
> -			if (mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE)) {
> -				munmap(map, size);
> +			if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE)) {
> +				__munmap(map, size);
>  				goto fail;
>  			}
>  		} else {
> -			map = mmap(0, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
> +			map = __mmap(0, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
>  			if (map == MAP_FAILED) goto fail;
>  		}
>  		tsd = map + size - __pthread_tsd_size;
> @@ -240,7 +132,7 @@ int pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict attrp
>  
>  	if (ret < 0) {
>  		a_dec(&libc.threads_minus_1);
> -		if (map) munmap(map, size);
> +		if (map) __munmap(map, size);
>  		return EAGAIN;
>  	}
>  

See comments below on thrd_create.

> diff --git a/src/thread/pthread_detach.c b/src/thread/pthread_detach.c
> index 651c38e..5ca7855 100644
> --- a/src/thread/pthread_detach.c
> +++ b/src/thread/pthread_detach.c
> @@ -1,11 +1,15 @@
>  #include "pthread_impl.h"
>  
> -int pthread_detach(pthread_t t)
> +int __pthread_join(pthread_t t, void **res);
> +
> +int __pthread_detach(pthread_t t)
>  {
>  	/* Cannot detach a thread that's already exiting */
>  	if (a_swap(t->exitlock, 1))
> -		return pthread_join(t, 0);
> +		return __pthread_join(t, 0);
>  	t->detached = 2;
>  	__unlock(t->exitlock);
>  	return 0;
>  }
> +
> +weak_alias(__pthread_detach, pthread_detach);

OK.

> diff --git a/src/thread/pthread_equal.c b/src/thread/pthread_equal.c
> index 3e3df4f..38fb45e 100644
> --- a/src/thread/pthread_equal.c
> +++ b/src/thread/pthread_equal.c
> @@ -1,4 +1,4 @@
> -#include <pthread.h>
> +#include "pthread_impl.h"
>  
>  int (pthread_equal)(pthread_t a, pthread_t b)
>  {

This does not seem like a useful change and just increases compile
time.

> diff --git a/src/thread/pthread_exit.c b/src/thread/pthread_exit.c
> new file mode 100644
> index 0000000..4f51269
> --- /dev/null
> +++ b/src/thread/pthread_exit.c
> @@ -0,0 +1,130 @@
> +#define _GNU_SOURCE
> +#include "pthread_impl.h"
> +#include "stdio_impl.h"
> +#include "libc.h"
> +#include <sys/mman.h>
> +#include <threads.h>
> +
> +static void dummy_0()
> +{
> +}
> +weak_alias(dummy_0, __pthread_tsd_run_dtors);
> +weak_alias(dummy_0, __do_private_robust_list);
> +weak_alias(dummy_0, __do_orphaned_stdio_locks);
> +
> +_Noreturn void __pthread_exit(void *result)
> +{
> +	pthread_t self = __pthread_self();
> +	sigset_t set;
> +
> +	self->result = result;
> +
> +	while (self->cancelbuf) {
> +		void (*f)(void *) = self->cancelbuf->__f;
> +		void *x = self->cancelbuf->__x;
> +		self->cancelbuf = self->cancelbuf->__next;
> +		f(x);
> +	}
> +
> +	__pthread_tsd_run_dtors();
> +
> +	__lock(self->exitlock);
> +
> +	/* Mark this thread dead before decrementing count */
> +	__lock(self->killlock);
> +	self->dead = 1;
> +
> +	/* Block all signals before decrementing the live thread count.
> +	 * This is important to ensure that dynamically allocated TLS
> +	 * is not under-allocated/over-committed, and possibly for other
> +	 * reasons as well. */
> +	__block_all_sigs(&set);
> +
> +	/* Wait to unlock the kill lock, which governs functions like
> +	 * pthread_kill which target a thread id, until signals have
> +	 * been blocked. This precludes observation of the thread id
> +	 * as a live thread (with application code running in it) after
> +	 * the thread was reported dead by ESRCH being returned. */
> +	__unlock(self->killlock);
> +
> +	/* It's impossible to determine whether this is "the last thread"
> +	 * until performing the atomic decrement, since multiple threads
> +	 * could exit at the same time. For the last thread, revert the
> +	 * decrement and unblock signals to give the atexit handlers and
> +	 * stdio cleanup code a consistent state. */
> +	if (a_fetch_add(&libc.threads_minus_1, -1)==0) {
> +		libc.threads_minus_1 = 0;
> +		__restore_sigs(&set);
> +		exit(0);
> +	}
> +
> +	if (self->locale != &libc.global_locale) {
> +		a_dec(&libc.uselocale_cnt);
> +		if (self->locale->ctype_utf8)
> +			a_dec(&libc.bytelocale_cnt_minus_1);
> +	}
> +
> +	__do_private_robust_list();
> +	__do_orphaned_stdio_locks();
> +
> +	if (self->detached && self->map_base) {
> +		/* Detached threads must avoid the kernel clear_child_tid
> +		 * feature, since the virtual address will have been
> +		 * unmapped and possibly already reused by a new mapping
> +		 * at the time the kernel would perform the write. In
> +		 * the case of threads that started out detached, the
> +		 * initial clone flags are correct, but if the thread was
> +		 * detached later (== 2), we need to clear it here. */
> +		if (self->detached == 2) __syscall(SYS_set_tid_address, 0);
> +
> +		/* The following call unmaps the thread's stack mapping
> +		 * and then exits without touching the stack. */
> +		__unmapself(self->map_base, self->map_size);
> +	}
> +
> +	for (;;) __syscall(SYS_exit, 0);
> +}
> +
> +void __do_cleanup_push(struct __ptcb *cb)
> +{
> +	if (!libc.has_thread_pointer) return;
> +	struct pthread *self = __pthread_self();
> +	cb->__next = self->cancelbuf;
> +	self->cancelbuf = cb;
> +}
> +
> +void __do_cleanup_pop(struct __ptcb *cb)
> +{
> +	if (!libc.has_thread_pointer) return;
> +	__pthread_self()->cancelbuf = cb->__next;
> +}
> +
> +/* pthread_key_create.c overrides this */
> +static volatile size_t dummy = 0;
> +weak_alias(dummy, __pthread_tsd_size);
> +static void *dummy_tsd[1] = { 0 };
> +weak_alias(dummy_tsd, __pthread_tsd_main);
> +
> +static FILE *volatile dummy_file = 0;
> +weak_alias(dummy_file, __stdin_used);
> +weak_alias(dummy_file, __stdout_used);
> +weak_alias(dummy_file, __stderr_used);
> +
> +static void init_file_lock(FILE *f)
> +{
> +	if (f && f->lock<0) f->lock = 0;
> +}
> +
> +void __thread_enable(void)
> +{
> +	for (FILE *f=libc.ofl_head; f; f=f->next)
> +		init_file_lock(f);
> +	init_file_lock(__stdin_used);
> +	init_file_lock(__stdout_used);
> +	init_file_lock(__stderr_used);
> +	__syscall(SYS_rt_sigprocmask, SIG_UNBLOCK, SIGPT_SET, 0, _NSIG/8);
> +	__pthread_self()->tsd = (void **)__pthread_tsd_main;
> +	libc.threaded = 1;
> +}
> +
> +weak_alias(__pthread_exit, pthread_exit);

See comments below on thrd_create.

> diff --git a/src/thread/pthread_getspecific.c b/src/thread/pthread_getspecific.c
> index b2a282c..bfc4294 100644
> --- a/src/thread/pthread_getspecific.c
> +++ b/src/thread/pthread_getspecific.c
> @@ -1,7 +1,10 @@
>  #include "pthread_impl.h"
>  
> -void *pthread_getspecific(pthread_key_t k)
> +static void *__pthread_getspecific(pthread_key_t k)
>  {
>  	struct pthread *self = __pthread_self();
>  	return self->tsd[k];
>  }
> +
> +weak_alias(__pthread_getspecific, pthread_getspecific);
> +weak_alias(__pthread_getspecific, tss_get);

Assuming we accept that multiple standard functions are allowed to
have the same address, I think this is probably okay. (IIRC there's a
DR that implies this is okay, and if so, we should probably apply it
to math later too.)


> diff --git a/src/thread/pthread_join.c b/src/thread/pthread_join.c
> index 719c91c..bca89aa 100644
> --- a/src/thread/pthread_join.c
> +++ b/src/thread/pthread_join.c
> @@ -1,15 +1,19 @@
>  #include "pthread_impl.h"
>  #include <sys/mman.h>
>  
> +int __munmap(void *, size_t);
> +
>  static void dummy(void *p)
>  {
>  }
>  
> -int pthread_join(pthread_t t, void **res)
> +int __pthread_join(pthread_t t, void **res)
>  {
>  	int tmp;
>  	while ((tmp = t->tid)) __timedwait(&t->tid, tmp, 0, 0, dummy, 0, 0);
>  	if (res) *res = t->result;
> -	if (t->map_base) munmap(t->map_base, t->map_size);
> +	if (t->map_base) __munmap(t->map_base, t->map_size);
>  	return 0;
>  }
> +
> +weak_alias(__pthread_join, pthread_join);

OK.

> diff --git a/src/thread/pthread_key_create.c b/src/thread/pthread_key_create.c
> index a9187f7..bfcd597 100644
> --- a/src/thread/pthread_key_create.c
> +++ b/src/thread/pthread_key_create.c
> @@ -9,7 +9,7 @@ static void nodtor(void *dummy)
>  {
>  }
>  
> -int pthread_key_create(pthread_key_t *k, void (*dtor)(void *))
> +int __pthread_key_create(pthread_key_t *k, void (*dtor)(void *))
>  {
>  	unsigned i = (uintptr_t)&k / 16 % PTHREAD_KEYS_MAX;
>  	unsigned j = i;
> @@ -31,7 +31,7 @@ int pthread_key_create(pthread_key_t *k, void (*dtor)(void *))
>  	return EAGAIN;
>  }
>  
> -int pthread_key_delete(pthread_key_t k)
> +int __pthread_key_delete(pthread_key_t k)
>  {
>  	keys[k] = 0;
>  	return 0;
> @@ -53,3 +53,6 @@ void __pthread_tsd_run_dtors()
>  		}
>  	}
>  }
> +
> +weak_alias(__pthread_key_delete, pthread_key_delete);
> +weak_alias(__pthread_key_create, pthread_key_create);

Looks ok.

> diff --git a/src/thread/pthread_mutex_lock.c b/src/thread/pthread_mutex_lock.c
> index 2a9a3aa..993f54c 100644
> --- a/src/thread/pthread_mutex_lock.c
> +++ b/src/thread/pthread_mutex_lock.c
> @@ -1,10 +1,14 @@
>  #include "pthread_impl.h"
>  
> -int pthread_mutex_lock(pthread_mutex_t *m)
> +int __pthread_mutex_timedlock(pthread_mutex_t *restrict m, const struct timespec *restrict at);
> +
> +int __pthread_mutex_lock(pthread_mutex_t *m)
>  {
>  	if ((m->_m_type&15) == PTHREAD_MUTEX_NORMAL
>  	    && !a_cas(&m->_m_lock, 0, EBUSY))
>  		return 0;
>  
> -	return pthread_mutex_timedlock(m, 0);
> +	return __pthread_mutex_timedlock(m, 0);
>  }
> +
> +weak_alias(__pthread_mutex_lock, pthread_mutex_lock);

OK.

> diff --git a/src/thread/pthread_mutex_timedlock.c b/src/thread/pthread_mutex_timedlock.c
> index ae883f9..16241ee 100644
> --- a/src/thread/pthread_mutex_timedlock.c
> +++ b/src/thread/pthread_mutex_timedlock.c
> @@ -1,6 +1,6 @@
>  #include "pthread_impl.h"
>  
> -int pthread_mutex_timedlock(pthread_mutex_t *restrict m, const struct timespec *restrict at)
> +int __pthread_mutex_timedlock(pthread_mutex_t *restrict m, const struct timespec *restrict at)
>  {
>  	if ((m->_m_type&15) == PTHREAD_MUTEX_NORMAL
>  	    && !a_cas(&m->_m_lock, 0, EBUSY))
> @@ -30,3 +30,5 @@ int pthread_mutex_timedlock(pthread_mutex_t *restrict m, const struct timespec *
>  	}
>  	return r;
>  }
> +
> +weak_alias(__pthread_mutex_timedlock, pthread_mutex_timedlock);

OK.

> diff --git a/src/thread/pthread_mutex_trylock.c b/src/thread/pthread_mutex_trylock.c
> index e851517..9be7930 100644
> --- a/src/thread/pthread_mutex_trylock.c
> +++ b/src/thread/pthread_mutex_trylock.c
> @@ -50,9 +50,11 @@ int __pthread_mutex_trylock_owner(pthread_mutex_t *m)
>  	return 0;
>  }
>  
> -int pthread_mutex_trylock(pthread_mutex_t *m)
> +static int __pthread_mutex_trylock(pthread_mutex_t *m)
>  {
>  	if ((m->_m_type&15) == PTHREAD_MUTEX_NORMAL)
>  		return a_cas(&m->_m_lock, 0, EBUSY) & EBUSY;
>  	return __pthread_mutex_trylock_owner(m);
>  }
> +
> +weak_alias(__pthread_mutex_trylock, pthread_mutex_trylock);

OK.

> diff --git a/src/thread/pthread_mutex_unlock.c b/src/thread/pthread_mutex_unlock.c
> index 46761d9..a7f39c7 100644
> --- a/src/thread/pthread_mutex_unlock.c
> +++ b/src/thread/pthread_mutex_unlock.c
> @@ -3,7 +3,7 @@
>  void __vm_lock_impl(int);
>  void __vm_unlock_impl(void);
>  
> -int pthread_mutex_unlock(pthread_mutex_t *m)
> +int __pthread_mutex_unlock(pthread_mutex_t *m)
>  {
>  	pthread_t self;
>  	int waiters = m->_m_waiters;
> @@ -36,3 +36,5 @@ int pthread_mutex_unlock(pthread_mutex_t *m)
>  		__wake(&m->_m_lock, 1, priv);
>  	return 0;
>  }
> +
> +weak_alias(__pthread_mutex_unlock, pthread_mutex_unlock);

OK.

> diff --git a/src/thread/pthread_once.c b/src/thread/pthread_once.c
> index 2eb0f93..05ebe69 100644
> --- a/src/thread/pthread_once.c
> +++ b/src/thread/pthread_once.c
> @@ -6,7 +6,7 @@ static void undo(void *control)
>  	__wake(control, 1, 1);
>  }
>  
> -int pthread_once(pthread_once_t *control, void (*init)(void))
> +int __pthread_once(pthread_once_t *control, void (*init)(void))
>  {
>  	static int waiters;
>  
> @@ -34,3 +34,5 @@ int pthread_once(pthread_once_t *control, void (*init)(void))
>  		return 0;
>  	}
>  }
> +
> +weak_alias(__pthread_once, pthread_once);

OK.

> diff --git a/src/thread/pthread_setcancelstate.c b/src/thread/pthread_setcancelstate.c
> index 060bcdc..2268217 100644
> --- a/src/thread/pthread_setcancelstate.c
> +++ b/src/thread/pthread_setcancelstate.c
> @@ -1,6 +1,6 @@
>  #include "pthread_impl.h"
>  
> -int pthread_setcancelstate(int new, int *old)
> +int __pthread_setcancelstate(int new, int *old)
>  {
>  	if (new > 1U) return EINVAL;
>  	if (!libc.has_thread_pointer) return ENOSYS;
> @@ -9,3 +9,5 @@ int pthread_setcancelstate(int new, int *old)
>  	self->canceldisable = new;
>  	return 0;
>  }
> +
> +weak_alias(__pthread_setcancelstate, pthread_setcancelstate);

Seems ok. Alternatively maybe we could have a simplfied inline version
of this (no error checking) in pthread_impl.h. But for now let's leave
it; I might do that, or it might even become unnecessary, along with
adding the new proposed cancellation mode later.

> diff --git a/src/thread/pthread_setcanceltype.c b/src/thread/pthread_setcanceltype.c
> index bf0a3f3..d493c1b 100644
> --- a/src/thread/pthread_setcanceltype.c
> +++ b/src/thread/pthread_setcanceltype.c
> @@ -1,11 +1,13 @@
>  #include "pthread_impl.h"
>  
> +void __pthread_testcancel(void);
> +
>  int pthread_setcanceltype(int new, int *old)
>  {
>  	struct pthread *self = __pthread_self();
>  	if (new > 1U) return EINVAL;
>  	if (old) *old = self->cancelasync;
>  	self->cancelasync = new;
> -	if (new) pthread_testcancel();
> +	if (new) __pthread_testcancel();
>  	return 0;
>  }

This change looks gratuitous. pthread_setcanceltype is in the POSIX
namespace anyway and isn't being changed for use in C11 (and it has no
use to C11).

> diff --git a/src/thread/pthread_setspecific.c b/src/thread/pthread_setspecific.c
> index 55e46a8..dd18b96 100644
> --- a/src/thread/pthread_setspecific.c
> +++ b/src/thread/pthread_setspecific.c
> @@ -1,6 +1,6 @@
>  #include "pthread_impl.h"
>  
> -int pthread_setspecific(pthread_key_t k, const void *x)
> +int __pthread_setspecific(pthread_key_t k, const void *x)
>  {
>  	struct pthread *self = __pthread_self();
>  	/* Avoid unnecessary COW */
> @@ -10,3 +10,5 @@ int pthread_setspecific(pthread_key_t k, const void *x)
>  	}
>  	return 0;
>  }
> +
> +weak_alias(__pthread_setspecific, pthread_setspecific);

It looks like you have a duplicate tss_set for this rather than using
the above, so either the above is a grauitous change, or the new
tss_set is duplicate code that should be removed. Or did I miss
something?

> diff --git a/src/thread/pthread_testcancel.c b/src/thread/pthread_testcancel.c
> index ba5f7c6..ee48e6d 100644
> --- a/src/thread/pthread_testcancel.c
> +++ b/src/thread/pthread_testcancel.c
> @@ -7,7 +7,9 @@ static void dummy()
>  
>  weak_alias(dummy, __testcancel);
>  
> -void pthread_testcancel()
> +void __pthread_testcancel()
>  {
>  	__testcancel();
>  }
> +
> +weak_alias(__pthread_testcancel, pthread_testcancel);

I think this is okay.

> diff --git a/src/thread/thrd_create.c b/src/thread/thrd_create.c
> new file mode 100644
> index 0000000..f72f992
> --- /dev/null
> +++ b/src/thread/thrd_create.c
> @@ -0,0 +1,98 @@
> +#define _GNU_SOURCE
> +#include "pthread_impl.h"
> +#include "stdio_impl.h"
> +#include "libc.h"
> +#include <sys/mman.h>
> +#include <threads.h>
> +
> +void *__mmap(void *, size_t, int, int, int, off_t);
> +int __munmap(void *, size_t);
> +int __mprotect(void *, size_t, int);
> +void __thread_enable(void);
> +_Noreturn void __pthread_exit(void *);
> +void *__copy_tls(unsigned char *);
> +extern volatile size_t __pthread_tsd_size;
> +
> +_Noreturn void thrd_exit(int result) {
> +	__pthread_exit((void*)(intptr_t)result);
> +}
> +
> +static void dummy_0()
> +{
> +}
> +weak_alias(dummy_0, __acquire_ptc);
> +weak_alias(dummy_0, __release_ptc);
> +
> +static int start(void *p)
> +{
> +	thrd_t self = p;
> +	int (*start)(void*) = (int(*)(void*)) self->start;
> +	thrd_exit(start(self->start_arg));
> +	return 0;
> +}
> +
> +#define ROUND(x) (((x)+PAGE_SIZE-1)&-PAGE_SIZE)
> +
> +int thrd_create(thrd_t *res, thrd_start_t entry, void *arg)
> +{
> +	int ret = -ENOMEM;
> +	size_t guard = ROUND(DEFAULT_GUARD_SIZE);
> +	size_t size = guard + ROUND(DEFAULT_STACK_SIZE + libc.tls_size +  __pthread_tsd_size);
> +	struct pthread *self, *new;
> +	unsigned char *map, *stack, *tsd, *stack_limit;
> +	unsigned flags = CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND
> +		| CLONE_THREAD | CLONE_SYSVSEM | CLONE_SETTLS
> +		| CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID | CLONE_DETACHED;
> +
> +	if (!libc.can_do_threads) return thrd_error;
> +	self = __pthread_self();
> +	if (!libc.threaded) __thread_enable();
> +
> +	__acquire_ptc();
> +
> +	if (guard) {
> +		map = __mmap(0, size, PROT_NONE, MAP_PRIVATE|MAP_ANON, -1, 0);
> +		if (map == MAP_FAILED) goto CLEANUP;
> +		if (__mprotect(map+guard, size-guard, PROT_READ|PROT_WRITE)) {
> +			__munmap(map, size);
> +			goto CLEANUP;
> +		}
> +	} else {
> +		map = __mmap(0, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
> +		if (map == MAP_FAILED) goto CLEANUP;
> +	}
> +	tsd = map + size - __pthread_tsd_size;
> +	stack = tsd - libc.tls_size;
> +	stack_limit = map + guard;
> +
> +	new = __copy_tls(tsd - libc.tls_size);
> +	new->map_base = map;
> +	new->map_size = size;
> +	new->stack = stack;
> +	new->stack_size = stack - stack_limit;
> +	new->start = (void *(*)(void*))entry;
> +	new->start_arg = arg;
> +	new->self = new;
> +	new->tsd = (void *)tsd;
> +	new->locale = &libc.global_locale;
> +	new->unblock_cancel = self->cancel;
> +	new->canary = self->canary;
> +
> +	a_inc(&libc.threads_minus_1);
> +	ret = __clone(start, stack, flags, new, &new->tid, TP_ADJ(new), &new->tid);
> +
> + CLEANUP:
> +	__release_ptc();
> +
> +	if (ret > 0) {
> +          *res = new;
> +          ret = thrd_success;
> +        } else if (ret == -ENOMEM) {
> +          ret = thrd_nomem;
> +        } else {
> +          a_dec(&libc.threads_minus_1);
> +          if (map) __munmap(map, size);
> +          ret = thrd_error;
> +	}
> +        return ret;
> +}

Minor indention/formatting issues. But my main concern is the amount
of code duplication, especially delicate implementation details where
it would be easy for them to get out of sync. To me that seems like a
bigger concern than thrd_create pulling in a few hundred bytes more
than it should (for the extra unused functionality in pthread_create).

Didn't we discuss before just passing a special attr pointer to
__pthread_create to effect C11-style creation?

> diff --git a/src/thread/thrd_current.c b/src/thread/thrd_current.c
> new file mode 100644
> index 0000000..1728535
> --- /dev/null
> +++ b/src/thread/thrd_current.c
> @@ -0,0 +1,11 @@
> +#include "pthread_impl.h"
> +#include <threads.h>
> +
> +/* Not all arch have __pthread_self as a symbol. On some this results
> +   in some magic. So this "call" to __pthread_self is not necessary a
> +   tail call. In particular, other than the appearance, thrd_current
> +   can not be implemented as a weak symbol. */
> +pthread_t thrd_current()
> +{
> +	return __pthread_self();
> +}

OK, but I'm not sure the detailed comment is needed. The exact same
code appears in pthread_self.c.

> diff --git a/src/thread/thrd_detach.c b/src/thread/thrd_detach.c
> new file mode 100644
> index 0000000..211b8fb
> --- /dev/null
> +++ b/src/thread/thrd_detach.c
> @@ -0,0 +1,18 @@
> +#include "pthread_impl.h"
> +#include <threads.h>
> +
> +int __munmap(void *, size_t);
> +
> +int thrd_detach(thrd_t t)
> +{
> +	/* Cannot detach a thread that's already exiting */
> +	if (a_swap(t->exitlock, 1)){
> +		int tmp;
> +		while ((tmp = t->tid)) __timedwait(&t->tid, tmp, 0, 0, 0, 0, 0);
> +		if (t->map_base) __munmap(t->map_base, t->map_size);
> +	} else {
> +		t->detached = 2;
> +		__unlock(t->exitlock);
> +	}
> +	return thrd_success;
> +}

You already moved pthread_detach to the protected namespace, so this
seems to be duplicate code. And it's a place that has implementation
details (magic numbers like 2, which you could argue shouldn't exist
;-) so I think it's best to avoid spreading these details out over
more places where they could go out of sync, no?

> diff --git a/src/thread/thrd_equal.c b/src/thread/thrd_equal.c
> new file mode 100644
> index 0000000..ac49a44
> --- /dev/null
> +++ b/src/thread/thrd_equal.c
> @@ -0,0 +1,6 @@
> +#include <threads.h>
> +
> +int (thrd_equal)(thrd_t a, thrd_t b)
> +{
> +	return a==b;
> +}

OK.

> diff --git a/src/thread/thrd_join.c b/src/thread/thrd_join.c
> new file mode 100644
> index 0000000..a8c7aed
> --- /dev/null
> +++ b/src/thread/thrd_join.c
> @@ -0,0 +1,16 @@
> +#include "pthread_impl.h"
> +#include <sys/mman.h>
> +#include <threads.h>
> +
> +int __munmap(void *, size_t);
> +
> +/* C11 threads cannot be canceled, so there is no need for a
> +   cancelation function pointer, here. */
> +int thrd_join(thrd_t t, int *res)
> +{
> +	int tmp;
> +	while ((tmp = t->tid)) __timedwait(&t->tid, tmp, 0, 0, 0, 0, 0);
> +	if (res) *res = (int)(intptr_t)t->result;
> +	if (t->map_base) __munmap(t->map_base, t->map_size);
> +	return thrd_success;
> +}

Likewise here there's dupliation.

> diff --git a/src/thread/tss_create.c b/src/thread/tss_create.c
> new file mode 100644
> index 0000000..0cbadc8
> --- /dev/null
> +++ b/src/thread/tss_create.c
> @@ -0,0 +1,13 @@
> +#include "pthread_impl.h"
> +#include <threads.h>
> +
> +int __pthread_key_create(pthread_key_t *k, void (*dtor)(void *));
> +
> +int tss_create(tss_t *tss, tss_dtor_t dtor)
> +{
> +	__THRD_ABI_MARK;
> +	/* Different error returns are possible. C glues them together into
> +	   just failure notification. Can't be optimized to a tail call,
> +	   unless thrd_error equals EAGAIN. */
> +	return __pthread_key_create(tss, dtor) ? thrd_error : thrd_success;
> +}

OK.

> diff --git a/src/thread/tss_delete.c b/src/thread/tss_delete.c
> new file mode 100644
> index 0000000..d50731f
> --- /dev/null
> +++ b/src/thread/tss_delete.c
> @@ -0,0 +1,8 @@
> +#include "pthread_impl.h"
> +#include <threads.h>
> +
> +int __pthread_key_delete(pthread_key_t k);
> +
> +void (tss_delete)(tss_t key) {
> +	(void)__pthread_key_delete(key);
> +}

This could probably be done with just an alias, but I'm fine with the
tail-call. It should be a very rarely-used function (ideally,
never-used) since it's rare for there to be any safe way to destroy a
key once you create it.

> diff --git a/src/thread/tss_set.c b/src/thread/tss_set.c
> new file mode 100644
> index 0000000..70c4fb7
> --- /dev/null
> +++ b/src/thread/tss_set.c
> @@ -0,0 +1,13 @@
> +#include "pthread_impl.h"
> +#include <threads.h>
> +
> +int tss_set(tss_t k, void *x)
> +{
> +	struct pthread *self = __pthread_self();
> +	/* Avoid unnecessary COW */
> +	if (self->tsd[k] != x) {
> +		self->tsd[k] = x;
> +		self->tsd_used = 1;
> +	}
> +	return thrd_success;
> +}

As mentioned before, this is duplicate, and it could probably just be
an alias.

> diff --git a/src/time/thrd_sleep.c b/src/time/thrd_sleep.c
> new file mode 100644
> index 0000000..6570b0c
> --- /dev/null
> +++ b/src/time/thrd_sleep.c
> @@ -0,0 +1,31 @@
> +#include <time.h>
> +#include <errno.h>
> +#include "syscall.h"
> +#include "libc.h"
> +#include "threads.h"
> +
> +/* Roughly __syscall already returns the right thing: 0 if all went
> +   well or a negative error indication, otherwise. */
> +int thrd_sleep(const struct timespec *req, struct timespec *rem)
> +{
> +  int ret = __syscall(SYS_nanosleep, req, rem);
> +  // compile time comparison, constant propagated
> +  if (EINTR != 1) {
> +    switch (ret) {
> +    case 0: return 0;
> +      /* error described by POSIX:                                    */
> +      /* EINTR  The nanosleep() function was interrupted by a signal. */
> +      /* The C11 wording is:                                          */
> +      /* -1 if it has been interrupted by a signal                    */
> +    case -EINTR:  return -1;
> +      /* error described by POSIX */
> +    case -EINVAL: return -2;
> +      /* described for linux */
> +    case -EFAULT: return -3;
> +      /* No other error returns are specified. */
> +    default: return INT_MIN;
> +    }
> +  }
> +  // potentially a tail call
> +  return ret;
> +}

__syscall is almost never a "tail call" because it's rarely a call at
all; usually, it's inline asm. :-) But okay. My preference would be
not to do fancy remapping of error codes (especially not when the
codes will differ by arch, since this just encourages programs to
assume they're fixed and then break on the arch where they're
different, if any).

Also indention is inconsistent with musl (tabs for indention, spaces
for alignment).

> diff --git a/src/time/timespec_get.c b/src/time/timespec_get.c
> new file mode 100644
> index 0000000..bf78e5a
> --- /dev/null
> +++ b/src/time/timespec_get.c
> @@ -0,0 +1,9 @@
> +#include <time.h>
> +
> +int __clock_gettime(clockid_t clk, struct timespec *ts);
> +
> +/* the base argument is simply ignored, there is no other implemented
> +   value than TIME_UTC. */
> +int timespec_get(struct timespec * ts, int base) {
> +  return __clock_gettime(CLOCK_REALTIME, ts) < 0 ? 0 : base;
> +}

As mentioned in another email (off-list, IIRC), I'd like to reject
unknown base values so that programs built against a newer musl that
has other clock bases, but which get an older musl at runtime, will
see an error rather than silently using the wrong clock. Does this
make sense?

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.