|
Message-ID: <20190917134422.aootviums4hdtell@zen.arangodb.com> Date: Tue, 17 Sep 2019 15:44:22 +0200 From: Max Neunhoeffer <max@...ngodb.com> To: musl@...ts.openwall.com Subject: Bug report, concurrency issue on exception with gcc 8.3.0 Hello, I am experiencing problems when linking a large multithreaded C++ application statically against libmusl. I am using Alpine Linux 3.10.1 and gcc 8.3.0 on X86_64. That is, I am using libmusl 1.1.22-r3 (Alpine Linux versioning) and gcc 8.3.0-r0. Before going into details, here is an overview: 1. libgcc does not detect correctly that the application is multithreaded, since `pthread_cancel` is not linked into the executable. As a consequence, the lazy initialization of data structures for stack unwinding (FDE tables) is executed without protection of a mutex. Therefore, if the very first exception in the program happens to be thrown in two threads concurrently, the data structures can be corrupted, resulting in a busy loop after `main()` is finished. 2. If I make sure that I explicitly link in `pthread_cancel` this problem is (almost certainly) gone, however, in certain scenarios this leads to a crash when the first exception is thrown. I had first reported this problem to gcc as a bug against libgcc, but the gcc team denies responsibility, see [this bug report](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91737). I have produced small sample programs to exhibit the problems, see below for a more detailed analysis as to what happens. For case 1: ------------------------ snip exceptioncollision.cpp ---------------------- #include <thread> #include <atomic> #include <chrono> std::atomic<int> letsgo{0}; void waiter() { size_t count = 0; while (letsgo == 0) { ++count; } try { throw 42; } catch (int const& s) { } } int main(int, char*[]) { #ifdef REPAIR try { throw 42; } catch (int const& i) {} #endif std::thread t1(waiter); std::thread t2(waiter); std::this_thread::sleep_for(std::chrono::milliseconds(10)); letsgo = 1; t1.join(); t2.join(); return 0; } ------------------------ snip exceptioncollision.cpp ---------------------- Use Alpine Linux 3.10.1, for example in a Docker container, and compile as follows: g++ exceptioncollision.cpp -o exceptioncollision -O0 -Wall -std=c++14 -lpthread -static Then execute the static executable multiple times: while true ; do ./exceptioncollision ; date ; done after a few tries it will freeze. For case 2: ----------------------------------- snip exceptionbang.cpp --------------- #include <pthread.h> //#include <iostream> #ifdef REPAIR void* g(void *p) { return p; } void f() { pthread_t t; pthread_create(&t, nullptr, g, nullptr); pthread_cancel(t); pthread_join(t, nullptr); } #endif int main(int argc, char*[]) { #ifdef REPAIR if (argc == -1) { f(); } #endif //std::cout << "Hello world!" << std::endl; try { throw 42; } catch(int const& i) {}; return 0; } ----------------------------------- snip exceptionbang.cpp --------------- Use Alpine Linux 3.10.1, for example in a Docker container, and compile as follows: g++ exceptionbang.cpp -o exceptionbang -Wall -Wextra -O0 -g -std=c++14 -static -DREPAIR=1 Execute `./exceptionbang` and it will create a segmentation violation. Curiously, if you uncomment the line //#include <iostream> then more of static initialization code seems to be compiled in and all is well. More detailed analysis of what is happening: Let's look at case 1 first: libgcc insists that it is a good idea to check for the presence of `pthread_cancel` to detect if the application is multi-threaded. Therefore, in my case, since I do not explicitly use `pthread_cancel` and am linking statically, the libgcc runtime thinks that the program is single-threaded (since `pthread_cancel` is in its own compilation unit). As a consequence the mutex [here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L1045) is not actually used. Therefore some code in `libgcc`, which is executed when an exception is first thrown in the life of the process ([see here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L1072)) is not thread-safe and ruins the data structure `seen_objects` rendering a singly linked list circular. This in the end leads to a busy loop [here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L221). No let's look at case 2: I tried to "fix" this by using `pthread_cancel` explicitly. This is how I arrived at the second example program `exceptionbang.cpp`. Here, the detection is successful detecting a multi-threaded program. However, it crashes when the first exception is thrown. I do not understand the details, but it seems that the libgcc runtime code stumbles over some data structures which are not properly initialized. When including the header `iostream`, some more code is compiled in which initializes the structures and all is well. Please let me know if you need any more information and please Cc me in communication about this issue. Cheers, Max.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.