Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190917134422.aootviums4hdtell@zen.arangodb.com>
Date: Tue, 17 Sep 2019 15:44:22 +0200
From: Max Neunhoeffer <max@...ngodb.com>
To: musl@...ts.openwall.com
Subject: Bug report, concurrency issue on exception with gcc 8.3.0

Hello,

I am experiencing problems when linking a large multithreaded C++ application
statically against libmusl. I am using Alpine Linux 3.10.1 and gcc 8.3.0
on X86_64. That is, I am using libmusl 1.1.22-r3 (Alpine Linux versioning)
and gcc 8.3.0-r0.

Before going into details, here is an overview:

1. libgcc does not detect correctly that the application is multithreaded,
   since `pthread_cancel` is not linked into the executable.
   As a consequence, the lazy initialization of data structures for stack
   unwinding (FDE tables) is executed without protection of a mutex.
   Therefore, if the very first exception in the program happens to be
   thrown in two threads concurrently, the data structures can be corrupted,
   resulting in a busy loop after `main()` is finished.
2. If I make sure that I explicitly link in `pthread_cancel` this problem
   is (almost certainly) gone, however, in certain scenarios this leads
   to a crash when the first exception is thrown.

I had first reported this problem to gcc as a bug against libgcc, but the
gcc team denies responsibility, see 
[this bug report](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91737).

I have produced small sample programs to exhibit the problems, see below for
a more detailed analysis as to what happens.

For case 1:

------------------------ snip exceptioncollision.cpp ----------------------
#include <thread>
#include <atomic>
#include <chrono>

std::atomic<int> letsgo{0};

void waiter() {
  size_t count = 0;
  while (letsgo == 0) {
    ++count;
  }
  try {
    throw 42;
  } catch (int const& s) {
  }
}

int main(int, char*[]) {
#ifdef REPAIR
  try { throw 42; } catch (int const& i) {}
#endif
  std::thread t1(waiter);
  std::thread t2(waiter);
  std::this_thread::sleep_for(std::chrono::milliseconds(10));
  letsgo = 1;
  t1.join();
  t2.join();
  return 0;
}
------------------------ snip exceptioncollision.cpp ----------------------

Use Alpine Linux 3.10.1, for example in a Docker container, and compile
as follows:

    g++ exceptioncollision.cpp -o exceptioncollision -O0 -Wall -std=c++14 -lpthread -static

Then execute the static executable multiple times:

    while true ; do ./exceptioncollision ; date ; done

after a few tries it will freeze.


For case 2:

----------------------------------- snip exceptionbang.cpp ---------------
#include <pthread.h>
//#include <iostream>

#ifdef REPAIR
void* g(void *p) {
  return p;
}

void f() {
  pthread_t t;
  pthread_create(&t, nullptr, g, nullptr);
  pthread_cancel(t);
  pthread_join(t, nullptr);
}
#endif

int main(int argc, char*[]) {
#ifdef REPAIR
  if (argc == -1) { f(); }
#endif
  //std::cout << "Hello world!" << std::endl;
  try { throw 42; } catch(int const& i) {};
  return 0;
}
----------------------------------- snip exceptionbang.cpp ---------------

Use Alpine Linux 3.10.1, for example in a Docker container, and compile
as follows:

    g++ exceptionbang.cpp -o exceptionbang -Wall -Wextra -O0 -g -std=c++14 -static -DREPAIR=1

Execute `./exceptionbang` and it will create a segmentation violation.

Curiously, if you uncomment the line

    //#include <iostream>

then more of static initialization code seems to be compiled in and
all is well.

More detailed analysis of what is happening:

Let's look at case 1 first:

libgcc insists that it is a good idea to check for the presence of
`pthread_cancel` to detect if the application is multi-threaded. Therefore,
in my case, since I do not explicitly use `pthread_cancel` and am
linking statically, the libgcc runtime thinks that the program is
single-threaded (since `pthread_cancel` is in its own compilation
unit). As a consequence the mutex
[here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L1045) is not actually used.

Therefore some code in `libgcc`, which is executed when an exception is
first thrown in the life of the process ([see here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L1072))
is not thread-safe and ruins the data structure `seen_objects` rendering
a singly linked list circular.

This in the end leads to a busy loop [here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L221).


No let's look at case 2:

I tried to "fix" this by using `pthread_cancel` explicitly. This is how
I arrived at the second example program `exceptionbang.cpp`. Here, the
detection is successful detecting a multi-threaded program. However,
it crashes when the first exception is thrown. I do not understand the
details, but it seems that the libgcc runtime code stumbles over some
data structures which are not properly initialized. When including the
header `iostream`, some more code is compiled in which initializes the
structures and all is well.


Please let me know if you need any more information and please Cc me in
communication about this issue.

Cheers,
  Max.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.