Tuesday, December 29, 2015

MySQL 5.7 InnoDB Versus the Thread Sanitizer

InnoDB and the Thread Sanitizer do not work well together in MySQL 5.7.  There are hundreds of possible data races reported by the Thread Sanitizer in the InnoDB code for the simplest case of initializing the MySQL server and shutting it down.  Luckily, most of these data races are benign.  The problem of finding an interesting data race in the InnoDB software is similar to finding evidence of a sub-atomic particle in a particle accelerator; many Thread Sanitizer events must be analyzed before something interesting is found.

One example of a benign data race is InnoDB's implementation of fuzzy counters.  A fuzzy counter is a counter whose value does not have to be precise.  Since fuzzy counters are not thread safe, their value is not precise.  However, the fuzzy counter implementation uses the trick of spreading the values over multiple cache lines, so performance scales on multiple threads.  Unfortunately, fuzzy counters do not use C++ atomic operations, even with relaxed memory ordering semantics, so the Thread Sanitizer is not given enough information to ignore them.  Luckily, a single Thread Sanitizer suppression can ignore them.

There are lot of other counters in the InnoDB software that are not fuzzy counters and are not thread safe.  These counters must also be ignored for now.

Can we find interesting issues in InnoDB with the Thread Sanitizer by identifying and ignoring the benign data races?  Here are two interesting data races.

First, the Thread Sanitizer reports a data race on the 'buf_page_cleaner_is_active' global variable that occurs when InnoDB is initialized.  The initialization thread creates the page clean coordinator thread.  These two threads use a notification pattern using the 'buf_page_cleaner_is_active' variable to signal progress from the page clean coordinator thread back to the initialization thread.  Unfortunately, the racy notification pattern can be broken by aggressive compiler optimization and by memory ordering issues on some multiple core processors.   The MySQL server code should use C++ atomic variables or memory barriers to avoid the data races that occur when using the notification design pattern.  The MySQL server uses this pattern all over the place.  This is a potential bug with aggressive compiler optimizations when running on certain processors.

Next, the Thread Sanitizer reports a data race on the 'monitor_set_tbl' variable.  This data race is reported because the 'monitor_set_tbl' global variable is initialized by the InnoDB initialization thread AFTER its value is used by some InnoDB background threads that are created BEFORE the variable is initialized.  Since there is no synchronization between the threads WRT the 'monitor_set_tbl' variable,  the Thread Sanitizer reports the data race.  This is an obvious bug.

This blog discussed how the Thread Sanitizer found data races in the InnoDB storage engine.  There are a lot of benign data races reported due to performance counters in the storage engine.  Once these benign data races are filtered out, one can find interesting issues.  Two of these issues are discussed in this blog, but there are lots of additional issues to analyze.  So far, I have only used the Thread Sanitizer with the MySQL server in the simplest scenario of MySQL instance initialization.  It will be interesting to run some more complex workloads like sysbench.

I look forward to the new year.

Software configuration:
MySQL 5.7
Thread Sanitizer in clang 3.8
Ubuntu 14.04
Cmake 2.8.12.2

MySQL configuration:
cmake -DDISABLE_SHARED=ON -DWITH_TSAN=ON -DMUTEXTYPE=sys ...


5 comments:

  1. MySQL has so many bandaids, you can't find the wounds.

    ReplyDelete
  2. MySQL is much older than the tools used to find bugs in C++ programs. However, valgrind is supported by MySQL because it is just too easy to write buggy (leaky) sequential C++ programs. Data races in C++ programs are just as easy to write and an order of magnitude harder to find. So, it is imperative to develop concurrent C++ programs in concert with data race detectors link valgrind's helgrind and DRD tools and the Thread Sanitizer. It is not too late to apply modern tools to old software.

    ReplyDelete
  3. WRT the thread sanitizer, MySQL bug #77866 should be opened. The thread sanitizer reports a data race on the os_thread_count variable. The assert 'ut_a(os_thread_count <= OS_THREAD_MAX_N)' should hold the 'thread_mutex'. See https://s3.amazonaws.com/prohaska7-pub/mysql_5710_tsan_innodb_os_thread_count_race.txt for details.

    ReplyDelete
  4. Created MySQL bug https://bugs.mysql.com/bug.php?id=80530 to track the 'monitor_set_tbl' data race.

    ReplyDelete
  5. This awesome essay prompts https://topgoodessays.net/essay-writing-prompts/ could really save you in the future. I think it could be helpful

    ReplyDelete