Tuesday, December 29, 2015

MySQL 5.7 InnoDB Versus the Thread Sanitizer

InnoDB and the Thread Sanitizer do not work well together in MySQL 5.7.  There are hundreds of possible data races reported by the Thread Sanitizer in the InnoDB code for the simplest case of initializing the MySQL server and shutting it down.  Luckily, most of these data races are benign.  The problem of finding an interesting data race in the InnoDB software is similar to finding evidence of a sub-atomic particle in a particle accelerator; many Thread Sanitizer events must be analyzed before something interesting is found.

One example of a benign data race is InnoDB's implementation of fuzzy counters.  A fuzzy counter is a counter whose value does not have to be precise.  Since fuzzy counters are not thread safe, their value is not precise.  However, the fuzzy counter implementation uses the trick of spreading the values over multiple cache lines, so performance scales on multiple threads.  Unfortunately, fuzzy counters do not use C++ atomic operations, even with relaxed memory ordering semantics, so the Thread Sanitizer is not given enough information to ignore them.  Luckily, a single Thread Sanitizer suppression can ignore them.

There are lot of other counters in the InnoDB software that are not fuzzy counters and are not thread safe.  These counters must also be ignored for now.

Can we find interesting issues in InnoDB with the Thread Sanitizer by identifying and ignoring the benign data races?  Here are two interesting data races.

First, the Thread Sanitizer reports a data race on the 'buf_page_cleaner_is_active' global variable that occurs when InnoDB is initialized.  The initialization thread creates the page clean coordinator thread.  These two threads use a notification pattern using the 'buf_page_cleaner_is_active' variable to signal progress from the page clean coordinator thread back to the initialization thread.  Unfortunately, the racy notification pattern can be broken by aggressive compiler optimization and by memory ordering issues on some multiple core processors.   The MySQL server code should use C++ atomic variables or memory barriers to avoid the data races that occur when using the notification design pattern.  The MySQL server uses this pattern all over the place.  This is a potential bug with aggressive compiler optimizations when running on certain processors.

Next, the Thread Sanitizer reports a data race on the 'monitor_set_tbl' variable.  This data race is reported because the 'monitor_set_tbl' global variable is initialized by the InnoDB initialization thread AFTER its value is used by some InnoDB background threads that are created BEFORE the variable is initialized.  Since there is no synchronization between the threads WRT the 'monitor_set_tbl' variable,  the Thread Sanitizer reports the data race.  This is an obvious bug.

This blog discussed how the Thread Sanitizer found data races in the InnoDB storage engine.  There are a lot of benign data races reported due to performance counters in the storage engine.  Once these benign data races are filtered out, one can find interesting issues.  Two of these issues are discussed in this blog, but there are lots of additional issues to analyze.  So far, I have only used the Thread Sanitizer with the MySQL server in the simplest scenario of MySQL instance initialization.  It will be interesting to run some more complex workloads like sysbench.

I look forward to the new year.

Software configuration:
MySQL 5.7
Thread Sanitizer in clang 3.8
Ubuntu 14.04
Cmake 2.8.12.2

MySQL configuration:
cmake -DDISABLE_SHARED=ON -DWITH_TSAN=ON -DMUTEXTYPE=sys ...


Monday, December 21, 2015

MySQL 5.7 Performance Schema Versus the Thread Sanitizer

This blog continues the investigation into possible data races in the MySQL server reported by the Thread Sanitizer.  The previous blog investigated several possible data races that were detected when the MySQL server is initialized.  This blog investigates three data races in the performance schema that are reported when the MySQL server is shut down.  The first data race is a racy increment of thread statistics.  The second data race is a problem that may leave the state of the performance schema memory allocator incorrect.  The final data race is in the lock free pinbox data structure.

Performance schema statistics aggregation

The Thread Sanitizer reported a data race when a thread exits and performance schema account statistics are aggregated.  The data race occurs when the 'PFS_account::m_disconnected_count' instance variable is incremented by an racy increment statement 'm_disconnected_count++'.  The increment is not thread safe as there is no associated lock or mutex.  Is a fuzzy counter good enough, or do we need an accurate counter?  Unfortunately, there is no comment or annotation regarding this variable, so the intention of the original programmer is unknown to me.   If an accurate counter is needed, then there are several techniques to implement an accurate counter that scales performance with multiple threads.

Performance schema buffer container

The Thread Sanitizer reported a couple of data races in the performance schema's object allocator.  One of the objectives of the performance schema is that it is malloc free as least in its fast path (see comment in pfs.cc).  The performance schema allocates various objects from a pool of preallocated memory called a buffer container.  The buffer container's internal organization is an array of 'pages'.  Each 'page' is an array of objects.  The intent of this data structure is to avoid thread contention when objects are allocated and freed.

The data races are on the buffer container's 'page->m_full' and 'm_full' variables, which attempt to capture whether or not the container has free objects (m_full == false) or all of the objects are allocated (m_full == true).   The Thread Sanitizer reports data races in the deallocate function which sets 'm_full = false' since it just freed an object.  This will conflict with the allocate function which sets 'm_full = true' when there are no free objects.  Since the 'm_full' variables are racy, allocation of objects from the pool may fail when in fact there are free objects in the pool.

Performance schema pinbox

The Thread Sanitizer reported a data race in the performance schema's pinbox.  Is this a real race?  We need to understand the pinbox data structure and algorithm.  The pinbox is a non-blocking stack based allocator.  It is non-blocking since the performance schema is intended to be mutex free (see pfs.cc)

The data race is reported in the pinbox's stack push operation.  The push operation reads the current top of the stack, computes a new top of stack, and then uses an compare and swap to write the new top of stack.  If the compare and swap fails, then some other thread updated the stack top and the push operation in the current thread repeats.  The Thread Sanitizer reports a data race between the racy read of the top of stack and the write of the new top of stack.  A similar data race exists in the pinbox's stack pop operation since it also manipulates the top of stack.

The pinbox's stack push and pop operations look correct to me, so the report of the possible data race should be ignored.  It can be suppressed with a Thread Sanitizer suppression.  Alternatively, the top of stack could be implemented with a C++ atomic variable where the racy read is replaced with a relaxed memory order load.  This results in a equivalent algorithm and it provides the Thread Sanitizer sufficient information to prohibit it from reporting a data race.  In fact, the boost lock free stack does this.

Conclusion

Since debugging data races in concurrent programs is hard, I am a big fan of using tools like the Thread Sanitizer to identify possible concurrency problems.   Some changes to the MySQL server code, like the use of C++ atomic variables, will be needed to make effective use of the Thread Sanitizer.

Versions

MySQL 5.7.10
Thread Sanitizer in Clang 3.8
Ubuntu 14.04

Monday, December 7, 2015

MySQL 5.7 Initialization Versus the Thread Sanitizer

The MySQL server is a large multi-threaded program primarily written in the C++ programming language.  Unfortunately,  data races and deadlocks in concurrent C++ programs are common.  Fortunately, the Thread Sanitizer can be used to find data races and deadlocks.  MySQL 5.7 does not support the Thread Sanitizer, but is is easy to add to the MySQL configuration using a simple build procedure.  When MySQL 5.7 is built with the Thread Sanitizer and database initialization is run, the Thread Sanitizer reports hundreds of possible data races and deadlocks in the MySQL server.

Database initialization occurs when the MySQL data is generated for a new MySQL instance.  This is done in MySQL 5.7 when running 'mysqld --initialize'.   Since the MySQL server (mysqld) is run to initialize the database and the MySQL server is compiled with the Thread Sanitizer, many possible data races and deadlocks are reported while the MySQL server is run.  This blog analyzes four of the issues:  a problem with pthread barrier destruction, a potential lock deadlock, a data race on error reporting, and a data race on the global THD count variable.

The Thread Sanitizer reports a data race inside pthread barriers when a pthread barrier is destroyed.  MySQL 5.7 only has one use of pthread barriers which is to synchronize the main thread with the initialization of a background timer notification thread.  IMO, the Thread Sanitizer reports this race because many pthread barrier implementations have bugs WRT the barrier wait and barrier destroy operations.  MySQL can either choose to not use pthread barriers or suppress this error since the MySQL code using pthread barriers is correct.  The pthread barriers blog discusses this issue in more detail.

The Thread Sanitizer reports a possible deadlock with 3 locks.   This problem was also detected by helgrind and reported in MySQL bug 77872.  Variations of this 3 way locking deadlock have been detected in the field and reported to MySQL.  The Thread Sanitizer is faster than helgrind and it reports all of the thread stacks involved in the deadlock while helgrind does not.  These two Thread Sanitizer advantages should make debugging with the Thread Sanitizer feasible. 

The Thread Sanitizer reports a data race on the log_error_verbosity global variable between the bootstrap and signal handling threads.  These two threads run concurrently.  The bootstrap thread does a racy write of the log_error_verbosity variable while the signal handling thread does a racy read of the log_error_verbosity variable.  The Thread Sanitizer reports a data race since it can not compute a happens before relationship WRT the writes and reads of the log_error_verbosity variable done by these two threads.  This looks like a bug to me.

The Thread Sanitizer reports a data race on the global thd count when shutting down the MySQL server.  The MySQL server does a racy read of the global_thd_count variable without holding a lock while most of the other reads and writes of the global_thd_count variable are done while holding a lock.  Since the MySQL code states that the global_thd_count variable is racy, this racy read is by design.  If MySQL uses a relaxed memory atomic variable to store the global_thd_count, then the Thread Sanitizer would not consider this a data race.  Otherwise, a Thread Sanitizer suppression can be used to ignore this issue.  It would be nice to annotate racy variables in the code rather than in a comment on the code or in a data race suppression rule.

In conclusion, a configuration option that supports MySQL builds with the Thread Sanitizer was added to MySQL 5.7.  Hundreds of possible data races and deadlocks were reported by the Thread Sanitizer when running MySQL initialization.  This blog only examined four of the issues that illustrate the value of the tool.  The next blogs will look into the Thread Sanitizer issues WRT the performance schema and the InnoDB storage engine.

Tools

MySQL 5.7.9
Thread Sanitizer in Clang 3.8
Ubuntu 14.04