Monday, December 21, 2015

MySQL 5.7 Performance Schema Versus the Thread Sanitizer

This blog continues the investigation into possible data races in the MySQL server reported by the Thread Sanitizer.  The previous blog investigated several possible data races that were detected when the MySQL server is initialized.  This blog investigates three data races in the performance schema that are reported when the MySQL server is shut down.  The first data race is a racy increment of thread statistics.  The second data race is a problem that may leave the state of the performance schema memory allocator incorrect.  The final data race is in the lock free pinbox data structure.

Performance schema statistics aggregation

The Thread Sanitizer reported a data race when a thread exits and performance schema account statistics are aggregated.  The data race occurs when the 'PFS_account::m_disconnected_count' instance variable is incremented by an racy increment statement 'm_disconnected_count++'.  The increment is not thread safe as there is no associated lock or mutex.  Is a fuzzy counter good enough, or do we need an accurate counter?  Unfortunately, there is no comment or annotation regarding this variable, so the intention of the original programmer is unknown to me.   If an accurate counter is needed, then there are several techniques to implement an accurate counter that scales performance with multiple threads.

Performance schema buffer container

The Thread Sanitizer reported a couple of data races in the performance schema's object allocator.  One of the objectives of the performance schema is that it is malloc free as least in its fast path (see comment in pfs.cc).  The performance schema allocates various objects from a pool of preallocated memory called a buffer container.  The buffer container's internal organization is an array of 'pages'.  Each 'page' is an array of objects.  The intent of this data structure is to avoid thread contention when objects are allocated and freed.

The data races are on the buffer container's 'page->m_full' and 'm_full' variables, which attempt to capture whether or not the container has free objects (m_full == false) or all of the objects are allocated (m_full == true).   The Thread Sanitizer reports data races in the deallocate function which sets 'm_full = false' since it just freed an object.  This will conflict with the allocate function which sets 'm_full = true' when there are no free objects.  Since the 'm_full' variables are racy, allocation of objects from the pool may fail when in fact there are free objects in the pool.

Performance schema pinbox

The Thread Sanitizer reported a data race in the performance schema's pinbox.  Is this a real race?  We need to understand the pinbox data structure and algorithm.  The pinbox is a non-blocking stack based allocator.  It is non-blocking since the performance schema is intended to be mutex free (see pfs.cc)

The data race is reported in the pinbox's stack push operation.  The push operation reads the current top of the stack, computes a new top of stack, and then uses an compare and swap to write the new top of stack.  If the compare and swap fails, then some other thread updated the stack top and the push operation in the current thread repeats.  The Thread Sanitizer reports a data race between the racy read of the top of stack and the write of the new top of stack.  A similar data race exists in the pinbox's stack pop operation since it also manipulates the top of stack.

The pinbox's stack push and pop operations look correct to me, so the report of the possible data race should be ignored.  It can be suppressed with a Thread Sanitizer suppression.  Alternatively, the top of stack could be implemented with a C++ atomic variable where the racy read is replaced with a relaxed memory order load.  This results in a equivalent algorithm and it provides the Thread Sanitizer sufficient information to prohibit it from reporting a data race.  In fact, the boost lock free stack does this.

Conclusion

Since debugging data races in concurrent programs is hard, I am a big fan of using tools like the Thread Sanitizer to identify possible concurrency problems.   Some changes to the MySQL server code, like the use of C++ atomic variables, will be needed to make effective use of the Thread Sanitizer.

Versions

MySQL 5.7.10
Thread Sanitizer in Clang 3.8
Ubuntu 14.04

2 comments:

  1. The 'PFS_single_stat::aggregate' function maintains aggregate statistics of various counter including the min and max values. The computation in this function is not thread safe. The code does not say whether or not the statistics should be fuzzy. The Thread Sanitizer reports data races when computing these aggregate statistics. See https://s3.amazonaws.com/prohaska7-pub/mysql_5710_tsan_pfs_single_stat_aggregate.txt

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete