Monday, December 17, 2018

Hangs when using trylock reader writer lock functions


The pthreads reader writer locks have a couple of problems that can lead to application hangs. The trylock functions can cause other threads to hang indefinitely in the rdlock or wrlock functions even after the lock is no longer held.  If these hanging threads hold other locks, the application can deadlock and grind to a halt.  This bug was reported in glibc bug 23844 and has existed since glibc 2.25 was released.  The 2.25 version includes a new reader writer lock implementation which replaces large critical sections with compare and exchange atomic instructions.  The code paths for an uncontended lock should be very fast.  Unfortunately, a couple of bugs that can hang the application need to be fixed.

The trywrlock function is a non-blocking function that tries to grab a write lock on a reader writer lock.  Unfortunately, this function is missing logic that the wrlock function has when the phase of the reader writer lock transitions from read phase to write phase.  This missing logic is intended to synchronize the state of the lock with other reader threads so that when the write lock is released, blocked reader threads are awakened.  Since the logic is missing, the reader threads are not awakened.

The tryrdlock function is a non-blocking function that tries to grab a read lock on a reader writer lock.  Unfortunately, this function is missing logic that the rdlock function has when the phase of the reader writer lock transitions from write phase to read phase.  This missing logic awakens reader threads that may be waiting for the read phase.  It just so happened that the thread executing tryrdlock got there first.  Since the logic is missing, the reader threads are not awakened.

There are patches to the trywrlock and tryrdlock functions in the glibc bug report that fix these bugs.  Hopefully, these patches will be included in a new glibc release soon.

The MySQL server uses reader writer locks for various purposes.  I wonder if it is affected by these reader writer lock bugs.