# Concurrent Disjoint Set Union

Author/​Artist:
Jayanti, Siddhartha [Browse]
Format:
Senior thesis
Language:
English
Disjoint set union is a classical problem in sequential data structures with a wide range of applications. The compressed trees algorithm with a good linking rule, such as linking by rank, or a good compaction rule, such as splitting, yields an amortized logarithmic time solution. In fact, using both rules together yields an optimal amortized inverse-Ackermann (almost constant) time solution to the problem. However, even constant time algorithms may be too slow for some huge practical instances of set union that arise in model checking, making concurrent set union algorithms paramount.A previous work by Anderson and Woll gave an algorithm for concurrent set union that uses the $$CAS{}$$(compare-and-swap) synchronization instruction, which is commonly supported on multi-processors. However, their algorithm's amortized work per operation on a set-union instance with $$n$$ nodes and $$p$$ processes was $$\Theta(\alpha(n,0) + p)$$. Unfortunately, the $$p$$-term implies that the work per operation grows {\em linearly} with the number of processes, which in effect means that using more processors does not reduce the time per operation over a sequential implementation.In this thesis we develop randomized and deterministic $$CAS{}$$-based algorithms for disjoint set union that reduce the concurrency overhead to a {\em logarithmic} term in $$p$$. Our strongest randomized algorithm has a worst-case work per operation of $$O(\log n)$$ with high probability and an expected work per operation of $$\Theta(\alpha(n, \frac{m}{np}) + \log (\frac{np}{m} + 1)),$$ where $$m$$ is the total number of operations performed by all processes.Our strongest deterministic algorithm uses the two-word compare-and-swap operation to guarantee a worst-case work per operation of $$O(\log n)$$ and an amortized work per operation of $$\Theta(\alpha(n, \frac{m}{np}) + \log (\frac{np}{m} + 1))$$. Finally, we establish that logarithmic concurrency overhead is inherent to all symmetric algorithms for the problem by proving that any symmetric algorithm for set union must perform at least $$\Omega(\alpha(n, \frac{m}{n}) + \log (\frac{np}{m} + 1))$$ amortized work per operation.