Saturday 9 October 2021

Query KD-trees 100x faster with this one weird trick!

Recently a GEOS patch was contributed to change the KdTree query implementation to use an explicit stack rather than recursion.  This has been ported to JTS as PR #779 (along with some refactoring).

The change was motivated by a QGIS issue in which a union of some large polygons caused a stack overflow during a KdTree query.  The reason is that the poor vertex alignment of the input polygons causes the union overlay process to invoke the SnappingNoder.  (For background about why this occurs see the post OverlayNG - Noding Strategies).  The snapping noder snaps vertices using a KdTree with a tolerance.  The polygon boundaries contain runs of coherent (nearly-monotonic) vertices (which is typical for high-resolution polygons delineating natural areas).  When these are loaded directly into a KdTree the tree can become unbalanced.  

An unbalanced tree has a relatively large depth for its size.  The diagrams below shows the graph of the KdTree for a polygon containing 10,552 vertices.  The unbalanced nature of the tree is evident from the few subtrees extending much deeper than the rest of the tree. The tree depth is 282, but most of that occurs in a single subtree.

An unbalanced KD-tree (size = 10,552,  depth = 282)

Left: full graph.     Right: close-up of graph structure

For large inputs querying such a deep subtree causes the recursive tree traversal to exceed the available call stack. Switching to an iterative implementation eliminates the stack overflow error, since the memory-based stack can grow to whatever size is needed.

However, stack overflow is not the only problem!  Unbalanced KD-trees also exhibit poor query performance.  This is because deep tree traversals make the search runtime more linear than logarithmic.   Increasing the size of the query stack increases robustness, but does not improve the performance problem.  The best solution is to build the KD-tree in a balanced way in the first place. 

One way to do this is to to break up monotonic runs by randomizing the order of vertex insertion.  This is actually implemented in the OverlayNG SnapRoundingNoder, as discussed in the post Randomization to the Rescue.  However, using this approach in the SnappingNoder would effectively query the tree twice for each vertex.  And in fact it's not necessary to randomize all the inserted points.  A better-balanced tree can be produced by "seeding" it with a small set of well-chosen points.  

But how can the points be chosen?   They could be selected randomly, but true randomness can be quite "clumpy".  Also, a non-deterministic algorithm is undesirable for repeatability.  Both of these issues can be avoided by using a low-discrepancy quasi-random sequence.  This concept has a fascinating theory, with many approaches available.  The simplest to implement is an additive recurrence sequence with a constant α (the notation {...} means "fractional value of"):

R(α) :  tn = { t0 + n * α },  n = 1,2,3,...    

If α is irrational the sequence never repeats (within the limits of finite-precision floating point).  The excellent article The Unreasonable Effectiveness of Quasirandom Sequences suggests that the lowest possible discrepancy occurs when α is the reciprocal of the Golden Ratio φ. 

1 / φ = 0.6180339887498949...

Implementing this is straightforward, and it is remarkably effective.  For the example above, the resulting KD-tree graph is significantly better-balanced, with lower depth (85 versus 282):

Better-balanced KD-tree (size = 10,552,  depth = 85)

Adding KD-tree seeding to the SnappingNoder gives a huge performance boost to the QGIS test case.  For the 526,466 input vertices the tree depth is reduced from 17,776 to 183 and the snapping time is improved by about 100 times, from ~80 s to ~800 ms!  (In this case the overall time for the union operation is reduced by only about 50%, since the remainder of the overlay process is time-consuming due to the large number of holes in the output.)

In GEOS the performance difference is smaller, but still quite significant.  The snapping time improves 14x, from 12.2 s to 0.82 s, with the union operation time decreasing by 40%, from 31 to 18 secs.

This improvement will be merged into JTS and GEOS soon.