Now that OverlayNG has landed on JTS master, it is getting attention from downstream projects interested in porting it or using it. One of the JTS ports is the Net Topology Suite (NTS) project, and it is very proactive about tracking the JTS codebase. Soon after the OverlayNG commit an NTS developer noticed an issue while running the performance tests in JTS: the InteriorPointInAreaPerfTest was now causing a StackOverflowError. This turned out to be caused by the change to GeometryPrecisionReducer to use OverlayNG with Snap-Rounding to perform robust precision reduction of polygons. Further investigation revealed that failure was occurring while querying the KdTree used in the HotPixelIndex in the SnapRoundingNoder. Moreover, in addition to the outright failure, even when queries did succeed they were taking an excessively long time for large inputs.
The reason for using a K-D tree as the search structure for HotPixels is that it supports two kinds of queries with a single structure:
- Lookup queries to find the HotPixel for a coordinate. These queries are performed while building the index incrementally. Hence a dynamic data structure like a K-D tree is required, rather than a static index such as STRtree.
- Range queries to find all HotPixels within a given extent. These queries are run after the index is loaded.
The JTS KdTree supports both of these queries more efficiently than QuadTree, which is the other dynamic index in JTS. However, K-D trees are somewhat notorious for becoming unbalanced if inserted points are coherent (i.e. contain monotonic runs in either ordinate dimension). This is exactly the situation which occurs in large, relatively smooth polygons - which is the test data used in InteriorPointInAreaPerfTest. An unbalanced tree is deeper than it would be if perfectly balanced. In extreme cases this leads to trees of very large depth relative to their size. This slows down query performance and, because the KdTree query implementation uses recursion, can also lead to stack overflow.
A perfectly balanced K-D tree, with depth = logN. In the worst case an unbalanced tree has only one node at each level (depth = N).
I considered a few alternatives to overcome this major blocker. One was to use a QuadTree, but as mentioned this would reduce performance. There are schemes to load a K-D tree in a balanced way, but they seemed complex and potentially non-performant.
It's always great when people who file issues are also able to provide code to fix the problem. And happily the NTS developer submitted a pull request with a very nice solution. He observed that while the K-D tree was being built incrementally, in fact the inserted points were all available beforehand. He proposed randomizing them before insertion, using the elegantly simple Fisher-Yates shuffle. This worked extremely well, providing a huge performance boost and eliminated the stack overflow error. Here's the relative performance times:
|Num Pts||Randomized||In Order|
|10K||126 ms||341 ms|
|20K||172 ms||1924 ms|
|50K||417 ms||12.3 s|
|100K||1030 ms||59 s|
|200K||1729 ms||240 s|
It's surprising to me that this performance issue hasn't shown up in the other two JTS uses of KdTree: SnappingNoder and ConstrainedDelaunayTriangulator. More investigation required!