Hard to believe that the JTS Topology Suite is almost 20 years old. That's 140 in dog years! Despite what they say about old dogs, one of the benefits of longevity is that you have the opportunity to learn a trick or two along the way. One of the key lessons learned after the initial release of JTS is that intersection (node) detection is a fundamental part of many spatial algorithms, and critical in terms of performance. This resulted in the development of the noding package to provide an API supporting many different kinds of intersection detection and insertion.
Prior to this, intersection detection was performed as part of the GeometryGraph framework, which combined it with topology graph formation and analysis. At the time this seemed like an elegant way to maximize code reuse across many JTS operations, including overlay, buffering, spatial predicates and validation. But as often the case, there are significant costs to such general-purpose code:
- The overall codebase is substantially more complex
- A performance penalty is imposed on algorithms which don't require topology construction
- Algorithms are harder to read and understand.
- The code is brittle, and so hard to modify
- Porting the code is more difficult
Another operation that relies on GeometryGraph is the IsSimpleOp class, which implements the OGC Simple Features isSimple predicate. The algorithm for isSimple essentially involves determining if the geometry linework contains a self-intersection. GeometryGraph is unnecessarily complex for this particular task, since there is no need to compute the entire topology graph in order to find a single self-intersection. Reworking the code to use the the MCIndexNoder class in the noding API produces a much simpler and more performant implementation. I also took the opportunity to move the code to the operation.valid package, since the operations of isSimple and isValid are somewhat complementary.
Now, isSimple is probably the least-used OGC operation. Its only real use is to test for self-intersections in lines or collections of lines, and that is not a critical issue for many workflows. However, there is one situation where it is quite useful: testing that linear network datasets are "vector-clean" - i.e. contain LineStrings which touch only at their endpoints.
A linear network containing non-simple intersections (isSimple == false)
To demonstrate the performance improvement, I'll use a dataset for Rivers of the US maintained by the US National Weather Service. It supplies two datasets: a full dataset of all rivers, and a subset of major rivers only. You might expect a hydrographic network to be "vector-clean", but in fact both of these datasets contain numerous instances of self-intersections and coincident linework.
Here's the results of running the isSimple predicate on the datasets. On the larger dataset the new implementation provides a 20x performance boost!
Dataset | New time | Old time |
Subset (909,865 pts) | 0.25 s | 1 s |
Full (5,212,102 pts) | 2 s | 30 s |
Finding Non-Simple Locations
The new codebase made it easy to add a functionality enhancement that computes the locations of all places where lines self-intersect. This can be used to for visual confirmation that the operation is working as expected, and to indicate places where data quality needs to be improved. Here's the non-simple intersection points found in the river network subset:
Closeups of some non-simple intersection locations:
IsSimpleOp is the easiest algorithm to convert over from using GeometryGraph. As such it serves as a good proof-of-viability, and establishes useful code patterns for further conversions.
Next up is to give IsValidOp the same treatment. This should provide similar benefits of simplicity and performance. And as always, porting the improved code to GEOS.