Thursday, 16 May 2024

JTS Topological Relationships - the Next Generation

The most fundamental and widely-used operations in the JTS Topology Suite are the ones that evaluate topological relationships between geometries.  JTS implements the Dimensionally-Extended 9 Intersection Model (DE-9IM), as defined in the OGC Simple Features specification, in the RelateOp API.  

DE-9IM matrix for overlapping polygons

The RelateOp algorithm was the very first one implemented during the initial JTS development, over 20 years ago.  At that time it was an appealing idea to implement a general-purpose topology framework (the GeometryGraph package), and use it to support topological predicates, overlay, and buffering.  However, some disadvantages of this approach have become evident over time:

  • the need to create a topological graph structure limits the ability to improve performance. This has led to the implementation of PreparedGeometry - but that adds further complexity to the codebase, and supports only a limited set of predicates.
  • a large number of code dependencies make it hard to fix problems and improve semantics
  • constructing a full topology graph increases exposure to geometric robustness errors
  • GeometryCollections are not supported (initially because the OGC did not define the semantics for this, and now because adding this capability is difficult)
There's an extensive list of issues relating (ha!) to RelateOp here.

The importance of this functionality is especially significant since the same algorithm is implemented in GEOS.  That codebase is used to evaluate spatial queries in popular spatial APIs such as Shapely and R-sf, and numerous systems such as PostGISDuckDBSpatialLiteQGIS, and GDAL (to name just a few).  It would not be surprising to learn that the RelateOp algorithm is executed billions of times per day across the world's CPUs.

During the subsequent years of working on JTS I realized that there was a better way to evaluate topological relationships.  It would required a ground-up rewrite, but would avoid the shortcomings of RelateOp and provide better performance and a more tractable codebase.  Thanks to my employer Crunchy Data I have finally been able to make this idea a reality.  Soon JTS will provide a new algorithm for topological relationships called RelateNG.

Key Features of RelateNG

The RelateNG algorithm incorporates a broad spectrum of improvements over RelateOp in the areas of functionality, robustness, and performance.  It provides the following features:

  • Efficient short-circuited evaluation of topological predicates (including matching custom DE-9IM matrix patterns)
  • Optimized repeated evaluation of predicates against a single geometry via cached spatial indexes (AKA "prepared mode")
  • Robust computation (only point-local geometry topology is computed, so invalid topology does not cause failures)
  • GeometryCollection inputs containing mixed types and overlapping polygons are supported, using union semantics.
  • Zero-length LineStrings are treated as being topologically identical to Points.
  • Support for BoundaryNodeRules.

Using the RelateNG API

The main entry point is the RelateNG class.  It supports evaluating topological relationships in three different ways:

  • Evaluating a standard OGC named boolean binary predicate, specified via a TopologyPredicate instance.  Standard predicates are obtained from the RelatePredicate factory functions intersects, contains, overlaps, etc.
  • Testing an arbitrary DE-9IM relationship by matching an intersection matrix pattern (e.g. "T**FF*FF*", which is the pattern for a relation called Contains Properly).
  • Computing the full value of a DE-9IM IntersectionMatrix.
RelateNG operations can be executed in two modes: stateless and prepared.  Stateless mode is provided by the relate static functions, which accept two input geometries and evaluate a predicate.  For prepared mode, a single geometry is provided to the prepare functions to create a RelateNG instance.  Then the evaluate instance methods can be used to evaluate spatial predicates on a sequence of geometries. The instance creates spatial indexes (lazily) to make topology evaluation much more efficient.  Note that even stateless mode can be significantly faster than the current implementation, due to short-circuiting and other internal heuristics.

Here is an example of matching an intersection matrix pattern, in stateless mode:
boolean isMatched = RelateNG.relate(geomA, geomB, "T**FF*FF*");
Here is an example of setting up a geometry in prepared mode, and evaluating a named predicate on it:
RelateNG rng = RelateNG.prepare(geomA);
for (Geometry geomB : geomSet) {
    boolean predValue = rng.evaluate(geomB, RelatePredicate.intersects());
}

Rolling It Out

It's exciting to launch a major improvement on such a core piece of spatial functionality.  The Crunchy spatial team will get busy on porting this algorithm to GEOS.  From there it should get extensive usage in downstream projects.  We're looking forward to hearing feedback from our own PostGIS clients as well as other users.  We're always happy to be able to reduce query times and equally importantly, carbon footprints.  

In further blog posts I'll describe the RelateNG algorithm design and provide some examples of performance metrics.

Future Ideas

The RelateNG implementation provides an excellent foundation to build out some interesting extensions to the fundamental DE-9IM concept.

Extended Patterns

The current DE-9IM pattern language is quite limited.  In fact, it's not even powerful enough to express the standard named predicates.  It could be improved by adding features like:

  • disjunctive combinations of patterns.  For example, touches is defined by  "FT******* | F**T***** | F***T****"
  • dimension guards to specify which dimensions a pattern applies to.  For example, overlaps is defined by "[0,2] T*T***T** | [1] 1*T***T**"
  • while we're at it, might as well support dotted notation and spaces for readability; e.g. "FT*.***.***"

Approximate Spatial Relationships

A requirement that comes up occasionally is to compute approximate spatial relationships between imprecise data using a distance tolerance.  This is useful in datasets where features don't match exactly due to data inaccuracy. Because RelateNG creates topology only in the neighbourhood of specified points, it should be possible to specify the neighbourhood size using a distance tolerance.  Essentially, vertices and line intersections will "snap together" to determine the effective topology.  Currently within-distance queries are often used to compute "approximate intersects".  Adding a distance tolerance capability to RelateNG will allow other kinds of approximate relationships to be evaluated as well.

Further Performance Optimizations?

A challenge with implementing algorithms over a wide variety of spatial types and use cases is how to provide general-purpose code which matches (or exceeds) the efficiency of more targeted implementations.  RelateNG analyzes the input geometries and the predicate under evaluation to tune strategies to reduce the amount of work needed to evaluate the DE-9IM.  It may be that profiling specific use cases reveals further hotspots in the code which can be improved by additional optimizations.  

Curve Support

GEOS has recently added support for representing geometries with curves.  The RelateNG design is modular enough that it should be possible to extend it to allow evaluating relationships for geometries with curves.  


2 comments:

  1. Is there a release containing these upgrades planned in the (near) future?

    ReplyDelete
  2. Yes, that's becoming a bit critical! I'll make it a priority.

    ReplyDelete