Monday 27 May 2024

RelateNG Performance

A previous post introduced a new algorithm in the JTS Topology Suite called RelateNG.  It computes topological relationships between geometries using the Dimensionally-Extended 9 Intersection Model  (DE-9IM) model.  

This algorithm is fundamental to a large proportion of spatial queries executed in numerous geospatial environments.  It would not be surprising to learn that the Relate algorithm is executed billions of times per day across the world's data centres.  Given this, performance is a critical metric.  

The original JTS RelateOp algorithm was hamstrung by its design, which required computing a full topology graph for each relationship evaluation.  The subsequent development of PreparedGeometry provided a significant performance boost for common spatial predicates such as intersects and covers.  A useful side-effect is that it is less susceptible to geometric robustness problems.  However, it has some drawbacks:

  • it only implements a small set of predicates
  • the design does not easily extend to more complex predicates
  • it does not support GeometryCollection inputs
  • it is a completely separate codebase to the general-purpose RelateOp, which increases maintenance effort and increases the risk of bugs and behaviour discrepancies
RelateNG solves all of these problems.  In addition, it provides much better performance for non-prepared (stateless) calls.  This is important for systems which do not execute spatial queries in a stateful way (due to architectural limitations, or simply lack of attention to performance), and thus cannot use prepared mode.

Performance Tests

Here are some examples of performance metrics.  
Note that the data used in the tests is sometimes subsetted to avoid skewing the results with many false results due to non-interaction between the input geometries.  This reflects the typical deployment of these predicates in systems which perform a primary extent filter (i.e. via a spatial index) before executing the full predicate.
Some of datasets are in geodetic coordinates, which strictly speaking JTS does not handle. But this does not matter for the purposes of performance testing.

Point / Polygon

This test queries a MultiPolygon of the outline of Europe (including larger islands) against a synthetic dataset of random points.  The test evaluates the intersects predicate, which is really the only one of interest in the Point / Polygon case.  

The test data metrics are:
  • Query MultiPolygon: 461 polygons, 22,675 vertices
  • Target Points: 10,000 points


OpIntersectsIntersects Prep
Relate61.7 s21 ms
RelateNG0.1 s19 ms

The result clearly shows the enormous improvement in the stateless case, since RelateNG does not build topology on the input polygon.  The results are very similar in the prepared case.  This is as expected, since both codebases run a simple Point-in-Polygon test using a cached spatial index.

Line / Line

This test uses a dataset of major rivers of the continental United States.  A single river mainstem is queried against a subset of river branches using the intersects and touches relationships, in stateless and prepared modes.

The test data metrics are:
  • Query Line: 6,975 vertices
  • Target Lines: 407 lines with 47,328 vertices

Timings (per 100 executions of the operation)

OpIntersectsIntersects PrepTouchesTouches Prep
Relate38.2 s133 ms36 sN/A
RelateNG1.18 s142 ms2.05 s2.03 s

This shows the much better performance of RelateNG.  RelateNG can evaluate touches in prepared mode, but the performance is similar to stateless mode, since currently indexes are not cached for Line/Line cases.  This could be improved in a future release.

Polygon / Polygon

This test uses data from two polygonal datasets:
The tests query an administrative unit polygon against a subset of Bedrock polygons which intersect its extent, using intersects and covers in stateless and prepared modes. 

The test data metrics are:
  • GADM unit: 4,017 vertices
  • Bedrock polygons: 4,318 polygon with 337,650 vertices

Timings (per 100 executions of the operation)

OpIntersectsIntersects PrepCoversCovers Prep
Relate61.7 s534 ms54.9 s842 ms
RelateNG5.8 s595 ms6.4 s943 ms

Polygon / Polygon - Custom Relate Pattern

The tests shows the ability of RelateNG to efficiently evaluate arbitrary Relate Intersection Matrix patterns.  In this case, the pattern used is F***0****, which corresponds to a relationship which could be called "point-touches": two geometries have boundaries which intersect in only one (or more) points (dimension 0), but whose interiors do not intersect.

This test uses data from The GADM Level 1 boundaries for Canada.  This has the advantage that Canada contains a rare example of 4 boundaries meeting at a single point (the jurisdictions of Saskatchewan, Manitoba, Northwest Territories, and Nunavut).

The test data metrics are:
  • GADM Canada Level 1: 13 polygons, 4,005,926 vertices


OpF***0****F***0**** Prep
Relate504 sN/A
RelateNG9.8 s6.6 s

As expected, for this test RelateNG is far more performant than Relate.  This is due to its ability to analyze Intersection Matrix patterns and perform only topology tests that are necessary, as well as not building a full topological structure of the inputs.  The test shows the effect of caching spatial indexes in prepared mode, although stateless mode is very efficient as well.

Analysis of Results

Clearly RelateNG is far more performant than Relate when running in non-prepared mode.  The PreparedGeometry implementation is slightly faster (which confirms the efficiency of its original design), but not by very much.  The difference may be a consequence of the general-purpose and thus more complex code and data structures in RelateNG. Closing this gap may be an area for future research.
One thing is certain: the RelateNG algorithm design provides much more scope for adding case-specific optimizations.  If you have a significant use case which could be improved by further targeted optimization improvements, let me know in the comments!
It will be interesting to re-evaluate these tests once RelateNG is ported to GEOS.  Sometimes (but not always) a C++ implementation can be significantly faster than Java, due to more opportunities for code and compiler optimization.

No comments: