Wednesday, 26 May 2021

JTS IsSimple gets simpler (and faster)

Hard to believe that the JTS Topology Suite is almost 20 years old.  That's 140 in dog years!  Despite what they say about old dogs, one of the benefits of longevity is that you have the opportunity to learn a trick or two along the way.  One of the key lessons learned after the initial release of JTS is that intersection (node) detection is a fundamental part of many spatial algorithms, and critical in terms of performance.  This resulted in the development of the noding package to provide an API supporting many different kinds of intersection detection and insertion.

Prior to this, intersection detection was performed as part of the GeometryGraph framework, which combined it with topology graph formation and analysis.  At the time this seemed like an elegant way to maximize code reuse across many JTS operations, including overlay, buffering, spatial predicates and validation.  But as often the case, there are significant costs to such general-purpose code:

  • The overall codebase is substantially more complex
  • A performance penalty is imposed on algorithms which don't require topology construction
  • Algorithms are harder to read and understand.  
  • The code is brittle, and so hard to modify
  • Porting the code is more difficult  
Because of this, a focus of JTS development is to free operations from their dependency on GeometryGraph - with the ultimate goal of expunging it from the JTS codebase.  A major step along this road was the rewrite of the overlay operations.

Another operation that relies on GeometryGraph is the IsSimpleOp class, which implements the OGC Simple Features isSimple predicate.  The algorithm for isSimple essentially involves determining if the geometry linework contains a self-intersection.  GeometryGraph is unnecessarily complex for this particular task, since there is no need to compute the entire topology graph in order to find a single self-intersection.  Reworking the code to use the the MCIndexNoder class in the noding API produces a much simpler and more performant implementation. I also took the opportunity to move the code to the operation.valid package, since the operations of isSimple and isValid are somewhat complementary.

Now, isSimple is probably the least-used OGC operation.  Its only real use is to test for self-intersections in lines or collections of lines, and that is not a critical issue for many workflows.  However, there is one situation where it is quite useful: testing that linear network datasets are "vector-clean" - i.e. contain LineStrings which touch only at their endpoints.

A linear network containing non-simple intersections (isSimple == false)

To demonstrate the performance improvement, I'll use a dataset for Rivers of the US maintained by the US National Weather Service.  It supplies two datasets: a full dataset of all rivers, and a subset of major rivers only.  You might expect a hydrographic network to be "vector-clean", but in fact both of these datasets contain numerous instances of self-intersections and coincident linework.

Here's the results of running the isSimple predicate on the datasets.  On the larger dataset the new implementation provides a 20x performance boost!

 Dataset New time  Old time 
  Subset  (909,865 pts)   0.25 s   1 s
  Full   (5,212,102 pts)  2 s  30 s

Finding Non-Simple Locations

The new codebase made it easy to add a functionality enhancement that computes the locations of all places where lines self-intersect.  This can be used to for visual confirmation that the operation is working as expected, and to indicate places where data quality needs to be improved.  Here's the non-simple intersection points found in the river network subset:

Closeups of some non-simple intersection locations:

IsSimpleOp is the easiest algorithm to convert over from using GeometryGraph.  As such it serves as a good proof-of-viability, and establishes useful code patterns for further conversions. 

Next up is to give IsValidOp the same treatment.  This should provide similar benefits of simplicity and performance.  And as always, porting the improved code to GEOS.

Monday, 3 May 2021

Fixing Invalid Geometry with JTS

TLDR: JTS can now fix invalid geometry!

The JTS Topology Suite implements the Geometry model defined in the OGC Simple Features specification.  An important part of the specification is the definition of what constitutes valid geometry.  These are defined by rules about the structural and geometric characteristics of geometry objects.  Some validity rules apply to all geometry; e.g. vertices must be defined by coordinates with finite numeric values (so that NaN and Inf ordinates are not valid).  In addition, each geometric subtype (Point, LineString, LinearRing, Polygon, and Multi-geometry collections) has its own specific rules for validity.

The rules for Polygons and MultiPolygons are by far the most restrictive.  They include the following constraints:

  1. Polygons rings must not self-intersect
  2. Rings may touch at only a finite number of points, and must not cross
  3. A Polygon interior must be connected (i.e. holes must not split a polygon into two parts)
  4. MultiPolygon elements may touch at only a finite number of points, and must not overlap

These rules guarantee that:

  • a given area is represented unambiguously by a polygonal geometry
  • algorithms operating on polygonal geometry can make assumptions which provide simpler implementation and more efficient processing

Valid polygonal geometry is well-behaved

Given the highly-constrained definition of polygonal validity, it is not uncommon that real-world datasets contain polygons which do not satisfy all the rules, and hence are invalid. This occurs for various reasons:

  • Data is captured using tools which do not check validity, or which use a looser or different definition than the OGC standard
  • Data is imported from systems with different polygonal models
  • Data is erroneous or inaccurate 

Because of this, JTS does not enforce validity on geometry creation, apart from a few simple structural constraints (such as rings having identical first and last points).  This allows invalid geometry to be represented as JTS geometry objects, and processed using JTS code.  Some kinds of spatial algorithms can execute correctly on invalid geometry (e.g. determining the convex hull).  But most algorithms require valid input in order to ensure correct results (e.g. the spatial predicates) or to avoid throwing exceptions (e.g. overlay operations).  So the main reason for representing invalid geometry is to allow validity to be tested, to take appropriate action on failure.

Often users would like "appropriate action" to be Just Make It Work.  This requires converting invalid geometry to be valid.  Many spatial systems provide a way to do this: 

But this has a been a conspicuous gap in the JTS API.  While it is possible to test for validity, there has never been a way to fix an invalid geometry.  To be fair, JTS has always had an unofficial way to make polygonal geometry valid.  This is the well-known trick of computing geometry.buffer(0), which creates a valid output which often is a good match to the input. This has worked as a stop-gap for years (in spite of an issue which caused some problems, now fixed - see the post Fixing Buffer for fixing Polygons). However, using buffer(0) on self-intersecting "figure-8" polygons produces a "lossy" result.  Specifically, it retains only the largest lobes of the input linework.  This is undesirable for some uses (although it is advantageous in other situations, such as trimming off small self-intersections after polygon simplification).

Buffer(0) of Figure-8 is lossy 

So, it's about time that JTS stepped up to provide a supported, guaranteed way of fixing invalid geometry. This should handle all geometry, although polygonal geometry repair is the most critical requirement.

This raises the question of what exactly the semantics of repairing polygons should be.  While validity is well-specified, there are no limits to the complexity of invalid polygons, and a variety of possible approaches to fixing them.  The most significant decision is how to determine the interior and exterior of a polygonal geometry with self-intersections or overlaps. (This is the classic "bow-tie" or "figure-8" - although self-intersecting polygons can be far more complex.) The question comes down to whether the geometry linework or structure is used to determine interior areas.  

If linework is used to create validity, to node the constituent linework to form a topologically-valid coverage.  This coverage is then scanned with an alternating even-odd strategy to assign areas as interior or exterior.  This may result in adjacent interior or exterior areas, in which case these are merged.

Alternatively, the structure of the polygonal geometry can be taken as determinative. The shell and hole rings are assumed to accurately specify the nature of the area they enclose (interior or exterior).  Likewise, the (potentially overlapping or adjacent) elements of a MultiPolygon are assumed to enclose interior area.  The repair operation processes each ring and polygon separately.  Holes are subtracted from shells.  Finally, if required the repaired polygons are unioned to form the valid result.  

PostGIS MakeValid and the ESRI Java Geometry API makeSimple both use the linework approach.  However, for some relatively simple invalid geometries this produces results which seem overly complex.  
Complex output from ST_MakeValid

For this reason I have implemented the structure-based approach in JTS.  It provides results that are closer to the existing buffer(0) technique (and conveniently allows using the existing buffer code).  This made it a relatively simple matter to implement the repair algorithm as the GeometryFixer class.

Here's some examples of how GeometryFixer works.  First, the example above, showing the (arguably) simpler result that arises from using the structure information:
Figure-8s are handled as desired (keeping all area):

Self-overlapping shells have all interior area preserved:
Of course, the GeometryFixer also handles simple fixes for all geometry types, such as removing invalid coordinates.

One further design decision is how to handle geometries which are invalid due to collapse (e.g. a line with a single point, or a ring which has only two unique vertices).  GeometryFixer provides an option to either remove collapses, or to return them as equivalent lower dimensional geometry. 

To see the full range of effects of GeometryFixer, the JTS TestBuilder can be used to view and run the GeometryFixer on the set of 75 test cases for invalid polygonal geometry in the file TestInvalidA.xml

It's been a long time coming, but finally JTS can function as a full-service repair shop for geometry, no matter how mangled it might be.