Monday, 3 May 2021

Fixing Invalid Geometry with JTS

TLDR: JTS can now fix invalid geometry!

The JTS Topology Suite implements the Geometry model defined in the OGC Simple Features specification.  An important part of the specification is the definition of what constitutes valid geometry.  These are defined by rules about the structural and geometric characteristics of geometry objects.  Some validity rules apply to all geometry; e.g. vertices must be defined by coordinates with finite numeric values (so that NaN and Inf ordinates are not valid).  In addition, each geometric subtype (Point, LineString, LinearRing, Polygon, and Multi-geometry collections) has its own specific rules for validity.

The rules for Polygons and MultiPolygons are by far the most restrictive.  They include the following constraints:

  1. Polygons rings must not self-intersect
  2. Rings may touch at only a finite number of points, and must not cross
  3. A Polygon interior must be connected (i.e. holes must not split a polygon into two parts)
  4. MultiPolygon elements may touch at only a finite number of points, and must not overlap

These rules guarantee that:

  • a given area is represented unambiguously by a polygonal geometry
  • algorithms operating on polygonal geometry can make assumptions which provide simpler implementation and more efficient processing

Valid polygonal geometry is well-behaved

Given the highly-constrained definition of polygonal validity, it is not uncommon that real-world datasets contain polygons which do not satisfy all the rules, and hence are invalid. This occurs for various reasons:

  • Data is captured using tools which do not check validity, or which use a looser or different definition than the OGC standard
  • Data is imported from systems with different polygonal models
  • Data is erroneous or inaccurate 

Because of this, JTS does not enforce validity on geometry creation, apart from a few simple structural constraints (such as rings having identical first and last points).  This allows invalid geometry to be represented as JTS geometry objects, and processed using JTS code.  Some kinds of spatial algorithms can execute correctly on invalid geometry (e.g. determining the convex hull).  But most algorithms require valid input in order to ensure correct results (e.g. the spatial predicates) or to avoid throwing exceptions (e.g. overlay operations).  So the main reason for representing invalid geometry is to allow validity to be tested, to take appropriate action on failure.

Often users would like "appropriate action" to be Just Make It Work.  This requires converting invalid geometry to be valid.  Many spatial systems provide a way to do this: 

But this has a been a conspicuous gap in the JTS API.  While it is possible to test for validity, there has never been a way to fix an invalid geometry.  To be fair, JTS has always had an unofficial way to make polygonal geometry valid.  This is the well-known trick of computing geometry.buffer(0), which creates a valid output which often is a good match to the input. This has worked as a stop-gap for years (in spite of an issue which caused some problems, now fixed - see the post Fixing Buffer for fixing Polygons). However, using buffer(0) on self-intersecting "figure-8" polygons produces a "lossy" result.  Specifically, it retains only the largest lobes of the input linework.  This is undesirable for some uses (although it is advantageous in other situations, such as trimming off small self-intersections after polygon simplification).

Buffer(0) of Figure-8 is lossy 

So, it's about time that JTS stepped up to provide a supported, guaranteed way of fixing invalid geometry. This should handle all geometry, although polygonal geometry repair is the most critical requirement.

This raises the question of what exactly the semantics of repairing polygons should be.  While validity is well-specified, there are no limits to the complexity of invalid polygons, and a variety of possible approaches to fixing them.  The most significant decision is how to determine the interior and exterior of a polygonal geometry with self-intersections or overlaps. (This is the classic "bow-tie" or "figure-8" - although self-intersecting polygons can be far more complex.) The question comes down to whether the geometry linework or structure is used to determine interior areas.  

If linework is used to create validity, to node the constituent linework to form a topologically-valid coverage.  This coverage is then scanned with an alternating even-odd strategy to assign areas as interior or exterior.  This may result in adjacent interior or exterior areas, in which case these are merged.

Alternatively, the structure of the polygonal geometry can be taken as determinative. The shell and hole rings are assumed to accurately specify the nature of the area they enclose (interior or exterior).  Likewise, the (potentially overlapping or adjacent) elements of a MultiPolygon are assumed to enclose interior area.  The repair operation processes each ring and polygon separately.  Holes are subtracted from shells.  Finally, if required the repaired polygons are unioned to form the valid result.  

PostGIS MakeValid and the ESRI Java Geometry API makeSimple both use the linework approach.  However, for some relatively simple invalid geometries this produces results which seem overly complex.  
Complex output from ST_MakeValid

For this reason I have implemented the structure-based approach in JTS.  It provides results that are closer to the existing buffer(0) technique (and conveniently allows using the existing buffer code).  This made it a relatively simple matter to implement the repair algorithm as the GeometryFixer class.

Here's some examples of how GeometryFixer works.  First, the example above, showing the (arguably) simpler result that arises from using the structure information:
Figure-8s are handled as desired (keeping all area):

Self-overlapping shells have all interior area preserved:
Of course, the GeometryFixer also handles simple fixes for all geometry types, such as removing invalid coordinates.

One further design decision is how to handle geometries which are invalid due to collapse (e.g. a line with a single point, or a ring which has only two unique vertices).  GeometryFixer provides an option to either remove collapses, or to return them as equivalent lower dimensional geometry. 

To see the full range of effects of GeometryFixer, the JTS TestBuilder can be used to view and run the GeometryFixer on the set of 75 test cases for invalid polygonal geometry in the file TestInvalidA.xml

It's been a long time coming, but finally JTS can function as a full-service repair shop for geometry, no matter how mangled it might be.