*TLDR: JTS can now fix invalid geometry!*

The **JTS Topology Suite** implements the Geometry model defined in the OGC Simple Features specification. An important part of the specification is the definition of what constitutes valid geometry. These are defined by rules about the structural and geometric characteristics of geometry objects. Some validity rules apply to all geometry; e.g. vertices must be defined by coordinates with finite numeric values (so that NaN and Inf ordinates are not valid). In addition, each geometric subtype (Point, LineString, LinearRing, Polygon, and Multi-geometry collections) has its own specific rules for validity.

The rules for Polygons and MultiPolygons are by far the most restrictive. They include the following constraints:

- Polygons rings must not self-intersect
- Rings may touch at only a finite number of points, and must not cross
- A Polygon interior must be connected (i.e. holes must not split a polygon into two parts)
- MultiPolygon elements may touch at only a finite number of points, and must not overlap

These rules guarantee that:

- a given area is represented unambiguously by a polygonal geometry
- algorithms operating on polygonal geometry can make assumptions which provide simpler implementation and more efficient processing

*Valid polygonal geometry is well-behaved*

Given the highly-constrained definition of polygonal validity, it is not uncommon that real-world datasets contain polygons which do not satisfy all the rules, and hence are invalid. This occurs for various reasons:

- Data is captured using tools which do not check validity, or which use a looser or different definition than the OGC standard
- Data is imported from systems with different polygonal models
- Data is erroneous or inaccurate

Because of this, JTS does not enforce validity on geometry creation, apart from a few simple structural constraints (such as rings having identical first and last points). This allows invalid geometry to be represented as JTS geometry objects, and processed using JTS code. Some kinds of spatial algorithms can execute correctly on invalid geometry (e.g. determining the convex hull). But most algorithms require valid input in order to ensure correct results (e.g. the spatial predicates) or to avoid throwing exceptions (e.g. overlay operations). So the main reason for representing invalid geometry is to allow validity to be tested, to take appropriate action on failure.

Often users would like "appropriate action" to be Just Make It Work. This requires converting invalid geometry to be valid. Many spatial systems provide a way to do this:

- PostGIS has the ST_MakeValid function (which is backed by an implementation in GEOS)
- QGIS has a Fix Geometries process.
- OpenJUMP has MakeValidOp
- The ESRI Java Geometry API has an operation called (confusingly) makeSimple.

But this has a been a conspicuous gap in the JTS API. While it is possible to test for validity, there has never been a way to fix an invalid geometry. To be fair, JTS has always had an *unofficial* way to make polygonal geometry valid. This is the well-known trick of computing geometry.buffer(0), which creates a valid output which often is a good match to the input. This has worked as a stop-gap for years (in spite of an issue which caused some problems, now fixed - see the post *Fixing Buffer for fixing Polygons*). However, using buffer(0) on self-intersecting "figure-8" polygons produces a "lossy" result. Specifically, it retains only the largest lobes of the input linework. This is undesirable for some uses (although it is advantageous in other situations, such as trimming off small self-intersections after polygon simplification).

*Buffer(0) of Figure-8 is lossy*

So, it's about time that JTS stepped up to provide a supported, guaranteed way of fixing invalid geometry. This should handle all geometry, although polygonal geometry repair is the most critical requirement.

*invalid*polygons, and a variety of possible approaches to fixing them. The most significant decision is how to determine the interior and exterior of a polygonal geometry with self-intersections or overlaps. (This is the classic "bow-tie" or "figure-8" - although self-intersecting polygons can be far more complex.) The question comes down to whether the geometry

**linework**or

**structure**is used to determine interior areas.

**linework**is used to create validity, to node the constituent linework to form a topologically-valid coverage. This coverage is then scanned with an alternating even-odd strategy to assign areas as interior or exterior. This may result in adjacent interior or exterior areas, in which case these are merged.

**structure**of the polygonal geometry can be taken as determinative. The shell and hole rings are assumed to accurately specify the nature of the area they enclose (interior or exterior). Likewise, the (potentially overlapping or adjacent) elements of a MultiPolygon are assumed to enclose interior area. The repair operation processes each ring and polygon separately. Holes are subtracted from shells. Finally, if required the repaired polygons are unioned to form the valid result.

*Complex output from ST_MakeValid*

## 2 comments:

I think prepair should be listed as an option (full disclosure: this is my work).

We wrote a paper years ago comparing solutions for this, and implemented 2 solutions.

==> paper: http://3dgeoinfo.bk.tudelft.nl/hledoux/pdfs/14_cgeo_prepair.pdf

==> code: https://github.com/tudelft3d/prepair

(it's linking against CGAL <v5 at this moment, we're updating the code for CGAL v5+)

Yes, prepair is an interesting alternative. But this post is just intended as an introduction to the new JTS GeometryFixer class, not as an exhaustive survey of options for geometry repair.

It would be interesting to see a comparison of the GeometryFixer algorithm against prepair, now that it is ported to GEOS and (soon) exposed in PostGIS.

Post a Comment