TLDR: JTS can now fix invalid geometry!
The JTS Topology Suite implements the Geometry model defined in the OGC Simple Features specification. An important part of the specification is the definition of what constitutes valid geometry. These are defined by rules about the structural and geometric characteristics of geometry objects. Some validity rules apply to all geometry; e.g. vertices must be defined by coordinates with finite numeric values (so that NaN and Inf ordinates are not valid). In addition, each geometric subtype (Point, LineString, LinearRing, Polygon, and Multi-geometry collections) has its own specific rules for validity.
The rules for Polygons and MultiPolygons are by far the most restrictive. They include the following constraints:
- Polygons rings must not self-intersect
- Rings may touch at only a finite number of points, and must not cross
- A Polygon interior must be connected (i.e. holes must not split a polygon into two parts)
- MultiPolygon elements may touch at only a finite number of points, and must not overlap
These rules guarantee that:
- a given area is represented unambiguously by a polygonal geometry
- algorithms operating on polygonal geometry can make assumptions which provide simpler implementation and more efficient processing
Given the highly-constrained definition of polygonal validity, it is not uncommon that real-world datasets contain polygons which do not satisfy all the rules, and hence are invalid. This occurs for various reasons:
- Data is captured using tools which do not check validity, or which use a looser or different definition than the OGC standard
- Data is imported from systems with different polygonal models
- Data is erroneous or inaccurate
Because of this, JTS does not enforce validity on geometry creation, apart from a few simple structural constraints (such as rings having identical first and last points). This allows invalid geometry to be represented as JTS geometry objects, and processed using JTS code. Some kinds of spatial algorithms can execute correctly on invalid geometry (e.g. determining the convex hull). But most algorithms require valid input in order to ensure correct results (e.g. the spatial predicates) or to avoid throwing exceptions (e.g. overlay operations). So the main reason for representing invalid geometry is to allow validity to be tested, to take appropriate action on failure.
Often users would like "appropriate action" to be Just Make It Work. This requires converting invalid geometry to be valid. Many spatial systems provide a way to do this:
But this has a been a conspicuous gap in the JTS API. While it is possible to test for validity, there has never been a way to fix an invalid geometry. To be fair, JTS has always had an unofficial way to make polygonal geometry valid. This is the well-known trick of computing geometry.buffer(0), which creates a valid output which often is a good match to the input. This has worked as a stop-gap for years (in spite of an issue which caused some problems, now fixed - see the post Fixing Buffer for fixing Polygons). However, using buffer(0) on self-intersecting "figure-8" polygons produces a "lossy" result. Specifically, it retains only the largest lobes of the input linework. This is undesirable for some uses (although it is advantageous in other situations, such as trimming off small self-intersections after polygon simplification).
So, it's about time that JTS stepped up to provide a supported, guaranteed way of fixing invalid geometry. This should handle all geometry, although polygonal geometry repair is the most critical requirement.