The JTS Topology Suite has been rolling out the capability to manage polygonal coverages. It supports modelling polygonal coverages as arrays of discrete polygonal geometries. This is simple to work with and allows using all of the wide variety of JTS algorithms. Coverage topology allows operations such as CoverageSimplifier and CoverageUnion to be highly performant and effective.
The key requirement for coverage operations is that the input is a valid polygonal coverage. JTS provides the CoverageValidator to check a dataset to see if it is a valid coverage, and provide full information on the location of topology errors. However, this leaves the question of how to convert a dataset with errors into a valid coverage.
There are two options for fixing coverages: manual editing, and automated cleaning. CoverageValidator provides information to identify the location of all errors. However, manually fixing them is time-consuming, and difficult in environments which lack convenient editing tools. And of course it simply isn't feasible in automated workflows.
What users really want is automated coverage cleaning. This is a major pain point (witness the numerous questions about how to do this; for example, this and this and this). Clearly, cleaning coverages is considered critical!
There are several existing tools to do this:
- mapshaper is perhaps the most popular tool currently. It supports snapping to improve dataset quality, along with overlap and gap merging. It is written in Javascript, which provides surprisingly good performance, but makes it awkward to integrate in other tools.
- pprepair is an academically-advanced algorithm which uses triangulation as the basis for cleaning. It relies on CGAL, which limits appeal for integration
- GRASS v.clean is a long-standing cleaning tool with an extensive set of options for carrying out various kinds of cleaning
But in the JTS/GEOS ecosystem it's been a gap that needs filling (so to speak). So I'm excited to announce the the arrival of the JTS CoverageCleaner.
JTS CoverageCleaner class
The JTS CoverageCleaner class provides the essential capabilities of cleaning polygonal data to form a valid coverage:
- Snapping: snapping vertices and lines eliminates small discrepancies and narrow gaps, and improves the robustness of subsequent processing
- Overlap Merging: of two or more polygons are merged with a neighbouring polygon. The merge target can be chosen to be the neighbour with the longest shared border, the largest or smallest area, or the one with minimum input index (which allows priority-based merging).
- Gap Detection and Merging: Gaps are defined as enclosed empty areas with width less than a distance tolerance. Width is computed as the diameter of the Maximum Inscribed Circle of the gap polygon. (This is an improvement over other tools, which only offer cruder area or roundness-metric tests to identify gaps to be merged.) Gaps can be filled and merged with an adjacent polygon.
Other aspects of cleaning data are provided by separate JTS classes. Invalid polygonal geometries can be preprocessed with GeometryFixer to repair them. CoverageSimplifier can be used to reduce the vertex size of cleaned datasets. And a forthcoming coverage precision reducer will allow controlling the amount of precision maintained (which can reduce the space required for some storage formats).
Algorithm Description
- the SnappingNoder was developed for OverlayNG, and turned out to be ideal for eliminating minor errors and providing a robust basis for the rest of the process
- the Polygonizer allows fast recreation of a coverage topology from the noded linework
- Spatial indexes (STRtree and Quadtree) and the fast predicates of RelateNG provide performance for operating in the discrete geometric domain
- the new fast MaximumInscribedCircle.isRadiusWithin predicate allows using polygon width for gap detection
Input
Snapping and Noding
Polygonizing and Resultant Categorization
The noded linework is polygonized using the Polygonizer. This forms a clean topology of resultant polygons. Resultant polygons are associated with their parent input polygon(s) via their interior points and point-in-polygon tests. Resultants are categorized according to their number of parents:- Result area: one parent (which identifies the associated input polygon)
- Overlap: multiple parents
- Gap: no parents
Overlap Merging
Overlaps are always merged, since they violate coverage topology. They are merged with an adjacent result area chosen via a specified merge strategy. The strategy indicates whether the merge target is chosen to be the neighbour with the longest shared border, the largest or smallest area, or the one with minimum input index (which allows priority-based merging).Gap Detection and Merging
Gaps in the created coverage do not invalidate coverage topology, but they are often an artifact of invalidities, or may be errors in the original linework. For this reason a distance tolerance can be used to select gaps to be merged. Gaps which are narrower than the width tolerance are merged with the neighbour result area with longest shared border. The width of the gap polygon is determined by the diameter of the MaximumInscribedCircle. (This is an improvement over other tools, which only provide cruder area or roundness-metric tests to identify gaps.)Output
Examples
Example 1: Montana County Commissioner Districts
Example 2: Communes in France Department 71
Gap Merging Quality Issues
This is an example of a wide gap which is not merged, but which has narrow portions which could be "clipped off" and merged:
GEOS and PostGIS Ports
As usual, this work will be ported to GEOS soon, and then exposed in PostGIS as a new coverage function, It should also show up in downstream projects such as Shapely and hopefully QGIS.
Further Enhancements
Overlap and Gap Handling
There are other possibilities for controlling overlap merging and gap filling. Choosing merge targets could be determined by a priority value on each input area. Gaps could be determined by area or roundness measure criteria, as provided in other systems (although they seem less precise than using polygon width.)
Gap Partitioning and Spike Removal
Gore Removal
Gores are narrow "inlets" along the border of in a coverage. They have a similar shape to gaps, but since they are not closed they are not identified as gaps by the cleaning process. They could be identified by a different process, closed off, and then merged (ideally with partitioning as discussed above).
Coverage precision reduction
A common requirement in data management is to reduce the precision of data coordinates. For coverages this cannot be done piecewise, but must be performed as a global operation over the entire coverage, since rounding coordinates independently can create line intersections. The SnapRoundingNoder provides an effective way of doing this.