Monday, 13 March 2023

Simplifying Polygonal Coverages with JTS

A new capability for the JTS Topology Suite is operations to process Simple Polygonal Coverages.  A Simple Polygon Coverage is a set of edge-matched, non-overlapping polygonal geometries (which may be non-contiguous, and have holes).  Typically this is used to model an area in which every point has a value from some domain.  A classic example of a polygonal coverage is a set of administrative boundaries, such as those available from GADM or Natural Earth.

GADM polygonal coverage for France Level 1   ( 198,350 vertices)

The first coverage operations provided are:

  • Coverage Validation, to check if a set of polygons forms a topologically-valid coverage
  • Coverage Union, which takes advantage of coverage topology to provide a very fast union operation

Another operation on polygonal coverages is simplification.  Simplifying a coverage reduces the number of vertices it contains, while preserving the coverage topology.  Preserving topology means that the simplified polygons still form a valid coverage, and that polygons which had a common edge in the input coverage (i.e. which were adjacent) still share an edge in the simplified result.  Reducing dataset size via simplification can provide more efficient storage and download, and faster visualization at smaller scales.

France coverage simplified with tolerance 0.01   (7,918 vertices)
The decrease in resolution is hardly noticeable at this scale

Closeup of simplified coverage (tolerance = 0.01)

Simplification is perhaps the most requested algorithm for polygonal coverages (for example, see GIS StackExchange questions here, here and here.)  My colleague Paul Ramsey sometimes calls it the "killer app" for polygonal coverages.  Earlier this century there was no easily-available software implementing this capability.  Users often had to resort to the complex approach of extracting the linework from the dataset, dissolving it, simplifying the lines (with a tool which would not cause further overlaps), re-polygonizing, and re-attaching feature attribution to the result geometries. 

More recently tooling has emerged to provide this functionality.  Simplification is the raison-d'etre of the widely-used and cited MapShaper tool.  GRASS has the v.generalize module.  And good old OpenJUMP added Simplify Polygon Coverage a while back.  

JTS has provided the TopologyPreservingSimplifier algorithm for many years, but this only operates on Polygons and MultiPolygons, not on polygonal coverages.  (Attempting to use it on polygonal coverages can introduce gaps and overlaps, resulting in complaints like this.)  But coverage simplification has been lacking in JTS/GEOS - until now.

JTS Coverage Simplification

Recent work on Polygon Hulls provided ideas for an implementation of coverage simplification. The CoverageSimplifier class uses an area-based simplification approach similar to the well-known Visvalingam-Whyatt algorithm. This provides good results for simplifying areal features (as opposed to linear ones).  It's possible to use a Douglas-Peucker based approach as well, so this may be a future option.

The degree of simplification is determined by a tolerance value.  The value is equivalent roughly to the maximum distance a simplified edge can change (technically speaking, it is the square root of the area tolerance for the Visvalingam-Whyatt algorithm).  

The algorithm progressively simplifies all coverage edges, while ensuring that no edges cross another edge, or touch at endpoints.  This provides the maximum amount of simplification (up to the tolerance) while still maintaining the coverage topology.

The coverage of course should be valid according to the JTS CoverageValidator class.  Invalid coverages can still be simplified, but only edges with valid topology will have it maintained.

France coverage simplified with tolerance = 0.1   ( 1,928 vertices)

France coverage simplified with tolerance = 0.5    ( 1,527 vertices)
The coverage topology (adjacency relationship) is always preserved.


Inner Simplification

The implementation also provides the interesting option of Inner Simplification.  This mode simplifies only inner, shared edges, leaving the outer boundary edges unchanged.  This allows portions of a coverage to be simplified, by ensuring that the simplified polygons will fit exactly into the original coverage.  (This GIS-SE question is an example of how this can be used.)

Inner Simplification of France coverage

It would also be possible to provide Outer Simplification, where only outer edges are simplified.  It's not clear what the use case would be for this - if you have ideas, leave a comment!

GEOS and PostGIS

As usual, this algorithm will be ported to GEOS, from where it will be available to downstream projects such as PostGIS and Shapely.  

For PostGIS, the intention is to expose this as a window function (perhaps ST_CoverageSimplify).  That will make it easy to process a set of features (records) and maintain the attributes for each feature.



Monday, 6 March 2023

Fast Coverage Union in JTS

The next operation delivered in the build-out of Simple Polygonal Coverages in the JTS Topology Suite is Coverage Union. This is simply the topological union of a set of polygons in a polygonal coverage, producing one or more polygons as the result.  (This is sometimes called "dissolve" in the context of polygonal coverages.)


Union of polygons has long been available in JTS, most recently (and robustly) as the UnaryUnion capability of OverlayNG.  This makes use of the Cascaded Union technique to provide good performance for large sets of polygons.  But the constrained structure of polygonal coverages means unions can be computed much faster than even Cascaded Union.  Essentially, the duplicated inner edges of adjacent polygons are identified and discarded, leaving only the outer boundary of the unioned area.  Because a valid polygonal coverage has edges which match exactly, identifying duplicate segments can be done with a fast equality test.  Also, there is no need for computationally-expensive techniques to ensure geometric robustness.

This is now available in JTS as the CoverageUnion class.

Performance

To test the performance of Coverage Union we need some large clean polygonal coverages. These are nicely provided by the GADM repository of worldwide administrative areas.  

Here is some metrics comparing the performance of Coverage Union against OverlayNG Unary Union (which uses the Cascaded Union technique).  Coverage Union is much faster for all sizes of dataset.


DatasetPolygonsVerticesCoverage UnionOverlayNG UnionTimes Faster
France level 43,728407,1020.3 s8.5 s28 x
France level 536,612729,5731.07 s13.9 s13 x
Germany level 411,3022,162,1840.68 s27.3 s40 x

Union by Attribute

It's worth noting that unioning a set of polygons in a coverage leaves the boundary of the unioned area perfectly unchanged. So subsets of a coverage can be unioned, and the result still forms a valid polygonal coverage.  This provides a fast Union by Attribute capability, which is a common spatial requirement

GEOS and PostGIS

GEOS already supports a GEOSCoverageUnion operation.  At some point it would be nice to expose this in PostGIS, most likely as a new aggregate function (perhaps ST_UnionCoverage).


Friday, 27 January 2023

Alpha Shapes in JTS

Recently JTS gained the ability to compute Concave Hulls of point sets.  The algorithm used is based the Chi-shapes approach described by Duckham et al.   It works by eroding border triangles from the Delaunay Triangulation of the input points, in order of longest triangle edge length, down to a threshold length provided as a parameter value.  

Concave Hull of Ukraine (Edge-length = 10)

Alpha-Shapes

Another popular approach for computing concave hulls is the Alpha-shape construction originally proposed by Edelsbrunner et alPapers describing alpha-shapes tend to be lengthy and somewhat opaque theoretical expositions, but the actual algorithm is fairly simple.  It is also based on eroding Delaunay triangles, but uses the circumradius of the triangles as the criteria for removal.  Border triangles whose circumradius is greater than alpha are removed.  

Another way of understanding this is to imagine triangles on the border of the triangulation being "scooped out" by a disc of radius alpha:

Construction of an Alpha-shape via "scooping" with an alpha-radius disc

Given the similar basis of both algorithms, it was easy to generalize the JTS ConcaveHull class to be able to compute alpha-shapes as well.  This is now available in JTS via the ConcaveHull.alphaShape function.

Seeking Alpha

An issue that became apparent was: what is the definition of alpha?  There are at least three alternatives in use:

  • The original Edelsbrunner 1983 paper defines alpha as the inverse of the radius of the eroding disc.
  • In a later 1994 paper Edelsbrunner defines alpha as the radius of the eroding disc.  The same definition is used in his 2010 survey of alpha shape research.
  • The CGAL geometry library implements alpha to be the square of the radius of the eroding disc.  (The derivation of this definition is not obvious to me; perhaps it is intended to make the parameter have areal units?)

The simplest option is to define alpha to be the radius of the eroding disc.  This has the additional benefit of being congruent with the edge-length parameter.  For both parameters, 0 produces a result with maximum concaveness, and for values larger than some (data-dependent) value the result is the convex hull of the input.

Alpha-Shape VS Concave Hull

An obvious question is: how do alpha-shapes differ to edge-length-based concave hulls?  Having both in JTS makes it easy to compare the hulls on various datasets.  For comparison purposes the alpha and edge-length parameters are chosen to produce hulls with the same area (as close as possible).

Here's an illustration of an Alpha-shape and a Concave Hull generated for the same dataset, with essentially identical area.

Overall the shapes are fairly similar.  Alpha-shapes tend to follow the data points more closely in "shallow bays", whereas the concave hull tends to include them.  Conversely, the concave hull can sometimes erode deeper bays.  The effect of keeping shallow bays is to make the Concave Hull slightly smoother than the Alpha-shape.

Avoiding Cavities

However, there is one way in which alpha-shapes can be less well-formed than concave hulls.  The measurement of a circumradius is independent of the orientation of the triangle.  This means it is not sensitive to whether the triangle has a long or short edge on the border of the triangulation.  This can result in what I am calling "cavitation". Cavitation occurs where narrow triangles reaching deep into the interior of the dataset are removed. This in turn exposes more interior triangles to being eroded.  This is visible in the example below, where the Alpha-shape contains a large cavity which is obviously not desirable. 


Alpha-shape with a "cavity"

This is shown in more detail below.  The highlighted triangle is removed due to its large circumradius, even though it has a relatively small border edge.  This in turn exposes more triangles to being removed, forming the undesirable cavity.
Alpha-shape with Delaunay triangle forming a "cavity"

The edge-length metric does not have this problem, since it considers only the edges along the (current) border of the triangulation.  (In fact, the presence of the cavity in the example above means that the equal-area comparison strategy causes more erosion artifacts in the Concave Hull than might otherwise be present.)

For an even more egregious example of this issue, here is an Alpha-shape of the Ukraine dataset:
Alpha-shape of Ukraine (alpha = 7)

The effect of cavitation is obvious.  Although it can be eliminated by increasing the alpha radius, note that Crimean Peninsula is already less well-delineated than in the Concave Hull, and increasing alpha would make this worse. 

Recommendation

Given the risk of undesired cavitation, the safest option for computing Concave Hulls is to use the Edge-Length approach, rather than Alpha-shapes.   

Future Work

  • Best of Both? - perhaps there is a strategy which combines aspects of both Alpha-shapes and Concave Hulls, to erode shallow bays but avoid the phenomenon of cavitation?
  • Disconnected Hulls - the "classic" Alpha-shape definition allows disconnected MultiPolygons as a result.  Currently, the JTS algorithm always produces a result which is a connected single polygon.  This seems like the most commonly desired behaviour, but there is a possibility to extend the implementation to optionally allow a disconnected result .  CGAL provides a parameter to control the number of polygons in the result, which could be implemented as well.