Monday 13 March 2023

Simplifying Polygonal Coverages with JTS

A new capability for the JTS Topology Suite is operations to process Simple Polygonal Coverages.  A Simple Polygon Coverage is a set of edge-matched, non-overlapping polygonal geometries (which may be non-contiguous, and have holes).  Typically this is used to model an area in which every point has a value from some domain.  A classic example of a polygonal coverage is a set of administrative boundaries, such as those available from GADM or Natural Earth.

GADM polygonal coverage for France Level 1   ( 198,350 vertices)

The first coverage operations provided are:

  • Coverage Validation, to check if a set of polygons forms a topologically-valid coverage
  • Coverage Union, which takes advantage of coverage topology to provide a very fast union operation

Another operation on polygonal coverages is simplification.  Simplifying a coverage reduces the number of vertices it contains, while preserving the coverage topology.  Preserving topology means that the simplified polygons still form a valid coverage, and that polygons which had a common edge in the input coverage (i.e. which were adjacent) still share an edge in the simplified result.  Reducing dataset size via simplification can provide more efficient storage and download, and faster visualization at smaller scales.

France coverage simplified with tolerance 0.01   (7,918 vertices)
The decrease in resolution is hardly noticeable at this scale

Closeup of simplified coverage (tolerance = 0.01)

Simplification is perhaps the most requested algorithm for polygonal coverages (for example, see GIS StackExchange questions here, here and here.)  My colleague Paul Ramsey sometimes calls it the "killer app" for polygonal coverages.  Earlier this century there was no easily-available software implementing this capability.  Users often had to resort to the complex approach of extracting the linework from the dataset, dissolving it, simplifying the lines (with a tool which would not cause further overlaps), re-polygonizing, and re-attaching feature attribution to the result geometries. 

More recently tooling has emerged to provide this functionality.  Simplification is the raison-d'etre of the widely-used and cited MapShaper tool.  GRASS has the v.generalize module.  And good old OpenJUMP added Simplify Polygon Coverage a while back.  

JTS has provided the TopologyPreservingSimplifier algorithm for many years, but this only operates on Polygons and MultiPolygons, not on polygonal coverages.  (Attempting to use it on polygonal coverages can introduce gaps and overlaps, resulting in complaints like this.)  But coverage simplification has been lacking in JTS/GEOS - until now.

JTS Coverage Simplification

Recent work on Polygon Hulls provided ideas for an implementation of coverage simplification. The CoverageSimplifier class uses an area-based simplification approach similar to the well-known Visvalingam-Whyatt algorithm. This provides good results for simplifying areal features (as opposed to linear ones).  It's possible to use a Douglas-Peucker based approach as well, so this may be a future option.

The degree of simplification is determined by a tolerance value.  The value is equivalent roughly to the maximum distance a simplified edge can change (technically speaking, it is the square root of the area tolerance for the Visvalingam-Whyatt algorithm).  

The algorithm progressively simplifies all coverage edges, while ensuring that no edges cross another edge, or touch at endpoints.  This provides the maximum amount of simplification (up to the tolerance) while still maintaining the coverage topology.

The coverage of course should be valid according to the JTS CoverageValidator class.  Invalid coverages can still be simplified, but only edges with valid topology will have it maintained.

France coverage simplified with tolerance = 0.1   ( 1,928 vertices)

France coverage simplified with tolerance = 0.5    ( 1,527 vertices)
The coverage topology (adjacency relationship) is always preserved.


Inner Simplification

The implementation also provides the interesting option of Inner Simplification.  This mode simplifies only inner, shared edges, leaving the outer boundary edges unchanged.  This allows portions of a coverage to be simplified, by ensuring that the simplified polygons will fit exactly into the original coverage.  (This GIS-SE question is an example of how this can be used.)

Inner Simplification of France coverage

It would also be possible to provide Outer Simplification, where only outer edges are simplified.  It's not clear what the use case would be for this - if you have ideas, leave a comment!

GEOS and PostGIS

As usual, this algorithm will be ported to GEOS, from where it will be available to downstream projects such as PostGIS and Shapely.  

For PostGIS, the intention is to expose this as a window function (perhaps ST_CoverageSimplify).  That will make it easy to process a set of features (records) and maintain the attributes for each feature.



Monday 6 March 2023

Fast Coverage Union in JTS

The next operation delivered in the build-out of Simple Polygonal Coverages in the JTS Topology Suite is Coverage Union. This is simply the topological union of a set of polygons in a polygonal coverage, producing one or more polygons as the result.  (This is sometimes called "dissolve" in the context of polygonal coverages.)


Union of polygons has long been available in JTS, most recently (and robustly) as the UnaryUnion capability of OverlayNG.  This makes use of the Cascaded Union technique to provide good performance for large sets of polygons.  But the constrained structure of polygonal coverages means unions can be computed much faster than even Cascaded Union.  Essentially, the duplicated inner edges of adjacent polygons are identified and discarded, leaving only the outer boundary of the unioned area.  Because a valid polygonal coverage has edges which match exactly, identifying duplicate segments can be done with a fast equality test.  Also, there is no need for computationally-expensive techniques to ensure geometric robustness.

This is now available in JTS as the CoverageUnion class.

Performance

To test the performance of Coverage Union we need some large clean polygonal coverages. These are nicely provided by the GADM repository of worldwide administrative areas.  

Here is some metrics comparing the performance of Coverage Union against OverlayNG Unary Union (which uses the Cascaded Union technique).  Coverage Union is much faster for all sizes of dataset.


DatasetPolygonsVerticesCoverage UnionOverlayNG UnionTimes Faster
France level 43,728407,1020.3 s8.5 s28 x
France level 536,612729,5731.07 s13.9 s13 x
Germany level 411,3022,162,1840.68 s27.3 s40 x

Union by Attribute

It's worth noting that unioning a set of polygons in a coverage leaves the boundary of the unioned area perfectly unchanged. So subsets of a coverage can be unioned, and the result still forms a valid polygonal coverage.  This provides a fast Union by Attribute capability, which is a common spatial requirement

GEOS and PostGIS

GEOS already supports a GEOSCoverageUnion operation.  At some point it would be nice to expose this in PostGIS, most likely as a new aggregate function (perhaps ST_UnionCoverage).


Friday 27 January 2023

Alpha Shapes in JTS

Recently JTS gained the ability to compute Concave Hulls of point sets.  The algorithm used is based the Chi-shapes approach described by Duckham et al.   It works by eroding border triangles from the Delaunay Triangulation of the input points, in order of longest triangle edge length, down to a threshold length provided as a parameter value.  

Concave Hull of Ukraine (Edge-length = 10)

Alpha-Shapes

Another popular approach for computing concave hulls is the Alpha-shape construction originally proposed by Edelsbrunner et alPapers describing alpha-shapes tend to be lengthy and somewhat opaque theoretical expositions, but the actual algorithm is fairly simple.  It is also based on eroding Delaunay triangles, but uses the circumradius of the triangles as the criteria for removal.  Border triangles whose circumradius is greater than alpha are removed.  

Another way of understanding this is to imagine triangles on the border of the triangulation being "scooped out" by a disc of radius alpha:

Construction of an Alpha-shape via "scooping" with an alpha-radius disc

Given the similar basis of both algorithms, it was easy to generalize the JTS ConcaveHull class to be able to compute alpha-shapes as well.  This is now available in JTS via the ConcaveHull.alphaShape function.

Seeking Alpha

An issue that became apparent was: what is the definition of alpha?  There are at least three alternatives in use:

  • The original Edelsbrunner 1983 paper defines alpha as the inverse of the radius of the eroding disc.
  • In a later 1994 paper Edelsbrunner defines alpha as the radius of the eroding disc.  The same definition is used in his 2010 survey of alpha shape research.
  • The CGAL geometry library implements alpha to be the square of the radius of the eroding disc.  (The derivation of this definition is not obvious to me; perhaps it is intended to make the parameter have areal units?)

The simplest option is to define alpha to be the radius of the eroding disc.  This has the additional benefit of being congruent with the edge-length parameter.  For both parameters, 0 produces a result with maximum concaveness, and for values larger than some (data-dependent) value the result is the convex hull of the input.

Alpha-Shape VS Concave Hull

An obvious question is: how do alpha-shapes differ to edge-length-based concave hulls?  Having both in JTS makes it easy to compare the hulls on various datasets.  For comparison purposes the alpha and edge-length parameters are chosen to produce hulls with the same area (as close as possible).

Here's an illustration of an Alpha-shape and a Concave Hull generated for the same dataset, with essentially identical area.

Overall the shapes are fairly similar.  Alpha-shapes tend to follow the data points more closely in "shallow bays", whereas the concave hull tends to include them.  Conversely, the concave hull can sometimes erode deeper bays.  The effect of keeping shallow bays is to make the Concave Hull slightly smoother than the Alpha-shape.

Avoiding Cavities

However, there is one way in which alpha-shapes can be less well-formed than concave hulls.  The measurement of a circumradius is independent of the orientation of the triangle.  This means it is not sensitive to whether the triangle has a long or short edge on the border of the triangulation.  This can result in what I am calling "cavitation". Cavitation occurs where narrow triangles reaching deep into the interior of the dataset are removed. This in turn exposes more interior triangles to being eroded.  This is visible in the example below, where the Alpha-shape contains a large cavity which is obviously not desirable. 


Alpha-shape with a "cavity"

This is shown in more detail below.  The highlighted triangle is removed due to its large circumradius, even though it has a relatively small border edge.  This in turn exposes more triangles to being removed, forming the undesirable cavity.
Alpha-shape with Delaunay triangle forming a "cavity"

The edge-length metric does not have this problem, since it considers only the edges along the (current) border of the triangulation.  (In fact, the presence of the cavity in the example above means that the equal-area comparison strategy causes more erosion artifacts in the Concave Hull than might otherwise be present.)

For an even more egregious example of this issue, here is an Alpha-shape of the Ukraine dataset:
Alpha-shape of Ukraine (alpha = 7)

The effect of cavitation is obvious.  Although it can be eliminated by increasing the alpha radius, note that Crimean Peninsula is already less well-delineated than in the Concave Hull, and increasing alpha would make this worse. 

Recommendation

Given the risk of undesired cavitation, the safest option for computing Concave Hulls is to use the Edge-Length approach, rather than Alpha-shapes.   

Future Work

  • Best of Both? - perhaps there is a strategy which combines aspects of both Alpha-shapes and Concave Hulls, to erode shallow bays but avoid the phenomenon of cavitation?
  • Disconnected Hulls - the "classic" Alpha-shape definition allows disconnected MultiPolygons as a result.  Currently, the JTS algorithm always produces a result which is a connected single polygon.  This seems like the most commonly desired behaviour, but there is a possibility to extend the implementation to optionally allow a disconnected result .  CGAL provides a parameter to control the number of polygons in the result, which could be implemented as well.

Thursday 13 October 2022

Relational Properties of DE-9IM spatial predicates

There is an elegant mathematical theory of binary relations.  Homogeneous relations are an important subclass of binary relations in which both domains are the same.  A homogeneous relation R is a subset of all ordered pairs (x,y) with x and y elements of the domain.  This can be thought of as a boolean-valued function R(x,y), which is true if the pair has the relationship and false if not.

The restricted structure of homogeneous relations allows describing them by various properties, including:

Reflexive - R(x, x) is always true

Irreflexive - R(x, x) is never true

Symmetric - if R(x, y), then R(y, x)

Antisymmetric - if R(x, y) and R(y, x), then x = y

Transitive - if R(x, y) and R(y, z), then R(x, z)

The Dimensionally Extended 9-Intersection Model (DE-9IM) represents the topological relationship between two geometries.  

Various useful subsets of spatial relationships are specified by named spatial predicates.  These are the basis for spatial querying in many spatial systems including the JTS Topology Suite, GEOS and PostGIS.

The spatial predicates are homogeneous binary relations over the Geometry domain  They can thus be categorized in terms of the relational properties above.  The following table shows the properties of the standard predicates:

PredicateReflexive /
Irreflexive
Symmetric /
Antisymmetric
Transitive

EqualsRST
IntersectsRS-
DisjointRS-
ContainsRA-
WithinRA-
CoversRAT
CoveredByRAT
CrossesIS-
OverlapsIS-
TouchesIS-

Notes

  • Contains and Within are not Transitive because of the quirk that "Polygons do not contain their Boundary" (explained in this post).  A counterexample is a Polygon that contains a LineString lying in its boundary and interior, with the LineString containing a Point that lies in the Polygon boundary.  The Polygon does not contain the Point.  So Contains(poly, line) = true and Contains(line, pt) = true, but Contains(poly, pt) = false.   The predicates Covers and CoveredBy do not have this idiosyncrasy, and thus are transitive.

Wednesday 24 August 2022

Validating Polygonal Coverages in JTS

The previous post discussed polygonal coverages and outlined the plan to support them in the JTS Topology Suite.  This post presents the first step of the plan: algorithms to validate polygonal coverages.  This capability is essential, since coverage algorithms rely on valid input to provide correct results.  And as will be seen below, coverage validity is usually not obvious, and cannot be taken for granted.  

As described previously, a polygonal coverage is a set of polygons which fulfils a specific set of geometric conditions.  Specifically, a set of polygons is coverage-valid if and only if it satisfies the following conditions:

  • Non-Overlapping - polygon interiors do not intersect
  • Edge-Matched (also called Vector-Clean and Fully-Noded) - the shared boundaries of adjacent polygons has the same set of vertices in both polygons
The Non-Overlapping condition ensures that no point is covered by more than one polygon.  The Edge-Matched condition ensures that coverage topology is stable under transformations such as reprojection, simplification and precision reduction (since even if vertices are coincident with a line segment in the original dataset, this is very unlikely to be the case when the data is transformed).  
An invalid coverage which violates both (L) Non-Overlapping and (R) Edge-Matched conditions (note the different vertices in the shared boundary of the right-hand pair)

Note that these rules allow a polygonal coverage to cover disjoint areas.  They also allow internal gaps to occur between polygons.  Gaps may be intentional holes, or unwanted narrow gaps caused by mismatched boundaries of otherwise adjacent polygons.  The difference is purely one of size.  In the same way, unwanted narrow "gores" may occur in valid coverages.  Detecting undesirable gaps and gores will be discussed further in a subsequent post. 

Computing Coverage Validation

Coverage validity is a global property of a set of polygons, but it can be evaluated in a local and piecewise fashion.  To confirm a coverage is valid, it is sufficient to check every polygon against each adjacent (intersecting) polygon to determine if any of the following invalid situations occur:
  • Interiors Overlap:
    • the polygon linework crosses the boundary of the adjacent polygon
    • a polygon vertex lies within the adjacent polygon
    • the polygon is a duplicate of the adjacent polygon
  • Edges do not Match:
    • two segments in the boundaries of the polygons intersect and are collinear, but are not equal 
If neither of these situations are present, then the target polygon is coverage-valid with respect to the adjacent polygon.  If all polygons are coverage-valid against every adjacent polygon then the coverage as a whole is valid.

For a given polygon it is more efficient to check all adjacent polygons together, since this allows faster checking of valid polygon boundary segments.  When validation is used on datasets which are already clean, or mostly so, this improves the overall performance of the algorithm.  

Evaluating coverage validity in a piecewise way allows the validation process to be parallelized easily, and executed incrementally if required.

JTS Coverage Validation

Validation of a single coverage polygon is provided by the JTS CoveragePolygonValidator class.  If a polygon is coverage-invalid due to one or more of the above situations, the class computes the portion(s) of the polygon boundary which cause the failure(s).  This allows the locations and number of invalidities to be determined and visualized.

The class CoverageValidator computes coverage-validity for an entire set of polygons.  It reports the invalid locations for all polygons which are not coverage-valid (if any). 

Using spatial indexing makes checking coverage validity quite performant.  For example, a coverage containing 91,384 polygons with 10,474,336 vertices took only 6.4 seconds to validate.  In this case the coverage is nearly valid, since only one invalid polygon was found. The invalid boundary linework returned by CoverageValidator allows easily visualizing the location of the issue.

A polygonal dataset of 91,384 polygons, containing a single coverage-invalid polygon

The invalid polygon is a tiny sliver, with a single vertex lying a very small distance inside an adjacent polygon.  The discrepancy is only visible using the JTS TestBuilder Reveal Topology mode.

The size of the discrepancy is very small.  The vertex causing the overlap is only 0.0000000001 units away from being valid:

[921]  POLYGON(632)
[922:4]  POLYGON(5)
Ring-CW  Vert[921:0 514]  POINT ( 960703.3910000008 884733.1892000008 )
Ring-CW  Vert[922:4:0 3]  POINT ( 960703.3910000008 884733.1893000007 )

This illustrates the importance of having fast, robust automated validity checking for polygonal coverages, and providing information about the exact location of errors.

Real-world testing

With coverage validation now available in JTS, it's interesting to apply it to publicly available datasets which (should) have coverage topology.  It is surprising how many contain validity errors.  Here are a few examples:

SourceCity of Vancouver
DatasetZoning Districts


This dataset contains 1,498 polygons with 57,632 vertices. There are 379 errors identified, which mainly consist of very small discrepancies between vertices of adjacent polygons.
Example of a discrepant vertex in a polygon


SourceBritish Ordnance Survey OpenData
DatasetBoundary-Line
Fileunitary_electoral_division_region.shp



This dataset contains 1,178 polygons with 2,174,787 vertices. There are 51 errors identified, which mainly consist of slight discrepancies between vertices of adjacent polygons. (Note that this does not include gaps, which are not detected by CoverageValidator.  There are about 100 gaps in the dataset as well.)
An example of overlapping polygons in the Electoral Division dataset


SourceHamburg Open Data Platform
DatasetVerwaltungsEinheit (Administrative Units)


The dataset (slightly reduced) contains 7 polygons with 18,254 vertices.  Coverage validation produces 64 error locations.  The errors are generally small vertex discrepancies producing overlaps.  Gaps exist as well, but are not detected by the default CoverageValidator usage.
An example of overlapping polygons (and a gap) in the VerwaltungsEinheit dataset


As always, this code will be ported to GEOS.  A further goal is to provide this capability in PostGIS, since there are likely many datasets which could benefit from this checking.  The piecewise implementation of the algorithm should mesh well with the nature of SQL query execution.

And of course the next logical step is to provide the ability to fix errors detected by coverage validation.   This is a research project for the near term.

UPDATE: my colleague Paul Ramsey pointed out that he has already ported this code to GEOS.  Now for some performance testing!

Sunday 24 July 2022

Polygonal Coverages and Operations in JTS

An important concept in spatial data modelling is that of a coverage.  A coverage models a two-dimensional region in which every point has a value out of a range (which may be defined over one or a set of attributes).  Coverages can be represented in both of the main physical spatial data models: raster and vector.  In the raster data model a coverage is represented by a grid of cells with varying values.  In the vector data model a coverage is a set of non-overlapping polygons (which usually, but not always, cover a contiguous area).  

This post is about the vector data coverage model, which is termed (naturally) a polygonal coverage. These are used to model regions which are occupied by discrete sub-regions with varying sets of attribute values.  The sub-regions are modelled by simple polygons.  The coverage may contain gaps between polygons, and may cover multiple disjoint areas.  The essential characteristics are:

  • polygon interiors do not overlap
  • the common boundary of adjacent polygons has the same set of vertices in both polygons.

There are many types of data which are naturally modelled by polygonal coverages.  Classic examples include:

  • Man-made boundaries
    • parcel fabrics
    • political jurisdictions
  • Natural boundaries
    • vegetation cover
    • land use
A polygonal coverage of regions of France

Topological and Discrete Polygonal Coverages


There are two ways to represent polygonal coverages: as a topological data structure, or as a set of discrete polygons.  
A coverage topology consists of linked faces, edges and nodes. The edges between two nodes form the shared boundary between two faces.  The coverage polygons can be reconstituted from the edges delimiting each face.  
The discrete polygon representation is simpler, and aligns better with the OGC Simple Features model.  It is simply a collection of polygons which satisfy the coverage validity criteria given above.
Most common spatial data formats support only a discrete polygon model, and many coverage datasets are provided in this form.  However, the lack of inherent topology means that datasets must be carefully constructed to ensure they have valid coverage topology.  In fact, many available datasets contain coverage invalidities.  A current focus of JTS development is to provide algorithms to detect this situation and provide the locations where the polygons fail to form a valid coverage.

Polygonal Coverage Operations

Operations which can be performed on polygonal coverages include:

  • Validation - check that a set of discrete polygons forms a valid coverage
  • Gap Detection - check if a polygonal coverage contains narrow gaps (using a given distance tolerance)
  • Cleaning - fix errors such as gaps, overlaps and slivers in a polygonal dataset to ensure that it forms a clean, valid coverage
  • Simplification - simplify (generalize) polygon boundary linework, ensuring coverage topology is preserved
  • Precision Reduction - reduce precision of polygon coordinates, ensuring coverage topology is preserved
  • Union - merge all or portions of the coverage polygons into a single polygon (or multipolygon, if the input contains disjoint regions)
  • Overlay - compute the intersection of two coverages, producing a coverage of resultant polygons 

Implementing polygonal coverage operations is a current focus for development in the JTS Topology Suite.  Since most operations require a valid coverage as input, the first goal is to provide Coverage Validation.  Cleaning and Simplification are priority targets as well.  Coverage Union is already available, as is Overlay (in a slightly sub-optimal way).  In addition,  a Topology data structure will be provided to support the edge-node representation.  (Yes, the Topology Suite will provide topology at last!).  Stay tuned for further blog posts as functionality is rolled out.

As usual, the coverage algorithms developed in JTS will be ported to GEOS, and will thus be available to downstream projects like PostGIS.

Tuesday 21 June 2022

JTS 1.19 Released

JTS 1.19 has just been released!  There is a great deal of new, improved and fixed functionality in this release - see the GitHub release page or the Version History for full details.

This blog has several posts describing new functionality in JTS 1.19:

New Functionality

Improvements


Many of these improvements have been ported to GEOS, and will appear in the soon-to-appear version 3.11.  In turn this has provided the basis for new and enhanced functions in the next PostGIS release, and will likely be available in other platforms via the many GEOS bindings and applications.

Monday 30 May 2022

Algorithm for Concave Hull of Polygons

The previous post introduced the new ConcaveHullOfPolygons class in the JTS Topology Suite.  This allows computing a concave hull which is constrained by a set of polygonal geometries.  This supports use cases including:

  • generalization of groups of polygon
  • joining polygons
  • filling gaps between polygons

A concave hull of complex polygons

The algorithm developed for ConcaveHullOfPolygons is a novel one (as far as I know).  It uses several features recently developed for JTS, including a neat trick for constrained triangulation.  This post describes the algorithm in detail.

The construction of a concave hull for a set of polygons uses the same approach as the existing JTS ConcaveHull implementation.  The space to be filled by a concave hull is triangulated with a Delaunay triangulation.  Triangles are then "eroded" from the outside of the triangulation, until a criteria for termination is achieved.  A useful termination criteria is that of maximum outside edge length, specified as either an absolute length or a fraction of the range of edge lengths.

For a concave hull of points, the underlying triangulation is easily obtained via the Delaunay Triangulation of the point set.  However, for a concave hull of polygons the triangulation required is for the space between the constraint polygons.  A simple Delaunay triangulation of the polygon vertices will not suffice, because the triangulation may not respect the edges of the polygons.  

Delaunay Triangulation of polygon vertices crosses polygon edges

What is needed is a Constrained Delaunay Triangulation, with the edge segments of the polygons as constraints (i.e. the polygon edge segments are present as triangle edges, which ensures that other edges in the triangulation do not cross them).  There are several algorithms for Constrained Delaunay Triangulations - but a simpler alternative presented itself.  JTS recently added an algorithm for computing Delaunay Triangulations for polygons.  This algorithm supports triangulating polygons with holes (via hole joining).  So to generate a triangulation of the space between the input polygons, they can be inserted as holes in a larger "frame" polygon.  This can be triangulated, and then the frame triangles removed. Given a sufficiently large frame, this leaves the triangulation of the "fill" space between the polygons, out to their convex hull. 

Triangulation of frame with polygons as holes

The triangulation can then be eroded using similar logic to the non-constrained Concave Hull algorithm.  The implementations all use the JTS Tri data structure, so it is easy and efficient to share the triangulation model between them. 

Triangulation after removing frame and eroding triangles

The triangles that remain after erosion can be combined with the input polygons to provide the result concave hull.  The triangulation and the input polygons form a polygonal coverage, so the union can be computed very efficiently using the JTS CoverageUnion class.  If required, the fill area alone can be returned as a result, simply by omitting the input polygons from the union.

Concave Hull and Concave Fill

A useful option is to compute a "tight" concave hull to the outer boundary of the input polygons.  This is easily accomplished by removing triangles which touch only a single polygon.  

Concave Hull tight to outer edges


Concave Hull of complex polygons, tight to outer edges.

Like the Concave Hull of Points algorithm, holes are easily supported by allowing erosion of interior triangles.

Concave Hull of Polygons, allowing holes

The algorithm performance is determined by the cost of the initial polygon triangulation.  This is quite efficient, so the overall performance is very good.

As mentioned, this seems to be a new approach to this geometric problem.  The only comparable implementation I have found is the ArcGIS tool called Aggregate Polygons, which appears to provide similar functionality (including producing a tight outer boundary).  But of course algorithm details are not published and the code is not available.  It's much better to have an open source implementation, so it can be used in spatial tools like PostGIS, Shapely and QGIS (based on the port to GEOS).  Also, this provides the ability to add options and enhanced functionality for use cases which may emerge once this gets some real-world use.

Friday 20 May 2022

Concave Hulls of Polygons

A common spatial need is to compute a polygon which contains another set of polygons.  There are numerous use cases for this; for example:

  • Generalizing groups of building outlines (questions: 1, 2
  • Creating "district" polygons around block polygons (questions: 1)
  • Removing gaps between sets of polygons (questions: 1, 2, 3, 4)
  • Joining two polygons by filling the space between them (questions: 1, 2)     
This post describes a new approach for solving these problems, via an algorithm for computing a concave hull with polygonal constraints.  The algorithm builds on recent work on polygon triangulation in JTS, and uses a neat trick which I'll describe in a subsequent post.  

Approach: Convex Hull

The simplest way to compute an area enclosing a set of geometries is to compute their convex hull.  But the convex hull is a fairly coarse approximation of the area occupied by the polygons, and in most cases a better representation is required.  

Here's an example of gap removal between two polygons.  Obviously, the convex hull does not provide anything close to the desired result: 

Approach: Buffer and Unbuffer

A popular suggestion is to buffer the polygon set by a distance sufficient to "bridge the gaps", and then "un-buffer" the result inwards by the same (negative) distance.  But the buffer computation can "round off" corners, which usually produces a poor match to the input polygons.  It also fills in the outer boundary of the original polygons.

Approach: Concave Hull of Points

A more sophisticated approach is to use a concave hull algorithm.  But most (or all?) available concave hull algorithms use points as the input constraints. The vertices of the polygons could be used as the constraint points, but since the polygon boundaries are not respected, the computed hull may cross the polygon edges and hence not cover the polygons.  

Densifying the polygon boundaries helps, but introduces another problem - the computed hull can extend beyond the outer boundaries of individual polygons.  And it introduces new vertices not present in the original data.

Solution: Concave Hull of Polygons

What is needed is a concave hull algorithm that accepts polygons as constraints, and thus respects their boundaries.  The JTS Topology Suite now provides this capability in a class called ConcaveHullOfPolygons (not a cute name, but descriptive).  It provides exactly the solution desired for the gap removal example:  

The Concave Hull of Polygons API

Like concave hulls of point sets, concave hulls of polygons form a sequence of hulls, with the amount of concaveness determined by a numeric parameter. ConcaveHullOfPolygons uses the same parameters as the JTS ConcaveHull algorithm.  The control parameter determines the maximum line length in the triangulation underlying the hull.  This can be specified as an absolute length, or as a ratio between the longest and shortest lines.  

Further options are:

  • The computed hull can be kept "tight" to the outer boundaries of the individual polygons.  This allows filling gaps between polygons without distorting their original outer boundaries.  Otherwise, the concaveness of the outer boundary will be decreased to match the distance parameter specified (which may be desirable in some situations).
  • Holes can be allowed to be present in the computed hull
  • Instead of the hull, the fill area between the input polygons can be computed.  
As usual, this code will be ported to GEOS, and from there it can be exposed in the downstream libraries and projects.

Examples of Concave Hulls of Polygons

Here are examples of using ConcaveHullOfPolygons for the use cases above:

Example: Generalizing Building Groups

Using the "tight" option allows following the outer building outlines.


Example: Aggregating Block Polygons

The concave hull of a set of block polygons for an oceanside suburb.  Note how the "tight" option allows the hull to follow the convoluted, fine-grained coastline on the right side.

Example: Removing Gaps to Merge Polygons

Polygons separated by a narrow gap can be merged by computing their concave hull using a small distance and keeping the boundary tight.

Example: Fill Area Between Polygons

The "fill area" portion of the hull between two polygons can be computed as a separate polygon. This could be used to provide an "Extend to Meet" construction by unioning the fill polygon with one of the input polygons.  It can also be used to determine the "visible boundary", provided by the intersection of the fill polygon with the input polygon(s).



Wednesday 4 May 2022

Using Outer Hulls for Smoothing Vectorized Polygons

The electrons were hardly dry on the JTS Outer and Inner Polygon Hull post when another interesting use case popped up on GIS StackExchange.  The question was how to remove aliasing artifacts (AKA "jaggies") from polygons created by vectorizing raster data, with the condition that the result should contain the original polygon.  

A polygon for Vancouver Island vectorized from a coarse raster dataset.  Aliasing artifacts are obvious.

This problem is often handled by applying a simplification or smoothing process to the "jaggy" polygon boundary.  This works, as long as the process preserves polygonal topology (e.g. such as the JTS TopologyPreservingSimplifier). But generally this output of this process does not contain the input polygon, since the simplification/smoothing can alter the boundary inwards as well as outwards.  

Simplification using TopologyPreservingSimplifier with distance = 0.1.  Artifacts are removed, but the simplified polygon does not fully contain the original.

In contrast, the JTS Polygon Outer Hull algorithm is designed to do exactly what is required: it reduces the number of vertices, while guaranteeing that the input polygon is contained in the result.  It is essentially a simplification method which also preserves polygonal topology (using an area-based approach similar to the Visvalingham-Whyatt algorithm).

Outer Hull using vertex ratio = 0.25.  Artifacts are removed, and the original polygon is contained in the hull polygon.

Here's a real-world example, taken from the GADM dataset for administrative areas of Germany.  The coastline of the state of Mecklenburg-Vorpommern appears to have been derived from a raster, and thus exhibits aliasing artifacts.  Computing the outer hull with a fairly conservative parameter eliminates most of the artifacts, and ensures polygonal topology is preserved.

A portion of the coastline of Mecklenburg-Vorpommern showing aliasing artifacts.  The Outer Hull computed with vertex ratio = 0.25 eliminates most artifacts, and preserves the coastline topology.

Future Work

A potential issue for using Outer Hull as a smoothing technique is the choice of parameter value controlling the amount of change.  The algorithm provides two options: the ratio of reduction in the number of vertices, or the fraction of change in area allowed.  Both of these are scale-independent, and reflect natural goals for controlling simplification.  But neither relate directly to the goal of removing "stairstep" artifacts along the boundary.  This might be better specified via a distance-based parameter. The parameter value could then be determined based on the known artifact size (i.e. the resolution of the underlying grid).  Since the algorithm for Outer Hull is quite flexible, this should be feasible to implement. 

Monday 25 April 2022

Outer and Inner Polygon Hulls in JTS

The JTS Topology Suite recently gained the ability to compute concave hulls.  The Concave Hull algorithm computes a polygon enclosing a set of points using a parameter to determine the "tightness".  However, for polygonal inputs the computed concave hull is built only using the polygon vertices, and so does not always respect the polygon boundaries.  This means the concave hull may not contain the input polygon.

It would be useful to be able to compute the "outer hull" of a polygon.  This is a valid polygon formed by a subset of the vertices of the input polygon which fully contains the input polygon.  Vertices can be eliminated as long as the resulting boundary does not self-intersect, and does not cross into the original polygon.

An outer hull of a polygon representing Switzerland

As with point-set concave hulls, the vertex reduction is controlled by a numeric parameter. This creates a sequence of hulls of increasingly larger area with smaller vertex counts.  At an extreme value of the parameter, the outer hull is the same as the convex hull of the input.
A sequence of outer hulls of New Zealand's North Island

The outer hull concept extends to handle holes and MultiPolygons.  In all cases the hull boundaries are constructed so that they do not cross each other, thus ensuring the validity of the result.

An outer hull of a MultiPolygon for the coast of Denmark.  The hull polygons do not cross.

It's also possible to construct inner hulls of polygons, where the constructed hull is fully within the original polygon.

An inner hull of Switzerland

Inner hulls also support holes and MultiPolygons.  At an extreme value of the control parameter, holes become convex hulls, and a polygon shell reduces to a triangle (unless blocked by the presence of holes).

An inner hull of a lake with islands.  The island holes become convex hulls, and prevent the outer shell from reducing fully to a triangle

A hull can provide a significant reduction in the vertex size of a polygon for a minimal change in area. This could allow faster evaluation of spatial predicates, by pre-filtering with smaller hulls of polygons.

An outer hull of Brazil provides a 10x reduction in vertex size, with only ~1% change in area.

This has been on the JTS To-Do list for a while (I first proposed it back in 2009).  At that time it was presented as a way of simplifying polygonal geometry. Of course JTS has had the TopologyPreservingSimplifier for many years.  But it doesn't compute a strictly outer hull.  Also, it's based on Douglas-Peucker simplification, which isn't ideal for polygons.  

It seems there's quite a need for this functionality, as shown in these GIS-StackExchange posts (1, 2, 3, 4).  There's even existing implementations on Github: rdp-expansion-only and simplipy (both in Python) - but both of these sound like they have some significant issues. 

Recent JTS R&D (on concave hulls and polygon triangulation) has provided the basis for an effective, performant polygonal concave hull algorithm.  This is now released as the PolygonHullSimplifier class in JTS.

The PolygonHullSimplifier API

Polygon hulls have the following characteristics:

  • Hulls can be constructed for Polygons and MultiPolygons, including holes.
  • Hull geometries have the same structure as the input.  There is a one-to-one correspondence for  elements, shells and holes.
  • Hulls are valid polygonal geometries.
  • The hull vertices are a subset of the input vertices.

The PolygonHullSimplifier algorithm supports computing both outer and inner hulls. 

  • Outer hulls contain the input geometry.  Vertices forming concave corners (convex for holes) are removed.  The maximum outer hull is the convex hull(s) of the input polygon(s), with holes reduced to triangles.
  • Inner hulls are contained within the input geometry.  Vertices forming convex corners (concave for holes) are removed.   The minimum inner hull is a triangle contained in (each) polygon, with holes expanded to their convex hulls.  

The number of vertices removed is controlled by a numeric parameter.  Two different parameters are provided:

  • the Vertex Number Fraction specifies the desired result vertex count as a fraction of the number of input vertices.  The value 1 produces the original geometry.  Smaller values produce simpler hulls.  The value 0 produces the maximum outer or minimum inner hull.
  • the Area Delta Ratio specifies the desired maximum change in the ratio of the result area to the input area.  The value 0 produces the original geometry.  Larger values produce simpler hulls. 
Defining the parameters as ratios means they are independent of the size of the input geometry, and thus easier to specify for a range of inputs.  Both parameters are targets rather than absolutes; the validity constraint means the result hull may not attain the specified value in some cases. 

Algorithm Description

The algorithm removes vertices via "corner clipping".  Corners are triangles formed by three consecutive vertices in a (current) boundary ring of a polygon.  Corners are removed when they meet certain criteria.  For an outer hull, a corner can be removed if it is concave (for shell rings) or convex (for hole rings).  For an inner hull the removable corner orientations are reversed.  

In both variants, corners are removed only if the triangle they form does not contain other vertices of the (current) boundary rings.  This condition prevents self-intersections from occurring within or between rings. This ensures the resulting hull geometry is topologically valid.  Detecting triangle-vertex intersections is made performant by maintaining a spatial index on the vertices in the rings.  This is supported by an index structure called a VertexSequencePackedRtree.  This is a semi-static R-tree built on the list of vertices of each polygon boundary ring.  Vertex lists typically have a high degree of spatial coherency, so the constructed R-tree generally provides good space utilization.  It provides fast bounding-box search, and supports item removal (allowing the index to stay consistent as ring vertices are removed).

Corners that are candidates for removal are kept in a priority queue ordered by area.  Corners are removed in order of smallest area first.  This minimizes the amount of change for a given vertex count, and produces a better quality result.  Removing a corner may create new corners, which are inserted in the priority queue for processing.  Corners in the queue may be invalidated if one of the corner side vertices has previously been removed; invalid corners are discarded. 

This algorithm uses techniques originated for the Ear-Clipping approach used in the JTS PolgyonTriangulator implementation. It also has a similarity to the Visvalingham-Whyatt simplification algorithm.  But as far as I know using this approach for computing outer and inner hulls is novel. (After the fact I found a recent paper about a similar construction called a Shortcut Hull [Bonerath et al 2020], but it uses a different approach).

Further Work

It should be straightforward to use this same approach to implement a variant of Topology-Preserving Simplifier using the corner-area-removal approach (as in Visvalingham-Whyatt simplification).  The result would be a simplified, topologically-valid polygonal geometry.  The simplification parameter  limits the number of result vertices, or the net change in area.  The resulting shape would be a good approximation of the input, but is not necessarily be either wholly inside or outside.