Tuesday, 18 January 2022

Concave Hulls in JTS

A common spatial need is to find a polygon that accurately represents a set of points.  The convex hull of the points often does not provide this, since it can enclose large areas which contain no points.  What is required is a non-convex hull, often termed the concave hull.  

The Convex Hull and a Concave Hull of a point set

A concave hull is generally considered to have some or all of the following properties:

  • The hull is a simply connected polygon
  • It contains all the input points
  • The vertices in the hull polygon boundary are all input points
  • The hull may or may not contain holes
For a typical point set there are many polygons which meet these criteria, with varying degrees of concaveness.  Concave Hull algorithms provide a numeric parameter which controls the amount of concaveness in the result.  The nature of this parameter is particularly important, since it affects the ease-of-use in practical scenarios.  Ideally it has the following characteristics:

  • Simple geometric basis: this allows the user to understand the effect of the parameter and aids in determining an effective value
  • Scale-free (dimensionless): this allows a single parameter value to be effective on varying sizes of geometry, which is essential for batch or automated processing
  • Local (as opposed to global):  A local property (such as edge length) gives the algorithm latitude to determine the concave shape of the points. A global property (such as area) over-constrains the possible result. 
  • Monotonic area:  larger (or smaller) values produce a sequence of more concave areas
  • Monotonic containment :the sequence of hulls produced are topologically nested
  • Convex-bounded: an extremal value produces the convex hull

This is a well-studied problem, and many different approaches have been proposed.  Some notable ones are:

Of these, Delaunay Erosion (Chi-shapes) offers the best set of features.  It is straightforward to code and is performant.  It uses the control parameter of Edge Length Ratio, a fraction of the difference between the longest and shortest edges in the underlying Delaunay triangulation.  This is easy to reason about, since it is scale-free and corresponds to a simple property of the point set (that of distance between vertices).  It can be extended to support holes.  And it has a track record of use, notably in Oracle Spatial.  

ConcaveHull generated by Delaunay Erosion with Edge Length Ratio = 0.3

Recently the Park-Oh algorithm has become popular, thanks to a high-quality implementation in Concaveman project (which has spawned numerous ports).  However, it has some drawbacks.  It can't support holes (and likely not disconnected regions and discarding outlier points).  As the paper points out and experiment confirms, it produces rougher outlines than the Delaunay-based algorithm.  Finally, the control parameter for Delaunay Erosion has a simpler geometrical basis which makes it easier to use.

Given these considerations, the new JTS ConcaveHull algorithm utilizes Delaunay Erosion. The algorithm ensures that the computed hull is simply connected, and contains all the input points.  The Edge Length Ratio is used as the control parameter. A value of 1 produces the convex hull; 0 produces a concave hull of minimal size.  Alternatively the maximum edge length can be specified directly. This allows alternative strategies to determine an appropriate length value; for instance, another possibility is to use a fraction of the longest edge in the Minimum Spanning Tree of the input points.  

The recently-added Tri data structure provides a convenient basis for the implementation,.  It operates as follows:

  1. The Delaunay Triangulation of the input points is computed and represented as a set of of Tris
  2. The Tris on the border of the triangulation are inserted in a priority queue, sorted by longest boundary edge
  3. While the queue is non-empty, the head Tri is popped from the queue.  It is removed from the triangulation if it does not disconnect the area.  Insert new border Tris into the queue if they have a boundary edge length than the target length
  4. The Tris left in the triangulation form the area of the Concave Hull 

Thanks to the efficiency of the JTS Delaunay Triangulation the implementation is quite performant, approaching the performance of a Java port of Concaveman.  

Concave Hull of Ukraine dataset; Edge Length Ratio = 0.1

Optionally holes can be allowed to be present in the hull polygon (while maintaining a simply connected result).  A classic demonstration of this is to generate hulls for text font glyphs:


This algorithm is in the process of being ported to GEOS.  The intention is to use it to enhance the PostGIS ST_ConcaveHull function, which has known issues and has proven difficult to use.

Further Ideas

  • Disconnected Result - It is straightforward to extend the algorithm to allow a disconnected result (i.e. a MultiPolygon).  This could be provided as an option.
  • Outlier Points - It is also straightforward to support discarding "outlier" points.
  • Polygon Concave Hull - computing a concave "outer hull" for a polygon can be used to simplify the polygon while guaranteeing the hull contains the original polygon.  Additionally, an "inner hull" can be computed which is fully contained in the original.  The implementation of a Polygon Concave Hull algorithm is well under way and will be released in JTS soon. 



Monday, 3 January 2022

JTS Offset Curves

Offset curves (also known as parallel curves) are an oft-requested feature in JTS.  They are a natural extension to the concept of buffering, and are useful for things like placing labels along rivers.  As far as I know there is no hard-and-fast definition for how an offset curve should be constructed, but a reasonable semantic would seem to be "a line lying on one side of another line at a given offset distance"

Here's an image of offset curves on both sides of a river reach from a US rivers dataset:

GEOS Implementation

GEOS acquired an implementation to produce offset curves a few years ago.  However, it has some known bugs (such as producing disconnected curves, and including extraneous segments).  It also has the feature (or quirk?) of potentially producing a result with multiple disjoint curves for a single input line. 

The GEOS implementation is based on the concept that an offset curve is just a portion of the corresponding buffer boundary.  So to generate an offset curve the algorithm extracts a portion of the buffer polygon boundary.  The trick is deciding which portion!  

The GEOS implementation generates a raw offset curve (potentially full of self-intersections and unwanted linework) and then determines the intersection of that curve with the buffer boundary.  However, the use of intersection on linework is always tricky.  The buffer geometry is necessarily only a (close) approximation, and the buffer algorithm takes advantage of this to use various heuristics to improve the quality and robustness of the generated buffer linework.  This can cause the buffer linework to diverge from the raw offset curve.  The divergence makes the intersection result susceptible to errors caused by slight differences between the generated curves. The two issues above are caused by this limitation. 

JTS Implementation

Instead of using intersection, an effective technique is to match geometry linework using a distance tolerance.  This is the approach taken in the new JTS Offset Curve algorithm. The high-level design is

  1. The buffer is generated at the offset distance
  2. The raw offset curve is generated at the same distance
  3. The raw curve segments are matched to the buffer boundary using a distance tolerance
  4. The offset curve is the section of the buffer boundary between the first and last matching points.
To make the matching efficient a spatial index is created on the buffer curve segments.  

This algorithm provides the following semantics:
  • The offset curve of a line is a single LineString
  • The offset curve lies on one side of the input line (to the left if the offset distance is positive, and to the right if negative)
  • The offset curve has the same direction as the input line
  • The distance between the input line and the offset curve equals the offset distance (up to the limits of curve quantization and numerical precision)
This algorithm does a fine job at generating offset curves for typical simple linework.  The image below shows a family of offset curves on both sides of a convoluted line.  
This resolves both of the GEOS code issues.  It also supports parameters for join style (round, bevel, and mitre), number of quadrant segments (for round joins) and mitre limit:
Join Style = MITRE
Join Style = ROUND; Quadrant Segments = 2

There are also a few nuances required to handle some tricky situations; in particular, cases when the input line string curves back on itself and when it self-intersects.  These do not have a clear definition for what the result should be.  Moreover, the algorithm itself imposes some constraints on how these cases can be handled.  The images below show how the algorithm behaves in these cases.

"Loopbacks"produce offset curves representing only the "exposed" sections of the input linework: 
Offset Curves of a Line with "loop-backs"

For a self-intersecting input line, the offset curve starts at the beginning of the input line on the specified side, and continues only up to where the line side changes due to the crossing.  The length of the offset curve is also reduced by the requirement that it be no closer than specified offset distance to the input line: 
Offset Curves of a Line with a self-intersection

This algorithm is now in JTS, and has been ported to GEOS.