Some spatial use cases require identifying "narrow" or "skinny" polygons. A classic example is the process of cleaning polygonal coverages. Coverages (even clean ones) may contain narrow gaps between polygons which are unwanted. But they may also contain larger gaps which are valid. A geometric test is required to distinguish narrow polygons which are gaps (that can be merged) from wider ones (which should be retained).
Narrowness Measures
There are various criteria which can be used to determine if a polygon is "narrow". One approach is to use an estimate of circularity such as the Polsby-Popper measure. This is a measure based on the ratio of polygon area to the square of its perimeter. Polygons with a measure closer to 1 are more circular than ones with lower values (which are thus more likely to be narrow). This is easy to implement and fast to compute, but it has significant limitations:
- it is not obvious how to pick a measure value that determines "narrowness"
- it is based on shape-wide metrics (area and perimeter), so it may not accurately reflect the local properties of a shape (i.e. a polygon may have a low circularity measure and still contain a wide "bulge" in one place.)
- because it is a ratio it is size-independent, so very small gaps (which should be removed) may be compact according to the measure, and are thus retained. (Adding an area limit can mitigate this, but that introduces yet another difficult-to-choose parameter.)
Maximum Inscribed Circle isRadiusWithin
Really, the best way is to determine the narrowness of a polygon is to compute its actual width. An effective way to do this is to compute the diameter of the largest circle which lies within the polygon. This is known as the Maximum Inscribed Circle. The JTS Topology Suite provides the MaximumInscribedCircle class to compute the centre point and radius of the required circle (see the blog post). (The algorithm is also available in downstream projects including GEOS, PostGIS and Shapely.) The algorithm uses an iterative approximation, which currently requires specifying an approximation tolerance distance.
However, the real requirement is only to test whether the MIC radius is less than a given length value. The actual value of the radius is not required. Generally, an algorithm that tests for a value can be faster and more robust than one that computes the value explicitly. This is a perfect example of this principle. By formulating the function as a test against a provided radius length execution can be much faster, since computation can terminate as soon as it is known that the radius must be smaller or larger than the required length. (Determining that the MIC radius is larger than the limit is easy: an internal point is found with distance to the polygon boundary which is greater than the limit. The radius is known to be smaller than the limit once the grid size used to generate internal points is less than one-half the limit value. If neither of those situations occurs the approximation tolerance is used to terminate the computation.) This has recently been implemented as the MaximumInscribedCircle.isRadiusWithin predicate.
A further benefit of the isRadiusWithin implementation is that it removes the need to specify the approximation tolerance. The tolerance can be determined automatically, as a fraction of the specified radius limit. This is significant, because selecting an appropriate distance tolerance for a Maximum Inscribed Circle computation is difficult, since it has no relationship to the size of the input geometry (i.e. a geometry may have a large extent, but still be very narrow.) This is particularly challenging when a single tolerance must be chosen for a set of inputs of varying shape - precisely the use case for which this is being built.
Demonstration
As expected, evaluating the isRadiusWithin predicate is much faster than computing the actual value of the Maximum Inscribed Circle radius. For a demonstration we use this dataset of world country borders (exploded to treat each island as a separate polygon). This dataset has 8,393 polygons containing a total of 366,951 vertices.
Computing the Maximum Inscribed Circle radius for each polygon, with a tolerance of 0.00001 degrees, takes 2.68 seconds.
Evaluating the isRadiusWithin predicate, using a distance tolerance of 1 degree, executes in 0.034 seconds - almost 80 times faster.
Usage in Coverage Cleaning
This function is used in the CoverageCleaner class to provide a fast test for narrow polygons (as specified by the user using the gapWidth parameter). To see how this works on a real use case, consider a polygonal coverage dataset of communes in Departement 71 in France (Saône-et-Loire). The dataset has 564 polygons, with 612,238 vertices. As shown in a previous post on coverage cleaning, this dataset has highly non-aligned linework. This creates lots of gaps. The gaps can be found by cleaning the coverage with a gap tolerance distance of 0, which reveals 28,427 gaps, with 349,042 vertices.
Computing the Maximum Inscribed Circle radius explicitly for each gap takes 25.4 seconds. In contrast, using isRadiusWithin with a distance of 0.5 takes only 1.35 seconds - 18 times faster. Since the cleaning process itself takes only 4.9 seconds, the reduction in time for the gap merging phase is significant.
No comments:
Post a Comment