Tuesday, 2 February 2021

OverlayNG and Invalid Geometry

A recent blog post by Elephant Tamer gives a critical appraisal of the improvements to overlay processing shipped in PostGIS 3.1 with GEOS 3.9.  The author is disappointed that PostGIS still reports errors when overlay is used on invalid geometry.  However, this is based on a misunderstanding of the technology. 

GEOS 3.9 includes OverlayNG, ported from the JTS Topology Suite). It brings a major advance in overlay robustness, along with other improvements (described here and here).  Previously, robustness limitations in the overlay algorithm could sometimes cause errors even for inputs which were topologically valid.  This was doubly problematic because there was no fully effective way to process the input geometries to avoid the errors.  Now, OverlayNG solves this problem completely.  Valid inputs will always produce a valid and essentially correct(*) output.

(*) "Essentially" correct, because in order to achieve full robustness a snapping heuristic may be applied to the input geometry.  However, this is done with a very fine tolerance, so should not appreciably alter the output from the theoretically correct value.

But for invalid inputs, OverlayNG will still report errors.  The reason is that there is a wide variety of gruesome ways in which geometry can be invalid.  Automated handling of invalidity would involve expensive extra processing, and also require making assumptions about what area a corrupt geometry is intended to represent.  Rather than silently repairing invalid geometry and returning potentially incorrect results, the design decision is to report this situation as an error.

In fact, OverlayNG is able to handle "mildly" invalid polygons, as described in this post. This covers situations which are technically invalid according to the OGC SFS specification, but which still have well-defined topology.  This includes self-touching rings (sometimes called "inverted polygons" or "exverted holes"), and zero-width gores and spikes.

Taking a detailed look at the data used in the blog post, we can see these improvements at work.  The dataset is the ecology polygons obtained from the GDOS WFS server.  This contains 7662 geometries, of which 10 are invalid.  Using the old overlay algorithm, 9 of these invalid polygons cause TopologyException errors.  Using OverlayNG, only 4 of them cause errors. 

The polygons that can now be processed successfully are typical "OGC-invalid" situations, which do not materially affect the polygonal topology.  These include self-touching rings with pinch points:

and zero-width gores:


Of the cases that still cause errors, two are classic small bow-tie errors:


And two are wildly invalid self-crossing rings:



The last two are good examples of geometry which is so invalid that it is impossible to unambiguously decide what area is represented (although ST_MakeValid will happily grind them into something that is technically valid).

Ultimately it is the user's responsibility to ensure that geometries to be processed by overlay (and many other PostGIS functions) have valid topology (as reported by ST_IsValid).  Ideally this is done by correcting the data at source.  But it can also be done a posteriori in the database itself, by either the ST_MakeValid function, or the well-known buffer(0) trick. (Which to use is a topic for another blog post...)

One improvement that could be made is to check for input validity when OverlayNG throws an error.  Then PostGIS can report definitively that an overlay error is caused by invalid input.  If there is an overlay error that is not caused by invalidity, the PostGIS team wants to hear about it! 

And perhaps there is a case to be made for repairing invalid geometry automatically, even if the repair is suspect.  Possibly this could be invoked via a flag parameter on the overlay functions. More research is required - feedback is welcome!