Tuesday, 2 February 2021

OverlayNG and Invalid Geometry

A recent blog post by Elephant Tamer gives a critical appraisal of the improvements to overlay processing shipped in PostGIS 3.1 with GEOS 3.9.  The author is disappointed that PostGIS still reports errors when overlay is used on invalid geometry.  However, this is based on a misunderstanding of the technology. 

GEOS 3.9 includes OverlayNG, ported from the JTS Topology Suite). It brings a major advance in overlay robustness, along with other improvements (described here and here).  Previously, robustness limitations in the overlay algorithm could sometimes cause errors even for inputs which were topologically valid.  This was doubly problematic because there was no fully effective way to process the input geometries to avoid the errors.  Now, OverlayNG solves this problem completely.  Valid inputs will always produce a valid and essentially correct(*) output.

(*) "Essentially" correct, because in order to achieve full robustness a snapping heuristic may be applied to the input geometry.  However, this is done with a very fine tolerance, so should not appreciably alter the output from the theoretically correct value.

But for invalid inputs, OverlayNG will still report errors.  The reason is that there is a wide variety of gruesome ways in which geometry can be invalid.  Automated handling of invalidity would involve expensive extra processing, and also require making assumptions about what area a corrupt geometry is intended to represent.  Rather than silently repairing invalid geometry and returning potentially incorrect results, the design decision is to report this situation as an error.

In fact, OverlayNG is able to handle "mildly" invalid polygons, as described in this post. This covers situations which are technically invalid according to the OGC SFS specification, but which still have well-defined topology.  This includes self-touching rings (sometimes called "inverted polygons" or "exverted holes"), and zero-width gores and spikes.

Taking a detailed look at the data used in the blog post, we can see these improvements at work.  The dataset is the ecology polygons obtained from the GDOS WFS server.  This contains 7662 geometries, of which 10 are invalid.  Using the old overlay algorithm, 9 of these invalid polygons cause TopologyException errors.  Using OverlayNG, only 4 of them cause errors. 

The polygons that can now be processed successfully are typical "OGC-invalid" situations, which do not materially affect the polygonal topology.  These include self-touching rings with pinch points:

and zero-width gores:


Of the cases that still cause errors, two are classic small bow-tie errors:


And two are wildly invalid self-crossing rings:



The last two are good examples of geometry which is so invalid that it is impossible to unambiguously decide what area is represented (although ST_MakeValid will happily grind them into something that is technically valid).

Ultimately it is the user's responsibility to ensure that geometries to be processed by overlay (and many other PostGIS functions) have valid topology (as reported by ST_IsValid).  Ideally this is done by correcting the data at source.  But it can also be done a posteriori in the database itself, by either the ST_MakeValid function, or the well-known buffer(0) trick. (Which to use is a topic for another blog post...)

One improvement that could be made is to check for input validity when OverlayNG throws an error.  Then PostGIS can report definitively that an overlay error is caused by invalid input.  If there is an overlay error that is not caused by invalidity, the PostGIS team wants to hear about it! 

And perhaps there is a case to be made for repairing invalid geometry automatically, even if the repair is suspect.  Possibly this could be invoked via a flag parameter on the overlay functions. More research is required - feedback is welcome!

Wednesday, 13 January 2021

Using the GEOS geosop CLI

In a previous post I announced the new geosop command-line interface (CLI) for the GEOS geometry API.  This post provides examples of using geosop for various tasks. (Refer to the README for more information about the various options used.)

API Testing

geosop makes testing the GEOS library much easier.  Previously, testing the behaviour of the API usually required breaking out the C compiler (and updating autotools and cmake build files, and deciding whether to commit the new code for later use or throw it away, etc, etc).  Now testing is often just a matter of invoking a geosop operation on appropriate data, or at worst adding a few lines of code to the exiting framework.

For example, there is a long-standing issue with how GEOS handles number formatting in WKT output.  There are recent bug reports about this in GeoSwift and the Julia LibGEOS.  geosop makes it easy to run the test cases and see the less-than-desirable output:

geosop -a "POINT (654321.12 0.12)" -f wkt
POINT (654321.1199999999953434 0.1200000000000000)

geosop -a "POINT (-0.4225977234 46.3406448)" -f wkt
POINT (-0.4225977234000000 46.3406447999999997)

There's also an issue with precision handling.  To test this we added a --precision parameter to geosop. (This is the kind of rapid development enabled by having the CLI codebase co-resident with the API.)

geosop -a "POINT (654321.126 0.126)" --precision 2 -f wkt
POINT (6.5e+05 0.13)

Again we see undesirable behaviour. Using scientific notation for small numbers is unnecessary and difficult to read. And the precision value is determining the number of significant digits, not the number of decimal places as intended by the GEOS WKTWriter.setRoundingPrecision API.

These were all caused by using standard C/C++ numeric formatting, which is surprisingly limited and non-useful. After some fine work by Paul Ramsey to integrate the much better Ryu library, GEOS now has WKT output that is sensible and handles precision in a useful way.

By default, WKTWriter now nicely round-trips WKT text:

geosop -a "POINT (654321.12 0.12)" -f wkt
POINT (654321.12 0.12)

geosop -a "POINT (-0.4225977234 46.3406448)" -f wkt
POINT (-0.4225977234 46.3406448)

If WKTWriter.setRoundingPrecision or GEOSWKTWriter_setRoundingPrecision is called, the precision value applies to the decimal part of the number: 

geosop -a "POINT (654321.1234567 0.126)" --precision 0 -f wkt
POINT (654321 0)

geosop -a "POINT (654321.1234567 0.126)" --precision 2 -f wkt
POINT (654321.12 0.13)

geosop -a "POINT (654321.1234567 0.126)" --precision 4 -f wkt
POINT (654321.1235 0.126)

geosop -a "POINT (654321.1234567 0.126)" --precision 6 -f wkt
POINT (654321.123457 0.126)

Performance Testing

A key use case for geosop is to provide easy performance testing.  Performance of geometric operations is highly data-dependent.  It's useful to be able to run operations over different datasets and measure performance.  This allows detecting performance hotspots, and confirming the efficiency of algorithms.

As a simple example of performance testing, for many years GEOS has provided optimized spatial predicates using the concept of a prepared geometry.  Prepared geometry uses cached spatial indexes to dramatically improve performance for repeated spatial operations against a geometry.  Here is a performance comparison of the intersects spatial predicate in its basic and prepared form. 

geosop -a world.wkt -b world.wkt -t intersects
Ran 59,536 intersects ops ( 179,072,088 vertices)
  -- 16,726,188 usec    (GEOS 3.10.0dev)

geosop -a world.wkt -b world.wkt -t intersectsPrep
Ran 59,536 intersectsPrep ops ( 179,072,088 vertices)
  -- 1,278,348 usec    (GEOS 3.10.0dev)

The example of testing the world countries dataset against itself is artificial, but it shows off the dramatic 16x performance boost provided by using a prepared operation. 

Another interesting use is longitudinal testing of GEOS performance across different versions of the library.  For instance, here's a comparison of the performance of interiorPoint between GEOS 3.7 and 3.8.  The interiorPoint algorithm in GEOS 3.7 relied on overlay operations, which made it slow and sensitive to geometry invalidity. GEOS 3.8 included a major improvement which greatly improved the performance, and made the algorithm more robust.  Note that the dataset being used contains some (mildly) invalid geometries towards its end, which produces an error in GEOS 3.7 if the entire dataset is run.  The  --alimit 3800 option limits the number of geometries processed to avoid this issue.

GEOS 3.7

geosop -a ne_10m_admin_1_states_provinces.wkb --alimit 3800 -t interiorPoint
Ran 3,800 operations ( 1,154,703 vertices)  -- 2,452,540 usec  (GEOS 3.7)

GEOS 3.8

geosop -a ne_10m_admin_1_states_provinces.wkb --alimit 3800 -t interiorPoint
Ran 3,800 operations ( 1,154,703 vertices)  -- 35,665 usec  (GEOS 3.8)

The dramatic improvement in interiorPoint performance is clearly visible.

Geoprocessing

The raison d'etre of GEOS is to carry out geoprocessing.  Most users will likely do this using one of the numerous languages, applications and databases that include GEOS.  But it is still useful, convenient, and perhaps more performant to be able to process geometric data natively in GEOS.  

The design of geosop enables more complex geoprocessing via chaining operations together using shell pipes.  For example, here is a process which creates a Voronoi diagram of some points located in the British Isles, and then clips the Voronoi polygons to the outline of the islands.  This also shows a few more capabilities of geosop:
  • input can be supplied as WKT (or WKB) geometry literals on the command-line
  • input can be read from the standard input (here as WKB)
  • the output data is sent to the standard output, so can be directed into a file

geosop 
  -a "MULTIPOINT ((1342 1227.5), (1312 1246.5), (1330 1270), (1316.5 1306.5), (1301 1323), (1298.5 1356), (1247.5 1288.5), (1237 1260))" 
  -f wkb voronoi 
| geosop -a stdin.wkb -b uk.wkt -f wkt intersection 
> uk-vor.wkt





Tuesday, 12 January 2021

Introducing geosop - a CLI for GEOS

The GEOS geometry API is used by many, many projects to do their heavy geometric lifting.  But GEOS has always had a bit of a PR problem.  Most of those projects provide a more accessible interface to perform GEOS operations.  Some offer a high-level language like Python, R, or SQL (and these typically come with a REPL to make things even easier).  Or there are GUIs like QGIS, or a command-line interface (CLI) like GDAL/OGR.  

But you can't do much with GEOS on its own.  It is a C/C++ library, and to use it you need to break out the compiler and start cutting code. It's essentially "headless".  Even for GEOS developers, writing an entire C program just to try out a geometry operation on a dataset is painful, to say the least.  

There is the GEOS XMLTester utility, of course.  It processes carefully structured XML files, but that is hardly convenient.  (And in case this brings to mind a snide comment like "2001 called and wants its file format back", XML actually works very well in JTS and GEOS as a portable and readable format for geometry tests.  But I digress.)

JTS (on which GEOS is based) has the TestBuilder GUI, which works well for testing out and visualizing the results of JTS operations.  JTS also has a CLI called JtsOp. Writing a GUI for GEOS would be a tall order.  But a command-line interface (CLI) is much simpler to code, and has significant utility.  In fact there is an interesting project called geos-cli that provides a simple CLI for GEOS.  But it's ideal to have the CLI code as part of the GEOS project, since it ensures being up-to-date with the library code, and makes it easy to add operations to test new functionality.

This need has led to the development of geosop.  It is a CLI for GEOS which performs a range of useful tasks:

  • Run GEOS operations to confirm their semantics
  • Test the behaviour of GEOS on specific geometric data
  • Time the performance of operation execution
  • Profile GEOS code to find hotspots
  • Check memory usage characteristics of GEOS code
  • Generate spatial data for use in visualization or testing
  • Convert datasets between WKT and WKB

    geosop has the following capabilities:

    • Read WKT and WKB from files, standard input, or command-line literals
    • Execute GEOS operations on the list(s) of input geometries.  Binary operations are executed on every pair of input geometries (i.e. the cross join aka Cartesian product)
    • Output geometry results in WKT or WKB (or text, for non-geometric results)
    • Display the execution time of data input and operations
    • Display a full log of the command processing
    Here's a look at how it works.  

    geosop -h gives a list of the options and operations available:

    geosop - GEOS v. 3.10.0dev
    Executes GEOS geometry operations
    Usage:
      geosop [OPTION...] opName opArg

      -a arg               source for A geometries (WKT, WKB, file, stdin,
                           stdin.wkb)
      -b arg               source for B geometries (WKT, WKB, file, stdin,
                           stdin.wkb)
          --alimit arg     Limit number of A geometries read
      -c, --collect        Collect input into single geometry
      -e, --explode        Explode result
      -f, --format arg     Output format
      -h, --help           Print help
      -p, --precision arg  Sets number of decimal places in WKT output
      -r, --repeat arg     Repeat operation N times
      -t, --time           Print execution time
      -v, --verbose        Verbose output

    Operations:
      area A - computes area for geometry A
      boundary A - computes boundary for geometry A
      buffer A N - cmputes the buffer of geometry A
      centroid A - computes centroid for geometry A
      contains A B - tests if geometry A contains geometry B
      containsPrep A B - tests if geometry A contains geometry B, using PreparedGeometry
      containsProperlyPrep A B - tests if geometry A properly contains geometry B using PreparedGeometry
      convexHull A - computes convexHull for geometry A
      copy A - computes copy for geometry A
      covers A B - tests if geometry A covers geometry B
      coversPrep A B - tests if geometry A covers geometry B using PreparedGeometry
      difference A B - computes difference of geometry A from B
      differenceSR A B - computes difference of geometry A from B rounding to a precision scale factor
      distance A B - computes distance between geometry A and B
      distancePrep A B - computes distance between geometry A and B using PreparedGeometry
      envelope A - computes envelope for geometry A
      interiorPoint A - computes interiorPoint for geometry A
      intersection A B - computes intersection of geometry A and B
      intersectionSR A B - computes intersection of geometry A and B
      intersects A B - tests if geometry A and B intersect
      intersectsPrep A B - tests if geometry A intersects B using PreparedGeometry
      isValid A - tests if geometry A is valid
      length A - computes length for geometry A
      makeValid A - computes makeValid for geometry A
      nearestPoints A B - computes nearest points of geometry A and B
      nearestPointsPrep A B - computes nearest points of geometry A and B using PreparedGeometry
      polygonize A - computes polygonize for geometry A
      reducePrecision A N - reduces precision of geometry to a precision scale factor
      relate A B - computes DE-9IM matrix for geometry A and B
      symDifference A B - computes symmetric difference of geometry A and B
      symDifferenceSR A B - computes symmetric difference of geometry A and B
      unaryUnion A - computes unaryUnion for geometry A
      union A B - computes union of geometry A and B
      unionSR A B - computes union of geometry A and B

    Most GEOS operations are provided, and the list will be completed soon.

    Some examples of using geosop are below.
    • Compute the interior point for each country in a world polygons dataset, and output them as WKT:
    geosop -a world.wkt -f wkt interiorPoint



    • Determine the time required to compute buffers of distance 1 for each country in the world:
    geosop -a world.wkt --time buffer 1

    • Compute the union of all countries in Europe:
    geosop -a europe.wkb --collect -f wkb unaryUnion  


     The README gives many more examples of how to use the various command-line options.  In a subsequent post I'll give some demonstrations of using geosop for various tasks including GEOS testing, performance tuning, and geoprocessing.

    Future Work

    There's potential to make geosop even more useful: 
    • GeoJSON is a popular format for use in spatial toolchains.  Adding GeoJSON reading and writing would allow geosop to be more widely used for geo-processing.  
    • Adding SVG output would provide a way to visualize the results of GEOS operations.   
    • Improve support for performance testing by adding operations to generate various kinds of standard test datasets (such as point grids, polygon grids, and random point fields).  
    And of course, work will be ongoing to keep geosop up-to-date as new operations and functionality are added to GEOS.