Wednesday 13 January 2021

Using the GEOS geosop CLI

In a previous post I announced the new geosop command-line interface (CLI) for the GEOS geometry API.  This post provides examples of using geosop for various tasks. (Refer to the README for more information about the various options used.)

API Testing

geosop makes testing the GEOS library much easier.  Previously, testing the behaviour of the API usually required breaking out the C compiler (and updating autotools and cmake build files, and deciding whether to commit the new code for later use or throw it away, etc, etc).  Now testing is often just a matter of invoking a geosop operation on appropriate data, or at worst adding a few lines of code to the exiting framework.

For example, there is a long-standing issue with how GEOS handles number formatting in WKT output.  There are recent bug reports about this in GeoSwift and the Julia LibGEOS.  geosop makes it easy to run the test cases and see the less-than-desirable output:

geosop -a "POINT (654321.12 0.12)" -f wkt
POINT (654321.1199999999953434 0.1200000000000000)

geosop -a "POINT (-0.4225977234 46.3406448)" -f wkt
POINT (-0.4225977234000000 46.3406447999999997)

There's also an issue with precision handling.  To test this we added a --precision parameter to geosop. (This is the kind of rapid development enabled by having the CLI codebase co-resident with the API.)

geosop -a "POINT (654321.126 0.126)" --precision 2 -f wkt
POINT (6.5e+05 0.13)

Again we see undesirable behaviour. Using scientific notation for small numbers is unnecessary and difficult to read. And the precision value is determining the number of significant digits, not the number of decimal places as intended by the GEOS WKTWriter.setRoundingPrecision API.

These were all caused by using standard C/C++ numeric formatting, which is surprisingly limited and non-useful. After some fine work by Paul Ramsey to integrate the much better Ryu library, GEOS now has WKT output that is sensible and handles precision in a useful way.

By default, WKTWriter now nicely round-trips WKT text:

geosop -a "POINT (654321.12 0.12)" -f wkt
POINT (654321.12 0.12)

geosop -a "POINT (-0.4225977234 46.3406448)" -f wkt
POINT (-0.4225977234 46.3406448)

If WKTWriter.setRoundingPrecision or GEOSWKTWriter_setRoundingPrecision is called, the precision value applies to the decimal part of the number: 

geosop -a "POINT (654321.1234567 0.126)" --precision 0 -f wkt
POINT (654321 0)

geosop -a "POINT (654321.1234567 0.126)" --precision 2 -f wkt
POINT (654321.12 0.13)

geosop -a "POINT (654321.1234567 0.126)" --precision 4 -f wkt
POINT (654321.1235 0.126)

geosop -a "POINT (654321.1234567 0.126)" --precision 6 -f wkt
POINT (654321.123457 0.126)

Performance Testing

A key use case for geosop is to provide easy performance testing.  Performance of geometric operations is highly data-dependent.  It's useful to be able to run operations over different datasets and measure performance.  This allows detecting performance hotspots, and confirming the efficiency of algorithms.

As a simple example of performance testing, for many years GEOS has provided optimized spatial predicates using the concept of a prepared geometry.  Prepared geometry uses cached spatial indexes to dramatically improve performance for repeated spatial operations against a geometry.  Here is a performance comparison of the intersects spatial predicate in its basic and prepared form. 

geosop -a world.wkt -b world.wkt -t intersects
Ran 59,536 intersects ops ( 179,072,088 vertices)
  -- 16,726,188 usec    (GEOS 3.10.0dev)

geosop -a world.wkt -b world.wkt -t intersectsPrep
Ran 59,536 intersectsPrep ops ( 179,072,088 vertices)
  -- 1,278,348 usec    (GEOS 3.10.0dev)

The example of testing the world countries dataset against itself is artificial, but it shows off the dramatic 16x performance boost provided by using a prepared operation. 

Another interesting use is longitudinal testing of GEOS performance across different versions of the library.  For instance, here's a comparison of the performance of interiorPoint between GEOS 3.7 and 3.8.  The interiorPoint algorithm in GEOS 3.7 relied on overlay operations, which made it slow and sensitive to geometry invalidity. GEOS 3.8 included a major improvement which greatly improved the performance, and made the algorithm more robust.  Note that the dataset being used contains some (mildly) invalid geometries towards its end, which produces an error in GEOS 3.7 if the entire dataset is run.  The  --alimit 3800 option limits the number of geometries processed to avoid this issue.

GEOS 3.7

geosop -a ne_10m_admin_1_states_provinces.wkb --alimit 3800 -t interiorPoint
Ran 3,800 operations ( 1,154,703 vertices)  -- 2,452,540 usec  (GEOS 3.7)

GEOS 3.8

geosop -a ne_10m_admin_1_states_provinces.wkb --alimit 3800 -t interiorPoint
Ran 3,800 operations ( 1,154,703 vertices)  -- 35,665 usec  (GEOS 3.8)

The dramatic improvement in interiorPoint performance is clearly visible.

Geoprocessing

The raison d'etre of GEOS is to carry out geoprocessing.  Most users will likely do this using one of the numerous languages, applications and databases that include GEOS.  But it is still useful, convenient, and perhaps more performant to be able to process geometric data natively in GEOS.  

The design of geosop enables more complex geoprocessing via chaining operations together using shell pipes.  For example, here is a process which creates a Voronoi diagram of some points located in the British Isles, and then clips the Voronoi polygons to the outline of the islands.  This also shows a few more capabilities of geosop:
  • input can be supplied as WKT (or WKB) geometry literals on the command-line
  • input can be read from the standard input (here as WKB)
  • the output data is sent to the standard output, so can be directed into a file

geosop 
  -a "MULTIPOINT ((1342 1227.5), (1312 1246.5), (1330 1270), (1316.5 1306.5), (1301 1323), (1298.5 1356), (1247.5 1288.5), (1237 1260))" 
  -f wkb voronoi 
| geosop -a stdin.wkb -b uk.wkt -f wkt intersection 
> uk-vor.wkt





No comments: