Tuesday, 12 January 2021

Introducing geosop - a CLI for GEOS

The GEOS geometry API is used by many, many projects to do their heavy geometric lifting.  But GEOS has always had a bit of a PR problem.  Most of those projects provide a more accessible interface to perform GEOS operations.  Some offer a high-level language like Python, R, or SQL (and these typically come with a REPL to make things even easier).  Or there are GUIs like QGIS, or a command-line interface (CLI) like GDAL/OGR.  

But you can't do much with GEOS on its own.  It is a C/C++ library, and to use it you need to break out the compiler and start cutting code. It's essentially "headless".  Even for GEOS developers, writing an entire C program just to try out a geometry operation on a dataset is painful, to say the least.  

There is the GEOS XMLTester utility, of course.  It processes carefully structured XML files, but that is hardly convenient.  (And in case this brings to mind a snide comment like "2001 called and wants its file format back", XML actually works very well in JTS and GEOS as a portable and readable format for geometry tests.  But I digress.)

JTS (on which GEOS is based) has the TestBuilder GUI, which works well for testing out and visualizing the results of JTS operations.  JTS also has a CLI called JtsOp. Writing a GUI for GEOS would be a tall order.  But a command-line interface (CLI) is much simpler to code, and has significant utility.  In fact there is an interesting project called geos-cli that provides a simple CLI for GEOS.  But it's ideal to have the CLI code as part of the GEOS project, since it ensures being up-to-date with the library code, and makes it easy to add operations to test new functionality.

This need has led to the development of geosop.  It is a CLI for GEOS which performs a range of useful tasks:

  • Run GEOS operations to confirm their semantics
  • Test the behaviour of GEOS on specific geometric data
  • Time the performance of operation execution
  • Profile GEOS code to find hotspots
  • Check memory usage characteristics of GEOS code
  • Generate spatial data for use in visualization or testing
  • Convert datasets between WKT and WKB

    geosop has the following capabilities:

    • Read WKT and WKB from files, standard input, or command-line literals
    • Execute GEOS operations on the list(s) of input geometries.  Binary operations are executed on every pair of input geometries (i.e. the cross join aka Cartesian product)
    • Output geometry results in WKT or WKB (or text, for non-geometric results)
    • Display the execution time of data input and operations
    • Display a full log of the command processing
    Here's a look at how it works.  

    geosop -h gives a list of the options and operations available:

    geosop - GEOS v. 3.10.0dev
    Executes GEOS geometry operations
    Usage:
      geosop [OPTION...] opName opArg

      -a arg               source for A geometries (WKT, WKB, file, stdin,
                           stdin.wkb)
      -b arg               source for B geometries (WKT, WKB, file, stdin,
                           stdin.wkb)
          --alimit arg     Limit number of A geometries read
      -c, --collect        Collect input into single geometry
      -e, --explode        Explode result
      -f, --format arg     Output format
      -h, --help           Print help
      -p, --precision arg  Sets number of decimal places in WKT output
      -r, --repeat arg     Repeat operation N times
      -t, --time           Print execution time
      -v, --verbose        Verbose output

    Operations:
      area A - computes area for geometry A
      boundary A - computes boundary for geometry A
      buffer A N - cmputes the buffer of geometry A
      centroid A - computes centroid for geometry A
      contains A B - tests if geometry A contains geometry B
      containsPrep A B - tests if geometry A contains geometry B, using PreparedGeometry
      containsProperlyPrep A B - tests if geometry A properly contains geometry B using PreparedGeometry
      convexHull A - computes convexHull for geometry A
      copy A - computes copy for geometry A
      covers A B - tests if geometry A covers geometry B
      coversPrep A B - tests if geometry A covers geometry B using PreparedGeometry
      difference A B - computes difference of geometry A from B
      differenceSR A B - computes difference of geometry A from B rounding to a precision scale factor
      distance A B - computes distance between geometry A and B
      distancePrep A B - computes distance between geometry A and B using PreparedGeometry
      envelope A - computes envelope for geometry A
      interiorPoint A - computes interiorPoint for geometry A
      intersection A B - computes intersection of geometry A and B
      intersectionSR A B - computes intersection of geometry A and B
      intersects A B - tests if geometry A and B intersect
      intersectsPrep A B - tests if geometry A intersects B using PreparedGeometry
      isValid A - tests if geometry A is valid
      length A - computes length for geometry A
      makeValid A - computes makeValid for geometry A
      nearestPoints A B - computes nearest points of geometry A and B
      nearestPointsPrep A B - computes nearest points of geometry A and B using PreparedGeometry
      polygonize A - computes polygonize for geometry A
      reducePrecision A N - reduces precision of geometry to a precision scale factor
      relate A B - computes DE-9IM matrix for geometry A and B
      symDifference A B - computes symmetric difference of geometry A and B
      symDifferenceSR A B - computes symmetric difference of geometry A and B
      unaryUnion A - computes unaryUnion for geometry A
      union A B - computes union of geometry A and B
      unionSR A B - computes union of geometry A and B

    Most GEOS operations are provided, and the list will be completed soon.

    Some examples of using geosop are below.
    • Compute the interior point for each country in a world polygons dataset, and output them as WKT:
    geosop -a world.wkt -f wkt interiorPoint



    • Determine the time required to compute buffers of distance 1 for each country in the world:
    geosop -a world.wkt --time buffer 1

    • Compute the union of all countries in Europe:
    geosop -a europe.wkb --collect -f wkb unaryUnion  


     The README gives many more examples of how to use the various command-line options.  In a subsequent post I'll give some demonstrations of using geosop for various tasks including GEOS testing, performance tuning, and geoprocessing.

    Future Work

    There's potential to make geosop even more useful: 
    • GeoJSON is a popular format for use in spatial toolchains.  Adding GeoJSON reading and writing would allow geosop to be more widely used for geo-processing.  
    • Adding SVG output would provide a way to visualize the results of GEOS operations.   
    • Improve support for performance testing by adding operations to generate various kinds of standard test datasets (such as point grids, polygon grids, and random point fields).  
    And of course, work will be ongoing to keep geosop up-to-date as new operations and functionality are added to GEOS.


    No comments: