Thursday 21 February 2019

Better/Faster Interior Point for Polygons - now in GEOS

As a gentle introduction to GEOS development I took on the task of porting the recent improvements to the JTS InteriorPointArea algorithm.   This is a fairly small chunk o'code, but it touches most of the aspects of GEOS development process: build chain, debugging, unit tests, and infra (source control and build farm).  It also has the advantage that the JTS code was still hot off the keyboard and (almost) unreleased, so it was a chance to see if any cross-fertilization would blossom from working on the two projects jointly.

Skipping lightly over the details of my (somewhat painful) GEOS learning curve, I'm delighted to say that the code has landed in master and is basking in the green glow from the build bot badges.

And now the whole point of the exercise: how much better is the new code in GEOS? 

It exhibits the expected improvement in robustness, since a GEOS test which actually depended on a thrown TopologyException (due to the now-removed call to Geometry::intersection() ) had to be modified to handle a successful return.

Most importantly, there is a dramatic improvement in performance.  Here's some numbers from running the GEOS InteriorPointArea performance test:

Data size Time Time
OLD
Improvement Time
Centroid
100 .8 ms 86 ms x 100 1 ms
1000 6 ms 144 ms x 24 12 ms
10,000 55 ms 672 ms x 12 107 ms
100,000 508 ms 6,714 ms x 13 961 ms
1,000,000 5,143 ms 73,737 ms x 14 11,162 ms

Some observations:
  • The performance test uses synthetic data (sine stars).  Real-world datasets are likely to show significantly better times ( 80x better in some cases, based on JTS timings)
  • The largest improvement is for small geometries, which is nice since these are more common
  • InteriorPoint is now actually faster than the Centroid computation.  This is also good news, since users were often tempted to try and use centroids instead of interior points, despite the known issues.
Future Work

Running the identical performance test in JTS is still faster, by roughly 5x.  This may be due to the advantages of JIT compilation and memory management.  It may also indicate there is room for improvement by making GEOS smarter about data handling.

No comments: