The other day this badge popped up on my GitHub profile:
Sure enough, there is JTS in the list of dependencies provided by the Jet Propulsion Laboratory (JPL).
So take that, Elon - JTS got there first! With a lot of help from NASA, of course.
Because the shortest distance between two thoughts is a straight line
The other day this badge popped up on my GitHub profile:
Sure enough, there is JTS in the list of dependencies provided by the Jet Propulsion Laboratory (JPL).
So take that, Elon - JTS got there first! With a lot of help from NASA, of course.
A recent blog post by Elephant Tamer gives a critical appraisal of the improvements to overlay processing shipped in PostGIS 3.1 with GEOS 3.9. The author is disappointed that PostGIS still reports errors when overlay is used on invalid geometry. However, this is based on a misunderstanding of the technology.
GEOS 3.9 includes OverlayNG, ported from the JTS Topology Suite). It brings a major advance in overlay robustness, along with other improvements (described here and here). Previously, robustness limitations in the overlay algorithm could sometimes cause errors even for inputs which were topologically valid. This was doubly problematic because there was no fully effective way to process the input geometries to avoid the errors. Now, OverlayNG solves this problem completely. Valid inputs will always produce a valid and essentially correct(*) output.
(*) "Essentially" correct, because in order to achieve full robustness a snapping heuristic may be applied to the input geometry. However, this is done with a very fine tolerance, so should not appreciably alter the output from the theoretically correct value.
But for invalid inputs, OverlayNG will still report errors. The reason is that there is a wide variety of gruesome ways in which geometry can be invalid. Automated handling of invalidity would involve expensive extra processing, and also require making assumptions about what area a corrupt geometry is intended to represent. Rather than silently repairing invalid geometry and returning potentially incorrect results, the design decision is to report this situation as an error.
In fact, OverlayNG is able to handle "mildly" invalid polygons, as described in this post. This covers situations which are technically invalid according to the OGC SFS specification, but which still have well-defined topology. This includes self-touching rings (sometimes called "inverted polygons" or "exverted holes"), and zero-width gores and spikes.
Taking a detailed look at the data used in the blog post, we can see these improvements at work. The dataset is the ecology polygons obtained from the GDOS WFS server. This contains 7662 geometries, of which 10 are invalid. Using the old overlay algorithm, 9 of these invalid polygons cause TopologyException errors. Using OverlayNG, only 4 of them cause errors.
The polygons that can now be processed successfully are typical "OGC-invalid" situations, which do not materially affect the polygonal topology. These include self-touching rings with pinch points:
and zero-width gores:
Of the cases that still cause errors, two are classic small bow-tie errors:
And two are wildly invalid self-crossing rings:
The last two are good examples of geometry which is so invalid that it is impossible to unambiguously decide what area is represented (although ST_MakeValid will happily grind them into something that is technically valid).
Ultimately it is the user's responsibility to ensure that geometries to be processed by overlay (and many other PostGIS functions) have valid topology (as reported by ST_IsValid). Ideally this is done by correcting the data at source. But it can also be done a posteriori in the database itself, by either the ST_MakeValid function, or the well-known buffer(0) trick. (Which to use is a topic for another blog post...)
One improvement that could be made is to check for input validity when OverlayNG throws an error. Then PostGIS can report definitively that an overlay error is caused by invalid input. If there is an overlay error that is not caused by invalidity, the PostGIS team wants to hear about it!
And perhaps there is a case to be made for repairing invalid geometry automatically, even if the repair is suspect. Possibly this could be invoked via a flag parameter on the overlay functions. More research is required - feedback is welcome!
In a previous post I announced the new geosop command-line interface (CLI) for the GEOS geometry API. This post provides examples of using geosop for various tasks. (Refer to the README for more information about the various options used.)
The GEOS geometry API is used by many, many projects to do their heavy geometric lifting. But GEOS has always had a bit of a PR problem. Most of those projects provide a more accessible interface to perform GEOS operations. Some offer a high-level language like Python, R, or SQL (and these typically come with a REPL to make things even easier). Or there are GUIs like QGIS, or a command-line interface (CLI) like GDAL/OGR.
But you can't do much with GEOS on its own. It is a C/C++ library, and to use it you need to break out the compiler and start cutting code. It's essentially "headless". Even for GEOS developers, writing an entire C program just to try out a geometry operation on a dataset is painful, to say the least.
There is the GEOS XMLTester utility, of course. It processes carefully structured XML files, but that is hardly convenient. (And in case this brings to mind a snide comment like "2001 called and wants its file format back", XML actually works very well in JTS and GEOS as a portable and readable format for geometry tests. But I digress.)
JTS (on which GEOS is based) has the TestBuilder GUI, which works well for testing out and visualizing the results of JTS operations. JTS also has a CLI called JtsOp. Writing a GUI for GEOS would be a tall order. But a command-line interface (CLI) is much simpler to code, and has significant utility. In fact there is an interesting project called geos-cli that provides a simple CLI for GEOS. But it's ideal to have the CLI code as part of the GEOS project, since it ensures being up-to-date with the library code, and makes it easy to add operations to test new functionality.
This need has led to the development of geosop. It is a CLI for GEOS which performs a range of useful tasks:
geosop -a world.wkt -f wkt interiorPoint
geosop -a world.wkt --time buffer 1
geosop -a europe.wkb --collect -f wkb unaryUnion
The README gives many more examples of how to use the various command-line options. In a subsequent post I'll give some demonstrations of using geosop for various tasks including GEOS testing, performance tuning, and geoprocessing.
The OGC Simple Features specification implemented by JTS has strict rules about what constitutes a valid polygonal geometry. These include:
This problem has limped along for many years now, never being quite enough of a pain point to motivate the effort needed to find a fix (or to fund one!). And to be honest, the buffer code is some of the most complicated and delicate in JTS, and I was concerned about wading into it to add what seemed poised to be a fiddly correction.
But recently there has been renewed interest in providing a Make-Valid capability for JTS. This inspired me to revisit the usage of buffer(0), and think more deeply about ring orientation and its role in determining valid polygonal topology. And this led to discovering a surprisingly simple solution for the buffer issue.
The fix is to use an orientation test which takes into account the entire ring. This is provided by the Signed-Area Orientation test, implemented in Orientation.isCCWArea using the Shoelace Formula. This effectively determines orientation based on the largest area enclosed by the ring. This corresponds more closely to user expectation based on visual assessment. It also minimizes the change in area and (usually) extent.
And indeed it works:
It fixes the simplification issue nicely:
The fix consists of about 4 lines of actual code. To paraphrase a great orator, never in the history of JTS has so much benefit been given to so many by so few lines of code. Now buffer(0) can be recommended unreservedly as an effective, performant way to fix polygonal geometry. And all those helpful documentation pages can drop any qualifications they might have.
As usual, this fix will soon show up in GEOS, and from there in PostGIS and other downstream projects.
This isn't the end of the story. There are times when the effect of buffer(0) is not what is desired for fixing polygon topology. This is discussed nicely in this blog post. The ongoing research into Make Valid will explore alternatives and how to provide an API for them.
Now that OverlayNG has landed on JTS master, it is getting attention from downstream projects interested in porting it or using it. One of the JTS ports is the Net Topology Suite (NTS) project, and it is very proactive about tracking the JTS codebase. Soon after the OverlayNG commit an NTS developer noticed an issue while running the performance tests in JTS: the InteriorPointInAreaPerfTest was now causing a StackOverflowError. This turned out to be caused by the change to GeometryPrecisionReducer to use OverlayNG with Snap-Rounding to perform robust precision reduction of polygons. Further investigation revealed that failure was occurring while querying the KdTree used in the HotPixelIndex in the SnapRoundingNoder. Moreover, in addition to the outright failure, even when queries did succeed they were taking an excessively long time for large inputs.
The reason for using a K-D tree as the search structure for HotPixels is that it supports two kinds of queries with a single structure:
The JTS KdTree supports both of these queries more efficiently than QuadTree, which is the other dynamic index in JTS. However, K-D trees are somewhat notorious for becoming unbalanced if inserted points are coherent (i.e. contain monotonic runs in either ordinate dimension). This is exactly the situation which occurs in large, relatively smooth polygons - which is the test data used in InteriorPointInAreaPerfTest. An unbalanced tree is deeper than it would be if perfectly balanced. In extreme cases this leads to trees of very large depth relative to their size. This slows down query performance and, because the KdTree query implementation uses recursion, can also lead to stack overflow.
A perfectly balanced K-D tree, with depth = logN. In the worst case an unbalanced tree has only one node at each level (depth = N).
I considered a few alternatives to overcome this major blocker. One was to use a QuadTree, but as mentioned this would reduce performance. There are schemes to load a K-D tree in a balanced way, but they seemed complex and potentially non-performant.
It's always great when people who file issues are also able to provide code to fix the problem. And happily the NTS developer submitted a pull request with a very nice solution. He observed that while the K-D tree was being built incrementally, in fact the inserted points were all available beforehand. He proposed randomizing them before insertion, using the elegantly simple Fisher-Yates shuffle. This worked extremely well, providing a huge performance boost and eliminated the stack overflow error. Here's the relative performance times:
Num Pts | Randomized | In Order |
10K | 126 ms | 341 ms |
20K | 172 ms | 1924 ms |
50K | 417 ms | 12.3 s |
100K | 1030 ms | 59 s |
200K | 1729 ms | 240 s |
500K | 5354 ms | Overflow |
Once the solution was merged into JTS, my colleague Paul Ramsey quickly ported it to GEOS, so that PostGIS and all the other downstream clients of GEOS would not encounter this issue.
It's surprising to me that this performance issue hasn't shown up in the other two JTS uses of KdTree: SnappingNoder and ConstrainedDelaunayTriangulator. More investigation required!
I'm happy to say that OverlayNG has landed on the JTS master branch! This is the culmination of over a year of work, and even more years of planning and design. It will appear in the upcoming JTS 1.18 release this fall.
As described in previous posts OverlayNG brings substantial improvements to overlay computation in JTS:
All of these improvements are encapsulated in the new OverlayNGRobust class. It provides fully robust execution with excellent performance, via automated fallback through a series of increasingly robust noding strategies. This should solve a host of overlay issues reported over the years in various downstream projects such as GEOS, PostGIS, Shapely, R-sf and QGIS. (Many of these cases have been captured as XML tests for overlay robustness, to ensure that they are handled by the new code).
Initially the idea was to use OverlayNG as an opportunity to simplify and improve the semantics of overlay output, including:
In the end we decided that this change would have too much impact on existing tests and downstream code, so the default semantics are the same as the previous overlay implementation. However, the simplified semantics are available as a "strict" mode for overlay operations.
At the moment OverlayNG is not wired in to the JTS Geometry overlay operations, but is provided as a separate API. The plan is to provide a runtime switch to allow choosing which overlay code is used. This will allow testing in-place while avoiding potential impact on production systems. GeometryPrecisionReducer has been changed to use OverlayNG with Snap-Rounding, to provide more effective precision reduction.
GEOS has been tracking the OverlayNG codebase closely for a while now, which has been valuable to finalize the overlay semantics, and for finding and fixing issues. Having the code in JTS master gives the green light for downstream projects to do their own testing as well. There have been a few issues reported:
After years of designing and developing improvements to overlay, it's great to see OverlayNG make its debut. Hopefully this will be the end of issues involving the dreaded TopologyException. And the new design will make it easier to build other kinds of overlay operations, including things like fast line clipping, split by line, coverage overlay... so stay tuned!