In a previous post I described how the JTS Topology Suite operation IsSimpleOp has been completely rewritten to reduce code dependencies, improve performance, and provide a simpler, more understandable implementation. The post points out that the IsValidOp implementation would benefit from the same treatment. This work has now been carried out, with similar benefits achieved.
The original IsValidOp code used the GeometryGraph framework to represent the topology of geometries. This code reuse reduced development effort, but it carried some serious drawbacks:
- The GeometryGraph structure was used for many JTS operations, including overlay, buffer, and spatial predicate evaluation. This made the code complex, hard to understand, and difficult to maintain and enhance
- The GeometryGraph structure computes the full topology graph of the input geometry, in order to support constructive operations such as overlay and buffer. This makes it slower and more subject to robustness problems. Non-constructive operations such as IsValidOp and IsSimpleOp can compute topology "on-the-fly", which allows short-circuiting processing as soon as an invalidity is found.
- Coordinates must contain valid numbers
- LineStrings must not be zero-length
- LinearRings must be simple (have no self-intersections)
- Rings must be simple (have no self-intersections)
- Rings must not cross, and can touch only at discrete points
- Holes must lie inside their shell
- Holes must not lie inside another hole of their polygon
- The interiors of polygons must be connected
- In a MultiPolygon, the element interiors must not intersect (so polygons cannot overlap)
- In a MultiPolygon, the element boundaries may intersect only at a finite number of points
Validity Algorithm for Polygonal Geometry
- It supports short-circuiting when an invalid situation is found, to improve performance
- Spatial indexing is used in more places
- The code is simpler and more modular. This makes it easier to understand, and easier to adapt to new requirements (e.g. such as the ability to validate self-touching rings described below)
1. Check Ring Intersections
The first check determines if the rings in the geometry have any invalid intersections. This includes the cases of a ring crossing itself or another ring, a ring intersecting itself or another ring in a line segment (a collinear intersection), and a ring self-touching (which is invalid in the OGC polygon model). This check can use the robust and performant machinery in the JTS noding package. If an invalid intersection is found, the algorithm can return that result immediately.
This check is an essential precursor to all the remaining checks. If a geometry has no invalid intersections this confirms that its rings do not self-cross or partially overlap. This means that the results of orientation and point-in-polygon computations are reliable. It also indicates that the rings are properly nested (although it remains to be determined if the nesting of shells and holes is valid).
Intersection search is the most computationally expensive phase of the algorithm. Because of this, information about the locations where rings touch is saved for use in the final phase of checking connected interiors.
2. Check Shell and Hole nesting
- A hole must lie inside its shell
- A hole may not lie inside another hole.
- In MultiPolygons, a shell may not lie inside another shell, unless the inner shell lies in a hole of the outer one.
3. Check Connected Interior
The final check is that the interior of each polygon is connected. In a polygon with no self-touching rings, there are two ways that interior disconnection can happen:
- a chain of one or more touching holes touches the shell at both ends, thus splitting the polygon in two
- a chain of of two or more touching holes forms a loop, thus enclosing a portion of the polygon interior
Validity with Self-touching Rings
As expected, the new codebase provides better performance, due to simpler code and the ability to short-circuit when an invalidity is found. Here's some performance comparisons for various datasets:
|Data||Size||New (ms)||Old (ms)||Improvement|
|world||244 Polygons, 366K vertices||340||1250||3.7 x|
|invalid-polys||640 Polygons, 455K vertices||126||334||2.6 x|
|valid-polys||640 Polygons, 455K vertices||244||487||2 x|
|australia||1 MultiPolygon, 1,222K vertices||1321||69026||52 x|
- world is the standard dataset of world country outlines
- invalid-polys is a dataset of all invalid polygons. valid-polys is the same dataset processed by GeometryFixer. The timings show the effect of short-circuiting invalidity to improve performance.
- australia is a geometry of the Australian coastline and near-shore islands, with 6,697 polygon elements. The dramatic difference in performance reflects the addition of a spatial index for checking nested shells.
The improved IsValidOp will appear in JTS version 1.19. It has already been ported to GEOS, providing similar performance improvements. And since GEOS backs the PostGIS ST_IsValid function, that will become faster as well.
There is one remaining use of GeometryGraph in JTS for a non-constructive operation: the RelateOp class, used to compute all topological spatial predicates. Converting it should provide the same benefits of simpler code and improved performance. It should also improve robustness, and allow more lenient handling of invalid inputs. Watch this space!