A humble workhorse of geospatial processing is the ability to compute a point which is guaranteed to lie in the interior of a polygon. In the OGC
Simple Features for SQL specification (and hence in
PostGIS) this is known as
ST_PointOnSurface. In JTS/GEOS it is called
getInteriorPoint, for historical reasons [1].
Interior points for country boundaries
There are some important use cases for this capability:
- Constructing a "proxy point" for a polygon to use in drill-down spatial query. This has all kinds of applications:
- Cartographic rendering including:
- Creating leader lines
- Placing labels for polygons (for which it is a quick solution but not necessarily a quality one. More on this in a later post)
There is a
variety of ways that have been proposed to compute interior points: triangulation, sampling via random or grid points, medial axis transform, etc. These all involve trade-offs between location quality and performance [2]. JTS uses an approach which optimizes performance, by using a simple scan-line algorithm:
JTS Scan-Line Interior Point Algorithm
- Determine a Y-ordinate which is distinct to every polygon vertex Y-ordinate and close to the centre of the vertical extent
- Draw a scan line across the polygon and determine the segments of intersection
- Choose the interior point as the midpoint of the widest intersection segment
Locating a polygon interior point along a scan-line
The current code has been in the JTS codebase since the release of the very first version back in 2001. It is elegantly simple, but is quite non-optimal, since it uses the overlay
intersection algorithm. This is overkill for the computation of the scan-line intersection segments. It also has a couple of serious drawbacks: slow performance, and the requirement that the input polygon be valid. These are not just theoretical concerns. They have been
noticed in the user community, and have caused client projects to have to resort to awkward
workarounds. It's even documented as a
known limitation in PostGIS.
Thanks to
Crunchy Data recognizing the
importance of geospatial, we've been able to look into fixing this. It turns out that a relatively simple change makes a big improvement. The scan-line intersections can be computed via a linear-time scan of the polygon edges. This works even for invalid input (except for a few pathological situations).
Interior points of invalid polygons
(LH invalid polygon shows suboptimal point placement)
Best of all, it's much faster - providing performance comparable to the (less useful) centroid computation. The performance results speak for themselves:
Dataset |
# polys |
# points |
Time |
Prev
time |
Improvement |
World countries |
244
|
366,951 |
25 ms |
686 ms |
x 27
|
Land Cover |
64,090 |
366,951 |
78 ms |
6.35 s |
x 81
|
This has been
committed to JTS. It will be ported to
GEOS soon, and from there should show up in PostGIS (and other downstream projects like
QGIS,
Shapely,
GDAL, etc etc).
More Ideas
Some further improvements that could be investigated:
- Use the centroid to provide the Y-ordinate. This is probably better in some situations, and worse in others. But perhaps there's a fast way to choose the best one?
- Use multiple scan-lines (both vertical and horizontal)
- Provide better handling of short/zero-width scan-line intersections
- Support clipping the interior point to a rectangle. This would provide better results for cartographic labelling
[1] JTS was based on the original OGC
SFS specification (version 1.1). The spec actually does include a method
Surface.pointOnSurface. The reason for the different choice of name is lost to time, but possibly the reasoning was that the JTS function is more general, since it handles all types of geometry. (One of the design principles of JTS is Geometric Uniformity, AKA No Surprises. Wherever possible spatial operations are generalized to apply to all geometric types. There is almost always a sensible generalization that can be defined, and it's often quite useful.)
[1a] Also, the acronym IPA is much better than the one for PointOnSurface.
[2] Apparently Oracle has decided to provide blazingly fast performance by simply returning
a point on the boundary of the polygon. Thus proving the maxim that for every problem there is an solution which is simple, fast, and dead wrong.