Showing posts with label ogc simple features. Show all posts
Showing posts with label ogc simple features. Show all posts

Wednesday, August 8, 2007

PreparedGeometry - efficient batch spatial operations for JTS

A common usage pattern for JTS spatial operations (in particular, predicates such as intersects and contains, and overlay operations such as intersection and union) involves the operation being invoked between a single target geometry and a batch of test geometries. For instance, this occurs in queries in a spatial database such as PostGIS, where each geometry in the driving table interacts with a set of nearby geometries via the required operation. By taking advantage of the fact that the target geometry does not change, it is possible to significantly reduce the overall execution cost of the operation. Information such as an index on the line segments in the target geometry can be computed once and cached for reuse with all of the test geometries.

In order for this to work, the calling code must be able to indicate that the operation is being invoked in a batch setting. The standard JTS Geometry API doesn't allow differentiating this case, so the new concept of PreparedGeometry has been developed. (This name intentionally echoes the concept of prepared statement in SQL API's, which implements the same strategy.) Creating a PreparedGeometry on a target Geometry enables precomputation of various data structures, which then allows subsequent operations to be evaluated with maximum efficiency.

In addition to caching, it is also possible to optimize the evaluation of spatial predicates. Optimization takes the form of short-circuiting predicate evaluation when certain situations are detected. While not all OGC spatial predicates are amenable to optimization, fortunately the most commonly usd predicates, intersects and contains, can be optimized significantly. Examples of situations which can be optimized are:
  • For intersects, any intersection between line segments indicates a true result. If there is no intersection between any line segments of the operands, and the target geometry is polygonal, then one or more fast point-in-polygon tests can be performed to compute the result value.
  • Since it provides more information than intersects, the contains predicate is more complex to optimize. In the case where the target geometry is polygonal, evaluation can be short-circuited when the test geometry has no line segment intersection. In this case, containment can be tested by an appropriate set of point-in-polygon tests. It is also possible to short-circuit in some specific cases where a proper intersection is detected between two line segments. Both of these situations can be tested efficiently.
The performance gains from using PreparedGeometry in real-world situations can be dramatic - up to 40 times improvement in speed. This functionality will appear in the next release of JTS, and hopefully will make its way into GEOS and PostGIS as well.

Saturday, June 9, 2007

Quirks of the "Contains" Spatial Predicate

One of the most useful OGC standards is their specification of the Dimensionally-Extended 9 Intersection Model for spatial relationships. (To give credit where is it due, this was originally developed by Egenhofer, Clementini and others). In its most general form this model is quite complex, so to make it usable for mortal programmers, a set of commonly-useful "named predicates" have been specified. These include obvious relationships such as intersects, disjoint, touches, equals, and contains.

But are they really so obvious? In fact not - several of them have subtle aspects to their definition which are contrary to intuition.

In particular, "contains" (and its converse "within") has an aspect of its definition which may produce unexpected behaviour. This quirk can be expressed as "Polygons do not contain their boundary". More precisely, the definition of contains is:

Geometry A contains Geometry B iff no points of B lie in the exterior of A, and at least one point of the interior of B lies in the interior of A

That last clause causes the trap - because of it, a LineString which is completely contained in the boundary of a Polygon is not considered to be contained in that Polygon!

This behaviour could easily trip up someone who is simply trying to find all LineStrings which have no points outside a given Polygon. In fact, this is probably the most common usage of contains. For this reason it's useful to define another predicate called covers, which has the intuitively expected semantics:

Geometry A covers Geometry B iff no points of B lie in the exterior of A

Its converse coveredBy is also useful as well. It's a bit of a mystery why OGC did not define this predicate. In any case, Oracle Spatial does provide this predicate. I have added it to JTS as well.

There's a further bonus to defining the covers predicate. It is much easier to optimize the common use case:

covers(Rectangle, Geometry)

All that is necessary to determine this condition is to perform a simple bounding box comparison. This is not possible with contains, because even if the bounding box of Geometry is covered by the Rectangle, a further expensive operation is required to test if the Geometry lies wholly in the boundary of the Rectangle (in which case the predicate fails).

Covers "simplifies" the defintion of contains by making it more general (inclusive). There's another possibility as well, which is to simplify in the direction of being less inclusive. This would produce a predicate which might be called containsProperly, with the definition:

Geometry A containsProperly Geometry B iff all points of B lie in the interior of A

Interestingly, some recent work I'm doing indicates that this predicate might play a very useful role - but that's a story for another post.


References

DE-9IM: Clementini, Eliseo, Di Felice, and van Osstrom; "A Small Set of Formal Topological Relationships Suitable for End-User Interaction," D. Abel and B.C. Ooi (Ed.), Advances in Spatial Database—Third International Symposium. SSD '93. LNCS 692. Pp. 277-295.

9 Intersection model: M. J. Egenhofer and J. Herring; "Categorizing binary topological relationships between regions, lines, and points in geographic databases," Tech. Report, Department of Surveying Engineering, University of Maine, Orono, ME 1991.