Wednesday, 8 August 2007

PreparedGeometry - efficient batch spatial operations for JTS

A common usage pattern for JTS spatial operations (in particular, predicates such as intersects and contains, and overlay operations such as intersection and union) involves the operation being invoked between a single target geometry and a batch of test geometries. For instance, this occurs in queries in a spatial database such as PostGIS, where each geometry in the driving table interacts with a set of nearby geometries via the required operation. By taking advantage of the fact that the target geometry does not change, it is possible to significantly reduce the overall execution cost of the operation. Information such as an index on the line segments in the target geometry can be computed once and cached for reuse with all of the test geometries.

In order for this to work, the calling code must be able to indicate that the operation is being invoked in a batch setting. The standard JTS Geometry API doesn't allow differentiating this case, so the new concept of PreparedGeometry has been developed. (This name intentionally echoes the concept of prepared statement in SQL API's, which implements the same strategy.) Creating a PreparedGeometry on a target Geometry enables precomputation of various data structures, which then allows subsequent operations to be evaluated with maximum efficiency.

In addition to caching, it is also possible to optimize the evaluation of spatial predicates. Optimization takes the form of short-circuiting predicate evaluation when certain situations are detected. While not all OGC spatial predicates are amenable to optimization, fortunately the most commonly usd predicates, intersects and contains, can be optimized significantly. Examples of situations which can be optimized are:
  • For intersects, any intersection between line segments indicates a true result. If there is no intersection between any line segments of the operands, and the target geometry is polygonal, then one or more fast point-in-polygon tests can be performed to compute the result value.
  • Since it provides more information than intersects, the contains predicate is more complex to optimize. In the case where the target geometry is polygonal, evaluation can be short-circuited when the test geometry has no line segment intersection. In this case, containment can be tested by an appropriate set of point-in-polygon tests. It is also possible to short-circuit in some specific cases where a proper intersection is detected between two line segments. Both of these situations can be tested efficiently.
The performance gains from using PreparedGeometry in real-world situations can be dramatic - up to 40 times improvement in speed. This functionality will appear in the next release of JTS, and hopefully will make its way into GEOS and PostGIS as well.

4 comments:

Andrea Aime said...

Yeaaahh, finally bulk spatial operation at good speed. Hurray for Martin! :)
Can't wait to put my hands on it... time for GeoTools to have spatial operations too!
Oh boy, I'm so happy! :)

D_Guidi said...

Oh my god this seems to be an amazing feature for JTS :)

. said...

I like the concept... and indeed I'm looking forward the implementation, but I think the name is misleading. IMHO (and without fully knowing the details and scope of the feature), PreparedOperation would a better name. See, every time you explain what a "PreparedGeometry" is, the word "operation" is used anyway. Following your SQL API analogy, PreparedStatement comes to be a better name than PreparedRow :)

Dr JTS said...

Ricardo, that's a good point that SQL statements correspond more to operations than geometry. However, the implementation of PreparedGeometry really adds information to the Geometry, not the individual operations. And a single PreparedGeometry supports several operations, so I definitely need a class that contains the prepared data and hosts the operation methods.

Hopefully this will make sense when you see the implementation.