Thursday, 21 June 2007

Packed 1-dimensional R-Tree for Geometric Algorithms





Some geometry algorithms depend on searching a fixed set of 1-dimensional intervals for the ones which contain a given value. An example is Point-In-Polygon testing using the stabbing line method. In cases where the algorithm needs to be run multiple times for different input values, the performance can be improved if a suitable data structure is used to increase the efficiency of search.

Here's an idea for an index which allows fast lookup of 1-dimensional intervals. It is basically a 1-dimensional R-tree, with a packing scheme which allows easy creation of the index for a fixed set of intervals. The index has the following charateristics:
  • It indexes 1-dimensional intervals over some ordered set of values (for geometry, usually floating-point numbers)
  • It indexes a static, pre-known set of items. Once built the index cannot be modified.
  • Queries can be by either range or single value (stabbing queries), and return the set of all items which intersect the query value
  • One optional parameter is the bucket size
The tree contains two types of nodes: leaf nodes, which contains the stored intervals (and any additional user-specific information), and interior nodes, which contain an interval and references to two child nodes which lie in that interval.

To build the tree:
  • sort all the intervals to be indexed by their midpoint. These form the leaf nodes of the tree.
  • Create new interior nodes for every adjacent pair of nodes, assigning the new node to have an interval which spans the intervals of its children. If there is only one child available, do not create a new node for it.
  • Recursively repeat the previous step, until a single node is created. This is the root node of the tree.
Searching the tree is simple:
  • traverse the tree in depth-first order, pruning branches which have intervals which do not intersect the query value.
A nice aspect of this data structure is that it is simple to build. It should be possible to implement the structure using two arrays, one for leaf nodes and one for interior nodes. This should provide a pretty efficient and straightforward implementation in C.

The interior node definition can easily be generalized to allow n intervals per node (the bucket size). Of course, there's a trade-off between increasing fan-out and decreasing selectivity. It's not obvious to me where the sweet spot is.

Really this is a 1-dimensional version of Leutenegger & Edgington's STR tree. JTS even contains an implementation of this, using the generalized STRtree classes already in JTS. The novelty here is the exploration of just how simple the implementation of this structure can be. A simple implementation should be faster to build and query, as well - this is a subject for some performance testing.

There are several (lots?) of other data structures for efficiently querying sets of intervals. The segment tree and interval tree are probably the most well-known. Both of these structures are more complex to understand and implement, I believe. In addition, I don't think that they are amenable to an implementation based solely on two simple arrays.

5 comments:

Bill Thorp said...

I *think* I was just talking to Sean Gillies about this very same idea. However, when I explain it, I sounded like a stoner, rather than an engineer.

Bill Thorp said...

Well, actually my idea as a little different... storing geometry permanently as trees. Then it'd have intrinsic tree pruning for fast searching, but also brain-dead same-SRS rasterization into tilesets.

Dr JTS said...

Storing geometry as a tree is an interesting idea. The 1D-R-tree isn't going to do this for you of course, but there are lots of other options. Hanan Samet's bible-of-spatial-indexes I think talks about that a bit.

Seems to me that this would be heavily dependent on use case. There's a tradeoff between complexity of storage and the efficiency gain (as always)

Bill Thorp said...

Hunh. Apparent Hanan Samet et al wrote "Using linear quadtrees to store vector data" when I was six. Maybe its time for some graduate studies. :)

Dr JTS said...

Hard to get really new ideas in this world....

But you could always *implement* something cool - that's much rarer than just writing something up!