Thursday 13 September 2007

Geodetic data in PostGIS - Spherical indexing schemes

Here at Refractions we're starting to think about how we can provide better support in PostGIS for geodetic data. Geodetic data is data which is defined in a true spherical coordinate system (in particular, of course, the surface of the Earth).

Currently PostGIS provides some geodetic-aware functions (such as distance between two geodetic points). But it's current spatial data model is fundamentally planar, so there are definite limitations to modelling geodetic data (e.g. such as the notorious "line crossing the Date Line" problem). As PostGIS gets used for larger datasets and more ambitious projects, the utility of having a full-function geodetic data model is becoming increasingly obvious.

Handling geodetic data in a correct and efficient way presents quite a few challenges. A major one is: how can geodetic geometry be spatially indexed? Conventional spatial indexes (such as 2D R-trees) all rely on geometry being embedded in a planar space. They don't handle data which can "wrap around", as can occur in a spherical space.

There have been various clever proposals for spherical indexing strategies. Some prominent ones are listed below:
  • Hierarchical Triangular Mesh - this is essentially a "quad-tree for the sphere". It has a lot of appeal for use with point data, since it provides a hierarchical key which can be indexed using a conventional B-tree index. (It was co-developed by the late, great Jim Gray in order to index astronomical data). The mathematics to determine the index key for a non-point object would seem to be somewhat complicated. It also seems like HTM would suffer from the usual disadvantage of quadtrees of not being very self-tuning. Another disadvantage from the PostGIS point of view is that this would likely be a brand new index type (i.e. lots of difficult code to write)
  • Hipparchus Voronoi-based index. This index can be thought of as a fixed-grid index using a custom Voronoi cell coverage for the globe. IBM's DB/2 Geodetic extension uses this scheme. I must say that this concept, while ingenious, seems a bit baroque to me. This index has the usual disadvantage of fixed-grid indexes of not handling widely-varying data sizes well. And it also requires extremely complex cell coverage structures, which have to be selected specifically for the expected data composition. DB/2 supplies 13 different ones based on various data densities (G7 industrial output, anyone?). I'm not sure what you are supposed to do if your data has some other density distribution - it doesn't seem very feasible to make your own Voronoi grid. And what if you don't know your data distribution, or it changes over time?

  • 3D Bounding Box - this is the approach used by the pgSphere project. It's pretty straightforward. The key concept is to embed the sphere in 3-space, so that it is possible to compute 3D bounding boxes for geometries on the embedded sphere. The bounding boxes can then be indexed using a 3D R-tree (exactly analogous to a 2D R-tree spatial index). The GIST index supplied with PostgreSQL can be customized to provide the required 3D R-tree. One possible issue is that apparently R-trees become "less effective" in higher-dimensional spaces. It remains to be seen whether this is truly a serious problem.
Out of these options, it seems like the 3D Bounding Box approach is the most straightforward scheme. There's some challenges in developing the mathematics required (at least, for a planar/linear guy like me), but I'm hopeful that we can deal with these issues and arrive at a efficient, maintainable solution.

8 comments:

Ashton said...

It'd be great to see increased spherical functionality in postGIS - or in any GIS that I'm aware of! We continue to employ 2-d Cartesian systems, with their associated distortions, even though spherical digital representations appear quite tractable.

A correction: Jim Gray certainly did not develop the hierarchical triangular mesh in the past few years; nor was it initially designed for astronomical indexing. Articles on this structure, referring to it as the quaternary triangular mesh (QTM) have appeared over at least the past 20 years. Geographers such as Geoffrey Dutton and Mike Goodchild have done much work on the fundamental structure, algorithms and indexing schemes, and on applications to global grids.

This literature might be valuable if you decide to pursue the indexing scheme further.

Dr JTS said...

Thanks for the elaboration on the origins of spherical meshes. The paper by Gray et al does refer to various other version of the scheme (so the antecedents are given credit, at least!). I'm not sure exactly what the original content of their paper was - the HTM name, perhaps?

Actually, the real point of mentioning Jim Gray was to make a small posthumous tribute to his work.

I'll definitely check out the other literature - although I still feel that the R-Tree approach will be more productive.

Dr JTS said...

Ashton,

Are you aware that all of the major spatial databases support geodetic data already? IBM uses a Voronoi index as I mentioned; I'm not sure what Oracle uses; and I suspect that the forthcoming MS SqlServer 2008 will use HTM.

That leaves the issue of how well GIS clients handle the geodetic data provided by the database. That I'm less clear on - I suspect that support is somewhat limited.

Ashton said...

I don't follow database developments too closely, so it's great to hear this. And hopefully it'll be adopted by users with world-wide applications. Many apps - or at least their representations - seem to be stuck in Cartesian space. For example, check out the following NASA map generation site of global temperature anomalies.

http://data.giss.nasa.gov/gistemp/maps/

Assuming the resulting map goes to 90N and 90S, the upper and lower edges of the map should all have the same value (as these 'edges' are really points). They don't, implying that the interpolation routine used to generate the map is not operating on a spherical structure.

Thanks again for an interesting read - progress in spatial data structures is terrific to see!

mentaer said...

As Ashton i would also refer to Geoff Duttons work (and book: Springer): see http://www.spatial-effects.com/SE-Home1.html and you may even ask him for support ;)

but i am not sure how one can handle geocentric cartesic 3d corrdinates with it (that are not on the sphere)

Unknown said...

You missed one other outlier tiling scheme - http://www.pyxisinnovation.com/pyxwiki/index.php?title=How_PYXIS_Works

Oleg Bartunov said...

We developed Q3C sphere indexing scheme and implement it for PostgreSQL. See
http://www.sai.msu.su/~megera/wiki/SkyPixelization
for details. We used it in our Virtual Observatory project with great success.

Dr JTS said...

Oleg,

Yes, I saw this project. The SphereCube concept does seem like the simplest quad-tree variant for spherical data.

Recently I had a conversation with Hanan Samet in which he recommended this approach as well.

My one concern is that this will be more complex to use to index non-point geometries. Did you handle this use case in your project?