|Maximum Daily Temperatures for November 30, 2010|
There are many, many different ways of interpolating surfaces - Wikipedia has a long list. It turns out that a classic approach for interpolation of meteorological data is Barnes Interpolation (also called Barnes Analysis). Barnes Interpolation is a surface estimating technique which uses an initial pass to provide a smoothed estimate of the surface, and optionally further refinement passes to improve the surface to more closely match the observation values. The initial pass produces an averaged (smoothed) surface, using a summation of exponential (Gaussian) decay functions around each observation point. Subsequent refinement passes compute an error surface using the delta between the previous estimated surface and the observations. The error surface is added to the previous estimate surface to refine the estimate (by reducing the delta to the observations).
I'm more of a vector guy than a raster guy, but I figured that Google and Wikipedia would make short work of this - after all, how hard can raster be? Unfortunately, the Wikipedia article is not very helpful - the description is unclear, and it doesn't even define all the symbols used in the presented equations. I found some better references here and here, but they still contained some confusing discrepancies in the equations. Barnes' original paper is online as well, but as is often the case with primary sources, it is more concerned with the physical properties of the analysis, and doesn't have a very crisp description of the actual algorithm.
Triangulating between the various sources, I eventually arrived at the following algorithm definition (provided herewith without warranty!):
|Barnes Interpolation Algorithm |
For the first pass, the estimated value Eg at each grid point gx,y is:
Eg = ∑( wi * oi) / ∑( wi )
wi = exp(- di2 / L2C )
During refinement passes the estimate at each grid point is re-computed using:
E'g = Eg + ∑( wi * (oi - Ei) ) / ∑( wi )
The target development platform is Java (of course). For prototyping purposes I used JEQL to read sample datasets, run the algorithm with various parameter choices, and produce plots to visualize the results. Here's the surface plot for the dataset above:
This looks pretty reasonable, and provides a nice visualization of the large-scale data trend. (Surprise! It's hotter down south...) The ultimate goal is much more exciting. The algorithm will be provided as a Rendering Transformation in GeoServer, allowing surfaces to be generated dynamically from any supported datastore. This will open the way to providing all kinds of further surface visualizations, including other interpolation techniques - as well as everyone's favourite, Heat Maps (aka Multivariate Kernel Density Estimation).
Now, the pure mathematical description of the algorithm is sufficient for a proof-of-concept implementation producing single images. But to make it usable in a dynamic way in a high-demand environment, there are more details which need to be worked out to improve performance and functionality. For instance:
- How large an extent of data points is necessary to produce a stable surface for a given area? This is important for dynamic presentation, since zooming, panning and tiling should not change the appearance of the surface. Of course, all data points could always be used, but more selectivity improves performance.
- How far should the surface be extrapolated into areas of few or no data observations? Apart from the validity implications, limiting the extent of the computed surface provides a boost in performance.
- A further performance increase may be gained by computing the surface on a coarse grid, then up-sampling in an appropriate way to the display resolution.
- For data which is global in scope, how does the algorithm need to be changed to accommodate computation on the spheroid? It seems likely that it's necessary to use a geodetic distance calculation, and there may be other impacts as well.