Thursday, January 8, 2009

Computing Geometric Similarity

The GEOS team is working on porting the JTS CascadedPolygonUnion algorithm. Reasonably enough they asked whether there were unit tests that they could use to check the correctness of their code. I was a bit dismayed to realize that in fact there weren't any unit tests for this class (although having used the code extensively I'm pretty confident that it's correct.)

So fine, some unit tests are needed. This should be easy, right? Just create or obtain some polygonal datasets, union them using both CascadedPolygonUnion (fast) and Geometry.union (slow), and compare the results.

But there's a catch (in geometric processing there usually is). Due to rounding effects, there's no guarantee that the results of the two algorithms will be vertex-by-vertex identical. Of course, they should be very, very similar... but how to check this?

In geometry processing it's usually much harder to test "similar" than it is to test "exact", and this case is no exception. In fact, there's many different tests that could be used, depending on how you want to define "similar". In the context of unit tests, similarity testing is essentially checking for discrepancies which might be introduced into a theoretically correct result. So the tests to use may depend on the operation(s) that are being performed, since different kinds of algorithms can produce different kinds of discrepancies. A further issue which needs to be considered is to detect gross differences which might result from catastrophic algorithm failure.

So now the task is to write a SimilarityValidator for geometries. For polygons there's various characteristics which can be compared:
  • A simple thing to do is to compare basic quantities like the area and perimeter length of the polygons. Paradoxically, these are good for detecting small discrepancies, but not for detecting gross ones (e.g. two identical rectangles which are offset by a large distance)
  • One obvious thing to do is to compute the symmetricDifference of the polygons and inspect the area of the result. This has a couple of issues. One is what area tolerance to use? A more serious issue is that the symmetric difference operation is itself an approximation to an exact answer. This is especially true in situations where the inputs are very close - which is exactly the situation we are trying to check! So there may be a certain lack of confidence that this is detecting all discrepancies
  • A more robust test is to compute the Hausdorff distance between the rings of the polygons. This allows expressing the tolerance in terms of distance, which is more intuitive. It's also fairly robust and easy to confirm the answer. There's a challenge in developing an efficient Hausdorff algorithm - but that's a topic for another blog post.
  • Other tests might include comparing structural things like comparing the number of rings in the polygons.

Hausdorff Distance between two polygons

With this validator written and some test datasets run I can finally confirm that CascadedPolygonUnion actually works!

1 comment:

Dustin Sampson said...

Hi Martin,

You mentioned in your blog about using area and perimeter length for comparing two polygons for similarity but problems exist with large offsets.

Is it possible to also use centroids and base the comparison on a difference ratio of each characteristic (assuming spatial index is being used)??

Not sure how efficient this is compared to the Hausdorff distance.