The Hausdorff Distance is a useful spatial function which can appear slightly mysterious. Partly this is due to the name. It honours Felix Hausdorff, one of the founding fathers of topology, and a polymath who was creative in music and literature as well as mathematics.
Felix Hausdorff (1868-1942)
But the name conveys nothing about why this function is useful, or how it is different to the more familiar shortest distance. The key difference is: the shortest distance tells you how close things are, but the Hausdorff distance tells you how far apart they are. So a more descriptive name might be "farthest distance" or "maximum distance". With due respect to Dr. Hausdorff, this is one of those historical artifacts of nomenclature that deserves a refresh. (Especially since it's becoming recognized that the core concept was actually first published by the Romanian mathematician Dimitrie Pompeiu. Users of the future will be grateful to be spared invoking the ST_PompeiuHausdorffDistance function.)
Definition
The formal definition of the Hausdorff distance (HD) is
HD(A,B) = max( DHD(A,B), DHD(B,A) )
where DHD is the directed Hausdorff Distance (DHD):
DHD(A,B) = max a ∈ A dist(a,B)
with dist(a,B) being the usual shortest distance between point a and geometry B:
dist(a,B) = min b ∈ B dist(a,b)
The Hausdorff distance is symmetric and is a true distance metric. The directed Hausdorff distance is asymmetric. Both can be useful in different contexts. The directed version is arguably more fundamental. (It's certainly where the bulk of the implementation effort lies.)
Directed Hausdorff Distance is asymmetric
The main application of the Hausdorff distance is in determining how well two datasets match, by providing a measure of their similarity. In spatial applications these are typically geometries such as lines or polygons, but they can also be point clouds or raster images. The Hausdorff distance is much more useful than shortest distance as a similarity measure because it gives information about all the points in a shape, not just a single closest point. While shortest distance puts a bound on how far a single point is from the target, the Hausdorff distance is a bound on every point in the query shape. So in the figure below the two lines have a small shortest distance, but the Hausdorff distance reveals that they are actually far apart at some points.
Hausdorff Distance VS Shortest Distance
The Implementation Challenge
A key difference between shortest distance and Hausdorff distance is that the pair of points defining the shortest distance always includes at least one vertex, whereas the Hausdorff distance can occur at non-vertex points. For lines, the Hausdorff distance can occur anywhere on the edges:
For polygons it can occur on edges or in the interior of the query area:
This makes the Hausdorff distance substantially harder to implement for general 2D geometries. While the shortest distance can be determined simply by evaluating the distance at the finite set of vertices on each geometry, the Hausdorff distance requires a way to evaluate a finite set of points out of the infinite number of non-vertex points.
Perhaps this is why it's so hard to find an implementation of Hausdorff distance for general 2D geometry. (Or is there just no need for a fast accurate general-purpose Hausdorff distance? Surely not...) There's some implementations for point sets, and at least one for the specific case of convex polygons. There's a couple which may support lines (here and here), but in a seemingly crude way. And I haven't found a single one for general polygons. Excellent - it's good to have a challenge!
Discrete Hausdorff Distance
A simple approach is to discretize the input linework by densifying the linework. The Hausdorff distance is then evaluated over the original and added vertices. The JTS Topology Suite class DiscreteHausdorffDistance implements this approach. This algorithm was developed many years ago (2008) for use in the RoadMatcher linear network conflation tool. It worked well enough for that use case, since inputs were typically small and the accuracy was "good enough". But it has some serious problems:
- achieving accuracy requires a high degree of densification of every edge, which means slow performance
- if the Hausdorff distance occurs at a vertex, then densification is not needed, but this is impossible to determine a priori
- the user generally has no idea what level of densification (if any) is required to determine a result of required accuracy (this is particularly problematic in automated batch processing, where geometries may require varying amounts of densification)
- the use of a densification factor rather than a maximum segment length was a mistake. It is hard to determine the factor needed for a desired distance accuracy, and it causes over-densification of short edges
- it is very slow when the inputs are equal or very similar (as shown in this issue)
- polygonal inputs are not supported
- the internal shortest distance computation is inefficient, since it does not use an indexed algorithm
Some of these flaws could be fixed. For instance, shortest distance computation can be improved by using IndexedFacetDistance (which was not available at the time of development). And densification could be controlled by a maximum segment length instead of a factor. But addressing all these issues requires a fundamental rethinking of the algorithm.
Given the wide deployment of JTS and its C++ port GEOS, any improvement stands to benefit a huge number of users. And after 18 years it's high time this clunky old code was replaced. So I'm happy to announce that I'm working on an entirely new implementation for Hausdorff distance which solves all the issues above. Expect a blog post soon!




