Wednesday, February 6, 2013

JTS Union VS ArcGIS Dissolve

Ragnvald Larsen has an interesting post on ways to mitigate the poor performance and stabilty of Dissolve computations in ArcGIS.  Dissolve is the Arc term for the geometric union of a collection of polygons (possibly grouped by attribute, although that capability was not used in this case).

Ragnvald's dataset consisted of a 15 MB shapefile containing about 7000 overlapping polygons.  Here's what the data looks like:

He found that using the ArcGIS Dissolve method took about 150 sec to process the dataset.  In an effort to reduce this time, he experimented with partitioning the dataset and doing the union in batches.  After a (presumably lengthy) series of experiments to find the optimal batch size, he was able to get the time down to 25 sec using a batch size of 110 features.

Improving union performance by partioning the input is the basic idea behind the Cascaded Union function in JTS (which I blogged about back in 2007).  Cascaded Union uses a spatial index to automatically optimize the partitioning.  Ragnvald doesn't mention whether he used a spatial index, but I suspect this might be quite time-consuming to code in ArcPy.

I thought it would be interesting to compare the performance of the JTS algorithm to the ArcGIS one.  To do this I used JEQL, which provides an easy high-level way to read the data and invoke the JTS Cascaded Union.  The entire process can be expressed as a very simple JEQL script:

ShapefileReader t file: "agder/agder_buffer.shp";
t = select geomUnionMem(GEOMETRY) g from t;
ShapefileWriter t file: "result.shp";

geomUnionMem is a JEQL spatial aggregate function which is implemented using the JTS Cascaded Union algorithm.  (Although not needed in this case, note that the more general Dissolve use case of unioning groups of features by their attributes can easily be achieved by using the standard SQL GROUP BY clause.)

Running this on a (late-model) PC workstation produced a timing of about 1.5 sec!

Here's the output union:


mentaer said...

mhm.. well, OpenJUMPs union took 6 secs on my 2006 MacBook Pro. But I think we are not using the simple JTS union(), but maybe also cascaded union or something else... depends if we have attributes etc.
However, its not 100 times faster as with JEQL but still much faster than ArcGIS/ArcPy :)

SigTill said...

Looks like there are more people having this issue.