Tuesday 21 October 2008

Nostalgic Trivia

When I was but a wee nerdling, I took a course taught by a grizzled veteran of the computer industry. I can no longer remember the subject matter of the course, but I do remember that at one point he referred to the main players in the computer business as "IBM and D'BUNCH". D'BUNCH were DEC, Burroughs, Univac, NCR, Control Data, and Honeywell. (And yes, I had to resort to Wikipedia to remember all these names.)

The moral of the story? Computer manufacturers come and go, but IBM remaineth eternal, apparently.

Dating myself even more, in the early part of my career I used machines made by the first 3 of these. The Univac had the distinction of having the most obtuse, unwieldy, difficult-to-use OS I have ever encountered. DEC, in contrast, had the best OS (of proprietary ones, that is - *nix blows 'em all away).

Monday 20 October 2008

Untangling REST

Thanks to Sean I just learned about Roy Fielding's blog.

I don't know why it never occurred to me that Roy Fielding would have a blog, and it would likely to be a good source of commentary on the evolving philosophy of the Web, but it didn't. He does, and sure enough it's chock full o' goodness.

Highlights include this post about how to efficiently build a RESTful Internet-scale event notification system for querying Flickr photo update events - using images as bitmap indexes of event timeslices! Or the post which Sean noted, along with this one - both cri-de-coeurs about the pain of seing the clean concept of REST muddied to the point of meaningless.

Anyone confused about REST would do well to avoid inhaling too much smoke and get a dose of fresh air from the source...

Tuesday 7 October 2008

Improvements to JTS buffering

By far the most difficult code in JTS to develop has been the buffer algorithm. It took a lot of hard graft of thinking, coding and testing to achieve a solid level of robustness and functionality. There have been a couple of iterations of improvements since the first version shipped, but the main features of the algorithm have been pretty stable.

However, it was always clear that the performance and memory-usage characteristics left something to be desired. This is particularly evident in cases which involve large buffer distances and/or complex geometry. These shortcomings are even more apparent in GEOS, which is less efficient at computation involving large amounts of memory allocation.

I'm pleased to say that after a few years of gestating ideas (really! - at least on and off) about how to improve the buffer algorithm, I've finally been able to implement some enhancements which address these problems. In fact, they provide dramatically better performance in the above situations. As an example, here are timing comparisons between JTS 1.9 and the new code (the input data is 50 polygons for African countries, in lat-long):

Distance JTS 1.9 JTS 1.10

0.01 359 ms 406 ms
0.1 1094 ms 594 ms
1.0 16.453 s 2484 ms
10.0 217.578 s 3656 ms
100.0 728.297 s 250 ms
1000.0 1661.109 s 313 ms

Another tester reports that a buffering task which took 83 sec with JTS 1.9 now takes 2 sec with the new code.

But wait, there's more! In addition to the much better performance of the new algorithm, the timings reveal a further benefit - once the buffer distance gets over a certain size (relative to the input), the execution time actually gets faster. (In fact, this is as it should be - as buffer distances get very large, the shape of the input geometry has less and less effect on the shape of the buffer curve.)
The algorithm improvements which have made such a difference are:
  • Improved offset curve geometry - to avoid some nasty issues arising from arc discretization, the original buffer code used some fairly conservative heuristics. These have been fine-tuned to produce a curve which allows more efficent computation, while still maintaining fidelity of the buffer result
  • Simplification of input - for large buffer distances, small concavities in the input geometry don't affect the resulting buffer to a significant degree. Removing these in a way which preserves buffer distance accuracy (within tolerance) gives a big improvement in performance.
A nice side effect of this work is the development of a solid methodology for validating buffers, and a thorough test suite for correctness, robustness, and performance.

This code will appear in JTS 1.10 Real Soon Now.