Friday, 28 November 2008

KML Pie Charts in JEQL

Pie Charts are a nice way of displaying thematic visualization in Google Earth. They're also a good test of the chops of a KML generator. So naturally I was keen to see how to produce pie charts with JEQL.

For simplicity I decided to make my test statistic an orthographic comparison of the names of countries, showing the relative length of the country names and their vowel/consonant distribution. (This wasn't because this is a particularly interesting statistic, but it uses easily available data and exercises some of the data processing capabilities of JEQL).

The solution ended up using lots of existing capabilities, such as splitting multigeometries, regular expressions, JTS functions such as interior point, boundary and distance, and of course generating KML with extrusions and styling. The only new function I had to add was one to generate elliptical arc polygons - which is a good thing to have.

The results look pretty snazzy, I think - and would be even better with more meaningful data!


Tuesday, 4 November 2008

Wordlicious

After seeing this post on Wordle, I had to try it....


and there's no stopping at just one...

Tuesday, 21 October 2008

Nostalgic Trivia

When I was but a wee nerdling, I took a course taught by a grizzled veteran of the computer industry. I can no longer remember the subject matter of the course, but I do remember that at one point he referred to the main players in the computer business as "IBM and D'BUNCH". D'BUNCH were DEC, Burroughs, Univac, NCR, Control Data, and Honeywell. (And yes, I had to resort to Wikipedia to remember all these names.)


The moral of the story? Computer manufacturers come and go, but IBM remaineth eternal, apparently.

Dating myself even more, in the early part of my career I used machines made by the first 3 of these. The Univac had the distinction of having the most obtuse, unwieldy, difficult-to-use OS I have ever encountered. DEC, in contrast, had the best OS (of proprietary ones, that is - *nix blows 'em all away).

Monday, 20 October 2008

Untangling REST

Thanks to Sean I just learned about Roy Fielding's blog.



I don't know why it never occurred to me that Roy Fielding would have a blog, and it would likely to be a good source of commentary on the evolving philosophy of the Web, but it didn't. He does, and sure enough it's chock full o' goodness.

Highlights include this post about how to efficiently build a RESTful Internet-scale event notification system for querying Flickr photo update events - using images as bitmap indexes of event timeslices! Or the post which Sean noted, along with this one - both cri-de-coeurs about the pain of seing the clean concept of REST muddied to the point of meaningless.

Anyone confused about REST would do well to avoid inhaling too much smoke and get a dose of fresh air from the source...

Tuesday, 7 October 2008

Improvements to JTS buffering

By far the most difficult code in JTS to develop has been the buffer algorithm. It took a lot of hard graft of thinking, coding and testing to achieve a solid level of robustness and functionality. There have been a couple of iterations of improvements since the first version shipped, but the main features of the algorithm have been pretty stable.

However, it was always clear that the performance and memory-usage characteristics left something to be desired. This is particularly evident in cases which involve large buffer distances and/or complex geometry. These shortcomings are even more apparent in GEOS, which is less efficient at computation involving large amounts of memory allocation.

I'm pleased to say that after a few years of gestating ideas (really! - at least on and off) about how to improve the buffer algorithm, I've finally been able to implement some enhancements which address these problems. In fact, they provide dramatically better performance in the above situations. As an example, here are timing comparisons between JTS 1.9 and the new code (the input data is 50 polygons for African countries, in lat-long):

Buffer
Distance JTS 1.9 JTS 1.10

0.01 359 ms 406 ms
0.1 1094 ms 594 ms
1.0 16.453 s 2484 ms
10.0 217.578 s 3656 ms
100.0 728.297 s 250 ms
1000.0 1661.109 s 313 ms

Another tester reports that a buffering task which took 83 sec with JTS 1.9 now takes 2 sec with the new code.

But wait, there's more! In addition to the much better performance of the new algorithm, the timings reveal a further benefit - once the buffer distance gets over a certain size (relative to the input), the execution time actually gets faster. (In fact, this is as it should be - as buffer distances get very large, the shape of the input geometry has less and less effect on the shape of the buffer curve.)
The algorithm improvements which have made such a difference are:
  • Improved offset curve geometry - to avoid some nasty issues arising from arc discretization, the original buffer code used some fairly conservative heuristics. These have been fine-tuned to produce a curve which allows more efficent computation, while still maintaining fidelity of the buffer result
  • Simplification of input - for large buffer distances, small concavities in the input geometry don't affect the resulting buffer to a significant degree. Removing these in a way which preserves buffer distance accuracy (within tolerance) gives a big improvement in performance.
A nice side effect of this work is the development of a solid methodology for validating buffers, and a thorough test suite for correctness, robustness, and performance.

This code will appear in JTS 1.10 Real Soon Now.

Friday, 22 August 2008

Java Power Tools cuts straight and true

Part of my summer holiday reading is the book Java Power Tools, by John Ferguson Smart.


I count a computer book a good buy if I get one new idea from it; two is stellar; and three or more goes on my "Recommend to Colleagues" list. This one is on the list... Ideas I`ve picked up include:
  • the XMLTask extension for Ant, that provides easy and powerful editing of XML files. This should make configuring things like web.xml and struts-config.xml a lot easier. It even provides a way to uncomment blocks of XML markup.

  • SchemaSpy, which generates database documentation (including ER diagrams!) from JDBC metadata. The tool also also comes with profiles for interpreting some vendor-specific metadata. It will be interesting to see how it handles spatial datatypes in Oracle and PostGIS...

  • using Doxygen to generate documentation for Java source. Doxygen provides more capabilities than Javadoc, including UML diagrams and a variety of output document formats.

  • UMLGraph also allows generating UML diagrams from Java source, and embedding them directly in Javadoc.

For graph visualization, SchemaSpy, Doxygen and UMLGraph all use the GraphViz application. This looks like a great tool in its own right. It provides a DSL for specifying graph structures and node and edge symbology, along with a layout and rendering engine which outputs to numerous different formats.

JPT of course covers all the better-known tools such as Ant, Maven, CVS, SVN, JUnit, Bugzilla, Trac, and many others. It doesn`t replace the documentation for these tools, but it does give a good comparative overview and enough details to help you decide which ones you`re going to strap around your waist for the next project.

Friday, 15 August 2008

Be the most popular tile on your block

You might think that the image below is a map of North America population density. You'd be off by one level of indirection...

In fact it's a heat map of the access frequency for Virtual Earth map tiles.

So really it's not an image of where people are, but where they want to be...

The image comes from a paper out of Microsoft Research: How We Watch the City: Popularity and Online Maps, by Danyel Fisher.

Makes me wonder if there are tiles that have never been accessed. Like perhaps this one? (It took a looong time to render....)


And what is the most-accessed tile? This one, perhaps?

Thursday, 14 August 2008

Revolution is Happening Now

This great graph shows why we're all going to wrassling with parellelization for the rest of our coding lives:

Source: Challenges and Strategies for High End Computing, Kathy Yelick

Also see this course outline for an sobering/inspiring view of where computation is headed.

Monday, 4 August 2008

Krusty kurmudgeon Knuth kans kores & kommon kode

Andrew Binstock has an interesting interview with Don Knuth.



Professor Knuth makes a few surprising comments, including a low opinion of the current trend towards multicore architectures. Knuth says:
To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers...
But then he goes on to admit:
I haven’t got many bright ideas about what I wish hardware designers would provide instead of multicores...
Which seems to me to make his complaint irrelevant, at best. (Not that I don't sympathize with his frustration about banging our heads against the ceiling of sequential processing speed. And as the sage of combinatorial algorithms he must be more aware than most of us about the difficulties of taking advantage of concurrency.)

Another egregious Knutherly opinion is that he is "biased against the current fashion for reusable code". He prefers what he calls "re-editable code". My thought is that open source gives you both options, and personally I am quite happy to reuse, say, the Java API rather than rewriting it. But I guess when most of your code is developed in your own personal machine architecture (Knuth's MIX) then it's a good thing to enjoy rewriting code!

In the end, however, I have to respect the opinions of a man who has written more lines of code and analyzed more algorithms than most of us have had hot dinners. And I still place him in the upper levels of the pantheon of computer science, whose books all programmers would like to have seen gracing their shelves - even if most of us will never read them!

And yes, this post is mostly an excuse for some korny alliteration - but the interview is still worth a read. (Binstock's blog is worth scanning too - he has some very useful posts on aspects of Java development.)

Monday, 21 July 2008

Quote of the Day

Love him or hate, you have to admit that John Dvorak gives good copy:

Vista is essentially the old hooker with a bad facelift and too much makeup. She also can't remember her customers

Tuesday, 24 June 2008

Database design tips for massively-scalable apps

Here's an interesting post on design practices for building massively-scalable apps on database infrastructure such as Google BigTable.

The takeaway: this ain't your granpappy's old relational database system, so throw out everything he taught you. Denormalize. Prefer big fluffy things to small granular things. Don't bother with DB constraints - enforce the model in the application. Prefer small frequent updates to large page updates.

The good news (or bad, depending on how fed up you are with your local DBA) - don't bother with all this unless you intend to scale to millions of users.

Monday, 23 June 2008

GeoSVG, anyone?

The GeoPDF format seems to be gaining traction these days. I have to admit, when I first heard of this technology I had the same reaction as James Fee - "What's it good for"? But I'm coming round... A live map with rich information content and a true geospatial coordinate system - what's not to like?

My excuse for such scepticism is my finely-honed technical bullsh*t reflex, which uses the logic of "If this is such a good, obvious, simple idea then why hasn't it been implemented ages ago?".

To be fair, there have been lots of SVG mapping demos, which fill the same use case and provide equivalent functionality. Sadly that concept hasn't really caught fire, though (perhaps due to the ongoing SVG "always a bridesmaid, never a bride" conundrum).

Of course, an idea this good is really too important to be bottled up in the murky world of proprietary technology. It seems like this area is ripe for an open standard. SVG is the obvious candidate for the spatial content (sorry, GeoJSON). What it needs is some standards around modelling geospatial coordinate systems and encoding layers of features. It seems like this should be quite doable. The goal would be to standardize the document format so that viewers could easily be developed (either stand-alone, as modules of existing viewers, or as browser-hosted apps). Also, the feature data should be easy to extract from the data file, for use in other applications.

Perhaps there's already an initiative like this out there - if so, I'd love to hear about it.

Monday, 2 June 2008

B.C. winter causes frost buckling in Google Earth

The Google Earth imagery of Revelstoke Dam in B.C. (below) looks like it was taken in the depths of winter. Since this is a massive concrete dam, I'm guessing the frost buckling in the image is an artifact of the surface model. Maybe Google need to add some more antifreeze to their TIN algorithm... or perhaps light a twig fire under their rendering engine?

Thursday, 1 May 2008

Database Architecture monograph

In the era of cloud computing, map/reduce, dynamic languages and the semantic web, the stalwart Relational Database Management System is looking a bit fusty. But RDBMSes were perhaps the earliest widely-deployed example of many of the techniques of distributed computing, concurrent programming, and query optimization that are still highly relevant.

Hellerstein, Stonebraker and Hamilton have published a monograph on Architecture of a Database System. With names like that involved you'd expect high quality and some deep insight, and the article delivers. It's a good, accesible summary of the state-of-the-art in RDBMS technology.

JAQL pegs the cool technology mashup meter

JAQL is a query language and engine which has XQuery-like syntax, SQL-like operators, JSON as a native data format, and runs using the Hadoop ma/preduce framework. (Although not mentioned explicitly, it's probably great for social networking as well...)


You might think that JAQL and JEQL were separated at birth, but they actually have no genetic material in common. But it's interesting to see the J*QL acronym space being rapidly populated. The best two vowels are now gone - who's going to be next to pile in?

Wednesday, 30 April 2008

Taxonomy of Software Bugs

This software bug taxonomy is great! Heisenbugs, Bohr Bugs, and the dreaded Schroedinbugs....

Tuesday, 29 April 2008

Ted Nedward takes on the Tower of Babel

Here's another (as usual) fascinating, detailed, doesn't-this-guy-work-for-a-living post from Ted Nedward. This one starts as a meta-critique of Groovy VS Ruby and morphs into an interesting summary of what the Tower Of Babel IT department is using this year.

Money quote:

I wish I could get back to [C++]for a project in the same way that guys fantasize about running into an old high school girlfriend on a business trip.

Personally, my reaction to my old C++ girlfriend would be "TG I didn't get hitched to this chick - she's way too high-maintenance". Although Ted says she's changed...

Sunday, 20 April 2008

Microsoft's MVP's use... Google?

An article about a candid presentation by Steve Ballmer.

The XP-versus-Vista debacle just reinforces the core value of Open Software. Current version tested, deployed, and working fine? Then nobody can force you to upgrade...

Friday, 18 April 2008

Who's conspicuously absent from the PaaS fray?

Here's a hint: who was using the slogan "The Network is the Computer" 10 years ago? And who was the first company to deliver a RIA technology?

So why have they been MIA in the PaaS goldrush?

Let's think about this another way. What database are people most likely to run on their slice -o'Linux-in-the-cloud? mySQL perhaps? Which was just bought by...?

The Register has an article
about a possible JavaOne announcement about how this situation might change (with a leaked slide presentation! Fell off the back of an ftp packet, I guess..)

(The weird thing is is that the presentation talks only about PostgreSQL. An old file? Or a different corporate camp? Didn't get the memo maybe?)

KML Craziness

Why oh why did KML choose to specify colour values as AGBR ABGR rather than RGBA? Does anyone have a rational explanation for this anomaly?