In the era of cloud computing, map/reduce, dynamic languages and the semantic web, the stalwart Relational Database Management System is looking a bit fusty. But RDBMSes were perhaps the earliest widely-deployed example of many of the techniques of distributed computing, concurrent programming, and query optimization that are still highly relevant.
Hellerstein, Stonebraker and Hamilton have published a monograph on Architecture of a Database System. With names like that involved you'd expect high quality and some deep insight, and the article delivers. It's a good, accesible summary of the state-of-the-art in RDBMS technology.
Thursday, May 1, 2008
Database Architecture monograph
JAQL pegs the cool technology mashup meter
JAQL is a query language and engine which has XQuery-like syntax, SQL-like operators, JSON as a native data format, and runs using the Hadoop ma/preduce framework. (Although not mentioned explicitly, it's probably great for social networking as well...)![]()
You might think that JAQL and JEQL were separated at birth, but they actually have no genetic material in common. But it's interesting to see the J*QL acronym space being rapidly populated. The best two vowels are now gone - who's going to be next to pile in?
Wednesday, April 30, 2008
Taxonomy of Software Bugs
This software bug taxonomy is great! Heisenbugs, Bohr Bugs, and the dreaded Schroedinbugs....
Tuesday, April 29, 2008
Ted Nedward takes on the Tower of Babel
Here's another (as usual) fascinating, detailed, doesn't-this-guy-work-for-a-living post from Ted Nedward. This one starts as a meta-critique of Groovy VS Ruby and morphs into an interesting summary of what the Tower Of Babel IT department is using this year.
Money quote:
I wish I could get back to [C++]for a project in the same way that guys fantasize about running into an old high school girlfriend on a business trip.
Personally, my reaction to my old C++ girlfriend would be "TG I didn't get hitched to this chick - she's way too high-maintenance". Although Ted says she's changed...
Sunday, April 20, 2008
Microsoft's MVP's use... Google?
An article about a candid presentation by Steve Ballmer.
The XP-versus-Vista debacle just reinforces the core value of Open Software. Current version tested, deployed, and working fine? Then nobody can force you to upgrade...
Friday, April 18, 2008
Who's conspicuously absent from the PaaS fray?
Here's a hint: who was using the slogan "The Network is the Computer" 10 years ago? And who was the first company to deliver a RIA technology?
So why have they been MIA in the PaaS goldrush?
Let's think about this another way. What database are people most likely to run on their slice -o'Linux-in-the-cloud? mySQL perhaps? Which was just bought by...?
The Register has an article about a possible JavaOne announcement about how this situation might change (with a leaked slide presentation! Fell off the back of an ftp packet, I guess..)
(The weird thing is is that the presentation talks only about PostgreSQL. An old file? Or a different corporate camp? Didn't get the memo maybe?)
KML Craziness
Why oh why did KML choose to specify colour values as AGBR ABGR rather than RGBA? Does anyone have a rational explanation for this anomaly?
Tuesday, April 15, 2008
Is that cloud on the horizon going to start raining applications?
Timothy O'Brien speculates that the transition to cloud-based computing is happening sooner than expected. He's talking about the new integration between Salesforce.com (which is apparently the poster child for SaaS) and Google Apps (the poster child for desktop replacement by the Web). And he generalizes this to include EC2, SimpleDB, and the "twenty or thirty other companies that are going to join the industry".
He also warns here that this transition could transform the model for software development in ways uncomfortable for IT professionals.
He could be right. Cloud computing does seem to be poised to finally provide the right platform to suck the juice out of corporate data centres. The idea of virtual everything certainly has an appeal (especially to someone like me who is basically a software guy).
But questions occur... Salesforce and Google seem like a perfect match - but what about the other companies that want a piece of this action? Does it matter that you will have to commit everything to a given cloud platform? And what happens if that platform goes away? The more advantage you take of the cloud, the bigger the pain when it disappears. And what about apps which are a bit more specific than CRM (which in my naive view seems like just a fancy Contacts list - and hence an obvious and easy thing to integrate with an office suite).
Tim would probably call these kinds of questions "self-interested observations from one with the most to lose". He mentions a Salesforce meeting where business types applaud a sign showing "Software" with a big red slash through it... Well, maybe. Last I noticed no-one has quite managed to automate generating code from requirements documents (let alone automating the generation of implementable requirments documents out of people's heads 8^). So I would say it's more like "different software" than "no software".
One thing's for sure.. there's going to be some gigantic platform turf wars going on up there in the stratosphere.
(One big disappointment - it sounds like the Salesforce platform is based on their proprietary Apex language. Ugh. Just what the world needs - one more language to debate over. At least Google App Engine picked a real language for their launch!)
Friday, April 4, 2008
Ontogeny recapitulates Phylogeny in the life of a programmer
(C'mon, admit it - you've always wanted to use that as the title of a blog post too...)
Paul's epiphany seems like the equivalent of Ontogeny recapitulating Phylogeny in the evolution of a programmer. "Hey, C has arrays! Hey, C arrays are really just syntactic sugar for pointer dereferencing! Hey, I can index to anywhere in memory really easily! Hey, I can store anywh...."
SEGFAULT - CORE DUMPED
Sh*t.
"Hey, there's this new language called Java! And it has arrays too! Hey, if I index past the end of an array I get a nice error message telling me exactly where in my code the problem occurred! Hey, I think I can knock off work early and go to the pub!"
8^)
Friday, March 28, 2008
JTS at the sharp end of many arrows
Miguel Montesinos and Jorge Sanz from the gvSIG project have made a nice diagram showing the relationships between a bunch of GFOSS projects. It' s nice to see JTS close to the centre of the diagram - although having so many arrows pointed at it makes me a little nervous!
They show JTS having a dependency on Batik for some reason - that must be an error, there's no relationship between the two. Or does Batik use JTS?
It would have been nice to see a "parent-of" relationship between JTS and GEOS and NTS, since the latter two also form a key component of quite a few projects. And I'm pretty sure MapGuide OS uses GEOS, as does OGR.
But it's not surprising there's a few errors and omissions - it's a pretty complicated web of relationships. And this is over only what - 8 years? I wonder what the diagram will look like in another 8...
Monday, March 24, 2008
Time for JSTS?
Like a lot of stuff, spatial functionality is moving out into the browser. Signs of the times:
- proj4js
- GeoJSON
- This comment: "The DOJO client-side libraries provide support for performing some simple spatial operations like ‘intersection’ on the client-side." (Although I haven't been able to find this functionality in a quick browse of Dojo - can anyone confirm this?)
Take it away, someone - I'm too busy!
Saturday, March 22, 2008
Swimming taught by telephone
Ok, this is like popcorn... you can't have just one.
Just think - they had no idea that one day you would be able to simply download a learn-to-swim podcast and listen to it on your waterproof MP3 player...
Has Automation affected YOUR job?
Can't resist posting this (from Modern Mechanix).
I'm definitely enjoying not having grimy hands, but what happened to the shorter work week and more leisure time?
Tuesday, March 18, 2008
World Population Cartograms
The Daily Green has some cartograms showing the change in distribution of world population since 1900.
But there's another aspect to this that the original images don't show - the increase in the absolute number of people in the world. So here's the images with that factor applied:

A good kick in the pants for the GeoWeb?
Reading the blog posts that are starting to come out about the ESRI Dev Summit, I'm struck by a few things:
- REST! KML! JSON! AJAX! Dojo! I haven't seen this many tech buzzwords out of ESRI since ArcGIS on Windows was released with COM! VBA! DLLs! Hopefully some of these have more legs
- Clearly ESRI is embracing the GeoWeb concept with a vengeance. It will be very interesting to see how these new modes of access to GIS functionality plays out. To what extent will the marriage of web services and GIS processing be an effective model?
- They also seem to be accepting Google Maps/Earth and MS VE as valid spatial delivery platforms. It sounds like there is some pretty full-featured capabilities to deliver data to these platforms. It's going to be interesting to see the uptake on these capabilities, and whether ESRI's heart is really in making these platforms perform to their full capability. No doubt there will be some fascinating business implications as well that get played out over time...
- I'll be interested to see whether ESRI's "legitimizing" spatial-via-web-service will have any effect on the OGC W*S world. It seems to me that while OGC was early out of the gate with a web service suite, they seem to be wallowing in the doldrums as far as making the specs effective for real-world use. The OGC W*S suite was quickly adopted by the OSS world, and for good reason - open source tends to be very faithful to open standards, since a design goal is usually to have a high degree of interoperability. Not to mention that it's easier to code to a standard which has already done a lot of the hard thinking. But - the cool kids are losing interest in the crusty old W*S interfaces, since they're not keeping up with the rapid emergence of exciting new web paradigms.
- Ultimately ESRI's trailblazing will be a good thing for OSS technologies. Nothing like having a working system to inspect, copy, and improve. And the more open, standard technologies that system uses, the easier it is to steal - um, learn from. Also, as the GIS stack gets more open interface technologies accepted, it becomes easier to plug in heterogeneous components into that stack.
Monday, March 17, 2008
Tired of those boring old conformal map projections? Make your own!
Flex Projector is an interesting tool that allows you to make a custom map projection by adjusting the location and shape of meridians and parallels.
Here's a nice projection I defined to maximise the area of tropical vacation spots 8^)
The downside of this app is that as far as I can tell, there's no way to export a projection once it's defined. Instead, you have to use the Flex Projector application to reproject datasets. So it doesn't look like you can make your own custom projection for use in say, PostGIS or OGR.
I guess it might be asking a little much for the app to spit out some custom Proj4 C code with an associated EPSG code and CRS WKT... But hey, it's an open source app, so come on someone - how about it?
Friday, March 14, 2008
Quote of the (Pi) Day
Sir, I send a rhyme excelling
In sacred truth and rigid spelling
Numerical sprites elucidate
For me the lexicon's full weight
Maybe not the greatest lyric poem - but list the number of letters in each word. (If you prefer a shortcut this site has them already listed - along with 9, 979 more)
And notice the value of the date as MM.DD...
Monday, March 3, 2008
Branch-and-Bound algorithms for Nearest Neighbour queries
This paper by Roussopoulos et al is a fine expositions of how to use a Branch-and-Bound algorithm in conjunction with an R-tree index to efficiently perform Nearest-Neighbour queries on a spatial database.
Nearest-neighbour is a pretty standard query for spatial database. Oracle Spatial offers this capability natively. It would be great if PostGIS did too. If the GIST index API offers appropriate access methods, it seems like it might not be too hard to implement the Roussopoulos algorithm. (How about it, Paul, now that you're dedicating your life to becoming a PostGIS coding wizard...)
I'm also thinking that this approach might produce an efficient algorithm for computing distance between Geometrys for JTS. It's always bugged me that the JTS distance algorithm is just plain ol' O(N^2) brute-force. It seems like there has to be a better way, but as usual there seems to be precious little prior art out there in web-land. This should also work well in the "Prepared" paradigm, which means that it will provide an efficient implementation for PostGIS as well. Soon, hopefully...
Sunday, February 17, 2008
The Four Programmers
Here's an amusing adaptation of Monty Python's Four Yorkshiremen sketch...
I feel increasingly like this myself (more so now that I'm in a company where I am the oldest employee)...
When I started coding we had to type out our programs on punchcards. If you made one typo you had to retype the entire card, so to avoid mistakes we wrote out our code on paper pads with 80-column grids. You had to stand in line to submit your card stack to the guy running the card reader, and then pick up your output once it had been separated and filed by some other guy. And if you had a syntax error - back to the cardpunch to do it all over again!
We thought we'd died and gone to heaven when we got online accounts and were allowed to use a hardcopy DecWriter terminal. It even had an APL character set (don't forget to change the charset when you switched back to FORTRAN). This was better - we could "erase" (actually just strikeout) and change mistakes on the line. But forget about printing out your entire program - at 300 baud this could take many minutes, which was all chewing up your connect time allocation.
But we had it lucky.. I worked with a guy who started out coding accounting routines for a magnetic drum-based system in the early 60's. A big chunk of their time was spent reordering the individual instructions around the drum to reduce read latency. Fun stuff!
And he was lucky compared to Don Booth, who was at the oceanographic research centre where I worked during my undergrad days. He was one of the pioneers of computing in Britain. All they had for storage was a 1000-word mercury delay tube... and no doubt they spent their coding time plugging wires and changing tubes.
But you try and tell kids with their touch-sensitive palmtops that these days, and they won't believe you...
Wednesday, January 30, 2008
The End of an Architectural Era?
Stonebraker et al have an interesting paper pointing out the antiquity and consequent limitations of the classic relational database architecture in today's world of massive disk/cycles/core.
If they are correct (and in spite of the recent MapReduce blunder Stonebraker has made a lot of great calls in the DB world), the world of data management is going to get awfully interesting in the coming years. The DBA's & DA's of this world have been living a relatively comfortable existence compared to those who are wandering across the stormy badlands of the middle tier. But Stonebraker postulates at least 5 radically variant database architectures to address specific use cases of data management. This would seem to lead to a much more complex world for data architects. But maybe a windfall for the consultants who find their niche?

