Showing posts with label computation. Show all posts
Showing posts with label computation. Show all posts

Thursday, May 1, 2008

Database Architecture monograph

In the era of cloud computing, map/reduce, dynamic languages and the semantic web, the stalwart Relational Database Management System is looking a bit fusty. But RDBMSes were perhaps the earliest widely-deployed example of many of the techniques of distributed computing, concurrent programming, and query optimization that are still highly relevant.

Hellerstein, Stonebraker and Hamilton have published a monograph on Architecture of a Database System. With names like that involved you'd expect high quality and some deep insight, and the article delivers. It's a good, accesible summary of the state-of-the-art in RDBMS technology.

JAQL pegs the cool technology mashup meter

JAQL is a query language and engine which has XQuery-like syntax, SQL-like operators, JSON as a native data format, and runs using the Hadoop ma/preduce framework. (Although not mentioned explicitly, it's probably great for social networking as well...)


You might think that JAQL and JEQL were separated at birth, but they actually have no genetic material in common. But it's interesting to see the J*QL acronym space being rapidly populated. The best two vowels are now gone - who's going to be next to pile in?

Wednesday, April 30, 2008

Taxonomy of Software Bugs

This software bug taxonomy is great! Heisenbugs, Bohr Bugs, and the dreaded Schroedinbugs....

Tuesday, April 29, 2008

Ted Nedward takes on the Tower of Babel

Here's another (as usual) fascinating, detailed, doesn't-this-guy-work-for-a-living post from Ted Nedward. This one starts as a meta-critique of Groovy VS Ruby and morphs into an interesting summary of what the Tower Of Babel IT department is using this year.

Money quote:

I wish I could get back to [C++]for a project in the same way that guys fantasize about running into an old high school girlfriend on a business trip.

Personally, my reaction to my old C++ girlfriend would be "TG I didn't get hitched to this chick - she's way too high-maintenance". Although Ted says she's changed...

Friday, April 18, 2008

Who's conspicuously absent from the PaaS fray?

Here's a hint: who was using the slogan "The Network is the Computer" 10 years ago? And who was the first company to deliver a RIA technology?

So why have they been MIA in the PaaS goldrush?

Let's think about this another way. What database are people most likely to run on their slice -o'Linux-in-the-cloud? mySQL perhaps? Which was just bought by...?

The Register has an article
about a possible JavaOne announcement about how this situation might change (with a leaked slide presentation! Fell off the back of an ftp packet, I guess..)

(The weird thing is is that the presentation talks only about PostgreSQL. An old file? Or a different corporate camp? Didn't get the memo maybe?)

Tuesday, April 15, 2008

Is that cloud on the horizon going to start raining applications?

Timothy O'Brien speculates that the transition to cloud-based computing is happening sooner than expected. He's talking about the new integration between Salesforce.com (which is apparently the poster child for SaaS) and Google Apps (the poster child for desktop replacement by the Web). And he generalizes this to include EC2, SimpleDB, and the "twenty or thirty other companies that are going to join the industry".

He also warns here that this transition could transform the model for software development in ways uncomfortable for IT professionals.

He could be right. Cloud computing does seem to be poised to finally provide the right platform to suck the juice out of corporate data centres. The idea of virtual everything certainly has an appeal (especially to someone like me who is basically a software guy).

But questions occur... Salesforce and Google seem like a perfect match - but what about the other companies that want a piece of this action? Does it matter that you will have to commit everything to a given cloud platform? And what happens if that platform goes away? The more advantage you take of the cloud, the bigger the pain when it disappears. And what about apps which are a bit more specific than CRM (which in my naive view seems like just a fancy Contacts list - and hence an obvious and easy thing to integrate with an office suite).

Tim would probably call these kinds of questions "self-interested observations from one with the most to lose". He mentions a Salesforce meeting where business types applaud a sign showing "Software" with a big red slash through it... Well, maybe. Last I noticed no-one has quite managed to automate generating code from requirements documents (let alone automating the generation of implementable requirments documents out of people's heads 8^). So I would say it's more like "different software" than "no software".

One thing's for sure.. there's going to be some gigantic platform turf wars going on up there in the stratosphere.

(One big disappointment - it sounds like the Salesforce platform is based on their proprietary Apex language. Ugh. Just what the world needs - one more language to debate over. At least Google App Engine picked a real language for their launch!)

Saturday, March 22, 2008

Has Automation affected YOUR job?

Can't resist posting this (from Modern Mechanix).

I'm definitely enjoying not having grimy hands, but what happened to the shorter work week and more leisure time?


Wednesday, January 30, 2008

The End of an Architectural Era?

Stonebraker et al have an interesting paper pointing out the antiquity and consequent limitations of the classic relational database architecture in today's world of massive disk/cycles/core.

If they are correct (and in spite of the recent MapReduce blunder Stonebraker has made a lot of great calls in the DB world), the world of data management is going to get awfully interesting in the coming years. The DBA's & DA's of this world have been living a relatively comfortable existence compared to those who are wandering across the stormy badlands of the middle tier. But Stonebraker postulates at least 5 radically variant database architectures to address specific use cases of data management. This would seem to lead to a much more complex world for data architects. But maybe a windfall for the consultants who find their niche?

Monday, November 19, 2007

Enumerating permutations using inversion vectors and factoradic numbering

Permutations are a fundamental concept in discrete mathematics, and crop up in numerous places in computer science. The set of all possible permutations of n items is usually denoted Sn (since the set forms a group which is isomorphic to the Symmetric group of order n). The number of permutations in this set is n!.

This apparently simple concept poses some unexpected challenges. For instance, the following useful operations are messy to compute via the brute-force way of enumerating successive permutations:
  • List all the permutations in Sn (in lexicographic order)
  • Find the i'th permutation in Sn
This would be straightforward if there was a way of effectively enumerating all permutations; i.e. a mapping between Sn and the integers {1, 2, ..., n!}.

It turns out that this can be done using a pair of ingenious concepts known as the inversion vector and factoradic numbering. These can be used to provide an enumeration of Sn, conveniently in lexicographic order. (Inversion vectors are also known as inversion tables and Lehmer codes - the term vector seems to me to be most appropriate for the nature of the structure.)

A permutation can be written as a sequence of numbers

P = (p1 p2 ... pn) where pi is in {1, 2, ..., n}

An inversion is a pair (pi, pj) where i < j but pi > pj. We define di to be the number of inversions in the sequence where the first element is pi:

di = | { pj : i < j , pi > pj } |

Then the inversion vector for the permutation is the sequence

D = (d1, d2, ..., dn) (di is in the range {0, 1, ..., n-i})

It is fairly straightforward to recover the original permutation from the inversion vector. This paper gives a definition of the forward and reverse mapping algorithms.

There are n! possible sequences D. They are in one-to-one relationship with the set of permutations Sn, and their natural order gives a lexicographic ordering of Sn. So we're halfway to the enumeration that we are looking for.

Inversion vectors can in turn be mapped to the integers in {1, ..., n!} by using the concept of factoradic numbering. A factoradic number is a number made up of a sequence of digits where the radix for digit di is (n-i)!. This Wikipedia article explains the concept in detail.

Here's an example:

P = (4 6 2 5 3 1 8 7) is a permutation in S8.

Its inversion vector is L(P) = [3,4,1,2,1,0,1,0].

The factoradic representation of this is

3×7! + 4×6! + 1×5! + 2×4! + 1×3! + 0×2! + 1×1! + 0×0! = 18175

So inversion vectors and factoradic numbering provide an elegant way of enumerating permutations. But what does this have to do with geospatial processing? That will be a subject for another post...

Thursday, September 13, 2007

Quote of the day - C. A. R. Hoare

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.

- C. A. R. Hoare

Tuesday, August 14, 2007

Engines of Logic

It seems only right that I put in a plug for the book Engines of Logic by the other, more famous Martin Davis. And here's an interesting article which reviews the book in particular reference to the occupational health hazards of being a mathematician. It's a good antidote to any regrets you might have about not being real smart.

Monday, August 13, 2007

The Vietnam of Computer Science?

I couldn't resist the analogy of this blog article, even if the author slightly overstates the case. Good summary of the history of the war, too.

Makes me feel like if I haven't served a tour in the Delta, at least I've done time at a forward fire base...

Friday, August 10, 2007

Complex Functions as Algorithmic Art

Hans Lundmark has an interesting page on visualizing complex functions. The idea is to map functions from f : CC using position, hue, and saturation. The domain values of the function provide the (x, y) position. The function output value provides the hue and saturation; the hue is determined by the complex argument (angle), and the saturation is determined by the absolute value.

The output struck me as a nice example of algorithmic art. I whipped up my own version in Java to experiment, and to be able to produce high-resolution images for printing. Here's a couple of my images:

An example function from the paper ( f ( z ) = ( z − 2 )2 ( z + 1 − 2i ) ( z + 2 + 2i ) / z3 )



"Wild Walrus"


Monday, July 9, 2007

Relations - the machine code for data?

In the rushing torrent of computer science, a few concepts stand like rocks which resist being swept away. One of the most familiar and useful is relational database technology. It took a relatively short time for this concept to be identified (in Codd's paper of 1970) and implemented commercially (around 1975, in IBM's System R, among others). Perhaps only 10 or 15 years elapsed between the earliest database systems and the emergence of commercially viable RDBMS . For over 30 years since then RDBMS's have stood the test of time and repelled all challengers to their central role in data management.

A recent and serious challenger was Object database technology. In the 1990's this area looked poised to ride the gathering wave of object-oriented programming and sweep aside the fusty old RDBMS technology. Since then, while the OO wave has swept far up the beach, the promise of ODBMS seems to have been left behind in the receding foam, never having been fully realized.

So why is this?

I think one reason is that the relational paradigm could be a fundamental data structure for information representation. In a sense, relations are the machine code of data modelling - they can be used to represent any required data model. Granted, relational models aren't alway elegant or efficient - but that simply confirms the metaphor. Moreover, their simplicity allows relational theory to be firmly grounded in mathematics and logic. (Of course, the relational hard-cores have been saying this for years - in some cases a bit too vociferously, IMHO. I'm just reluctantly coming round to think that they might be right.)

A further piece of evidence for this observation is that RDBMS vendors have a fine track record of being able to adapt the core paradigm to accomodate new technical ideas. Queuing, security models, data analytics, and of course even (to an extent) object-orientation are some of the areas which have been implemented quite reasonably in terms of an underlying relational model.

ODBMS's, in contrast, have never quite seemed to develop a truly compelling and general paradigm. The various OO query languages don't seem to have the expressive power of SQL, and Object data models seem to run up against thorny semantic issues sooner than relational ones do (such as schema evolution and inheritance modelling). Granted, implementing an object-based data model using relational technology doesn't actually solve any of these issues, but somehow the maturity of relational techniques and tools make them seem less painful.

Some people (myself included) have tended to see the rise of Object-Relational mapping technology as at best a temporary expedient, and at worst a string-and-baling-wire solution to cope with the OO-RDB impedance mismatch. Just give ODB's a few more years to mature, we hope, and we can do away with this inelegant hack! But if relational really is a fundamental concept, perhaps the quest for pure OO databases is misguided. Perhaps O-R mapping tools are the final goal, not just a stop-gap. They can be thought of as compilers for the data access layer (with all the opportunity for performance improvement and platform independence that the compilation paradigm provides).

This isn't to say that RDB technology has evolved as far as it needs to. I hope to see much more powerful querying capability, using concepts from the areas of logic and functional programming, machine reasoning and knowledge representation. All of these seem to be continuing to mature - I look forward to seeing them make the leap to practicality like OO did years ago.