Friday, July 9, 2010

Is JSON the CSV of the 21st Century?

It strikes me that JSON might be the CSV of the 21st century. Consider these similarities:
  • They both use POT (plain old text) as their encoding
  • The basic datatypes are strings and numbers. JSON adds booleans and nulls - Yay for progress!
  • The only schema metadata supported is field names
JSON has the major advance of supporting hierarchical and array structures. Of course, this makes it correspondingly more difficult to parse.

CSV has stood the test of time extraordinarily well. According to good 'ol Wikipedia it's been around since at least 1967 - that's over 40 years!

And CSV is still well supported whereever tabular data is used. Let's see if JSON is still around in 2035... I suspect not, because the half-life of technologies is a lot shorter these days. (Maybe CSV is the stromatolite of file formats!)

It would be nice if JSON had a standard schema notation. (There is JSON-Schema. It copies the XML Schema idea of encoding the schema in JSON. It remains to be seen whether this makes it as easy to use and popular as XML Schema has been. 8^)

And why, o why, do field names have to be in quotes? Ok, I know why technically - they're just strings, and JSON strings need to be in quotes, because they need to be embeddable in JavaScript code. But this is a classic case of a vestigial artifact which has a detrimental effect in a new environment. Nobody should be evaling JSON as just another chunk of Javascript anyway, for obvious security reasons. And in the wider world of JSON use cases following Javascript synax is completely irrelevant.

YAML seems to have a lot advantages over JSON as a rich textual format. For instance, it has minimal use of quotes, and a richer, extensible set of datatypes including timestamps and binary (WKB, anyone?). But it's going to be pretty hard to dislodge JSON, which is solidly entrenched for all the wrong reasons.


sgillies said...

I disagree that JSON profits from schemas. Use XML instead for those applications. Use JSON where you can accept tight client-server coupling and treat it as dicts (Python) on the wire. Like you said: a more expressive take on CSV, one that happens to share some syntax with Javascript (and Python, and others).

Dr JTS said...

That's a good perspective, Sean.

But does this preclude defining a scheme formalism for JSON for situations where it would be useful? That way the user can choose whether he wants to take the low road or high road for JSON use.

Also, it seems to me that JSON is being used in many situations where there is NOT tight coupling - and they would hugely benefit from having a formal way of describing JSON schemas. (I guess I might say the same thing about the wider RESTful Web Service world in general...)

From an SDLC perspective, the only situation I can see which doesn't *necessarily* require a formal specification is where a single person is defining both the client and server endpoints (and over a short time span!). That's pretty tight coupling indeed...

But perhaps I'm just a paleoprogrammer... 8^)

Dr JTS said...

Or should that be a paleo-protocol-programmer? 8^)

Mike Malone said...

I've had good experiences running JSON through a light compression algorithm. The data density is close enough to a binary representation for a lot of use cases and you get better separation of concerns. I really like the idea of human readable data formats. But they're certainly not free, and probably not right for all use cases. YAML has its own peculiarities. Like CSV, JSON has the major advantage of being stupidly simple.