Thursday 12 April 2018

The world needs a new flavour of SOSS!

Yes, you don't what the acronym SOSS means, because I just made it up. SOSS stands for Standard Open Simple Spatial format.

It's crazy that in the 21st century the most common de facto standard spatial format is based on 30-year old technology, is proprietary, and has silly limitations such as 11-character uppercase attribute names.

I'm talking, of course, about shapefiles.

Surely we can do better than this?!

Now there are actually a few things that shapefiles get right. For instance, the shapefile's simplistic tabular data model gets two full marks for being - simple and tabular! Hierarchical data models are very cool and highly expressive, but overkill and too complex for 80% of the use cases out there.

Another useful feature of shapefiles is that they store floating point data with full precision - i.e. in binary. Representing binary floating point numbers as textual decimal values is inherently lossy, and causes all kinds of subtle and annoying problems. (I'm always surprised that this doesn't crop up more often as a serious limitation of GML.)

So what are the current leading contenders for a SOSS format? Here's an opinionated list, with pros and cons


Format Pro Con
Shapefile tabular, lossless numerics proprietary, antiquated, limited
GML complex to model and parse, lossy numerics, poor schema handling open, flexible
KML proprietary, lossy, limited attribute handling, designed for presentation relatively simple, well documented
GeoRSS not appropriate as a full-featured SOSS, lossy
GeoJSON too tied to Javascript, lossy, no schema standard
YAML needs a spatial profile

Conspicuous by its absence on this list is XML. In fact XML is a meta-format, not a format. To utilize XML would require defining an appropriate profile (which would need to be highly restricted to meet the criteria of simple). The major drawback of XML is that specifying the profile almost inevitably drags one into the mind-bending hell of XML Schema. (There are other schema languages, such as RelaxNG, but they involve similar complexity and have even less traction).

There's also more esoteric formats such as NetCDF, but it fails the simplicity test, and it's unclear how well it supports Geometry types.