Thursday, 12 April 2018

The world needs a new flavour of SOSS!

Yes, you don't what the acronym SOSS means, because I just made it up. SOSS stands for Standard Open Simple Spatial format.

It's crazy that in the 21st century the most common de facto standard spatial format is based on 30-year old technology, is proprietary, and has silly limitations such as 11-character uppercase attribute names.

I'm talking, of course, about shapefiles.

Surely we can do better than this?!

Now there are actually a few things that shapefiles get right. For instance, the shapefile's simplistic tabular data model gets two full marks for being - simple and tabular! Hierarchical data models are very cool and highly expressive, but overkill and too complex for 80% of the use cases out there.

Another useful feature of shapefiles is that they store floating point data with full precision - i.e. in binary. Representing binary floating point numbers as textual decimal values is inherently lossy, and causes all kinds of subtle and annoying problems. (I'm always surprised that this doesn't crop up more often as a serious limitation of GML.)

So what are the current leading contenders for a SOSS format? Here's an opinionated list, with pros and cons

Format Pro Con
Shapefile tabular, lossless numerics proprietary, antiquated, limited
GML complex to model and parse, lossy numerics, poor schema handling open, flexible
KML proprietary, lossy, limited attribute handling, designed for presentation relatively simple, well documented
GeoRSS not appropriate as a full-featured SOSS, lossy
GeoJSON too tied to Javascript, lossy, no schema standard
YAML needs a spatial profile

Conspicuous by its absence on this list is XML. In fact XML is a meta-format, not a format. To utilize XML would require defining an appropriate profile (which would need to be highly restricted to meet the criteria of simple). The major drawback of XML is that specifying the profile almost inevitably drags one into the mind-bending hell of XML Schema. (There are other schema languages, such as RelaxNG, but they involve similar complexity and have even less traction).

There's also more esoteric formats such as NetCDF, but it fails the simplicity test, and it's unclear how well it supports Geometry types.


Jody Garnett said...

Interested in your take on GeoPackage? Is is being picked up by a lot of applications, even to the point of replacing shape file as the default for QGIS.

Also like shape file it can be "extended", not with sidecar files, but with profiles documenting additional tables to be carried around in the SQLite database. One interesting one that has come up is including styling information for a bit of interoperability across apps...

Dr JTS said...

I hate that it is based on SQLite. Seems as bad as proprietary to me.

Paul Austin said...

My favourites are

TSV with WKT geometries (So simple Excel can open it). Simple schema is the first row.

Plain JSON if you want data with objects, arrays and properties. Replace most uses of XML with this.

Dr JTS said...

Paul, what spatial format would you use with plain JSON?

I think a SOSS ideally provides an (optional, easy-to-read!) schema, including datatypes. So TSV and JSON would need to be augmented with a schema. Could be a sidecar file.

About Hydrology said...

Apache Arrow ?