It's crazy that in the 21st century the most common de facto standard spatial format is based on 30-year old technology, is proprietary, and has silly limitations such as 11-character uppercase attribute names.
I'm talking, of course, about shapefiles.
Surely we can do better than this?!
Now there are actually a few things that shapefiles get right. For instance, the shapefile's simplistic tabular data model gets two full marks for being - simple and tabular! Hierarchical data models are very cool and highly expressive, but overkill and too complex for 80% of the use cases out there.
Another useful feature of shapefiles is that they store floating point data with full precision - i.e. in binary. Representing binary floating point numbers as textual decimal values is inherently lossy, and causes all kinds of subtle and annoying problems. (I'm always surprised that this doesn't crop up more often as a serious limitation of GML.)
So what are the current leading contenders for a SOSS format? Here's an opinionated list, with pros and cons
Conspicuous by its absence on this list is XML. In fact XML is a meta-format, not a format. To utilize XML would require defining an appropriate profile (which would need to be highly restricted to meet the criteria of simple). The major drawback of XML is that specifying the profile almost inevitably drags one into the mind-bending hell of XML Schema. (There are other schema languages, such as RelaxNG, but they involve similar complexity and have even less traction).
Format | Pro | Con |
Shapefile | tabular, lossless numerics | proprietary, antiquated, limited |
GML | complex to model and parse, lossy numerics, poor schema handling | open, flexible |
KML | proprietary, lossy, limited attribute handling, designed for presentation | relatively simple, well documented |
GeoRSS | not appropriate as a full-featured SOSS, lossy | |
GeoJSON | too tied to Javascript, lossy, no schema standard | |
YAML | needs a spatial profile |
Conspicuous by its absence on this list is XML. In fact XML is a meta-format, not a format. To utilize XML would require defining an appropriate profile (which would need to be highly restricted to meet the criteria of simple). The major drawback of XML is that specifying the profile almost inevitably drags one into the mind-bending hell of XML Schema. (There are other schema languages, such as RelaxNG, but they involve similar complexity and have even less traction).
There's also more esoteric formats such as NetCDF, but it fails the simplicity test, and it's unclear how well it supports Geometry types.
Interested in your take on GeoPackage? Is is being picked up by a lot of applications, even to the point of replacing shape file as the default for QGIS.
ReplyDeleteAlso like shape file it can be "extended", not with sidecar files, but with profiles documenting additional tables to be carried around in the SQLite database. One interesting one that has come up is including styling information for a bit of interoperability across apps...
I hate that it is based on SQLite. Seems as bad as proprietary to me.
ReplyDeleteMy favourites are
ReplyDeleteTSV with WKT geometries (So simple Excel can open it). Simple schema is the first row.
Plain JSON if you want data with objects, arrays and properties. Replace most uses of XML with this.
Paul, what spatial format would you use with plain JSON?
ReplyDeleteI think a SOSS ideally provides an (optional, easy-to-read!) schema, including datatypes. So TSV and JSON would need to be augmented with a schema. Could be a sidecar file.
Apache Arrow ?
ReplyDelete