Friday, 7 October 2011

Extracting GTFS data using JEQL

Once you understand the General Transit Feed Specification (GTFS) data model, the next step is to transform the data into a format usable in GIS tools.  As usual, that means shapefiles (sigh...).  Here's a JEQL script that extracts the geometry of all unique trips (aka "route patterns"), as well as the stop locations.

Some interesting aspects of the script:
  • the GTFS spatial data is in geographic coordinates (lat/long). The script projects this into BC-Albers (EPGS code 3005) using the CRS.project() function.  Many other coordinate systems could be used (or none - e.g. for output to KML)
  • the geomConnect()aggregate function connects Points into LineStrings

Extracts Trips (with shapes) and stops from from a GTFS dataset. 
Includes shape, trip and route information.
Spatial data is converted from Geographic to BC-ALbers

Usage Notes:
- indir: directory containing the input GTFS files

indir = "";  outdir = "";

//  Extract Trips with unique linework, with route information
CSVReader tshape hasColNames: file: indir+"shapes.txt";
CSVReader ttrip hasColNames: file: indir+"trips.txt";
CSVReader troutes hasColNames: file: indir+"routes.txt";  
Mem troutes;

crs_BCAlbers = "EPSG:3005";

//--- extract unique shapes
trips = select distinct route_id, trip_headsign, shape_id 
    from ttrip;

//--- extract trip pts and order them in sequence
tpt = select shape_id, Val.toInt(shape_pt_sequence) seq,
                                 Val.toDouble(shape_pt_lat) ) pt
	from tshape
	order by shape_id, seq;

//--- connect pts into lines and project
tline = select shape_id, 
     CRS.project(Geom.toMulti(geomConnect(pt)), crs_BCAlbers) geom
	from tpt
	group by shape_id;
//--- join route info to trip shape
trouteLines = select r.route_id, r.agency_id, 
    r.route_short_name, r.route_long_name,
    r.route_type, r.route_url,
    trip_headsign, l.*
        from tline l
        join trips t on t.shape_id == l.shape_id
        join troutes r on r.route_id == t.route_id;

ShapefileWriter trouteLines file: outdir+"gtfs_trips.shp";

//  Extract Stop points
CSVReader tstopsRaw hasColNames: file: indir+"stops.txt";

tstop = select stop_id,stop_code,stop_name,stop_desc,
	    Val.toDouble(stop_lat) ), crs_BCAlbers) pt,
        from tstopsRaw;

ShapefileWriter tstop file: outdir+"gtfs_stops.shp";

Using this script on the GTFS data for Metro Vancouver (available from TransLink here) produces a dataset that looks like this:


1 comment:

Matt C said...

Another option is to use GTFSDB:

It can load directly to a PostGIS database if you enable the spatial features.