Bikeshare

London’s Cycle Hire scheme has been a roaring success and continues to grow, with new stations being added all the time. This tutorial will produce a visualisation of journey times from the central point (well, approximately) of the bike station network to all other stations.

This is made possible by the provision of an open-access instance of OSRM by the lovely people at Mapzen. I won’t spend too much time on what OSRM is or how it works; suffice to say that it’s an open-source routing engine that uses OpenStreetmap, and that the Mapzen instance provides walking, cycling, and public transit routing data via HTTP. Hurrah!

The code for this tutorial is available here as an IPython Notebook

Package installation

This tutorial uses Python 2.7.x, and the following non-stdlib packages are required:

  • IPython
  • Pandas
  • Numpy
  • Matplotlib
  • Basemap
  • Shapely
  • Fiona
  • Descartes
  • Requests

(The following is a cut-and-paste from a previous article – it all still applies) The installation of some of these packages can be onerous, and requires a great number of third-party dependencies (GDAL & OGR, C & FORTRAN77 (yes, really) compilers). If you’re experienced with Python package installation and building software from source, feel free to install these dependencies (if you’re using OSX, Homebrew and/or Kyngchaos are helpful, particularly for GDAL & OGR), before installing the required packages in a virtualenv, and skipping the rest of this section.

For everyone else: Enthought’s Canopy (which is free for academic users) provides almost everything you need, with the exception of Descartes and PySAL. You can install them into the Canopy User Python quite easily, see this support article for details.

Obtaining a basemap

We’re going to be working with basemaps from Esri Shapefiles, and we’re going to plot data on a map of London. I’ve created a shapefile for this, and it’s available in .zip format here, under Crown Copyright. Download it, and extract the files into a directory named data, under your main project directory. You’ll also need a basemap of the Thames. Get it here, also under Crown Copyright, and put in the data folder.

Obtaining some data

We’re going to need point data relating to the bike rental station locations. This is available from TfL to registered developers, provided as XML. I’ve made the November dataset available here. Save it to the data folder.

Package imports

from __future__ import unicode_literals
import requests
import numpy as np
import pandas as pd
from mpl_toolkits.basemap import Basemap
import matplotlib.cm as cm
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
from matplotlib.colors import Normalize
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
from shapely.geometry import Polygon, LineString, MultiPoint
from shapely.ops import unary_union
from descartes import PolygonPatch
import math
import fiona
from itertools import chain
from lxml import etree
mpl.rcParams['axes.grid'] = False

%matplotlib inline

Creating Basemap instances

We’re going to create two Basemap instances for plotting: one of London (using the GLA boundary), and one of the Thames. First, we’re going to open our London shapefile, and get some data out of it, in order to set up our main basemap:

shp = fiona.open('data/london_wards.shp')
crs_data = shp.crs
bds = shp.bounds
shp.close()
extra = 0.01
ll = (bds[0], bds[1])
ur = (bds[2], bds[3])
coords = list(chain(ll, ur))
w, h = coords[2] - coords[0], coords[3] - coords[1]

We’ve done two things here:

  1. extracted the map boundaries
  2. Calculated the extent, width and height of our basemap

We’re ready to create a Basemap instance, which we can use to plot our maps on.

m = Basemap(
    projection='tmerc',
    lon_0 = -2.,
    lat_0 = 49.,
    ellps = 'WGS84',
    llcrnrlon=coords[0] - extra * w,
    llcrnrlat=coords[1] - extra + 0.01 * h,
    urcrnrlon=coords[2] + extra * w,
    urcrnrlat=coords[3] + extra + 0.01 * h,
    lat_ts=0,
    resolution='i',
    suppress_ticks=True)

m.readshapefile(
    'data/london_wards',
    'london',
    color='none',
    zorder=2)

# cascade ward polygons together
london_poly = unary_union([Polygon(xy) for xy in m.london])

# plot thames
thames = m.readshapefile(
    'data/thames_wgs84',
    'thames',
    color='none',
    zorder=4)

I’ve chosen the transverse mercator projection, because it exhibits less distortion over areas with a small east-west extent. This projection requires us to specify a central longitude and latitude, which I’ve set as -2, 49. Note that I’ve also created a Thames basemap, because we’re going to need the polygons in it later.

Some Utility Functions

Next, we’re going to define a function which retrieves journey data from the OSRM instance.

def query_travel_time(start, end, method):
    """
    Get a travel time back from MapZen's OSRM
    start, end: lon, lat tuples
    method: foot, car, bicycle
    returns travel time, in seconds
    TODO: bounds checking for coords
    """
    allowed = ('foot', 'car', 'bicycle')
    if method not in allowed:
        raise Exception(
            "Unknown method. Must be one of %s. Christ." % ', '.join(allowed))
    endpoint = 'http://osrm.mapzen.com'
    method = '/{m}/viaroute'.format(m=method)
    # should be properly encoding second loc, but dict keys are unique!
    # reverse lon, lat because ugh    
    params = {'loc': '{1},{0}&loc={3},{2}'.format(*chain(start, end))}
    req = requests.get(endpoint + method, params=params)
    try:
        req.raise_for_status()
    except requests.exceptions.HTTPError:
        return np.nan
    if req.json()['status'] == 207:
        return np.nan
    return req.json()['route_summary']['total_time']

Instead of just falling over, this function will return numpy NaN values for missing data. This is a bit easier to work with in Pandas.

We’re also going to define a simple utility function for projecting non-zero-length linestrings into Basemap map projection coordinates:

def project_linestring(ls, inverse=False):
    """ return a linestring projected into map coordinates """
    if not pd.isnull(ls):
        return LineString(zip(*m(*zip(*ls.coords))))
    else:
        return np.nan

Processing Station Data

Now, we want to create a Pandas DataFrame which holds our station location data as floats.

# parse XML into dict
tree = etree.parse("data/bike_stations.xml")
root = tree.getroot()

output = dict()
output['name'] = []
output['lon'] = []
output['lat'] = []

for each in root.xpath('station'):
    output['name'].append(each[1].text)
    output['lon'].append(each[4].text)
    output['lat'].append(each[3].text)

stations = pd.DataFrame(output)
stations[['lon', 'lat']] = stations[['lon', 'lat']].astype(float)

We can now easily work out the centroid!

centroid = zip(*m(*zip(*MultiPoint(stations[['lon', 'lat']].values).convex_hull.centroid.coords)))[0]
centroid
(27087.88141094109, 25298.94044237616)

This gives us a problem, though: this centroid is located in the Thames. Let’s shift it a few hundred feet down-river, onto the approximate middle of Westminster Bridge

centroid = m(-0.12204, 51.50083)
m(*centroid, inverse=True)
(-0.12203999999842599, 51.500829999995766))

We’re now going to project our lon/lat point values into Basemap’s map projection coordinates

# project lon / lat coords
stations['projected_lon'], stations['projected_lat'] = m(*(stations["lon"].values, stations["lat"].values))

We now have enough data to perform some elementary calculations. For instance, we can calculate each station’s distance from our fake centroid:

# calculate station distance from centroid using Pythagorean Theorem
stations['centroid_distance'] = stations.apply(
    lambda x: math.sqrt(((abs(centroid[0] - x['projected_lon']) ** 2) +
                         (abs(centroid[1] - x['projected_lat']) ** 2))), axis=1)

Retrieving OSRM Data

Here, we’re defining a simple function which we’re going to apply to our DataFrame. It simply calls our OSRM-retrieval function (defined earlier) with the specified column value and travel mode for each row in our DataFrame. Because OSRM expects lon/lat values, we have convert them back from projected coordinates using the inverse=True flag. Finally, we divided the retrieved times by 60, because travel times in minutes are more easily understood.

def travel_time(df, start):
    """ return travel times between a given centroid and all stations in the network """
    return query_travel_time(start, (df['lon'], df['lat']), 'bicycle')


stations['travel_time'] = stations.apply(travel_time, args=(m(*centroid, inverse=True),), axis=1)
# travel time in minutes is more useful
stations['travel_time'] /=  60.

Bear in mind that Mapzen are providing this OSRM instance free of charge, and that this is quite a substantial operation, involving several hundred queries.

Thus, I’ve provided a CSV of retrieved data, which you can use instead. Save it to the data folder, and run this line of code instead of the previous cell:

stations = pd.read_csv('data/stations_travel_time.csv', index_col=0)

Let’s do a little post-processing

# remove any empty values
stations = stations.dropna()
# replace travel time of < 1 with 1. minutes
stations.loc[stations['travel_time'] <= 1., 'travel_time'] = 1.

# this station is closest to the centroid
stations.iloc[stations['centroid_distance'].idxmin()]

Plotting the Data

There’s no point in pretending that precise plotting using Matplotlib is easy. Luckily, I’ve done it for you.

What we’re doing here is conceptually simple: We’re plotting a histogram of the journey times, and then imposing a colour map onto that histogram. The colourmap is normalised by the journey time, using a diverging colour map (meaning intermediate values have a neutral colour; low values are cool, high values are hot).
We then plot our London shapefile on an inset axis, before making a scatter plot of the bike station locations on this inset map. We then impose the same colour map defined earlier upon the scattered points.
Finally, we perform a set-theoretic intersection on the map with our river polygon, in order to make the Thames properly transparent.

plt.clf()
fig = plt.figure(figsize=((16, 12)))
ax = fig.add_subplot(111, axisbg='w', frame_on=False)

# define a colourmap
cm = plt.cm.get_cmap('RdBu_r')
# plot a histogram
n, bins, patches = ax.hist(
    stations['travel_time'],
    bins=15, orientation='horizontal',
    color='#555555',
    ec='#555555',
    lw=1.)

# manually impose the colour map onto the histogram
bin_centers = 0.5 * (bins[:-1] + bins[1:])
normed = Normalize()
for c, p in zip(normed(bin_centers), patches):
    plt.setp(p, 'facecolor', cm(c))

# fix ticks and grids
axh = plt.gca()
axh.grid(b=None)
xticks = ax.xaxis.get_major_ticks()
for tick in xticks:
    tick.label.set_fontsize(12) 
xticks[0].label1.set_visible(False)
yticks = ax.yaxis.get_major_ticks()
for tick in yticks:
    tick.label.set_fontsize(12) 
ax.grid(b=None)

ax.set_xlabel('Number of Stations', size=16)
ax.set_ylabel('Travel Time (minutes)', size=16)


plt.title("Travel Time from Westminster Bridge", size=18)
# we need to reproduce the OS Data copyright licence
smallprint = ax.text(
    1.03, 0,
    u'Contains Ordnance Survey Data\n© Crown copyright and database right 2015',
    ha='right', va='bottom',
    size=8,
    color='#555555',
    transform=ax.transAxes,
)
# map inset
# width = % of parent_bbox
axins = inset_axes(
    ax,
    width="45%",
    height=3.5,
    loc=1)
axins.axis[:].toggle(ticklabels=False)
axins.set_axis_off()
m.scatter(
    stations['projected_lon'],
    stations['projected_lat'],
    c=stations['travel_time'],
    color='#FF0080',
    edgecolor='none',
    alpha=.8,
    s=2.5,
    cmap='RdBu_r',
    ax=axins,
    vmin=stations['travel_time'].min(), vmax=stations['travel_time'].max(), zorder=5)

# exaggerate size of the centroid so it's easily visible
m.scatter(
    *centroid,
    s=25., edgecolor='#000000', alpha=1.,
    color='#63B8FF', zorder=5, ax=axins)

# this makes the Thames transparent 
lp = PatchCollection(
    [PolygonPatch(poly) for poly in
        london_poly.difference(Polygon(m.thames[0]).buffer(0))],
    color='#555555', lw=.0001, alpha=1., zorder=3, match_original=False)

axins.add_collection(lp)
plt.savefig(
    "data/inset_demo.png",
    format="png",
    bbox_inches='tight',
    transparent=True,
    alpha=True, dpi=300)

plt.show()