We’re house-hunting. And for me, like most coders, house-hunting involves lots and lots and lots of screen-scraping.
As well as crawling Rightmove listings, I’ve been looking at transport and house-price data. Specifically, I’ve scraped travel times to London by train versus house prices, to examine the theory that houses get much cheaper once you escape the commuter belt.
To test this, I gathered mean journey times to London from Traintimes for every railway station in the UK, and mean asking prices for 3-bed houses near each station from Nestoria. Here’s the graph of all stations, with a moving-average line added:
Mouse over the graph to see data for individual stations. Or type a station name to highlight it on the graph:
Thoughts on the graph
- The sharp initial drop, up to about 30 minutes, must show just how much extra you pay to live in zone 2 rather than zone 6 of London itself. Yikes.
- Prices do start dropping more steeply about 70 minutes from London, which probably marks the edge of the commuter belt.
- Once you get to about 150 minutes, prices flatten. Except…
- …There’s a distinct “Edinburgh bump” at about 270 minutes from London, which I wasn’t expecting at all.
- There are a few high outliers, presumably where a mansion has skewed the average price. (It’s difficult to tell from the Nestoria data.)
- But there’s a striking baseline below which house prices near a station never fall. Actually, pretty much the closest thing to an outlier on the downside is poor old Corby.
About the data
For clarity, the graph excludes London stations, and the long tail of stations that are 400-900 mins from the capital, mostly in the Scottish Highlands.
This is roughly what I did:
- Find and geocode the 2500+ stations in England, Scotland and Wales, from this Guardian version of Office of Rail Regulation station usage data.
- For each station, find the mean travel time for the first 5 journeys to London after 8am on a weekday, scraped from TrainTimes, Matthew Somerville’s accessible version of National Rail Enquiries.
- For each station, find the mean asking price for a 3-bed house within 2km in the past 6 months, from the Nestoria API. (Nestoria shows listing prices, rather than transaction prices like Zoopla, so it may contain duplicates and is probably less accurate – but Zoopla isn’t granular enough to search just for 3-bed houses.)
- Plot the moving average price, with a frame of 100 datapoints.
This is the code I used (on Github), and the resulting raw data (in Fusion Tables). The next logical step would be to plot distances against house prices, I guess. If I’ve missed anything, let me know.
And with that, back to the screen-scrapers, the mortgage brokers and – God help us – the estate agents.