Train times v. house prices: the commuter belt, on a graph

We’re house-hunting. And for me, like most coders, house-hunting involves lots and lots and lots of screen-scraping.

As well as crawling Rightmove listings, I’ve been looking at transport and house-price data. Specifically, I’ve scraped travel times to London by train versus house prices, to examine the theory that houses get much cheaper once you escape the commuter belt.

To test this, I gathered mean journey times to London from Traintimes for every railway station in the UK, and mean asking prices for 3-bed houses near each station from Nestoria. Here’s the graph of all stations, with a moving-average line added:

Waiting for graph to load…

Mouse over the graph to see data for individual stations. Or type a station name to highlight it on the graph:  

Thoughts on the graph

  • The sharp initial drop, up to about 30 minutes, must show just how much extra you pay to live in zone 2 rather than zone 6 of London itself. Yikes.
  • Prices do start dropping more steeply about 70 minutes from London, which probably marks the edge of the commuter belt.
  • Once you get to about 150 minutes, prices flatten. Except…
  • …There’s a distinct “Edinburgh bump” at about 270 minutes from London, which I wasn’t expecting at all.
  • There are a few high outliers, presumably where a mansion has skewed the average price. (It’s difficult to tell from the Nestoria data.)
  • But there’s a striking baseline below which house prices near a station never fall. Actually, pretty much the closest thing to an outlier on the downside is poor old Corby.

About the data

For clarity, the graph excludes London stations, and the long tail of stations that are 400-900 mins from the capital, mostly in the Scottish Highlands.

This is roughly what I did:

  • Find and geocode the 2500+ stations in England, Scotland and Wales, from this Guardian version of Office of Rail Regulation station usage data.
  • For each station, find the mean travel time for the first 5 journeys to London after 8am on a weekday, scraped from TrainTimes, Matthew Somerville’s accessible version of National Rail Enquiries.
  • For each station, find the mean asking price for a 3-bed house within 2km in the past 6 months, from the Nestoria API. (Nestoria shows listing prices, rather than transaction prices like Zoopla, so it may contain duplicates and is probably less accurate – but Zoopla isn’t granular enough to search just for 3-bed houses.)
  • Plot the moving average price, with a frame of 100 datapoints.

This is the code I used (on Github), and the resulting raw data (in Fusion Tables). The next logical step would be to plot distances against house prices, I guess. If I’ve missed anything, let me know.

And with that, back to the screen-scrapers, the mortgage brokers and – God help us – the estate agents.





23 thoughts on “Train times v. house prices: the commuter belt, on a graph

  1. Nice move, you could also scrape London Underground info using the Travel Planner on the TfL gov website – so for example you have Harrow and Chorleywood, but the Met Line covers the same area servicing many more towns (Met Line also publishes a timetable but other tube lines don’t do so). And you’d get bus journeys too !

  2. First I’d like to say that this is very cool.

    Second it’s interesting what it shows, three stops on the same line, Hook, Basingstoke and Overton are quite interesting. Of the three I think locals would say that Overton is the more desirable and it has fast trains to London but it’s the furthest from London. Hook is the closest to London and has some nice outlying houses but isn’t as desirable as Overton and it’s on the slower stopping service to London. Basingstoke is the town in the middle, it has more houses, more trains and lots of fast trains, BUT it’s Basingstoke – locally known as Boringstoke or Basingrad and your data does show that house prices are noticeably down compared with the two villages either side…

  3. That is very useful.

    Stoke-on-Trent comes out quite well down there with Wales and the suburbs of Birmingham, but since you can get 3 bedrooms from 35,000 and Virgin trains are faster at 93 minutes than the slower but very cheap London Midland, I hope we’ll see you in a lovely spacious terrace in bohemian Burslem or pretty Penkhull soon :)

  4. Love it! Very interesting. A further modification (not sure how…) might reflect fares. e.g. I live between Chesterfield and Matlock. Trains from Matlock are much slower, and house prices a bit higher, but train fares are also cheaper from there, and parking at the station is also cheaper in Matlock.

  5. A very impressive demonstration of the value gained from mixing several raw data sources to deliver a valuable end result, and also a demonstration of how house hunting is morphing in to a science ;-)

    I wonder if the Edinburgh bump is an indication that perhaps the distance calculation should be from the nearest large conerbation with a major rail terminal – or will that just give you a London bump at the other end of the graph?

  6. This is very cool. My love of stats knows no bounds. Are these graphs are great

    But I didn’t learn much from this. I have issues with your main idea – London isn’t the center. Only a tiny number of people commute from Scotland to London often. And the housing regulations are different there too – a house often goes for much more than the asking price, unlike here in England.

    I suspect the standard deviation here is big and shows only a fleeting likeness of distance to house price.

    Gareth

  7. Very nice – although I think the “High Speed” line could do with some manual tweakage. From Ashford International, I can get into London within 38 minutes on the High Speed, but if I take the slow train it is 1hr34min (fast trains are twice an hour, slow ones appear once an hour). Could you split “Ashford International – HS1″ and “Ashford International – standard” into two entries (same from Stratford Int, Sandwich and other HS ones).

    Just trying to increase the accuracy: the main reason we moved from Lewisham to Ashford is that we could get into central London faster, but be able to afford somewhere a lot nicer/larger.

  8. Great work wonder if another key factor is train frequency. There is a big difference between a one hour journeys departing every 10 mins and an hourly service.

    I would add half of the mean travel interval to the journey time to allow for this. Might explain some of the bumps.

    1. Hi Simon – sorry about that, Google Fusion Tables had changed their API. I’ve just fixed it, so you should be able to see the graph now.

  9. Shows how much better value SE London is! zone 2/3 places like blackheath, hither green and new cross gate are all below the line.

  10. Beautiful and slightly sad – I’m betting there’s a German word for that. Cost of housing vs cost of season-ticket would be very interesting too, as my bitter experience is that a lot of places with (relatively) affordable house prices is that this is offset by insane season ticket prices, and any saving you make on the mortgage is handed straight to your local neighbourhood train operating company.

  11. Interesting and nicely done, but rather London centric. Once you get further than 2 hours way from London (perhaps less) other factors will play a bigger part in house prices than time to London. The “Edinburgh bump” wouldn’t surprise anyone who knows the city.

  12. You could work out the edge of the commuter belt a bit more accurately if you use trains that get in to London at 8:30, rather than trains that leave the town at 8. If there aren’t any that leave on the same day, then it’s not commutable anyway.

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>