We’re house-hunting. And for me, like most coders, house-hunting involves lots and lots and lots of screen-scraping.
As well as crawling Rightmove listings, I’ve been looking at transport and house-price data. Specifically, I’ve scraped travel times to London by train versus house prices, to examine the theory that houses get much cheaper once you escape the commuter belt.
To test this, I gathered mean journey times to London from Traintimes for every railway station in the UK, and mean asking prices for 3-bed houses near each station from Nestoria. Here’s the graph of all stations, with a moving-average line added:
Mouse over the graph to see data for individual stations. Or type a station name to highlight it on the graph:
Thoughts on the graph
- The sharp initial drop, up to about 30 minutes, must show just how much extra you pay to live in zone 2 rather than zone 6 of London itself. Yikes.
- Prices do start dropping more steeply about 70 minutes from London, which probably marks the edge of the commuter belt.
- Once you get to about 150 minutes, prices flatten. Except…
- …There’s a distinct “Edinburgh bump” at about 270 minutes from London, which I wasn’t expecting at all.
- There are a few high outliers, presumably where a mansion has skewed the average price. (It’s difficult to tell from the Nestoria data.)
- But there’s a striking baseline below which house prices near a station never fall. Actually, pretty much the closest thing to an outlier on the downside is poor old Corby.
About the data
For clarity, the graph excludes London stations, and the long tail of stations that are 400-900 mins from the capital, mostly in the Scottish Highlands.
This is roughly what I did:
- Find and geocode the 2500+ stations in England, Scotland and Wales, from this Guardian version of Office of Rail Regulation station usage data.
- For each station, find the mean travel time for the first 5 journeys to London after 8am on a weekday, scraped from TrainTimes, Matthew Somerville’s accessible version of National Rail Enquiries.
- For each station, find the mean asking price for a 3-bed house within 2km in the past 6 months, from the Nestoria API. (Nestoria shows listing prices, rather than transaction prices like Zoopla, so it may contain duplicates and is probably less accurate – but Zoopla isn’t granular enough to search just for 3-bed houses.)
- Plot the moving average price, with a frame of 100 datapoints.
This is the code I used (on Github), and the resulting raw data (in Fusion Tables). The next logical step would be to plot distances against house prices, I guess. If I’ve missed anything, let me know.
And with that, back to the screen-scrapers, the mortgage brokers and – God help us – the estate agents.
Very interesting graph.
Are you able to calculate the Spearman rank correlation coefficient on this data?
Nice move, you could also scrape London Underground info using the Travel Planner on the TfL gov website – so for example you have Harrow and Chorleywood, but the Met Line covers the same area servicing many more towns (Met Line also publishes a timetable but other tube lines don’t do so). And you’d get bus journeys too !
First I’d like to say that this is very cool.
Second it’s interesting what it shows, three stops on the same line, Hook, Basingstoke and Overton are quite interesting. Of the three I think locals would say that Overton is the more desirable and it has fast trains to London but it’s the furthest from London. Hook is the closest to London and has some nice outlying houses but isn’t as desirable as Overton and it’s on the slower stopping service to London. Basingstoke is the town in the middle, it has more houses, more trains and lots of fast trains, BUT it’s Basingstoke – locally known as Boringstoke or Basingrad and your data does show that house prices are noticeably down compared with the two villages either side…
Great site. What about adding Winchester to your list? It’s got a bit of everything- beautiful ancient city, stunning cathedral, loads going on, great schools and only a 55 minutes commute from Waterloo? I’m selling my double sized house here near the station for family reasons. It’s on with Charters. Sorry to advertise on your site but it might work for someone who is looking for a large flexible space convenient family home who needs to work in London. Good luck everyone, Hilary
See also: Mapumental. http://demo.mapumental.com/
That is very useful.
Stoke-on-Trent comes out quite well down there with Wales and the suburbs of Birmingham, but since you can get 3 bedrooms from 35,000 and Virgin trains are faster at 93 minutes than the slower but very cheap London Midland, I hope we’ll see you in a lovely spacious terrace in bohemian Burslem or pretty Penkhull soon 🙂
Nice but I don’t think People In edinburgh commute to london, It Had Utas Own economy
I particularly liked the 88k house at Duddeston, for a 109-minute journey to the capital. Would love to see the marketing blurb to frustrated metropoles…
Love it! Very interesting. A further modification (not sure how…) might reflect fares. e.g. I live between Chesterfield and Matlock. Trains from Matlock are much slower, and house prices a bit higher, but train fares are also cheaper from there, and parking at the station is also cheaper in Matlock.
A very impressive demonstration of the value gained from mixing several raw data sources to deliver a valuable end result, and also a demonstration of how house hunting is morphing in to a science 😉
I wonder if the Edinburgh bump is an indication that perhaps the distance calculation should be from the nearest large conerbation with a major rail terminal – or will that just give you a London bump at the other end of the graph?
The Edinburgh bump also calibrates with very available budget airlines ( tho not fares) which ime makes commuting by week not by day, feasible. Edinburgh has its own little academic air bubble because of this – ppl sleeping in student accommodation for the week and going home at weekend to Brighton or Berlin. But that’s rather beyond your data. And yrs, like everyone has said, very very cool. Don’t show to house developers.
If you’ve ever been at Edinburgh Airport at 5am on a Monday, you’ll know there is a significant number of affluent (weekly, many finance and IT sector) travellers. There are numerous reasons why top end house prices there are high, but long distance commuting could conceivably be one.
[…] for a while now (mainly because of her work on the Open Domesday Project), but I somehow missed her fantastic life-hack linking house prices with distance from London. On one hand, it shows what coders do to solve […]
This is very cool. My love of stats knows no bounds. Are these graphs are great
But I didn’t learn much from this. I have issues with your main idea – London isn’t the center. Only a tiny number of people commute from Scotland to London often. And the housing regulations are different there too – a house often goes for much more than the asking price, unlike here in England.
I suspect the standard deviation here is big and shows only a fleeting likeness of distance to house price.
Anna this is awesome!
Very nice – although I think the “High Speed” line could do with some manual tweakage. From Ashford International, I can get into London within 38 minutes on the High Speed, but if I take the slow train it is 1hr34min (fast trains are twice an hour, slow ones appear once an hour). Could you split “Ashford International – HS1” and “Ashford International – standard” into two entries (same from Stratford Int, Sandwich and other HS ones).
Just trying to increase the accuracy: the main reason we moved from Lewisham to Ashford is that we could get into central London faster, but be able to afford somewhere a lot nicer/larger.
[…] at darkgreener.com has a data visualization looking at UK real estate prices and distance (in minutes travelled) from […]
Great work wonder if another key factor is train frequency. There is a big difference between a one hour journeys departing every 10 mins and an hourly service.
I would add half of the mean travel interval to the journey time to allow for this. Might explain some of the bumps.
[…] Once you get more than 90 minutes outside London, house prices flatten. Have a look at this wonderful graph. […]
Great work, leading to a nice visual.
I make Luton the go-to place for (quickest journey with cheapest cost of living).
Can’t get this to work, just says ‘waiting for graph to load’. Any idea why?
Hi Simon – sorry about that, Google Fusion Tables had changed their API. I’ve just fixed it, so you should be able to see the graph now.
This is very cool. In addition to Mapumental mentioned above, there is also:
Does a good job!
Shows how much better value SE London is! zone 2/3 places like blackheath, hither green and new cross gate are all below the line.
Beautiful and slightly sad – I’m betting there’s a German word for that. Cost of housing vs cost of season-ticket would be very interesting too, as my bitter experience is that a lot of places with (relatively) affordable house prices is that this is offset by insane season ticket prices, and any saving you make on the mortgage is handed straight to your local neighbourhood train operating company.
Interesting and nicely done, but rather London centric. Once you get further than 2 hours way from London (perhaps less) other factors will play a bigger part in house prices than time to London. The “Edinburgh bump” wouldn’t surprise anyone who knows the city.
You could work out the edge of the commuter belt a bit more accurately if you use trains that get in to London at 8:30, rather than trains that leave the town at 8. If there aren’t any that leave on the same day, then it’s not commutable anyway.
[…] Train times vs House Prices. by Anna Powell-Smith. Turning data into information. Where is the best place to buy to minimise commute and maximise value? […]
Very cool. I recently bought a place in Croydon and whilst it rates well on the graph for commute time vs price I wish I had factored the cost of commuting into my calculations. My partner and I are paying £400 a month on travelcards which if we had lived closer some of which could have been going towards a more expensive mortgage.
I also made this Chrome extension while I was hunting to add commute times to Rightmove
Commute Time for Rightmove Chrome Extension. Hope it helps some others 🙂
If you’re concerned about outliers in price, you may want to ignore the highest and lowest 10% in each area, and take the mean of the middle 80%.
I’m not sure whether you’re data set is big enough to make this viable, though….
great work. Are there consultants for choosing a place to live as the research data above?