Sunday, 23 April 2017

Getting ready for #GE2017 - a big shapefile

I'm probably as unmoved as anyone else about the forthcoming General Election, but to get my head back into gear for it I thought I'd try to put together a full UK constituency shapefile of all 650 constituency results from the 2015 General Election, using data from a variety of sources. I'm sharing it here in the hope that people will find it useful, and that it might save you some work. If you spot an error, let me know and I'll try to fix it. There are other shapfiles out there, but to my knowledge there isn't a detailed complete UK (as opposed to GB) file that has all results, MPs and so on. I'm also sharing this here in the hope that we can move away from hex maps. I think they are nice and useful in many cases but I'd like to see a move back to the standard geographic representation in this election - hence, I am trying to promote Hexit. Anyway, here's an obligatory geogif I made with the file, using the 'time results declared' field.

The 2015 General Election in 30 seconds - phew

So, what's in the file? Well, I've tried to include a lot of stuff, sourced variously from the British Election Study, from the UK Parliament Data website, the Census and the devolved administrations of the UK. I have also calculated some variables myself, such as constituency area and the order in which results were declared. Key variables include:

  • PCONCODE - this is the ONS code for each constituency. It makes it possible to join lots of other data to the file. 
  • REGN - name of the sub-UK region each constituency is in - i.e. the old Government Office Regions in England, plus Northern Ireland, Scotland and Wales.
  • SECOND - which party came second in a constituency in 2015.
  • ELECT15 - the number of people in the electorate in 2015.
  • MAJ - size of the majority for the sitting MP.
  • TIME - time the results were declared. I seem not to have done this in 24H format, but you can see from the ORDER2015 field which order they are actually in.
  • MPFIRST, MPLAST, MPNAME - the fist, last and full name of each MP.
  • Winner15 - this contains the full party name of the winning party. The WINNER field contains the abbreviated party name.
  • POP2015 - this contains the mid-year population estimate for each constituency for 2015. I also added in the 18+ population, since it makes a bit more sense to do this, even though it is not the same as the electorate figure. 
  • Others - they should be self-explanatory but the list of Sources below will help if you are confused by any of these.

I hope you find this useful. If you want to download it, it can be accessed here. If you spot any glaring errors, please let me know. Who is going to win the 2017 General Election? My only prediction is that there will be lots of interesting maps and that the patterns on them may look a bit different.

Data notes: I have added a QGIS qml style file to the zipped data folder. This means that if you add the shapefile to QGIS it will display in the familiar colours of each political party. This happens because the qml file has the same name as the shapefile. The colours are matched from the BBC election results page from 2015. I tried very hard to ensure complete UK coverage, so I have patched data together from multiple UK sources but in a few cases I don't have variables for Northern Ireland. This is because the spreadsheet from the British Election Study I sourced some data from covers only GB. 

Sources: General Election 2015 results, from the UK Parliament Data pages. The British Election Study updated Excel file. Northern Ireland constituency boundaries were sourced from OpenDataNI, via their resources page. For Great Britain, I used the constituency boundaries available on the ONS Geography Portal pages - the 2016 boundaries. For the most recent mid-year population estimates, I used data from the National Records of Scotland, NISRA data for Northern Ireland mid-year population estimates and ONS mid-year population estimates for England and Wales. The map data contains OS data © Crown copyright and database right 2017. Similarly, the other data contains National Statistics data © Crown copyright and database right 2017.

Acknowledgements: I would like to thank Ian Turton for suggesting the little QGIS Atlas function tweak which enables the cumulative animation you see above. For more on this, see the related Stack Exchange post where I asked the question.

Friday, 31 March 2017

Visualising a lot

This post is about visualising 'a lot', because it's something I've been thinking about as I write part of a book on GIS. The basic idea I'm exploring here is that when you have a dataset and want to somehow simply visualise 'a lot' - e.g. because the volume of data seems overwhelming - then there are different ways to approach it. For example, if you had millions of points on a map, you could use a hex-binning technique to give a standardised per-area figure, or you could do some kind of visual aggregation or summary in chart form. Or, to convey 'a lot' as a kind of visual device, you could perhaps just do a visual data dump, as I did in this example. Today's 'lot' is from the Gun Violence Archive dataset for the United States in 2015, compiled and released by The Guardian and collated by the Gun Violence Archive. I opted for a fast animation to visualise 'a lot', which I have now updated with a running total (in yellow). Let's go straight to the gif now, showing all gun homicides, one frame per day, for 2015 (and fast - 10 frames per second).

It's supposed to be overwhelming - click it for full size

When I looked at the original dataset at first, which includes, more than 13,000 gun deaths, my immediate thought was 'that's a lot'. All things are relative, of course, but in a global context it's hard to argue against this, particularly when you compare the data to other developed nations. The dataset has precise lat/long details for each incident and also the date and number killed and injured. I then summarised the data by day, plotted the locations as single points and then created 365 frames for this animated gif. It's not supposed to be readable at the micro scale of individual days or incidents, because I wanted to focus attention on the volume of data. A video version that you can pause or play more slowly is embedded below. I also did a slightly slower animated gif, at 5 frames per second, which of course is still somewhat overwhelming, shown below. Update: I have also added a cumulative version, prompted by Simon Rogers, and thanks to a bit of help from Ian Turton.

This is the same as above, but a little slower (73 seconds in total)

In this version, it's cumulative - click to enlarge and start from beginning

The individual frames were created in QGIS and in relation to the max and min values per day you can see those below. The largest number of gun deaths in a single day in 2015 was on July 5th and the lowest was on May 22nd. The mean number killed per incident was 1.12 and the mean per day in 2015 was 35.8 (for a total of 13,067).

The peak month overall in 2015 was also July

This was the only day that the number killed was below 20

There are just over 11,600 incidents recorded in the database but it's quite difficult to get your head around at a national scale. The Guardian already published some great localised mapping of this data, if you're interested. With this example I was just trying to experiment with ways that quickly and simply convey the idea of 'a lot'. The fast animation using thousands of data points is one way of doing this. It's designed with repetition and replay in mind, and the point is not to highlight individual datapoints or days, but to create a kind of cognitive mash where the end result is that you can take away some detail - e.g. most days have between 20 and 50 gun deaths - and also see the locations do, as you'd expect, mirror underlying population patterns. But only to a point. If you look closely you can see that some places are over or under-represented.

There are many ways to powerfully visualise this kind of data, including much more nuanced interactive methods of the kind produced by FiveThirtyEight. My approach here is non-interactive on purpose, but of course it is less visually appealing too. But then I also think that making something beautiful out of something so ugly is not what I want to be doing. All I wanted to achieve was to highlight the volume in the data in a way that anyone could understand and by using one frame per day and plotting the location points I think I'm just about there.

If you're interested in looking at any of the individual frames for a given day, take a look at the Google Drive folder below. You can see individual dates to the top left of each image and also in the file name of each image.

See all 365 individual days here

Notes: in the Guardian's original csv, I found that the date formats were a bit messed up, so I fixed this and added in some new, corrected date fields to the right of the spreadsheet. I also added in individual columns for day and month. I'm not a gun campaigner, this was just an interesting dataset for me to use. If you have any questions, feel free to get in touch. This data covers homicides only, no suicides. I updated this post on 5 April 2017, to include cumulative totals in the maps. Updated again on 10 April 2017 to include a cumulative version. It looks a bit ugly at the end but then it's a pretty 'ugly' dataset. I thought this was another interesting way of displaying the data.

Sunday, 26 February 2017

Train Stations of Great Britain

In my ongoing quest to answer the burning questions of our times, I have decided to continue my data-based boffinry by looking at a couple of questions I sometimes think of when zipping up and down the country on the train. I'm sure I can't be the only one, so here are some results that I've had saved up for a while. The first question is, 'which parts of Great Britain are furthest from a train station'? The second is 'how many train stations are there in each local authority or parliamentary constituency?'. Yes, I know I need to get out more but if you're reading this you probably do too - so take a look at the first two maps below.

Not exactly earth shattering, but some interesting snippets

You can click on this to see a bit more detail

Not entirely unexpected patterns here. In part, I also did this to use as teaching material in the future (it uses a basic GIS operation) and I used 30km just because it produces an interesting result. You can see the area around Bude in North Cornwall is England's largest area without a station. This issue has been raised in parliament many times, including in 2014 by the previous MP for the area. The furthest areas from stations are all in the mostly sparsely populated north and west Highlands, but also in and about the Cairngorms and the Borders - though the latter has just got a lot smaller thanks to the re-opening of the Borders Railway. West Wales and a bit of North Wales is also not on the map in this regard. There is also a tiny sliver of land in Yorkshire that sits just outside this 30km buffer distance. Some zoomed in maps follow...

This is just on the Scotland-England border

Around Bude in North Cornwall (and a bit on Exmoor)

A zoomed in map of train station deserts in the Highlands

The Norfolk train-free zones

The West Wales no-rail-zone

Looking for trains in the Yorkshire Dales? Avoid this bit.

Okay, so having answered one burning question, let's briefly turn to the other. How many areas in Great Britain (and I'm just referring to the island of Great Britain) do not have a station? For Local Authorities, I make it 12 out of 376 and for Westminster Constituencies, I make it 49 out of 630. I've screenshotted the two files here but you can also explore them yourself in Google Drive

Many stations in the largest areas, obviously

Same as above - e.g. Highland coves a larger area than Wales

What should we conclude from this? Not much, but It's quite interesting to look at the local authorities or constituencies that do not have a train station - of which there are 2,557 listed in the Office of Rail and Road 2015-16 data that I used for this. The next two maps show where there are no stations - but there are possibly a couple of small inaccuracies (Kensington and Chelsea being one as three stations are right on the border there).

This is very interesting

If you've read this far, you should get out more

Okay, so that's about it. Some data notes below if anyone is interested. Also, the spreadsheets in the Google Drive folder have passenger entry and exit data - i.e. the headline 'passengers' figures that are used to identify the busiest stations - e.g. Waterloo with nearly 100 million in 2015-16. I have also added in average, max, min and sum figures on passengers for the aggregated local authority and parliamentary constituency numbers. Hours of fun.

Data notes: follow this link to get the 2015-16 data on stations that I used here - including the eastings and northings for station locations. I got the boundaries from the excellent ONS Geography Portal and they are, of course Crown Copyright (but also open data). As in, Contains OS data © Crown copyright and database right (2017). The data are compiled by Steer Davies Gleave on behalf of the Office of Rail and Road and they are accompanied by this interesting two page summary. In addition to the two spreadsheets, I have also uploaded the images in this post to the Google Drive folder. Train station vs railway station? I'm not bothered about this, or with data is/data are.

Tuesday, 21 February 2017

The UK's Best Place to Live

Where is the best place to live in the UK? The answer is simple. It's in my street. But that's just my view. Ask someone else and you'll get a different answer. Of course, this kind of thing doesn't really work when you do it from the perspective of individuals, and definitely not when you're trying to do it for the whole country, as I recently did for Outline Productions in their Channel 4 documentary, presented by Sarah Beeny. This blog post gives a little bit of the back story to it, discusses how mortified I am to be on the telly and a bit about the numbers. But how do we decide which is the best place to live in the UK? The real answer is that it depends upon who you ask and how you measure it.

I did the number crunching for this show

The background to this project is that I was contacted by Rachel Eadie of Outline Productions to see whether I could help them develop a 'best places' index based on a number of different criteria, such as income, house prices, wellbeing and so on. This was late in 2016 and I was a bit pushed for time, but it sounded interesting and I know the data pretty well so I said yes. After a few days of work and tweaking things I arrived at a final result. I received an initial 'wish list' of things to include from Outline Productions and I stuck to that where I could. The only criteria that I added was that I wanted this to cover the whole UK at local authority level - 391 in all - so that it could make some kind of sense across the entire UK. Too often these things only cover one or two parts of the UK. I included data on income, housing affordability, life satisfaction, happiness, jobs, unemployment, health, child poverty, and people aged between 20 and 29. 

The last bit highlights an important fact. We wanted this to be about the 'best place' to live for people in that age category. In this sense, think of it as a 'best place you might actually be able to move to and afford to rent or buy in' index. I say this because many existing 'best place to live' indices end up being topped by areas with an average house price of £500,000, and that's no use for most people. Also, given the propensity of people in our target age group to locate in larger cities, I also computed a 'proximity index' in relation to how close each local authority is to 13 major cities in the UK. Some places, such as Orkney, do really well on quality of life or 'best place' indices but their relatively low number of jobs and distance from major population centres means moving there is not a viable proposition many will consider - even if they are great places to live.

The Ring of Brodgar in Orkney (a great place to live) - source

What it was like to film this
I never thought doing television would be easy but by doing this little bit of work for a television production has made me realise a) how much goes into a single hour of television - so much work! and b) how bad I am at speaking, walking, thinking and communicating on camera. Seriously, I am not the most articulate person but I'm not completely terrible either. At least I didn't think I was. What I found is that having a camera on me made me robotic, incoherent and a lot more nervous than I expected. Things I know off by heart about data and places suddenly became impossible to recall when the camera was rolling. I also kind of forgot how to walk properly when being filmed, but I trust that the expert skill of the producer (Laura Mansfield of Outline Productions) means that I didn't end up totally ruining their programme. More seriously, it was an interesting experience and one that I think is useful. We filmed my bits in one day in Sheffield in December 2016, in ICOSS and across the way, outside The Diamond. I saw the final edit of the programme in January and despite not liking the look or sound of myself I thought the programme was well done. They sneakily got some good stuff in there about jobs-housing balance and the fact that indices are inherently subjective.

The numbers
I'm posting this just after the initial broadcast has finished in the UK (8pm, Tuesday 21 February 2017) so I can say a bit more about the final results now. It had been under embargo until that time. Remember that the areas I ranked relate to local authorities (e.g. London Boroughs, urban local authorities such as Leeds, Bristol and Newcastle, Glasgow, Cardiff, Belfast and so on). Individual places within local authorities, or places that go beyond the boundaries of individual local authorities are not part of the story here. It's based on the current 391 local authorities of the UK. South Ribble came top as our 'best place to live'. You may not have heard of it! But it's just to the south of Preston and includes within its borders places like Penwortham, Leyland and Bamber Bridge. 

Location of South Ribble - the UK's 'best place to live'

To add a bit of socio-economic data to this picture, you can look at the one of the maps from my Indices of Deprivation atlas (all other local authorities in England are here). In the map below, blue areas are among the least deprived in England, and the red and orange areas amongst the most deprived. You can see that for South Ribble most areas are in the least deprived deciles.

Deprivation map of South Ribble, from blue (least deprived) to red (most deprived)

Bear in mind that this all depends upon how you measure things - which of course also applies to just about anything in socio-economic studies. But, having said that, my follow up discussions with people who actually live there gave some more weight to the findings and there does seem to be a real dynamism in the area, possibly also because it is included in the new City Deal in Lancashire. I always try to 'sense check' the results of any data analysis against personal experiences of people who know areas, just to get an idea of whether the data seem to be telling the truth, as it were. For more on what's happening in South Ribble, see this piece in the Lancashire Evening Post. Remember also that part of the reason South Ribble came out top is because of what's nearby - and this is important to people when it comes to transport and jobs.

Next on the list was Warrington, located in between the urban local authorities of Liverpool and Manchester in the North West of England. This is a very good example of how transport connections, proximity to major urban labour markets and relatively affordable house prices combine to make it the kind of place that people could realistically move to and live in at the life stage which was the focus of the programme. Again, from personal experience I know that many people choose to live there for the reasons outlined above, so I wasn't very surprised by it. 

Motorways, railways, cities nearby - it's Warrington 

The North West of England dominated the top ten, but Blaby snuck in to the top 3. Blaby is another one of those places that is not on people's mental maps because it's the name of a local authority area rather than a well known town or city. However, it's a suburban local authority to the South West of Leicester in the East Midlands, as you can see below. You can see that, like South Ribble and Warrington, it is also very well connected in relation to transport (e.g. the M1) but this area also abuts a major English city - Leicester. This was a feature of several local authorities that came towards the top of the rankings. Other places like it include Rugby (at number 7) as you can see below.

Blaby - you might not have heard of it, but it's at number 2

Here's a basic map of the rest of the top 10 - just to give you an idea of the distribution of places. As you can see, 7 out of the 10 are in the North West of England - this is driven partly by relative affordability but also by things like happiness and wellbeing, and connectivity. Below this, you'll see a list of the top 25 places on the index.

The UK's 'best places' - top ten

An interesting mix of places in the top 25

Anyway, that's a little bit more information than is in the TV show itself so hopefully some people will find this informative. The precise position of places on the list does, as I explained before, depend upon how you choose to weight and measure individual indicators but this is how things came out. If we repeat it - e.g. in Best Places 2020 - we might find that different places come out top. The fact is that anywhere in the UK could be someone's own 'best place to live' with the exception, I suppose, of prison! Our programme gives the 'best place' notion a slightly different take on things. 

Notes: in the bits when I discuss the data, there are a few times when I've described it in ways that may seem unconventional - or even wrong. One such example is in relation to disposable income when in fact what I'm really discussing is discretionary income. I wanted to try to be informative without being too technical but at times I may have gone a little too far and simplified things more than was necessary. Having said that, I realise that the kind of people I hang around with might know these terms but the average TV viewer probably doesn't know or care. I mention it here in case anyone spotted this or any of the other things that seem a bit odd. After all, this is part statistical exercise and part entertainment. And why am I getting involved in this stuff anyway? Well, I like to do interesting work beyond the confines of the academic world and this seemed like an interesting opportunity to offer a different take on 'best places'.

Sunday, 12 February 2017

English Green Belt Atlas, Version 3

In 2015, after being posed the question by geodata guru Bob Barr, I decided to attempt to calculate the percentage of land in each local authority in England that was designated as green belt, using the official data from DCLG. This resulted in version 1 and 2 of my green belt atlas - a spare time project that I hoped people might find useful and informative. According to my calculations, 186 of the 326 local authorities in England contain at least some green belt land (that's 57%) but the amount within each area varies a lot, as you can see in my spreadsheet. For example, Sevenoaks, Epping Forest and Tandridge (below) all have more than 90% of their area as green belt, in contrast to (e.g.) 47% in Wirral and just over 5% in Bristol. I'd also be interested in the correlation between this data and house prices, or house price growth, if anyone is up for it (looking at you, Tom Forth).

This is my estimated figure, but I think it's pretty accurate

I've used the latest green belt data (2014-15) for this version

Surrounded by green belt but not much within the boundaries

I was prompted to go back to this after the release of the Housing White Paper last week and my last post on buildings in the green belt. Should we build all over the green belt? Definitely not. Should we consider building on little bits of it? Maybe. But before any of that I think it's useful to understand where and what it is, which is what prompted me to look at all this in the first place. I've uploaded all the individual map files to a separate Google Drive folder but if you're too busy to click, here are a few more.

The 'overboundedness' of Leeds' urban fabric is evident here

I find the North Warwickshire green belt split interesting

Manchester's green belt area is also interesting - airport in it?

My calculations suggest St Albans is more than 80% green belt

Hounslow - about a fifth of this is green belt

This surrounds Cambridge pretty neatly

Click to see all 186 maps

If you look at the folder with all the maps in it you should see that the figures all seem pretty accurate based on a visual comparison but there are a couple where I'm not 100% sure the figures seem right, but it might just be me. Either way, feel free to get back to me if you spot something that doesn't seem right.

This figure looks a little high, but it might just be my eyes

A London Borough with more than 50% of land designated as green belt

Sheffield is 25% green belt (not to mention part of the Peak District National Park)

York's green belt: note the little non-green belt part to the SE

That's all for now. Like I said, this was motivated by a personal interest in the topic and my desire to share useful information on an important issue. If  you want to use any of the maps, feel free. Or, if you want to get in touch if you've spotted an error then please do. If you're looking for an interactive web map of the green belt, then you have a choice: the Telegraph version, my CARTO version, or my Google search and zoom version.

The easy-to-remember link for the full set of images is

Methods: this was another QGIS Atlas project, and more details of the method can be found in the footnotes of the original post. The only difference this time is a slightly different style, I've added in the local authority names and boundaries, plus some place names. But you'll notice if you look through the images that the number of labels differs by area - I could fix that but it would take too long for a spare time project like this. However, I think it does help with orientation. This version was done in QGIS 2.14. The files are all 300dpi PNGs and you are free to use them as you wish. 

Wednesday, 8 February 2017

Buildings in the Green Belt

The publication of the Housing White Paper yesterday prompted me to complete something that had been on the back burner for a while - mapping the buildings on England's green belt (or green belts, if you prefer). Before going any further, this isn't a post advocating building on the green belt but rather I wanted to explore the extent of buildings that are already there, in the hope that it might help enlighten me and others. I've previously written about green belt data and tried to figure out how much of each local authority is green belt, so this continues my interest in this area. First of all, here's a little map of buildings on the green belt around Bristol. If you want to download the 'buildings in the green belt' shapefile, scroll to the bottom of this post.

The Bristol and Bath green belt - perhaps not as empty as you might think

If we zoom in a little more you can see a little bit more of the detail of the pattern of development in the green belt. I'm not sure how much people know about the level of building in the green belt, but I did note that the Housing White Paper quite rightly pointed out that 'parts of it are not the green fields we often picture' (p. 28). The zoomed in area below is near Bath.

Part of the green belt around Bath

The point here is not to say that there is a particularly high level of development on the existing green belt, or lots of buildings in absolute terms, but rather to show that there are places within the green belt which are perhaps already more built up than some accounts in the media might suggest.  Many news stories on 'green belt' also often have pictures of lovely green countryside that is not actually green belt. Housing experts and planners are already aware of this, of course, and much of the development can be traced back to before the green belts existed, so this is in part an educational and visual exercise in mapping it all. The next few images cover some other urban areas across the country, starting with Oxford.

Buildings in the Oxford green belt

Buildings in the Cambridge green belt

Buildings in the Metropolitan green belt

Guildford (bottom) and Woking (top) area green belt

Buildings in the North East green belt

Buildings in the York green belt

In the final buildings in the green belt' map below I have just shown buildings without the green coloured green belt backdrop. This also gives you an idea of the level of building in the green belt, although it isn't very high.

Bear in mind that there still aren't that many green belt buildings

Finally, in order to put things in a bit more perspective, I've done a zoomed in map of the Gloucester green belt area showing those buildings which sit on green belt and those that don't. I have chosen this because I think it helps emphasise how successful the green belt has been in some places in relation to achieving the aim of controlling urban growth.

Gloucester green belt and non-green belt buildings

Data notes: I downloaded the most recent green belt shapefile from the DCLG and then got the building polygon data from the Ordnance Survey open data web pages - the OS OpenMap Local product. There is green belt in 14 different two letter OS tiles so I just extracted building data where it intersected the green belt. I then merged this into a single file. If you're looking for building data for your area, you might find some on my buildings page, where I have joined data for major urban areas and also added the local authority each building sits within. If you just need to know which tile to download, check out my tile finder below. Want to play around with and map this data yourself? I've made a 'buildings in the green belt' shapefile available for anyone to use. There are undoubtedly some small errors in the dataset, but I've used the DCLG file in good faith here. Note that in some places you'll see a large built up cluster in areas of green belt - as in the case of the West Midlands map and Kenilworth above.

Use my interactive map to find out which data tile you need