Friday, 14 May 2021

Far too many words about a chart

This is a bit of a long read, because I wanted to put down my thoughts on a chart I made that has been quite popular and quite widely shared. See below for one version of this chart - it ranks all 650 UK Westminster parliamentary constituencies by deprivation and colours them by which party won the seat at the 2019 General Election. You can skip straight to the data and charts folder now if you want, but if you do, it's probably best if you also read the notes at the bottom of the page. Because this is a chart made of blocks, or bricks, and because of recent 'red wall'/'blue wall' media talk I'm going to call this a wallchart, but you can call it whatever you like. If you do like reading about charts and dataviz and the thinking behind them, read on. I cover all the small things.

Click to see it full size

Background

I will not go on too much about the backstory, but I'm here to tell you that there have been quite a few votes on things across the UK in recent years. This is something you may have noticed, particularly if you were up all night watching, waiting, or commiserating at any recent election. Because of all the votes there have been lots of maps, lots of data and lots of news. Some good stuff, some confusing stuff and probably just too much stuff. But anyway, after the 2017 UK General Election in June 2017 I wrote a piece on here with some maps and charts. Almost as an afterthought, I said to myself 'I wonder what it would look like if you took all 533 English constituencies, turned them into blocks, ranked them by deprivation, and then coloured them by the party that won the seat'. Then I thought, 'how quickly can I do this?' So, I did it very quickly and messily in Excel - see below for what the first one looked like and a screenshot from the original Excel file.

What's 533 divided by 10?


Some immediate observations about this chart that came to mind at the time: i) it's annoying that there are 533 rather than 530 constituencies in England - neat charts > representative democracy; ii) this chart tells us something that is blindingly obvious but even so it's also strangely striking; iii) 'hey, what's that blue block doing on the left hand side'? (it's Walsall North); iv) I didn't realise the orange blocks would nearly all be on the right half of the chart; v) I need to figure out how to deal with the fact that 533 doesn't divide by 10 neatly and maybe make the top of the chart nice and flat.

Excel?! #$@&%*! Excel!!?? Say it ain't so


Next versions

I really didn't think much of it at all after I hit Publish and never planned to come back to it, but at some point in the year or two that followed, Danny Dorling got in touch with me about it using it and then I did new versions for different years, including ones for the UK that were much neater because of the fact that there are currently 650 constituencies and this divides neatly into 10 columns of 65. I also changed the colours, added more labels and some versions had constituency names as well. The initial one used deprivation data for English constituencies produced by the always-excellent team at the House of Commons Library. For the earlier UK versions I used UK-wide deprivation data produced by Abel et al. (2016) to calculate the ranks. 

For most versions of the chart, including the UK ones, I've had to calculate constituency-level deprivation myself using a population-weighted method, aggregating up from small areas (LSOAs in England and Wales, data zones in Scotland and SOAs in Northern Ireland). A big hat-tip here must go to my collaborator and all-round political data boffin Philip Brown for prompting me to come back to this and do new versions. When the 2019 General Election came round I decided to dust off the old files and make a version that looked much nicer. The reason I've come back to this again now is because Alex Parsons recently published a new set of UK-wide deprivation data, based on the earlier work by Abel et al. 

The left hand column is turning blue


This was the 2017 General Election edition

It took a while to put the data together for the different years and although I would have loved to put together one for earlier years - particularly 1992 and 1997 - it would have taken me too long to do it.

Read on below for more on the design stuff


Since I did the first one of these in 2017, other people and organisations have done their own versions using the same concept. I am not sure if it was done before that, but I haven't seen one like it. All I do know is that I got the original idea from a chart Owen Boswarva made in relation to age and party colour. 

Here's a version that the Economist created (below) - horizontal rather than vertical and much more elegantly executed. It was in their 'Who are the Conservatives’ new voters in the north?' piece from December 2019 - which I took a picture of in the print edition just because it's nice to see things in print. I had to travel a further from home than I expected to actually find a print copy of the Economist because none of the local newsagents near me in Sheffield Brightside and Hillsborough (left-most column) stock it, but that's another story. 

This is a good place to mention the ecological fallacy, because I am certainly not claiming I am poor, or deprived - this is just a nod to the fact that internal variation obviously exists in all areas, even if the overall ranking is a good representation of the overall pattern across the UK in terms of relative deprivation by area.

Lovely!

Call me old fashioned but I do like print

There are also other versions out there, including a remain/leave by deprivation one that Helen De Cruz did - copied below. I thought this one was really interesting, particularly column 10.

Column 10 is clearly the most remainy

There was also a version of the chart looking at age and deprivation - this was done by the Resolution Foundation in December 2019 and can be found in this twitter thread (I've pasted the chart below as well). I decided to do an age one in my most recent iteration - see below for that.

Very nicely done

I have also created a dark mode version of the latest chart from 2019 and a chart ranked by median age as well, so I'll put these ones below too. You can see this in my more recent twitter thread about it.

Turn the lights off!

Age rank gives us a similar pattern

Design

The legendary Andy Kirk very kindly noticed one of the things I did in earlier versions of the chart so click that link for a bit of independent insight on one of the details. Here I will set out a few reasons why I think this chart connects with people - and why it also makes sense to me. One of the main reasons is that it builds upon some things that are hard-wired into our brains already.

  • Rank: ranking things is not always a sensible idea and it's always somewhat fuzzy. Things get collapsed into categories made by people and this can mask internal variation. However, even if ranks can be quite a coarse measure, the constituency ranks by deprivation are more or less what we might expect for England or the UK as a whole. The places with the most 'poor' or 'deprived' neighbourhoods are on the left of the chart, with the wealthiest areas on the right. Also, and fundamentally, ranks are easy to understand, so this is not much cognitive work for the reader, once they know how the chart is laid out with the most deprived areas at the top left and least deprived at the top right - and this requires only minimal labelling. 
  • Colour: this one is of course more context dependent in that UK readers (or those who follow UK politics) don't need more than a split second to interpret the colours, without a key. We see red, we see Labour, we see blue we see Conservative, we see yellow, we see SNP, and so on. There aren't many occasions when we can do this with colours but political charts and maps are one example where we can. The cognitive load here is again quite low, I would argue. I will just add that I've tweaked the original colours because the red/blue combo was blinding in the first chart. My colours are a mix of commonly used ones (e.g. on BBC election coverage) plus a bit of artistic licence. 
  • Position: The general idea is for this chart to be more about the forest than the individual trees. Having said that, being able to position individual areas on the spectrum from more deprived (left) to less deprived (right) is very important. It also matters symbolically that the red areas are on the left, and blue on the right because this is a visual match for the traditional left/right political divide. Again, we can argue with definitions and concepts here but having Labour on the left and the Conservatives on the right matches what is already hard-coded in our brains if we follow politics even just a bit.
  • All the small things: there are lots of little things I've tried to do with the different versions of this chart to make it easier to digest, but without overwhelming the reader. One example is the faded axis labels on the left and right of the chart (below). I did this because I only really wanted people to be able to understand how the ranking worked, but without introducing any cognitive overload. I also don't want to put too much stock in individual rankings - I'd rather people saw the groups of 10 (labelled across the top) as more indicative of relative socio-economic status. That's also why the label colour is lighter for the individual blocks. The number fade thing was something I got from John Burn-Murdoch on twitter but I can't find the original post on that. So, the faded axis numbers are very much there so readers can get what's going on, but without it being too much of a focal point or distraction. 
  • Notes, etc: with this being the internet, there are small dangers associated with doing this kind of thing. Quite often people take things without attribution or context and share them with friends and followers as their own (even academics who ought to know better). People also see what they want to see - particularly with politics and causal links on a chart like this. The idea behind the chart was simply to see what the pattern looked like. I'm not trying to identify causal links here, but clearly people will do this anyway. Looking for causal links is a different kind of statistical activity, but this chart raises such questions with people and that's to be expected. Anyway, in order to avoid some of the problems with no-context sharing, I try to put enough text on the chart to explain a) what it is; b) how it was made and c) data sources. I usually add my name as well so that if someone wants to find out who did it they can get in touch - this would be more complex if my name was John Smith - but thankfully there aren't that many Alasdair Raes doing weird colour charts these days. Still, people will crop your name off because this is the internet.


There are plenty of things I could improve about the chart- and I've tried to keep doing that - but at the same time this wasn't part of my day job (work sucks, I know) and it wasn't for any particular project so I can't invest too much time in it (he writes, whilst writing War and Peace instead of finishing the presentation for next Tuesday). Having said that, there are a few final points I'd make on all the design-related stuff.

  1. Doing this in Excel might be okay for a quick-and-dirty experiment but it's clunky and messy and every time I go back to the original I am confused. It also takes me ages to find the original files. Having said that, I'm not bothered about what tool I use to do a job so long as it works - and there is nothing wrong with Excel, used properly. I like to think I follow 'the law of the job' rather than 'the law of the instrument' but even so this was a messy mess first go round. It started off as a bit of Excel formula, pivot chart, conditional formatting and a few hacks and it worked, and that's okay.
  2. After Excel I normally used GIMP to add some text and other bits. Again, fine but not optimal. I think in previous quick versions I even drew some arrows in PowerPoint and I'm not even sorry because it worked fine and served a purpose.
  3. I don't really like to label the individual blocks because the chart idea wasn't about that, plus some constituency names are very long and the font size has to be small. But I do like to think about user needs so some versions have them because people always ask. There's also a searchable html version so you're only a quick browser search away from finding your area.
  4. Most importantly, the way I have done this to date is pretty sub-optimal to say the least but then sometimes sub-optimal things happen and you get locked in to them. 

So, I thought to myself... why don't you try to make this less bad and also sharable? Good idea. But how? Read on to find out and grab the data for yourself.


Wallcharts for everyone
I normally work with spatial data but the chart I'm talking about here is non-spatial - it's just a series of ordered blocks and where they sit in the chart bears no relationship to their geographic location in the real world. Nonetheless, what I decided to do was keep it simple and create a 10x65 GeoPackage that covered roughly the same areas as the UK and with the constituency deprivation rank number assigned to each block - see below for what this looks like with the standard UK election map on top of it. Once I'd created the 10x65 grid I prepared a separate spreadsheet with all my other data in it - including age, ethnicity and so on. I then joined this to the 10x65 grid to make the final GeoPackage.

The 10x65 GeoPackage can be found in this web folder - I've also shared it in different formats as in the hope that it can be more widely used (shp & geojson). When you add it to QGIS it will be coloured according to the party that won each seat in 2019. I do realise that for non-GIS users this might not be ideal, but you can easily take one of the existing versions and convert to a more suitable format for your own needs. There's also a qml file in the folder that you can use to style the layer, as well as one svg that I extracted as a test.

This is simple, but works well


Just in case you download the UK 650 block file and are confused by the column headings, here they are:

  • fid - just a numeric identifier, which can be ignored
  • DEPRANK - the rank of each constituency based on Parsons' 2021 UK small area deprivation data, aggregated up to constituency level (by me)
  • CODE - ONS constituency code
  • CONSTIT - constituency name, upper case
  • NAME2019 - constituency name
  • COUNTY - the county a constituency is in
  • CTRY_REG - the UK country or English region a constituency is in
  • COUNTRY - the UK country a constituency is in
  • GE2010, GE2015, GE2017, GE2019 - the part that won in each of these years
  • AGE2019 - the median age of each constituency in 2019
  • AGERANK the age rank of each constituency in 2019 (1 is youngest)
  • POP2019 - the population of each constituency
  • MINDECILE - the minimum UK-wide decile any single LSOA/DZ/SOA within a constituency is in (1 being the most deprived)
  • MAXDECILE - the maximum UK-wide decile any single LSOA/DZ/SOA is in (10 being the least deprived)
  • AVRANK - the average rank of all LSOA/DZ/SOAs in a constituency - where 1 is most deprived and 42,619 is least deprived (so, e.g. an average rank of 4262 would mean that the average area was among the 10% most deprived nationally)
  • POP2011 - the population in 2011 (relates to the indicators below)
  • WHITE2011, MIXED2011, ASIAN2011, BLACK2011, OTHER2011 - the number of people in each ethnicity group from the 2011 Census (data published by the House of Commons Library)
  • WHITEPCT, MIXEDPCT, ASIANPCT, BLACKPCT, OTHERPCT - same as above, but % rather than total number
  • WHITERANK, MIXEDRANK, ASIANRANK, BLACKRANK, OTHERRANK - this ranks all 650 constituencies according to ethnicity, so that an area with a WHITERANK of 1 has the highest % of white population (as of 2011, we don't yet have 2021 Census data, so keep that in mind) and a WHITERANK of 650 has the lowest % of white population - and so on for the other groups.


I'm almost done here, but before I wrap up here are some different versions of the chart - now much easier to create having done the hard work of putting it together in a more suitable format. All I need to do is apply a simple filter and I can get all sorts of interesting charts out of it.

The overall 2019 chart

2019, just for England

2019, just for Scotland

Coloured using 2010 results

Constits that are 25% + non-white

The 100 oldest constituencies

Lab in 2017 but Con in 2019

What colour the 2010 Lib Dem seats are now

I've put the high resolution (300dpi) versions of these charts in the new wallchart folder I created for this project, so feel free to use them as you wish. 

Well, this was quite a long post so if you read all of it then please accept my thanks and congratulations.

This was mainly about all the small things, so if you noticed the Blink 182 song references, well done - carry me home! 


Notes

The deprivation ranking is somewhat England-centric. You can read more about the new UK deprivation data I used for the new charts on Alex Parsons' repo for the project. This new iteration is all possible because of the great work Alex did. If you want to follow the method then you should definitely read his notes on this - as well as the bit on mapping (which I provided some of the text for). There are four different files you could use to create a UK-wide constituency ranking. I used the UK_IMD_E.csv file, but the overall pattern is not likely to change much when you use the others. I also used the most recent mid-year population estimates (2019) for weighting.

The individual ranks of constituencies are likely to move slightly up and down if you use a different method but, on the whole, I am confident they are in the right place. 

Feel free to use, re-use, improve or otherwise adapt this - I know there are many, many people with better skills and ideas than me so this is very much a starting point.

If I spot any typos or other errors I'll back later to clean them up.