Stats, Maps n Pix: 2017

Wednesday 27 December 2017

How to make a geogif

Back in October I said I'd do a write up of the workflow for creating a geogif, so here it is (and now I've fixed some errors in the original post, see update at end). I've done quite a few of these over the past few years, including ones of the 2017 UK general election, house prices in England and Wales since 1996, commuting in Greater Manchester and US gun homicides. My favourite is one of the coastline of Skye, though that was just for fun. But sometimes a geogif can be quite useful at telling the story of data quickly, particularly online (I think the general election one does this quite well). Here's the end result of the workflow described below - a geogif of the population of US states in 2017. All of the content (maps, QGIS project files, gifs, shapefiles) can be downloaded here.

Click for full size - 1200ms per frame, 1 minute in total

I'm going to assume that if you are following this then you have at least some knowledge of QGIS, but if not then you can probably read my previous posts on the QGIS Atlas tool to help you. But this is definitely aimed at people with existing knowledge of QGIS.

I've packaged up the QGIS project (.qgs file) and the individual shapefile layers I used to create the gif above. You can download them here and then just open the project to see what's going on 'under the hood'. If you open this project (us_state_pop_2017_red_fixed.qgs), you'll see that I've used three layers, as follows:

One layer is just a light grey states background;
One layer is for the darker grey states that appear sequentially in the gif; and
One layer (a duplicate of the darker grey layer) is for the red highlighted states, to give it a more dynamic animated look.

For the dark grey layer I've used a rule to style it, which means that it will show the currently active QGIS atlas feature, plus all previous ones: $id <= $atlasfeatureid. It was suggested to me on StackExchange by Ian Turton as a simple way of achieving this effect.

The red layer is similar, but the rule on this layer is $id = $atlasfeatureid, which means that it basically highlights only the current atlas feature as the Atlas iterates through each feature (note that in the new, updated QGIS project files I have used the new Atlas syntax, which is explained below).

Where you set the layer style rules I describe above

When you go to the Print Composer in the project I've shared above, you will not actually see anything much until you activate the Atlas by clicking the Atlas button, as shown below.

Notice the Item Properties in the white box to the right

In the final version of the layout, once the Atlas has been activated, you can see from the Item Properties box on the right what I have used to add values from the layers the project is based on. Basically, I've added in fields and some text, as you can see from the images below.

This one adds a number and state name, plus a dot and a space

This takes the population, divides it by the 2017 US population

This gives the population in 'million' format, to 1 decimal

An important side note here is that I have found it necessary some times to re-order attribute tables in order to get the desired effect in terms of the Atlas running through things in the right order. If you need to do this, install the MMQGIS plugin and use Modify > Sort on a shapefile to achieve this. It works perfectly. Another note is that in later versions of QGIS the syntax for Atlas functions looks a bit different so make sure you are aware of it if you run into problems (e.g. @atlas_featureid instead of $atlasfeatureid, though both work for me in 2.18).

Once you have an Atlas project set up you are ready to export the frames. One thing here that is helpful is if you give the frames sensible, descriptive names. You can easily customise output file names in QGIS Atlas (another great feature of the tool). You can see how I've done this below, using the newer QGIS Atlas syntax. The "NAME" in the image below just takes the name of the current Atlas feature from the shapefile's attribute table and adds it to the file name. You can see the result in the folder where I put all the outputs (see screenshot below). Making sure the files have sequential names can save a lot of time later on because for an animation you need to be able to order the files in the order you want them to appear in the final gif.

This creates a file with a number, underscore and state name

These are the outputs from the QGIS Atlas project

Okay, so this is basically part one over with. You can create all kinds of things with the QGIS Atlas and of course you don't even need to use this but for a gif of the kind shown at the top of the page this is my typical workflow. See Lesson 21 and Lesson 22 of Steve Bernard's QGIS YouTube series for a different kind of animation approach.

The next step is to patch the images together into a gif. You can do this lots of different ways, but I tend to use GIMP, which is great - free and open source and despite the slightly worrying name you can normally find it online without running into too much trouble. In fact, you may be relieved to hear that it's the first result on Google. I currently have version 2.8 installed.

I'll assume you've installed GIMP and now want to create a gif like the one at the top of the page. Here are the steps.

File, then Open as Layers...

This first step just gets all the different layers loaded into GIMP. I've done this with a few hundred layers before and it worked fine but if you're getting into the high hundreds then another approach might be best as you could run out of memory.

If you've used QGIS to give the individual layers sequential names then the frames should all load into gimp in order and you'll see the last one first, and the individual layers listed on the right, as shown below.

This is what it will look like in GIMP

If you need to reverse the order of the layers in GIMP, you can do this easily by going to Layer > Stack > Reverse Layer Order. But I want my animation to start at number 1 so I'll keep it like it is.

You could just go ahead and export this as an animated gif, but if you don't optimise it then the file size is likely to be pretty big. Both versions I shared in the online folder are under 1MB (979KB and 530KB) so this step is really useful. Here's how you optimise it: Filters > Animation > Optimize (for GIF).

It won't reduce the quality but it will reduce the file size

Once you've done this, a new GIMP window will open up. The only difference here will be that you'll see the same thing as before but a dashed yellow and black box will appear on top of the images - nothing to worry about, that just tells you which part of the image actually changes from frame to frame. Now would be a good time to save your GIMP project via File > Save As...

The next step is to Export the image to an animated gif. You do that via File > Export As... and then it's a case of selecting gif as the image format. If you're not used to this it can be a little mind-boggling but basically you just tick the As Animation box and choose your settings. Here you need to be aware of the delay settings in particular because that tells you how long each frame will appear for. As you can see below, I've chosen 1200 milliseconds for this gif because that makes it last one minute overall (50 frames for 1200ms).

These are the settings used for the image at the top of the page.

In this gif, I used a delay of 1500 so each frame lasts 1.5 seconds

The other thing to think about is how big you actually want your gif to be. I did a version at 1500x1500 pixels but in general I'd recommend going much smaller than this. The two on this page are 1000x1000 and you can choose whatever dimensions you want when exporting from QGIS Atlas.

The tricky thing with this is trying not to cause some kind of cognitive melt-down on the part of the viewer, which is easily done and in a way is a bit of a risk with the US states population gif here. But of course it's not intended just to be watched once and this is just an example use-case to show you the workflow. But there is a serious point here in that a gif is definitely not the way to go a lot of the time, but I do think it's underused and definitely has a place.

Why would you want to do this in the first place?

That's a good question. Take a look at some of the examples I've linked to at the top of the page if you want to see a variety of different use cases. Some are better than others but all of them tell a story. Some stories are better told in other ways of course. Topi Tjukanov has loads of great examples - with more appropriate data - on this Medium post. If you ask me, a geogif can be useful in the following cases:

When you want to show movements of things, such as commuters, freight or marine traffic in the Baltic Sea.
When you want to show things in sequence, alongside some basic data.
When you want to highlight that there are 'a lot' of something, rather than focus on the individual data elements.
When you want to quickly compare shapes, as I did previously when looking at gerrymandering in the US.
When you want to show the geographic spread of something over time, such as house price changes.
When you want to avoid the work you should be doing and instead feel the irresistible need to create something map-like that moves.

In many cases a geogif will only be an entry point to the data and will not allow users to delve more deeply but in many cases it can be a good way to highlight the big picture, which is still useful thing to do. You can judge for yourself what works, what doesn't and what is and isn't useful. Ultimately, geogiffery can be useful, but it can also just dazzle and confuse, so be careful.

Notes, etc.

Little-known Windows gif fact... If you left-click and hold the mouse button on the [X] at the top right of the Window when an animated gif is in progress, it will pause the gif. Move the pointer off the [X] and unclick to continue. I know of no other way to do this. Sometimes you can pause gifs on twitter, at least on mobile devices.

The shapefiles I use here are from the US Census Bureau Geography pages. The 2017 state population data used in the geogif here is also from the US Census Bureau.

Washington, D.C. is not a state so I didn't include it above but it had an estimated population of 693,972 on 1 July 2017.

I added the QGIS logo to the gif just to highlight the fact that this was done in QGIS.

Alaska and Hawaii are of course not to scale.

"Hey, I don't like red. Can you do a blue version?"

Yes, you can have a blue version

Update: this post was update on 29 December 2017 because the original gifs I posted here contained some errors (order of frames - e.g. Florida and New York were in the wrong order, though the data was correct). I have also updated the Google Drive files and they now contain five different versions of the blue and red gif each, at different speeds (from 1 second per frame up to 2 seconds per frame).

I have also uploaded two QGIS projects in zipped folders - one each for the red gif and blue gif. All the frames are there too, at 1000x1000 pixels.

In the text above I have explained how I re-sorted the state populations shapefile to make sure the Atlas iterates in the right order. You don't have to do this (e.g. I think @atlas_@atlas_featurenumber can be used) but I found it easier just to re-order the shapefile, particularly as I was working with copies of the same layer.

Saturday 16 December 2017

Population Density in Europe

I've recently been looking closely at population density data for Great Britain, Europe and the World. There are a number of good sources of data for this, which typically comes as a 1km resolution grid, including GHSL, the EU, and Datadaptive for Great Britain (based on ONS data). I'm working on some more technical stuff with this data - with varying degrees of success - but I have also outputted a few maps, including stuff I posted on twitter. I just thought I'd share a few of the raw maps here, plus a few observations, because the gifs are a little too quick and small to see the detail. Let's start with the Europe data.

I'm still not 100% convinced that Spain is correct - hmm

Pretty similar to the above, but Scandinavia looks most different

You can start to pick out settlement patterns more clearly now

Now we're getting up to reasonable urban density levels

This basically shows the most dense inner cities - Madrid and Paris highest here

I also did a similar thing for Great Britain, as you can see below. Nothing earth-shattering of course but I find it interesting and I know others do too.

You can see that lots of 1km cells have no people in them

This is isn't much different to the map above, just a little less red

Now this is just the wider urban fabric of Great Britain

This level of density is major towns and cities

Click to enlarge this and you'll see there's more to this than London

Some observations...

1. For the European data, it seems that Spain stands out as being different and - to the naked eye at least - potentially incorrect. I'm not sure about this though, because the data are collated, quality checked and published by the EU and have been circulated widely. Spain also has some very high urban densities and a notably different settlement pattern to other European nations.

2. Some countries are missing. Yes, this is because the EU data does not include some Balkan states, and other non-EU or non-EAA countries. But the coverage is pretty good and, if we want to do this on a global level then we could use GHSL data.

3. The availability of different data sources measuring the same thing now allows us to compare them and attempt some kind of validation and cross-checking. This is also important because where the 1km grid cells fall can of course affect the numbers reported at the local level.

4. Interestingly, for EU data, the UK is split into its four constituent nations here with separate country codes for EN, NI, SC and WA. The data, from 2011, shows that England is one of the most densely populated nations in Europe on this measure - and with recent growth since 2011 it may be the most densely populated, overall.

5. Using the most recent data I could find (2016 for England and 2017 for the Netherlands), England's population density is 424 persons per square km and in the Netherlands it is 502 (if you see lower figures for NL elsewhere it's probably because it includes non-land area, a common mistake). But raw arithmetic density figures are not that useful in my opinion because many areas have much higher densities and many much lower - in this case the mean is not a good model. A good example of this - at the extreme - is looking at population density in Russia. The fact that it has lots of empty land far from the main population centres could give the impression of a very empty country when in fact this would be a good case for using built-up density. This is something Duncan Smith has done in his excellent World Population Density map (see the Analysis link at the bottom of it).

6. The figures for Great Britain suggest some slightly higher maximum densities at the 1km cell level than the EU data but they are reasonably close. But nowhere in the UK is close to the maximum urban density figures in France and Spain. Both countries have a maximum density figure of above 50,000 per square kilometre. In the UK the maximum is just over 20,000 (in London). Poland, Belgium, Greece, Sweden - among others - all have higher absolute maximum densities than the UK.

7. I think population density is best measured and explored locally, so that's why I'm working with this data. More on that to follow in future hopefully.

Saturday 9 December 2017

A very late Brexity blog

I really have no idea whether Brexit will turn out to be a bad thing in the long run, but in the short term I'm kind of sick of hearing about it. So, naturally, I thought I'd do another blog post on it. The reason is that I had a few charts tucked away that I think are interesting. Here's what I did... I took the ward-level Brexit results put together by Martin Rosenbaum at the BBC and tried to establish if there was a correlation between % voting leave and level of deprivation. Yes, I know everyone has done this kind of thing using education to best effect but I'm interested in this link because i) I kind of wanted to test the 'left behind' narrative in a different way and ii) nobody had done this kind of localised analysis, to my knowledge. Here's the overall scatterplot comparing deprivation and % leave. Not great in terms of correlation. But perhaps you can use the data to produce something more interesting - I've put it here (it's a bit messy, but includes population-weighted ward-level IMD).

The BBC weren't able to get all wards - this is just a sub-set

But what I really wanted to do is pick out individual local authority areas and see if they fit the narrative of 'poor vote leave, rich vote remain'. The results for me were actually pretty interesting, particularly the contrast between some very non-deprived areas as you can see below with South Staffordshire and St Albans.

Not very deprived at all, all wards voted leave (record turnout too)

Big remain vote here in another non-deprived local authority

Now let's take a look at Birmingham, which I also think is an interesting case study in Brexit voting patterns at the local level. Plenty of poorer areas voted remain and at least two of the less deprived areas voted leave.

Different scales can sometimes be more revealing

My enthusiasm for Brexit analysis is on the wane as I type but I do want to share these scatterplots so here are the rest of the ones I did. I found them interesting so that's why I'm sharing them all here. In case anyone is wondering, I created a ward-level deprivation dataset by aggregating up from LSOAs to Wards, based on the technique described in the English Indices of Deprivation Technical Report. As such, these plots cover England only.

Now just the small matter of finishing this whole leaving the EU business and everything will be fine. Maybe. Who knows. Either way, I promise not to do any more Brexit data stuff here.

Saturday 25 November 2017

A blog post about British map labels (plus free data)

I recently did a talk about the north of England, for which I created a few 'alternative maps' - i.e. maps that took an unconventional approach. The thing that was unconventional was that I inverted the normal labelling hierarchy so that small places had big labels and big places had small labels - an example is shown below for part of the north east of England. I have also shared some of this data for the whole of Great Britain - read on for more on that.

A new megaregion is born

To do this, I used Ordnance Survey open data, and specifically the OS VectorMap District product's NamedPlace layer. I created a complete version for Great Britain and then filtered it so that only populated places were showing (i.e. FONTTYPE = 2). If you ever use this data you'll know that it also has a FONTHEIGHT field, which goes from a low of 5 (generally very small places) to 15 (the biggest cities in the country). For all types of point, not just type 2, the height field goes from 4 to 18. This can be used to set the font height in software like QGIS or ArcGIS, and in the map above I've just inverted and enlarged the labels using this variable. When you do this for the whole country it looks something like the big mess below.

From Mid Yell to Hugh Town, we got all the best names

Some labels are in upper case and some are in proper case in the attribute table, but I wanted to see how much logic there was to the label hierarchy, so I did a little digging. I should also say that I believe FONTHEIGHT is based on cartographic placement principles, in addition to some other things like settlement size and/or importance.

But hold on a minute, what's all this... I've just re-downloaded the most recent OS Vector Map District data for Great Britain and Ordnance Survey have done away with the above typology and replaced it with something which is easier to understand. Truly exciting stuff. Okay, perhaps I need to calm down but it's still pretty nifty and will be useful for a lot of people. Not having a numerical font height field may make things a little bit more tricky at times though. And, what would be really amazing is if there were an option to easily download data for the whole of Great Britain, rather than for the two letter OS grid squares.

Anyway, I've just tried to add a little bit of value here through merging the data and explaining all this. The new dataset has 364,581 named place points for the whole of Great Britain, divided into the following classes: 'hydrography' (50,343 features), 'landcover' (9,987), 'landform' (32,479), 'populated place' (255,959), 'woodland or forest' (15,813). I'm only really interested in populated places here and they break down into the following categories.

But what is 'large'? Keep reading...

Users need to keep in mind that the classification of populated places is still listed under the 'FONTHEIGHT' field so I'm sure it's partly about cartographic placement and not just size of places, or their populations. An example of a 'small' place would be Skaw, with Pockthorpe as 'medium', Altrincham as 'large' and Leeds as 'extra large'. You can see the 'extra large' places in the map below. I also notice that the gaelic name for my home town (Inbhir Nis) seems to have been added as a new feature that I didn't see before.

Remember, it's not all about importance or size

Some further maps below, so that you can see how it works in practice with the different types of places - which I have shown in different colours and sizes on the maps. You may have to click these to make them big enough to read the labels.

I have no idea what 68 - 96 is

Edinburgh as 'extra large' and Leith as 'large' here

Only these three are 'extra large' in London

Newport: it's not a small Welsh town - it's 'extra large'

All that's left for me to say is that I hope some people reading this find it useful and, if you do and you have a need to label places in Great Britain then feel free to use the GB layers I put together. You'll find them in this Google Drive folder. I've done one version with just populated places and another version with everything.

Get the data here