Sunday, 23 October 2016

The Global Human Settlement Layer: an amazing new global population dataset

At the recent Habitat III conference on housing and sustainable development in Quito, Ecuador (17-20 October 2016) the European Commission launched a new Global Human Settlement (GHS) dataset. That's what this post is about. Before that, here are some basic data details - Landsat data from 1975, 1990, 2000 and 2014 were processed and analysed in order to produce three different GHS products: one on population (GHS-POP), one on built-up areas (GHS-BUILT) and one city model dataset (GHS-SMOD). But this is already getting too technical, so let's look at some maps - I've just created a few in 3D for fun in order to give you an idea of what the population dataset looks like. Each image below uses the 250 metre resolution one (there is also a 1km cell version for population).

I've added a few place names and extruded cells by population

The London example is pretty interesting and I think provides a nice overview of settlement patterns both in terms of distribution and density. As you can tell, this is just a small chunk of the data - it covers the whole world so is a pretty big file - but more on that below. For that reason, I took a smaller extract to explore it further and for this I exported the United States as a separate file and looked at four metro areas I thought would be interesting from a populaton density point of view: San Francisco, Los Angeles, Houston and New York. The highest population value in any single 250m cell in the London example above was 1,595, so I also thought it would be interesting to compare them to the US cities. Let's take a look, starting with the Bay Area around San Francisco. 

That big spike in the north? That's San Quentin State Prison

This was also just a little extract, but again it gives an interesting view of population density. The population spike in the north of the Bay Area surprised me but then I looked at it more closely. The data showed a figure of 4,856 people in that 250m cell, which seemed pretty high so I dug a bit deeper. The Wikipedia page for San Quentin State Prison tells us there are 4,223 prisoners (137% of capacity) and another quick search tells me there is employee housing there too, so this figure stacks up. The next highest value is in San Francisco, so this makes sense too. But what about Los Angeles - how did that compare?

This is just a part of the wider Los Angeles metro area

The highest population value in any of the 250 metre cells in Los Angeles was 2,285, which I was a little surprised at because I didn't think it would be much higher than London. This was just a quick and dirty extract, so no labels here (or scale bars, sorry) but you do get a sense of the urban density and distribution of settlements here. Somewhere I did think would show much less density was Houston, and I was proved right here, as you can see below.

The sprawling metropolis of Houston

The highest population figure in any one 250 metre cell in Houston was 813, according to the GHS dataset. This is of course not surprising but I found it quite interesting to see it like this. Finally, I wanted to see what New York and its wider metro area looked like. I thought it would beat San Francisco for density, and it did.

An obvious spike in population density in most New York City Boroughs

The highest population figure for New York City per 250 metre cell was 6,189. This makes sense when you think that a tall residential apartment building can easily fit within one such cell - and in fact multiple buildings can. Mind you, it's still a pretty high figure.

These examples are from the 2014 dataset but there is so much else to see, if you have the time and skills to explore it. I'm at risk of becoming addicted to it, so I'll have to restrain myself. For now, I recommend that you check out the European Commission web pages on the data.

The rest of this post includes more technical information, possibly of interest to only a few data/GIS nerds with nothing better to do with their lives.

About the data (and yes, it's open and free)
The most important thing is to know where to get the data (once you've read about what it is) but this can involve endless clicking so here's a FTP link to the downloads. I've focused on the population part of the dataset here and it comes in TIF format. The 250 metre resolution one is about 626MB in size, so you need to have a decent machine to work with it. In terms of map projection, it comes as World Mollweide (EPSG:54009).

Quite a big file, but not too bad considering it's global
Here's the Copyright text file for the datasets

You can then open the file in your chosen GIS - I've shown a couple of examples of this below; one with ArcGIS and one with QGIS (the dataset notes file specifically mentions both of these). I have found it easier to work with in QGIS so far. When you open them at first you won't be very impressed - some further styling is needed. Also, in ArcGIS the high value suggests an impossible figure and in QGIS the values go from zero to zero - again, this just needs some tweaking in order to display something meaningful.

Notice the strange high values - that's not right!

Yep, nobody lives on earth (0 to 0 in the values on the left of the image)

Once you get the data on screen, you can start to style it and get something meaninful in front of you. Here's an example from ArcGIS, where you can see that there is some 'blockiness' in the data in some areas - it's not perfect at 250m resolution so at times the 1km product may be better on this front.

The high value of 7368 seems more reasonable here

Now let's take a look at a more cleanly styled view, this time for England. As you can see below, this now gives us quite a nice overview of the settlement pattern for the country.

This is just the original raster dataset, zoomed in

Since I extracted the data for just the United States, I also have a nice separate 250m cell version of that. I actually converted this to a vector layer in QGIS (and it's about 850MB) so here's what that looks like for the lower 48 states. I think this is quite pleasing to the eye. Click to make it bigger - it's a good approximation of the settlement pattern of the United States.

This is a vector version of the 250 metre population dataset for the US

One thing I haven't yet got to the bottom of is what the maximum population of any single 250m cell is. In both ArcGIS and QGIS, the maximum seems to be 634,492 - which isn't right. You definitely can't fit that many people in a 250 metre square! Hopefully someone will get to the bottom of this. I think this figure might come from aggregated blocks of cells in the data but so far I haven't had time to figure it out.

How to work with the data
Working with the data is quite tricky so here are a few tips for how I dealt with it in QGIS. This last part describes how I extracted a subset of the original massive 626MB TIF so that I could work with smaller chunks and then convert it to vector format for doing some 3D maps. All I did was load up the full 250 metre resolution population dataset and then went through the steps you can see in the screenshots below.

This is the original dataset, zoomed to Liverpool and Manchester

You can then just select an area of the TIF to extract by clicking and dragging

Using the new TIF, I then convereted it to a vector layer

This is the new vector layer, zoomed and symbolised

Finally, I decided to do a little bit of experimenting with the 2.5D symbology options in QGIS (available from version 2.14 onwards). The images at the top of the post were done in ArcScene (part of ArcGIS) but ideally I'd have done this in Blender instead - but that would have taken too much time. Also, I'm waiting to see what Steve Bernard and others might do with this dataset - there are so many possibilities and so far my Blender skills are really limited.

This might break your computer if you try too big an extract (e.g. an entire country)

Finally, a zoomed in version of the above viz

There's so much that you could do with this data for research purposes, or just for fun, but the first hurdle is getting your head round the data and how to work with it. This post is just intended as a small contribution in that vein. I hope some find it helpful.

Notes: the GHS population dataset is a giant raster (TIF format) of 626MB when compressed. I created an uncompressed version (by mistake) and it was 33GB! There are 141,969 columns and 60,829 rows in the full raster - this adds up to 8.6 billion cells, so I don't recommend trying to convert the whole thing to a vector image because it won't work and is not a good idea anyway. The creation of the dataset was supported by the Joint Research Centre (JRC) and the DG for Regional Development (DG REGIO) of the European Commission, together with the international partnership GEO Human Planet Initiative. Lots of very clever individuals contributed to the project, and you can find out more about the team on the GHSL people pages

Monday, 17 October 2016

D3 Charts in QGIS Print Composer

This post is about how to make charts created in d3.js appear in QGIS and how to make them work in the Print Composer so that you can export them with your maps. It's inspired by the recent publication of a new QGIS plugin which allows you to create one kind of D3 chart within QGIS. For this example, I'm using Motor Vehicle Collisons data from NYPD, because the QGIS plugin here needs data with time stamps in it and this fits the bill. Also, it's an interesting dataset because it's a good example of something where time and place are important. Collisions are closely tied to certain locations (e.g. intersections) but they are also more likely to happen at certain times (e.g. 8am, 6pm). First up, here's a little visual of lower Manhattan collisions by time of day, using the whole of 2015 - don't worry, there aren't that many collisions in a single day, even in NYC.

Crash, bang, wallop

Okay, so we have a gif with lots of flashing dots, which isn't very helpful. But it does tell a very basic and obvious story. What if we wanted to put a static map in the Print Composer beside a radar chart showing collisions by time of day? That's where the D3 Circular Date/Time Heatmap plugin comes in handy. In QGIS, just search for D3 via Plugins... Manage and Install Plugins to find it and install. Once you've done this, you'll see the little chart icon in your toolbar area. To make it work, the best thing you can do is follow the original tutorial, which uses 2006 Chicago Crime data. You'll see in my example below for the NYPD data that I've set the chart to show day of week and hour of day so that I can get a sense of the time pattern associated with collisions.

Note: this plugin produces circular charts only

Once you hit OK on the plugin, you'll get a D3 output page which will render in your web browser, after you tell it where to save the output files to. When you do this, you'll see three new files in your designated folder, as below.

Nice small file sizes with D3 here

Now you might want to add this chart next to a map of collisions in your Print Composer in QGIS. You can do this quite easily. First of all, you just need to get things ready by positioning the map output as you wish. In this case, I've gone for a landscape layout and I have a little Manhattan collision map on the left, showing only some collisions and the circle chart on the right. Here's a screenshot of my Print Composer showing that I've just added the D3 chart using the Add HTML Frame button. I've just told QGIS where I saved the chart by pointing it to the location on my computer where I saved the new chart, but you can paste in the original code in the box below that if you wish.

Still looks pretty messy, but you get the idea

In the example above, I've changed some of the original defaults in relation to fonts and so on, just for my own benefit. You can also go in to the html and tweak whatever you want - in fact it's quite a good way of getting used to how it works if you've never done it before. Then you might want to export the map at a decent resolution. This is what I've done in the example below, which is only for test purposes because it's not a finished map - I'm just adding it here so you can see what the D3 chart looks like from an exported Print Composer layout.

Just to demonstrate the resolution - click to enlarge

A little close up on the chart, just to highlight the quality

From here on in, it's just a case of experimenting more and more until you get what you want. There are lots of different ways to generate D3 charts and other output, including online tools such as Raw or even learning a bit of D3 yourself.

Finally, take care when crossing the street in NYC.

Notes: if you're looking for a QGIS plugin that will turn your whole map into D3 and make it interactive, look no further than Simon Benten's D3 Map Renderer, which is really cool. This example I've shown today is really very basic, but it shows you how to get some D3 into your QGIS projects, should you want to. For the most advanced, beautiful examples, check out Jason Davies and Mike Bostock. You'll sometimes see it written D3, D3.js, d3.js or just d3 - it's all the same thing. I prefer D3. For nice practical examples of D3 in the wild, check out the Financial Times page - with John Burn-Murdoch and Alan Smith really leading the way here.

Wednesday, 12 October 2016

Crowdsourced City Boundaries

One Friday afternoon a couple of weeks ago, while putting off something more important, I felt the sudden need to see if I could successfully crowdsource some city boundaries. I'd been doing this kind of thing with housing market boundaries for a few years but I thought it would be interesting to see how many shapes I could get people to draw, so here are the results - for fun more than anything else but I think they are also pretty interesting. If you want to download the raw data, you can go back to the original page and click the Download Data button in the bottom right. I've kept it very simple. First of all, the maps are overlaid on a black and white map and the brighter shading shows where more shapes overlap. For places in the UK, I've also overlaid local authority boundaries to give a sense of how the drawn shapes compare to administrative units. The code for this tool was written by Nick Martinelli (thanks a lot) - I just modified it slightly. You should be able to see the detail in the maps when you click on them.

Yes, this is definitely Birmingham

100% Edinburgh - that's it settled

Someone thinks Glasgow is pretty big!

This seems pretty sensible to me

Let's claim a bit of sea while we're at it

This seems big, but is probably not far from the truth

London's functional reach really does go far beyond the boundary

Yes, 'Manchester' is still the classic 'underbounded' city

Now for some US cities - I've just added the ones which had the most shapes drawn for them, starging with Chicago. I see someone has very carefully added in what looks to be an almost exact representation of the actualy city boundary - impressive!

Well done for the detailed city boundary drawing

Los Angeles County (pop c.10m) comes out a bit here

People seem to have mostly drawn the boundary of the 5 boroughs

For New York (and across the US), Jen Nelles helpfully pointed out to me on Twitter that the term 'city' has a different meaning and perhaps I should have used 'urban' or 'metro', which is a good point. When I did this I was only really thinking about the UK but then I got drawn cities coming in from all over the world, so keep that in mind when interpreting these.

Some detailed shapes were drawn for Philadelphia

I find Seattle particularly interesting - nice commutersheds here

Some evidence of people drawing the administrative boundaries here

And now some other cities from different parts of the world - I would have added more but some cities (e.g. Auckland) only had a couple of shapes drawn.

I've left a couple of nearby places in here too

There weren't that many shapes here but it's still quite interesting

Lots of blocky shapes in Tornoto, which I found interesting

And finally, three more images, one of which includes one of the rude shapes drawn by several scoundrels who defaced my map! 

Someone drew a giant knob over Wales - very naughty

A full size GB/IRE version 

I particularly like the way Munich has been drawn here.

And that's all I have to say about this little adventure in crowdsourced procrastination. Thanks to everyone who drew a shape or shared the link online. I've found it very interesting. Feel free to use any of this if you find it of interest. Now time for me to get back to what I should have been doing in the first place.

Notes: I'm did this as a little mapping experiment after seeing Colin Ross do it for Western Sydney (which was inspired by earlier Bostonography work). Earlier work by people such as Kevin Lynch and Patrick Abercrombie is also relevant here. Edinburgh City Council also ran a project like this in 2014 called 'Natural Neighbourhoods' and there are countless other examples across the world. A deliberately prettified/extreme pink example of some earlier stuff I did with this (using housing market data) can be seen on the Rightmove blog. That example involved millions of shapes, whereas we're only just into the thousands for this blog. If you want to see some of my academic work on this, in relation to housing market geographies, take a look at this paper (open access) - an example image is shown below. 

This was generated using data from Rightmove's 'Draw-a-Search' tool