Stats, Maps n Pix

Wednesday, 17 August 2016

Research with QGIS, R and speaking to people

I recently led a piece of research for the Joseph Rowntree Foundation on disconnected neighbourhoods - basically, it looked at the UK's most deprived areas and how connected or disconnected they are to their wider cities in relation to jobs and housing. You can read the brief Findings or Full Report here. This post is just about the methods we used and some of the outputs. We used open source software QGIS and R for the analysis (led by Ruth Hamilton) and we also spoke to policymakers across the country (Rich Crisp and Ryan Powell).

You can see the full report here

We looked at those areas in England, Northern Ireland, Scotland and Wales that fell within the 20% most deprived on the national deprivation indices in each nation and then explored data relating to household moves and commuting (plus lots more). We updated and developed two area typologies to help us make sense of the data - and to see how things changed we produced riverplots (Sankey diagrams) in R (this was done by the briliant Ruth Hamilton).

Created with the Riverplot package in R

I also did a little bit myself with open source software, including updating a 'divided cities' type graphic I produced in the past - looking at the spatial split between most and least deprived parts of 13 cities across the UK, as you can see below. See Sheffield in particular for a very stark divide.

Red = most deprived, blue = least deprived

Our colleagues Rich Crisp and Ryan Powell then spoke with more than 140 policymakers in cities across the UK - another nice 'open source' method. You can read more about this in Chapter 6 of the report but the bottom line is that if we want things to change for the better then we need to take a different approach to urban policy - a more inclusive approach.

As I said, there are two typologies, and we then combined these into a matrix in an attempt to understand area types a bit better and then suggest possible policy responses that might make sense. What each category means is explained in the report but in the figure below you can probably make sense of what 'Gentrifier' areas are and what those labelled 'Disconnected' are.

The shaded areas might be a good policy focus to begin with

The point of this blog post, however, was just to highlight how useful and effective open source software now is, and can be, in real-world research. Advocates will already know this but many more have yet to make the leap so hopefully this will provide just a little bit of inspiration or motivation to do so.

We produced hundreds of maps for the project (too many for the report) so you can probably find one for your area in the online folders. Two examples - one for each typology - are shown below.

Residential typology map of Birmingham

Travel to work typology map for Glasgow

For more on the methods used to develop the typology, see the Annexes in the Full report.

Monday, 1 August 2016

How long is the coastline of Great Britain?

This is a bit of a long read, so if you really want to know the answer to the question in the title of this post, it's very simple: it depends upon how you measure it. Or, you could say that the coastline of the island of Great Britain is infinitely long. But this doesn't really help anyone who wants to walk or kayak or swim round this island, so I'll attempt to answer the question here. Take a look at the image below and you'll see that I've calculated the distance of the coastline round the island of Great Britain as 11,023 miles.

Quite a lot of coastline for a small island

But hold on a minute, I also calculated it again and got an answer of 3,876 miles, as you can see below. What's going on here? Well, the first image is an extremely detailed digitised representation of the coastline of Great Britain and surrounding islands (bearing in mind 'detailed' is a relative concept). This first map is represented by 2,282,000 individual vertices which create the polygons you see in the image above.

In the second map, only 0.1% of these vertices are retained, so the geographical features you see below are represnted by 2,282 individual vertices. You can't see much different between the two at the scale you view them at here but if you were trying to navigate your way into a harbour or sea loch on the west coast of Scotland, for example, it would make a big difference. Click the first image to enlarge it and then compare it to the next one and you will see some differences, but nothing too drastic.

The coastline length is a function of how you measure it

At this point, you might be thinking 'hasn't this got something to do with fractals and Benoit Mandelbrot?' - and you'd be right. He wrote a very famous paper in Science in 1967 on exactly this topic, entitled 'How long is the coast of Britain'. The answer is that there really is no definitive answer - it's all about how you measure it. But let's say you want to swim or kayak around the coastline of Great Britain and nearby islands. How far would you have to travel? I tried to calculate this based on a 1km distance from the shoreline and concluded that it could be done by covering fewer than 2,000 miles - even though the coastline seems to be a lot longer. After all, you wouldn't want to go in and out of every little cove and estuary.

Be my guest

I created a little gif based on different ways of measuring the British coastline, starting off with a file that included 100% of the vertices from my original Ordnance Survey map layer (see notes below for more on this). I then created files with fewer and fewer vertices retained, all the way down to a non-sensical shape which retained hardly any of the original points. This is what I got - at 2 seconds per frame (note '% of vertices retained' figure in each image):

Coastline length at different measurement scales

It's a bit difficult to see the difference between some of these images at this scale, so I also zoomed in to the west coast of Scotland to produce another little animation. This time, you can really see more of the difference between the layers I produced. The figures on the graphics indicate what percentage of the original vertices were retained in each case. Below, this, I have also provided a still image with different versions of the coast overlaid on top of each other, just to demonstrate the impact of reducing the number of vertices on the representation of the coastline, and hence its length.

This shows Morar, Mallaig and Loch Nevis

Each line represents a different level of generalisation

I then decided to take a smaller island and extract the individual vertices (also known as nodes) that make up the shapes you see in the maps above. For this, I chose the Isle of Skye because it's one of the biggest British islands and the coast is highly irregular and indented. Using the version of the original shapefile where I retained 1% of the original vertices, Skye is represented by 772 individual nodes joined together to make a single polygon, as you can see below.

This produces a pretty good approximation of the coastline of Skye for most purposes. At this resolution, the coastline of Skye comes in at 330 miles (530km), compared to 456 miles (733km) at the original resolution. But of course we need to remember that if we had digitised around every single rock around the coastline the length would be nearly infinite. If you measured the coastline with a matchstick, for example, you'll get an extremely high value (and a sore back).

Skye represented with a polygon comprised of 772 vertices

Here's what this looks like when you show them one by one, in an animated gif - just to give you an idea of how it is plotted spatially. This is shown at 15ms per frame, so the dot fairly zooms around the coastline. All of this also gives you a little insight into how a GIS deals with geometry and what goes into the shapes that you see on your screen. It also helps explain why the very detailed, highly accurate spatial data files we can download from Ordnance Survey aren't always the most appropriate ones to use in small scale mapping. Or, maybe I just wanted to make another geogif, but either way I think I learned something.

A dot going round the Isle of Skye at 99,000 mph (forever)

So, how long is the coastline of Great Britain? Well, if you want to swim or kayak around all islands then you should think about training for a distance of around 2,000 miles and if you want to walk the coastline of Great Britain only then it's most likely going to be a bit more, or maybe a bit less - but that depends upon how you plan your route. Despite all the uncertainty, however, I think we can all agree that you'll need to go more than 1,024 miles.

Yes, this is Britain (kind of)

Last of all, I also did a little gif showing the 174 vertices of Great Britain when the file is massively reduced - so I'll end with this.

Another one, just for fun

Notes: I used the OS OpenData Boundary Line product for the coastline. This was a polyline file so I converted it to a polygon and then generalised it several times using the Visvalingam algorithm in mapshaper. Contains OS data © Crown copyright and database right 2015. You'll see if you search online that my measurements are close to those of others - so I'm at least as right or wrong as some people. If you're interested, you might want to look up the coastline paradox as well and, of course, Lewis Fry Richardson. Other big British islands? After the island of Great Britain, it's Lewis and Harris at 741 miles of coastline (1,193km), the mainland of Shetland at 692 miles (1,113km), Skye at 456 miles (733km) and North Uist at 334 miles (537km). Remember that this refers to coastline length and not land area.

Tuesday, 26 July 2016

Urban road network data for 80 world cities

A recent study published in Nature's Scientific Data caught my eye recently. In the paper, published online on 21 June 2016, the authors describe a method for taking OpenStreetMap (OSM) data and producing usable, toplogically accurate network data from it. It covers 80 of the biggest cities across the world, from Tokyo to Medellin. What caught my eye in particular is that they shared all the data on Figshare, and you can download it in chunks or the whole lot at once (2.15GB). I had a little play with it and made some night time/from space views of the network data, just for fun. Guess the cities below (helpful clues included)... and then tell me which is the odd one out.

Clue: it's not Bognor Regis

Clue: I have been there (that's not very helpful, sorry)

Clue: lots of people live here

Clue: rhymes with Few Dork

Clue: lots and lots and lots of people live here

Back on topic now - here's what the study looks like if you haven't already clicked - the authors also provide loads of useful resources in the paper. They also describe how they were able to develop a topologically accurate street network using their GIS-based protocol.

Take a look at the study - it's great

The 80 cities featured

Citation: Karduni,A., Kermanshah, A., and Derrible, S., 2016, "A protocol to convert spatial polyline data to network formats and applications to world urban road networks", Scientific Data, 3:160046, Available at http://www.nature.com/articles/sdata201646

Friday, 15 July 2016

From CartoDB to CARTO - the future of interactive mapping?

I've been using CartoDB (now CARTO) for a few years for interactive mapping - and have always loved what it can do - from basic mapping to much more complex analysis. Now, with the re-brand as CARTO and the advanced analytical tools available through the new Builder interface it's on a new level. So, credit where credit's due - I thought I'd do a short piece on this now to give my take on the new interface. But first, here's a little gif of me playing around with some commute data - which you can also download yourself if you want to. The dataset was used as part of a project I've been working on with Garrett Nelson - but hopefully more on that in future.

I'm just playing around turning things on and off here

If you've used the old CartoDB interface, the new Builder one might be a bit confusing at first - though you may not actually be able to get access to it yet. But once you have played around with it for a few minutes it soon becomes pretty intuitive. I uploaded a sub-set of commute flow lines for Minnesota and Wisconsin and then decided to add widgets so that I could filter the data using line distance, FIPS codes and commute volumes - as you should be able to see in the larger image below.

Click to enlarge - change the data view by using tools on right

This is very much just a little data sample, but if you want to play around with the interactive CARTO map you can see it here. It's not very pretty and the origins and destinations don't have place names right now - only FIPS codes - but the principle is the same. The Widget interface takes a little bit of getting used to as well, but is really easy to use once you've figured out what's what. See below for a screenshot.

You can add widgets for different data types

Any negatives to report? For me, not now. I'm just enjoying the enhanced analytical tools at hand. But if I was being greedy... I'm not massively keen on the default legends, there doesn't appear to be an 'addition' blend mode and the snap alignment of shapes in the old map editor has me a little flummoxed, but these are minor grumbles.

I'm not getting paid to promote this and I don't know anyone at CARTO - honest - I just think they have produced something that works brilliantly, is simple yet powerful and allows us to manipulate, analyse and share our data in new ways. There are other tools out there but the new Builder, for me, takes things to the next level for a mass audience. To answer the question in the title of the blog: is CARTO the future of interactive mapping, then? Not the future, but probably a very big part of it.

Notes: really, they didn't pay me. Data used are from the American Community Survey. I wrote a working paper on it already. I also blogged about it on my old blog. Finally, if you're one of the few people in the world not to have seen it, Mark Evans created this beautiful site with the same data.

Saturday, 25 June 2016

What can explain Brexit?

There have been enough maps, charts and infographics on Brexit already, so I'll just post three scatterplots here. I was trying to figure out what was going on, so I explored a few key variables that I thought might explain why people voted the way they did. I chose deprivation, lack of qualifications and higher qualifications, using 2011 Census data. I only focused on England here. Some of it has been done already by John Burn-Murdoch at the FT, though in a different (much nicer) way.

First, deprivation is - overall - not strongly correlated with percent voting leave at the local authority level (R-squared is 0.0369). I used 2015 Indices of Deprivation at the local authority level in the scatterplot below, where I've also labelled some areas and coloured the points by region.

Click to enlarge

Next, I decided to look at whether the percent of people with no qualifications correlated closely with the percent voting for leave in each local authority in England. This was much more successful, with an R-Squared of 0.6197.

A pretty convincing pattern here

Finally, I decided to see whether higher levels of education - rather than lack of education - was more strongly associated with the propensity to vote leave, and it is. In the scatterplot below I compare the percent with Level 4 qualifications or above with the percent voting to leave the EU. This produces an R-squared value of 0.8053, which is really pretty high.

The outlier to the top left is the City of London

This may not be particularly surprising, given what is known about the link between voting patterns and education but I think it's particularly interesting because of i) the historic significance of this referendum and ii) the people likely to be hardest hit by any post-Brexit economic downturn.

Data sources: Census 2011 table KS501EW via NOMIS and EU Referendum data is from the Electoral Commission. I have shared the spreadsheet (and the Level 4 vs. % leave) here in case anyone wants to look more closely or make an interactive version.

Sunday, 12 June 2016

International collaboration, without leaving the house (or, why social media can be a good thing)

The idea behind today's post is in some ways blindingly obvious, but also often overlooked. Basically, it's that sharing ideas online can lead to all sorts of interesting, unexpected international collaborations of various types. I'm talking here specifically about research and academic-related collaboration, but it could apply to just about anything. I thought I'd share a short maps/viz story about this, just to demonstrate that 'internationalisation' (a common feature of University policy) doesn't mean having to cram yourself into a plane for 10 hours to some far flung paradise. But of course I should start with a map.

It's a bit of an eyesore in some ways, but that's not the point. Here's the story, and the point...

I'd been doing some work on mapping travel to work flows, since the data were released for England and Wales in the summer of 2014. Some time after that, another University of Sheffield department were contacted by a scholar from Turkey (Ebru Sener) asking about the possibility of visiting on a short Erasmus-funded trip. The other department didn't follow up so I looked into it, Skyped with Ebru and quickly discovered we had a lot of common interests and a good amount of overlap in our skills - but also, crucially, areas where we could learn from each other.

This led to a research visit in 2015, during which time we wrote a paper on housing market search (now published in Cities). Ebru also taught a class for me that week, which was great. We then kept in touch via e-mail and social media and it was Ebru that suggested I make the commuter dots go back to where they came from, which I thought was a nice touch and I then wrote this stuff up into a short piece for the Huffington Post after they got in touch (I have a paper on that coming out in future as well).

That summer, I experimented further on this area of research by looking at US commuting flows at the small area level. I published a short working paper on it, as well as a couple of blog pieces, plus the data I created from it. This then led to a brilliant US scholar (Garrett Nelson) taking a sub-set of the data and using a community partitioning algorithm to derive communities for one state (Massachusetts - see below). He blogged this, told me about it on Twitter and then I had a new idea so contact him again to see if he wanted to collaborate on a project to do this for the whole United States. Once again, we Skyped, made a plan and then got to work (side note: we used cloud computing because the data had outgrown the desktop environment).

Now we've almost finished our paper and - we hope - this will lead to further collaboration on the topic.

Following this, the brilliant Mark Evans (from Michigan) got in touch to say he was planning something which built on some of the ideas I'd had in order to build an interactive US version of the animated dot map you see above. He built it, shared with me and then it went kind of viral - with many news sites and local and regional outlets picking it up (Daily Mail, CityLab, WIRED, Boston.com etc.). I don't claim that I could have done this but it's nice to know I was, in part, a source of inspiration. Mark has also shared his method, which is great for further collaboration and other users. In the gif below you can see that Mark has coloured the dots based on where people come from, which is a nice innovation.

And the point? International collaboration and internationalisation seem to be talked about a lot in terms of travel, short and medium-term study visits and can be a logistical nightmare. Often, it's necessary though. The kind of thing I've discussed above can be an additional means of 'internationalisation' that is both cost effective and time efficient. I haven't seen that much about the possibilities for formal international collaboration through social media. This might be because I don't pay attention.

Having said that you can do international collaboration without leaving the house, I am of course off to the US this week for a conference...

Friday, 27 May 2016

City Footprints

Earlier in the week I posted a map of London's 'economic hinterland' on Twitter because I've been working with commuting data and wanted to see what the economic footprint of London looks like. But, some people have been telling me that other cities exist - which is a fair point. Since I have the data and I'm just revising a paper on the topic I thought I'd look at a few others, but this time using lower level data - MSOAs instead of districts for the origins. The maps below show the proportion of people from an MSOA who commute to a given area. Only MSOAs with 1% or more going to a particular place are shown and the darker the colour the higher the percentage. These aren't exactly the same as travel to work areas but they do give a reasonable approximation of each city's economic footprint. As you can see, I did quite a few maps - click to enlarge, as ever.

Birmingham has quite a pleasing concentric pattern

Bradford has quite a wide footprint

Bristol - quite a neat footprint

Cambridge is a little wider than I expected

Camden - one of a few London Boroughs I looked at

Cardiff - clearly the major focus in South Wales

Cheshire West - I wanted to see how far into Wales it goes

Derby seems relatively tightly packed

Leeds - the second largest local authority by population

Leicester - another quite tight East Midlands labour market area

Liverpool - extends into Wales and south to Cheshire, as you'd expect

Greater London - the light areas are only 1% of commuters, but still!

Greater London - same as above, but with some city labels

Manchester - only the 'underbounded' district here, but still dominant

Middlesbrough - an important northern labour market area

Milton Keynes - I think it has quite a wide footprint

Newcastle - clearly dominant in the North East

Norwich - a good example of a large regional labour market area

Nottingham is relatively symmetrical in labour market terms

Oxford - a relatively large footprint here

Plymouth is a major South West economic zone

Reading - I had expected this to be a little bigger

Sheffield is another major northern labour market area

Southampton - somewhat overlaps with London's fringe

Southwark - I wanted to see how this London Borough looked

Swansea - quite a wide footprint here

Tower Hamlets - interesting to see the dominance of eastern origins

Warrington - a strategic hub in the North West

City and Westminster - the ONS group these two together

York - quite a wide Yorkshire footprint

Should I patch all these together in one big animated gif? Of course I should.

Why isn't my city included? Good question. Sincere apologies.

Birmingham doesn't get enough love, and is so often overlooked, so I did a zoomed in version of MSOA flows into Birmingham, with a few place labels.

Click here to see the full size version

Notes: the maps give the impression that they are unclassified choropleths, but that is just for effect and because this is a quick map batch. The colour classification is the same in each (see below) and no areas with less than 1% commuting to a given place are shown. I used the UK Data Service Flow Data website to extract the data and QGIS 2.14 for the maps. Bear in mind that they are a bit rough and ready and only really for comparison. Also, each 'city' here refers to the local authority area, not the wider city-region. But I think it's interesting to compare places. You just need to bear in mind the spatial scale and relative size of the destination places. Birmingham and Leeds ought to have much larger footprints that (e.g.) Nottingham and Sheffield because they contain more jobs. Where are Scotland and Northern Ireland? These datasets come separately and are not part of the English and Welsh MSOA geography so are not mapped here.

I used the same classification scheme for all maps