Friday 31 March 2017

Visualising a lot

This post is about visualising 'a lot', because it's something I've been thinking about as I write part of a book on GIS. The basic idea I'm exploring here is that when you have a dataset and want to somehow simply visualise 'a lot' - e.g. because the volume of data seems overwhelming - then there are different ways to approach it. For example, if you had millions of points on a map, you could use a hex-binning technique to give a standardised per-area figure, or you could do some kind of visual aggregation or summary in chart form. Or, to convey 'a lot' as a kind of visual device, you could perhaps just do a visual data dump, as I did in this example. Today's 'lot' is from the Gun Violence Archive dataset for the United States in 2015, compiled and released by The Guardian and collated by the Gun Violence Archive. I opted for a fast animation to visualise 'a lot', which I have now updated with a running total (in yellow). Let's go straight to the gif now, showing all gun homicides, one frame per day, for 2015 (and fast - 10 frames per second).

It's supposed to be overwhelming - click it for full size

When I looked at the original dataset at first, which includes, more than 13,000 gun deaths, my immediate thought was 'that's a lot'. All things are relative, of course, but in a global context it's hard to argue against this, particularly when you compare the data to other developed nations. The dataset has precise lat/long details for each incident and also the date and number killed and injured. I then summarised the data by day, plotted the locations as single points and then created 365 frames for this animated gif. It's not supposed to be readable at the micro scale of individual days or incidents, because I wanted to focus attention on the volume of data. A video version that you can pause or play more slowly is embedded below. I also did a slightly slower animated gif, at 5 frames per second, which of course is still somewhat overwhelming, shown below. Update: I have also added a cumulative version, prompted by Simon Rogers, and thanks to a bit of help from Ian Turton.

This is the same as above, but a little slower (73 seconds in total)

In this version, it's cumulative - click to enlarge and start from beginning

The individual frames were created in QGIS and in relation to the max and min values per day you can see those below. The largest number of gun deaths in a single day in 2015 was on July 5th and the lowest was on May 22nd. The mean number killed per incident was 1.12 and the mean per day in 2015 was 35.8 (for a total of 13,067).

The peak month overall in 2015 was also July

This was the only day that the number killed was below 20

There are just over 11,600 incidents recorded in the database but it's quite difficult to get your head around at a national scale. The Guardian already published some great localised mapping of this data, if you're interested. With this example I was just trying to experiment with ways that quickly and simply convey the idea of 'a lot'. The fast animation using thousands of data points is one way of doing this. It's designed with repetition and replay in mind, and the point is not to highlight individual datapoints or days, but to create a kind of cognitive mash where the end result is that you can take away some detail - e.g. most days have between 20 and 50 gun deaths - and also see the locations do, as you'd expect, mirror underlying population patterns. But only to a point. If you look closely you can see that some places are over or under-represented.

There are many ways to powerfully visualise this kind of data, including much more nuanced interactive methods of the kind produced by FiveThirtyEight. My approach here is non-interactive on purpose, but of course it is less visually appealing too. But then I also think that making something beautiful out of something so ugly is not what I want to be doing. All I wanted to achieve was to highlight the volume in the data in a way that anyone could understand and by using one frame per day and plotting the location points I think I'm just about there.

If you're interested in looking at any of the individual frames for a given day, take a look at the Google Drive folder below. You can see individual dates to the top left of each image and also in the file name of each image.

See all 365 individual days here

Notes: in the Guardian's original csv, I found that the date formats were a bit messed up, so I fixed this and added in some new, corrected date fields to the right of the spreadsheet. I also added in individual columns for day and month. I'm not a gun campaigner, this was just an interesting dataset for me to use. If you have any questions, feel free to get in touch. This data covers homicides only, no suicides. I updated this post on 5 April 2017, to include cumulative totals in the maps. Updated again on 10 April 2017 to include a cumulative version. It looks a bit ugly at the end but then it's a pretty 'ugly' dataset. I thought this was another interesting way of displaying the data.