Sunday, 28 November 2021

World Population by Latitude

If you search online for 'world population by latitude' you'll quickly find quite a few results, including the great analysis by Bill Rankin on his Radical Cartography blog from 2008, which uses population from 2000, and also includes population by longitude. There's also a nice interactive version on Engaging Data, plus similar things on my One Degree of Population piece on here a couple of years ago and my global population density spike maps. In this post I look at world population by single degree of latitude using data from 2020. There are maps and stats below, but let's begin by looking at what appears to be some kind of alien eyeball but is in fact world population by single degree of latitude, where redder areas = more people at that latitude and bluer areas = less people. I used 2020 WorldPop data to do the calculations and all the maps were done in QGIS, as usual.

Weird alien eyeball? Or population map?

It's better viewed as a flat map, obviously, so I've posted that below. I've labelled quite a few places across the world and of course there are places at some highly populated latitudes that have very few people (notably the Sahara Desert) but you get the idea here: redder = more people at a given latitude.


Note the numbers on the left - including population data

If you view this image in full size you should be able to read the population numbers for each single degree of latitude - I've posted a zoomed in extract of this below. The most highly populated single degrees of latitude, according to my analysis? Here are the top 10 that I get:

  1. 25-26° North: 278.6 million people
  2. 26-27° North: 271.7m
  3. 23-24° North: 244.5m
  4. 24-25° North: 237.4m
  5. 22-23° North: 235.3m
  6. 30-31° North: 234.8m
  7. 31-32° North: 226.2m
  8. 34-35° North: 215.8m
  9. 35-36° North: 214.6m
  10. 27-28° North: 198.3m
And then down in 28th place we have the first entry in the southern hemisphere with 6-7° South having a population of 117.5 million people, according to my calculations. 


These are numbers I calculated myself

It's difficult to do anything with this colour scale these days without bringing to mind Ed Hawkins' warming stripes, but I think it's useful to use the red/blue colour ramp here because the most highly populated places are generally the warmer ones and the colder latitudes on the whole have fewer people. But of course there's another factor here, and that is about where all the land is. So first let's look at land vs population and then we can look at a density version of the above map - i.e. population density by latitude that presents the stripes based on the population density of land at all the different latitudes.


This is what I get for land by latitude


And population by latitude - people don't like living in the sea

You can do a little crossfaded gif to get the comparison between these two different elements, so that's what I did below. You can see it in the still images above, and in the gif below I have highlighted the single degree of latitude with most people and most land. 


Finally, we have a population/land by latitude gif

Let's hit pause on the gif, mid-fade, and then add some labels so we can make a bit more sense of what's going on. That's what I've done below - click to enlarge in order to read the labels.



Here's another globe-style view of the population data but this time from a different perspective.


Some surprises here perhaps

Okay, so the next three images are similar to the 'world population by latitude' image above but this time they are actually population density by latitude. That is, the redder latitudes are the most densely populated. I did this so that it only takes into account land, which is kind of where people like to live for some reason. You'll note in particular here there is a dark red stripe in the southern hemisphere that cuts across Santiago, Buenos Aires, Cape Town and Sydney, among other places. Note that I experimented by dimming and then removing the latitude colours over the sea in the second and third images because it kind of makes sense to do that.


Density makes a bit more sense

Dimmed the sea because people don't live there

Removed the sea latitudes, but I like this less


Other odds and ends

Once I'd done this I experimented with different map projections and views, hence the weird population latitude eyeball at the top of the piece. I also experimented with views from above and from below, so you can see them below in the first two images. The first one in particular looks like a bloodshot eyeball to me.


Where do most people live? Just eyeball it

Antarctica is pretty big

What else? Well, I did a few other versions, including just coloured lines of latitude, then adding only land, adding a chart and that kind of thing so I've posted them below too. These are more experimental versions - some I like more than others but the one with just colours and land I think is quite interesting.


Population stripes, with land

Just the population stripes

I took away only the land here

In this one I just added a chart of population by latitude

This is the original, as above, but without a white border

Does the final result on all this depend upon what dataset you use? Do you get a different result by using, say GHSL data vs WorldPop data? Or what about NASA's GPWv4, or even another source? Well, probably a bit, but the difference between GHSL and WorldPop wasn't huge and WorldPop is also available for 2020 and GHSL isn't so I used WorldPop. But you can see the difference between GHSL and WorldPop below anyway.


GHSL

WorldPop

GHSL WorldPop GHSL WorldPop GHSL WorldP

This all basically makes sense and is not particularly surprising but then again it's nice to be able to put some numbers to all this and make some maps of it. Plus it's interesting for training data and for working on map methods and techniques in QGIS.

People live here

Here's a screenshot of what this looked like in QGIS before I exported some of the final images. The font is Righteous, by the way.


Print Layout on left, map view on right


So there we have it. 

Thursday, 11 November 2021

A few QGIS geometry, label and style tips

I haven't done a QGIS how-to blog post in a while, so it's time for another one because I'm working on a lot of training material right now. The end result is just a plaything, so it's more about the methods used. 'Learning by play' is a key concept in early years learning for a very good reason and I'm a big proponent of learning things this way, no matter how old we are. Okay, so what are we going to do? Well, see below for the end result and then we'll work it up step by step. This is for people already fairly familiar with QGIS, but you can probably follow it even if you're not. 

A stylised world cities and population map

The point of this exercise is mostly to demonstrate some methods, but first you need to grab some data from Natural Earth - just two layers are needed, plus one we'll make ourselves.

  1. Natural Earth populated places (simple version) .
  2. Natural Earth countries (without boundary lakes - this means we can see the Great Lakes on the map, and so on).
  3. A 10 degree grid layer, which we'll make in QGIS - it's very easy.

Step 1 - add the two layers

Add layers 1 and 2 above to QGIS, and make sure the places layer is on top - i.e. make sure the dots aren't under the land. 

Your colours may be different, it's not a problem


Step 2 - create a grid layer

Make a 10 degree global lat/long grid by going via Vector > Research Tools > Create Grid... and then entering the settings you see below. Make sure you create a polygon grid, as shown in my screenshot and then once it appears in QGIS, drag it underneath the other layers in the Layers panel. You'll notice that the grid extent in the screenshot below goes from -180 to 180 in longitude and -90 to 90 in latitude. It doesn't matter if you see decimal places in the box, just be sure to use these numbers otherwise your grid won't cover the whole world.

I chose 10 degree grid spacing, but feel free to us what you like

Step 3 - duplicate the layers

Then we're going to want to duplicate the layers - as shown below - and then change the styles. We will have three copies of the places layer, two copies of the countries layer and two copies of the grid layer. I'll share the colour and style information below as well. Don't know how to duplicate a layer? Just right-click a layer on the left and hit Duplicate Layer. They won't look like what you see below (yet) with the colours and filters on them but we'll do that next.

Only three data layers here, duplicated

Step 4 - style the point layers

There are three copies of the points layer. One is filtered to only show major world cities, and is represented by a square marker 2.0 size, 0.4 stroke width) with an upper case label and a semi-transparent background, with rounded corners. One is represented as spikes, based on population. And one is represented as fake shadows for the spikes. The spike and shadow layers use the QGIS Geometry Generator to convert the original point data to line data. See below for a screenshot of each layer's symbology.

The label settings for the point layer (font size is 10)

The main symbol settings for the point layer

Label background settings: #333333 is the label background colour, I used 45.2% opacity (in the Opacity slider in the Background options, rather than the Opacity slider in the colour options, but it probably doesn't matter which way you do it. I like the slightly rounded corners, so that's why the 2.0 values are in the Radius X,Y boxes.

To display only the cities you want, an easy way is to right-click a layer and use a filter (right-click > Filter...) and then here's what I used below to filter it, but you can add any cities you want of course, so long as they are in the dataset. 

Note that I used two different rules here - the second one is a list of city names that are also recorded as having a 1 in the "worldcity" column in the attribute table for the layer, but then I also wanted to add a few more that weren't classified this way, so I added a few more using the first "name" IN rule. Note that I have sometimes put an x or a city name with XXX after it when I decide to hide a city that I previously wanted to be on the map. This is just to remind me of what I was doing. Confused? Then either take some time to understand it by looking closely at the text, OR, just copy and paste it in and then play around with adding and deleting places. 


"name" IN ('Karachi','Dakar','Kinshasa','Tehran','x') OR

"name" IN ('Shanghai','New York','Madrid','Beijing','Tokyo','Paris','Moscow','Auckland',

'Sydney','Brasilia','Mexico City','Los Angeles','São Paulo','Buenos AiresXXX','Lagos','Cape Town',

'Cairo','Jakarta','TorontoXXX','Istanbul','Manila','Tunis','Nairobi','Dakar','Melbourne','Lima','Bogota',

'Santiago','Berlin','Mecca','New Delhi','London','Seoul','Rome','Mumbai','Hong Kong','Singapore'

) AND "worldcity" = 1

 

The bits of text in the double quotes are columns from the layer's attribute table

Okay, but how do you turn a point layer into spikes, or fake shadows? You use the Geometry Generator options in QGIS to do this, as shown below.

Here's the text you need to input for the spikes (below), which are based on the "pop_max" variable in our Natural Earth populated places shapefile. This makes a line from the points and the length is set to the value of the population, divided by 1 million. Why? Well, because the map units here are in degrees, and because the Tokyo metro area has about 35 million people, that means the biggest spike will be set to 35 degrees - if your dataset was in metres this would be too small and you wouldn't see any spike! 

If you're unsure exactly where  to find the Geometry Generator option, just right-click a layer, go to Properties... > Symbology and then where it says Marker towards the top you should see Simple Marker below that. If you then select Simple Marker look below that to see Symbol layer type - which should say Simple Marker right now, and then change it to Geometry Generator. Then you'll be able to replicate the screenshot below and add the text you see in the bullet point.

  • make_line($geometry,make_point(x($geometry),y($geometry)+ ("pop_max"  /1000000)))

 

This is for the vertical spike layer

Symbology for the spikes: 0.15 line width, #333333 colour. 

I decided I'd quite like to have some fake shadows for the spikes on my map as well, so I used a similar approach to generate these - I just added an offset angle. Note that for the symbology on the shadows I made them 0.35 thick (compared to 0.15 for the actual spikes) and also 10% opacity, so they are quite faint and not too visually dominant. Here's the Geometry Generator text I used to create these, and note the screenshot of it below as well.

  • rotate( make_line($geometry,make_point(x($geometry),y($geometry)+ ( "pop_max"  /1000000) )),110, start_point( $geometry))

This is the same as above, but it has rotate at the start because I want it to be rotated to a certain angle (in this case 110 degrees) but it also has start_point because I want it to be rotated from the point itself rather than another axis which would mean the shadow wasn't cast from the base of the spike, as it is here.

This is the fake shadow layer

Symbology for the spikes: 0.35 line width, #333333 colour, 10% opacity.

Obviously, these are fake spikes and you may not even want them, though that's not the point here. The point is just to demonstrate some of the capabilities with QGIS in relation to turning a point into a line, or any of the other Geometry Generator things you can do (e.g. simplify polygons, buffer, etc etc). Also note that this spike/shadow hack is not going to work if you change to a different kind of projection - e.g. anyone where the lines of longitude are not vertical - see below for a bit of Winkel Tripel!

Thanks to Oswald Winkel for his projection

Okay, that's the points layers - these were the trickiest ones so now let's look at the countries layers.


Step 5 - style the countries layers

The use case I'm imagining here is where someone wants to produce a locator map, with some cities named and then a country highlighted. The spikes I added above is just a way to demonstrate how you can use the Geometry Generator options to do interesting things. For the countries, we have two layers, one of which is filtered to only show China (and this is China based on the definition of China in the Natural Earth dataset).

  • The top layer is just filtered using "NAME" = 'China' (remember, right-click the layer, then Filter...) then a fill colour of #ef452f with opacity set to 38%. Stroke colour is #c93a27, 0.46 width. Nothing fancy.
  • The lower countries layer is just colour #e6daa1 for both fill and stroke, with stroke width 0.26. The only extra thing here is the slight drop shadow I used, to continue with the fake 3D effect. This is done by using the Draw effects options, as in the screenshots below.
You can do all sorts of great things with Draw effects

You may prefer other colours but this is what I used


And that's how I styled the countries layers. Nothing very complicated at all, but the drop shadow just lifts the map off the page a little and adds to the 3D effect, for a bit of fun.


Step 6 - style the grid layer

I decided to make the grid serve as a kind of canvas, as well as a geographical reference point, and I wanted to give it a drop shadow too. You can do this using only one layer with Draw effects but I found it works better using the drop shadow on a separate layer, so here's what I did.

  • The top grid layer is #70c8df fill colour and #ffffff (pure white) line of width 0.1. This gets us a nice blue grid with white map markers every 10 degrees of lat/long on the map.
  • For the grid shadow layer, that was done via the Draw effects options and all I had turned on here was the drop shadow itself, as you can see from the screenshot below. The drop shadow colour is #000000 (black) with 75% opacity and you can see how this lifts it off the page for another bit of 3D effect.
Again, this isn't essential, but it can be quite nice to do


Okay, great - we should now have a nice grid layer, some countries, China highlighted, city spikes and shadows, plus a number of city labels. Just a few more things and we're ready to export the final map.


Step 7 - background and QGIS logo

I wanted the map background to be darker than the map canvas so I went to Project > Properties and on the General tab on the left I set the Background color to #5facbf. That's just a darker shade of blue.

To add the logo, which is an svg file, I went to View > Decorations > Image... and then added the logo that I downloaded from the QGIS website, although you can also just paste in the path or url into the Image path box (see below). You can add other image types but you'll probably find that svg gives the best output quality in your final map.

Add a logo to your map layout

After that, all I did was make sure I'd zoomed out enough so that there was a bit of dark blue map canvas surrounding my whole world map. See below for a closer look at that.


Step 8 - save your map as a high quality png file

We're not going to bother with the QGIS Print Layout here at all. Sometimes we don't need to and one of the great things that the QGIS team has done in recent years (among many, many things!) is add more options for exporting high quality images directly from the main map view. So, to export the final image you see below, I just went to Project > Import/Export > Export Map to Image... and then changed the Resolution to 300dpi and unticked the Append georeference information (embedded or via world file box because I don't need that file, but it's not a problem if you keep it ticked, it just generates a small extra file. Note you could just as easily copy and paste the map using the Copy to Clipboard button.

And there we have it - one version of our world


Final notes

Obviously I haven't included every single little click, but there should be more than enough detail to replicate these methods if you have a basic familiarity with QGIS already. If you are left scratching your head though, please feel free to get in touch.

Don't like the cities I used? That's fine, they are not particularly well thought through - the idea here is more about showing how to pick your own ones.

Don't like the boundaries I used? Again, this is for demonstration purposes but of course that is something we still need to be aware of. 

Don't like the spikes? That's fine, they may be too much for this map but they can be useful in other situations and it's more about understanding what the QGIS Geometry Generator tools can do.


Friday, 29 October 2021

Food for thought

This is a long read, but it's about maps and food and data so hopefully there will be something you can enjoy in here. It's based on a very interesting open dataset from the UK's Food Standards Agency, which contains the location of more than half a million food establishments. The post is split into sections to make it more digestible, and part of the inspiration here is Tom Forth's great Greggs:Pret ratio analysis from 2014 onwards. 

It's all good fun, of course, but there are serious potential uses of this kind of open data, such as looking at the impact of Covid on the survival of food establishments, understanding what is actually in big datasets like this, real-world error checking and - not least - understanding the hygiene ratings of the places we buy our food from! But I'm mainly going to write about interesting/fun things here, as well as sharing lots of maps, with a few notes about data science along the way. Talking of which, here's the first map - showing some of the 300+ Prets in London (below). Just want the data? Get it here

Surely there's room for a few more


In brief

As part of my company's mission to share data and help people understand the world a bit better through maps and geodata, I like to mash together datasets that are really useful and/or interesting but maybe not that easy to use, either because of the format they come in, the separate files they are comprised of, or any number of other reasons. The Food Standard Agency's Food Hygiene Ratings Scheme (FHRS) dataset fits the bill here because it's REALLY interesting but also (as far as I'm aware) currently only available as hundreds of separate xml files for each individual local authority across the UK. This is great in many ways, but I do like to go on about how a simple, clean csv is so valuable - CLEAN + SIMPLE = VALUABLE! That's why I wish more organisations, in addition to whatever else they provide, would also include a clean, simple csv as standard. You can find my csv of the Food Standard Agency's Food Hygiene Ratings Scheme (FHRS) dataset, plus geodata versions, on this page. Anyway, back to the important stuff: here's a Greggs map of Belfast.

Is it time for a sausage roll yet?

You know what, let's zoom out a bit for a Greggs map of the UK too. Big icons, messy map, but you get the idea (which is that the people of Inverness are Greggs-bereft, although maybe this is due to the absolute dominance of the legendary Harry Gow, who knows).

When are Greggs opening in Ullapool?

Actually, let's have one more Greggs map, from the home of Greggs. 

You're never more than 39 seconds from a Greggs, or something


The data - food establishments in the UK

Okay, so we've had a few maps but now it's time to say a bit more about the data. And remember, if you want to explore it yourself, grab the csv or GeoPackages I made and then open them in your software of choice (e.g. Excel, R, Tableau, whatever) and take a deep dive.

My download from 16 October 2021 has 597,037 rows in it, but not every food establishment had a latitude and longitude associated with it, though 80% did. I managed to geocode just over 84% of food establishments across the UK (including fixing all the Stirling ones, which had no lat/long and the postcode was in the address field), so we have a reasonably good dataset of 502,341 rows of data. The full csv, with all rows (including those without location data) is the one called all-FHRS-GB-16-oct-2021-extract.csv in the web folder). Each row relates to a single establishment (e.g. a Pret, a Greggs, a Costa, a Starbucks, a Nandos, a Buttylicious, a Codfather, a Taj Mahal) and for each establishment we have the following columns in the datasset:

  • BusinessName (e.g. Greggs)
  • BusinessType (e.g. Pub/bar/nightclub)
  • LocalAuthorityName (based on the set of 374 UK local authorities as of Oct 2021)
  • LocalAuthorityCode (this is not the standard ONS code for each local authority, but a different numeric code for each area)
  • RatingValue (a rating of the hygiene level of each establishment, with 5 being the best, apart from in Scotland where it's a Pass/Improvement Required kind of rating)
  • AddressLine1, 2, 3 4 (full address of each establishment)
  • PostCode (full unit postcode for each establishment)
  • Longitude and Latitude (these are included for about 80% of places)
  • There are a few more technical fields (e.g. establishment ID) but these are the important ones
In addition, I've also added in a TTWA (travel-to-work-area) code and name for each point so that we know which TTWA a food establishment is in, as well as its local authority. I thought this would be useful for when we want to analyse the location of places that commuters typically eat at, rather than just residents (like Pret, for example). On the web page linked to above, I've also included a local authority summary GeoPackage so you can see how many establishments are in each local authority, as well as by type. I've added in population and employment counts from 2019 here as well so you can per-capita it.

So, when you map it, you can fairly easily produce something like a quick, messy McDonald's map of Liverpool and Manchester and surrounding areas, like the one below.

MMMM

Or, you could produce a quick Starbucks map of the centre of Edinburgh, if you were so inclined.

Time for a coffee

I should add at this stage that the dataset is often quite messy, in terms of the actual business names. For example, if you want to find all the Greggs, you can't simply filter it using 'greggs' - you have to be aware of the many different ways that a Greggs location is named. See below for a screenshot of this from my QGIS attribute table. I'll say more on this issue below, using Costa Coffee as an example, just so that you don't get your Costas mixed up with your Pentecostals.

Greggs? Yes


I had a little look at the food establishments by type and then explored this in relation to the % by type in each travel-to-work-area (TTWA) across the UK, so see below for a couple of examples. First map is takeaways/sandwich shops as a % of all food establishments in a TTWA and the second map is manufacturers/packers as a % of all food establishments in a TTWA.

Finally, we know that office workers eat sandwiches

 
Is this just a 'here be livestock' geography map?


Anyway, I've shared all the data (including area population and employment counts) so you can look into it in more depth yourself, these are just a couple of maps to illustrate what things look like in relation to a) business type and b) some kind of functional geography, rather than the administrative unit of the local authority. This makes sense for some things (like collecting the data) but less sense for others (like looking at where commuters might eat).

Now it's time for a fried chicken detour.

Do Alaskans fry chicken?

A few years ago there was a very interesting and amusing thread on twitter by Gwilym Lockwood about how loads of fried chicken places in the UK just take the words 'fried chicken' and then put a non-Kentucky US state in front of it as a way to somehow authenticate their fried chicken but without infringing any trademarks of big US-based fried chicken companies. Whatever the reason, you can just download the 16 Oct 2021 csv from here and explore it yourself by looking at the BusinessName column, or you can read on below because I've done the fried chicken analysis for you, using Gwilym's earlier work (see his blog for more) as inspiration. 

So, do Alaskans actually fry chicken in the UK then? Of course they do! In Cheadle (in Greater Manchester). How about those lovely Michigan folk? Yep, head to Oldham for that and if you also want some Montana Fried Chicken, pick some up while you're in Oldham because they're both there - alongside some more fried chicken states. The map below provides a full, thorough and essential analysis of this situation on the US side. Someone really needs to open an Alabama Fried Chicken outlet in the UK.

I'm just off to register a few new UK trademarks


In addition, there are many food establishments named after US cities, in addition to the Toronto Fried Chicken shop (also in Oldham), such as:

  • Boston Fried Chicken
  • Chicago Fried Chicken
  • Dallas Fried Chicken
  • Manhattan Fried Chicken
  • Miami Fried Chicken
  • Orlando Fried Chicken
  • Philadelphia Fried Chicken
No doubt someone already has a food blog about all this and has visited them all and ranked them on a whole host of measures, but I haven't seen that yet so let me know if you have. Bonus fried chicken places in the UK include:

  • Hollywood Pizza Fried Chicken 
  • Jailbird Nashville Fried Chicken 
Here's a map of them all (minus the KFCs), with a few labelled. None in Scotland or Northern Ireland, big cluster of Tennessee Fried Chickens in London and just one in Wales.

But have you eaten at all of them?


More maps - prety sure this can't be right, can it?

I have a big map project set up where I can look at all the data now, bearing in mind of course that I was only able to geocode approximately 85% of the data across the UK. But still, it makes for interesting viewing so I'll share some more maps below, starting with a Pret map of London, like the one I shared at the beginning. This is interesting, and something of a curiosity but there is a point to this in that it's often by mapping or otherwise visualising the data that we can make more sense of it, spot errors, or generate new questions. My question with the map below was 'can this be right?' Turns out that yes it is basically right - central London is Pretland.

A few Prets in central London

If I was some kind of spy looking to find out inside information from business or governments, I could probably think of a worse strategy than to hang around in central London Prets and eavesdrop on conversations. Let's take a look at Glasgow Prets now.

One of these has closed down (guess which one)

I don't know about you, but I think it's time for a kebab vs sushi map of Milton Keynes.

There's a clear winner here

Now we see a pair of maps that clearly illustrate another kind of north-south divide in England.

Levelling up is needed

That's more like it

I could go on for hours with these maps, but I'll leave it there - you get the idea. Of course I won't leave it there. We haven't even had a KFC map of West Yorkshire yet.

The traditional, original, authentic fried chicken outlet

A Costa-kebab map of Norwich, you say? Ok.

Big night out planned? Start here

If you are so inclined and have the skills and tools, you can have a play with this data yourself. If you do, and you're used to working with data then you may end up with some data science-related thoughts, so I talk about this briefly below.


Data science thoughts

I'm sharing this dataset with everyone in csv and GeoPackage format because it's part of what I do, but also because I'm going to be using it in my QGIS training sessions in future. However, it's also pretty interesting from a data science demo point of view, from how to access and download it on the web via the xml files provided by the brilliant Food Standards Agency data team, to the nuts and bolts of the data itself. I think the Wikipedia definition of  data science is a pretty good one, so here it is:

  • "an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data"

The 'interdisciplinary' bit is important for me because it highlights the fact that people come to the data from many different traditions, backgrounds and perspectives. This means we approach, analyse, interpret and play with data coming from a range of different ontological positions, with differing epistemologies, methodologies, methods and tools. Sorry about the fancy language, but I think this is what makes data science (however defined) so interesting.

The 'scientific methods, processes, algorithms and systems' bit is also interesting to me because it speaks to the variety of methods and tools used, as well as methodologies. For example, I used a simple text editor that is great with big files (EmEditor) to convert the data from xml to csv, but you could do it in loads of different ways. 

In relation to the 'extract knowledge and insights' bit, that is kind of what I'm doing here, but in a mostly whimsical, curiosity-driven way. But the bit about 'noisy' data applies, not because of the way the Food Standards Agency share the data but because of the messiness of the way data is collected, inputted, generated, and collated. 

A small example of this is in the spelling of food establishments, which can vary a lot and mess up our results if we're not careful - e.g. when we have McDonald's vs MacDonalds vs MacDonald's or McDonalds. Knowing how to handle these kinds of things is in my view one of the most important skills when dealing with data, whether we think of it as data science or not.

If you were wondering how many of the food establishments in the dataset have 'data' in their name, wonder no more.

I want to eat some data


The Ms of data science

Following on from the data science thoughts above, I'm going to offer my 'Ms of data science' list now, as it kind of summarises things for me when dealing with large datasets.

  • Missing - where is everything? Why are there big holes in this dataset? Why didn't they fill in this cell! Aaargh.
  • Messy - what a mess! Okay, we've got lots of data, but also lots of mess. Why did they put the postcode in the address field? Who is behind this? Is it a conspiracy?
  • Mystifying - what does this mean? "PC load letter? What the **** does that mean?" Maybe I should read the metadata file. Hold on, where is the metadata? Where's the readme? Ok found it. So that's what a flugelbinder is! Okay, I get that, but why is there a cluster of US state fried chicken shops in Oldham?
  • Maddening - why doesn't this work? Please work, please work, please work. Ok, great, this works. Now it doesn't. Why? Help. Solved with code. I am a genius! Oh, broken again. Hmm. If we accept that data is and are maddening at times then we'll be happier.
  • Marvellous - what fun! Kansas Fried Chicken! Buttylicious! The Codfather! It can be good fun taking a deep dive with rich, interesting data. It can be fun but it can also tell us really interesting, useful things about the world and that's a good reason to do it.
  • Mortifying - oops, I made a mistake. I didn't realise Shetland wasn't in Manchester (see below). I didn't realise that to search for an apostrophe I needed to put two apostrophes in between single quotes. But now I do. I also will never get Kingston upon Hull mixed up with Kingston upon Thames again because they are not the same place. 
  • Mundane - it's mostly just a lot of unglamorous grunt work, isn't it? I think this is basically the key to it all. Lots of dirty work, unglamorous and at times painful, but it helps lay the foundations for more fruitful analysis.
I probably could have come up with a few more if I didn't tie myself down to using the letter M. But of course in data science Maslow's Law of the Instrument is often a thing so when coming up with bulleted lists I decided to treat every issue as if it started with an M.


Food establishment names - from 'buttylicious' to The Codmother

There are a disappointingly low number of food establishments with 'haggis' in the name (7), and they are all in Scotland, including the very nice Happy Haggis Restaurant & Takeaway in Aviemore. But there are lots of interesting and brilliant names across the country, including.

  • 17 Buttyliciouses
  • 52 Codfathers- including more than a one Codfather II 
  • 1 called The Codmothers - in Plymouth
  • 74 Taj Mahals (or with Taj Mahal as part of the name)
  • 40 Spice Huts
  • 591 with 'red lion' somewhere in the name
  • 8,080 with 'fish' in the name
  • 1,773 with 'golden' in the name

But what about the best of the names? Well, here are a few contenders.

  • Flavour Junction (Newtownabbey, Northern Ireland)
  • £1 Baguettes and Pies (Stoke-on-Trent)
  • 200 Degrees Coffee Shop (lots of these, coffee sounds worryingly hot)
  • 21 Jumpin Waffles (being a big Waffle House fan, this interests me - in Hyndburn, Lancashire)
  • 900 Degrees (too hot for me, wherever it is)
  • A Rule of Tum Burger Shop Ltd (so bad it's good)
  • Lots of 'A Taste of' ones, but I like 'A Taste of Speyside' and 'A Taste of Hackney' the best
  • Zuhus & Big Bro Burgers (Edinburgh)
  • Zorba 6 (this is in Rutland) 
  • Ye Three Fyshes (this is in Bedford)
  • So many Wok puns in there, including 11 Wok this Ways, but Wok with Jon (Hackney) is quite original
  • The Sea Shanty (Wirral, clearly ahead of its time)
Actually, these are just some of the ones I found quickly but there are more than half a million in there, so be my guest to suggest your favourite.

Costa in Shetland? Pentecostals divided? 

One nice thing when mapping stuff like this is that it can allow you to see some errors straight away, in a way that you wouldn't in a big text file or spreadsheet. Have a look at the Costa map below to see what I mean.

Costa in Shetland

Looks okay, right? Well, perhaps you are wondering about the logistics of Costa supplying that store in Unst, and you'd be right to. This is in fact the costa in the Trafford Centre in Manchester, but with an incorrect lat/long associated with it, somehow. That's why it's plotted at just over 60 degrees north.

But Costa overall is a good example of the fiddlyness and complexity of dealing with free text data when trying to do this kind of analysis. If we want to search for all Costa stores, for example, we could just try filtering out data using something like "BusinessName" = 'costa' in QGIS. This gets us 564 stores, which sounds okay, but there are actually over 2,000 across the UK, so we need to be cleverer. We can then try something like "BusinessName"  LIKE  '%costa%' to find any points with the word costa in it. 

Okay, fine. This gets us 2,646, which seems a bit high and on closer inspection we see that it includes all the Costa Coffee outlets, but also other places that do food and that also have 'costa' in the name, like Pentecostal City Mission Food Bank Supply in Waltham Forest. 

You can filter out the Pentecostals (no offence) manually, but then you discover the the spelling of Birmingham Pentacostal Fellowship is a bit different. Eventually, you do a bit more thinking and filtering and you end up with a very messy query, but a relatively clean costa dataset. Mine has 2,522 in it so it's about right I think - but see below for the different ways they are recorded in the dataset.

The Costa doing data analysis can be very high

There are loads of these kinds of examples, but the Costa one is a good one and provides a nice training example in my opinion. Even when you think you've very cleverly solved all these problems, you then discover that someone owns a 'Costa's Mini Market' (Hammersmith) or a 'Costa's Fish and Chips' (Trafford) or a 'Costa's Fish Bar' (Ealing), 'Costa Pizza' (Salford) so it's always useful to use a human when doing data science / analytics, whatever you want to call it.


Why did I do this? What is the point?

1. Training dataset; 2. Interesting; 3. Curiosity; 4. I like food. These are the main reasons. I like to think that other people who occasionally stumble upon my blog might be interested too, whether that involves reading this War and Peace-length effort or using the dataset itself. Sure, it can be a bit of fun but there are many serious learning points in a dataset like this and I think, given its richness and importance, it could be used more.


Carto note

On the map front, I used OS Zoomstack data for the background mapping, with a style I created just for this little project and a fairly minimalist approach. Here are a few zoom ins for different places, without food locations. For Northern Ireland, I had to create it from OpenStreetMap data since OS Zoomstack obviously doesn't cover the UK.





Different zoom levels have different features turned on/off in my QGIS project but if you want to zoom out and make a very messy/noisy Nandos map you can do that too.

Lots of Nandae on a map

If you have reached this point after reading all the words and looking at all the maps, please congratulate yourself. I appreciate you coming along for the ride.


I hope there has been something of interest in here.