Friday 29 October 2021

Food for thought

This is a long read, but it's about maps and food and data so hopefully there will be something you can enjoy in here. It's based on a very interesting open dataset from the UK's Food Standards Agency, which contains the location of more than half a million food establishments. The post is split into sections to make it more digestible, and part of the inspiration here is Tom Forth's great Greggs:Pret ratio analysis from 2014 onwards. 

It's all good fun, of course, but there are serious potential uses of this kind of open data, such as looking at the impact of Covid on the survival of food establishments, understanding what is actually in big datasets like this, real-world error checking and - not least - understanding the hygiene ratings of the places we buy our food from! But I'm mainly going to write about interesting/fun things here, as well as sharing lots of maps, with a few notes about data science along the way. Talking of which, here's the first map - showing some of the 300+ Prets in London (below). Just want the data? Get it here

Surely there's room for a few more


In brief

As part of my company's mission to share data and help people understand the world a bit better through maps and geodata, I like to mash together datasets that are really useful and/or interesting but maybe not that easy to use, either because of the format they come in, the separate files they are comprised of, or any number of other reasons. The Food Standard Agency's Food Hygiene Ratings Scheme (FHRS) dataset fits the bill here because it's REALLY interesting but also (as far as I'm aware) currently only available as hundreds of separate xml files for each individual local authority across the UK. This is great in many ways, but I do like to go on about how a simple, clean csv is so valuable - CLEAN + SIMPLE = VALUABLE! That's why I wish more organisations, in addition to whatever else they provide, would also include a clean, simple csv as standard. You can find my csv of the Food Standard Agency's Food Hygiene Ratings Scheme (FHRS) dataset, plus geodata versions, on this page. Anyway, back to the important stuff: here's a Greggs map of Belfast.

Is it time for a sausage roll yet?

You know what, let's zoom out a bit for a Greggs map of the UK too. Big icons, messy map, but you get the idea (which is that the people of Inverness are Greggs-bereft, although maybe this is due to the absolute dominance of the legendary Harry Gow, who knows).

When are Greggs opening in Ullapool?

Actually, let's have one more Greggs map, from the home of Greggs. 

You're never more than 39 seconds from a Greggs, or something


The data - food establishments in the UK

Okay, so we've had a few maps but now it's time to say a bit more about the data. And remember, if you want to explore it yourself, grab the csv or GeoPackages I made and then open them in your software of choice (e.g. Excel, R, Tableau, whatever) and take a deep dive.

My download from 16 October 2021 has 597,037 rows in it, but not every food establishment had a latitude and longitude associated with it, though 80% did. I managed to geocode just over 84% of food establishments across the UK (including fixing all the Stirling ones, which had no lat/long and the postcode was in the address field), so we have a reasonably good dataset of 502,341 rows of data. The full csv, with all rows (including those without location data) is the one called all-FHRS-GB-16-oct-2021-extract.csv in the web folder). Each row relates to a single establishment (e.g. a Pret, a Greggs, a Costa, a Starbucks, a Nandos, a Buttylicious, a Codfather, a Taj Mahal) and for each establishment we have the following columns in the datasset:

  • BusinessName (e.g. Greggs)
  • BusinessType (e.g. Pub/bar/nightclub)
  • LocalAuthorityName (based on the set of 374 UK local authorities as of Oct 2021)
  • LocalAuthorityCode (this is not the standard ONS code for each local authority, but a different numeric code for each area)
  • RatingValue (a rating of the hygiene level of each establishment, with 5 being the best, apart from in Scotland where it's a Pass/Improvement Required kind of rating)
  • AddressLine1, 2, 3 4 (full address of each establishment)
  • PostCode (full unit postcode for each establishment)
  • Longitude and Latitude (these are included for about 80% of places)
  • There are a few more technical fields (e.g. establishment ID) but these are the important ones
In addition, I've also added in a TTWA (travel-to-work-area) code and name for each point so that we know which TTWA a food establishment is in, as well as its local authority. I thought this would be useful for when we want to analyse the location of places that commuters typically eat at, rather than just residents (like Pret, for example). On the web page linked to above, I've also included a local authority summary GeoPackage so you can see how many establishments are in each local authority, as well as by type. I've added in population and employment counts from 2019 here as well so you can per-capita it.

So, when you map it, you can fairly easily produce something like a quick, messy McDonald's map of Liverpool and Manchester and surrounding areas, like the one below.

MMMM

Or, you could produce a quick Starbucks map of the centre of Edinburgh, if you were so inclined.

Time for a coffee

I should add at this stage that the dataset is often quite messy, in terms of the actual business names. For example, if you want to find all the Greggs, you can't simply filter it using 'greggs' - you have to be aware of the many different ways that a Greggs location is named. See below for a screenshot of this from my QGIS attribute table. I'll say more on this issue below, using Costa Coffee as an example, just so that you don't get your Costas mixed up with your Pentecostals.

Greggs? Yes


I had a little look at the food establishments by type and then explored this in relation to the % by type in each travel-to-work-area (TTWA) across the UK, so see below for a couple of examples. First map is takeaways/sandwich shops as a % of all food establishments in a TTWA and the second map is manufacturers/packers as a % of all food establishments in a TTWA.

Finally, we know that office workers eat sandwiches

 
Is this just a 'here be livestock' geography map?


Anyway, I've shared all the data (including area population and employment counts) so you can look into it in more depth yourself, these are just a couple of maps to illustrate what things look like in relation to a) business type and b) some kind of functional geography, rather than the administrative unit of the local authority. This makes sense for some things (like collecting the data) but less sense for others (like looking at where commuters might eat).

Now it's time for a fried chicken detour.

Do Alaskans fry chicken?

A few years ago there was a very interesting and amusing thread on twitter by Gwilym Lockwood about how loads of fried chicken places in the UK just take the words 'fried chicken' and then put a non-Kentucky US state in front of it as a way to somehow authenticate their fried chicken but without infringing any trademarks of big US-based fried chicken companies. Whatever the reason, you can just download the 16 Oct 2021 csv from here and explore it yourself by looking at the BusinessName column, or you can read on below because I've done the fried chicken analysis for you, using Gwilym's earlier work (see his blog for more) as inspiration. 

So, do Alaskans actually fry chicken in the UK then? Of course they do! In Cheadle (in Greater Manchester). How about those lovely Michigan folk? Yep, head to Oldham for that and if you also want some Montana Fried Chicken, pick some up while you're in Oldham because they're both there - alongside some more fried chicken states. The map below provides a full, thorough and essential analysis of this situation on the US side. Someone really needs to open an Alabama Fried Chicken outlet in the UK.

I'm just off to register a few new UK trademarks


In addition, there are many food establishments named after US cities, in addition to the Toronto Fried Chicken shop (also in Oldham), such as:

  • Boston Fried Chicken
  • Chicago Fried Chicken
  • Dallas Fried Chicken
  • Manhattan Fried Chicken
  • Miami Fried Chicken
  • Orlando Fried Chicken
  • Philadelphia Fried Chicken
No doubt someone already has a food blog about all this and has visited them all and ranked them on a whole host of measures, but I haven't seen that yet so let me know if you have. Bonus fried chicken places in the UK include:

  • Hollywood Pizza Fried Chicken 
  • Jailbird Nashville Fried Chicken 
Here's a map of them all (minus the KFCs), with a few labelled. None in Scotland or Northern Ireland, big cluster of Tennessee Fried Chickens in London and just one in Wales.

But have you eaten at all of them?


More maps - prety sure this can't be right, can it?

I have a big map project set up where I can look at all the data now, bearing in mind of course that I was only able to geocode approximately 85% of the data across the UK. But still, it makes for interesting viewing so I'll share some more maps below, starting with a Pret map of London, like the one I shared at the beginning. This is interesting, and something of a curiosity but there is a point to this in that it's often by mapping or otherwise visualising the data that we can make more sense of it, spot errors, or generate new questions. My question with the map below was 'can this be right?' Turns out that yes it is basically right - central London is Pretland.

A few Prets in central London

If I was some kind of spy looking to find out inside information from business or governments, I could probably think of a worse strategy than to hang around in central London Prets and eavesdrop on conversations. Let's take a look at Glasgow Prets now.

One of these has closed down (guess which one)

I don't know about you, but I think it's time for a kebab vs sushi map of Milton Keynes.

There's a clear winner here

Now we see a pair of maps that clearly illustrate another kind of north-south divide in England.

Levelling up is needed

That's more like it

I could go on for hours with these maps, but I'll leave it there - you get the idea. Of course I won't leave it there. We haven't even had a KFC map of West Yorkshire yet.

The traditional, original, authentic fried chicken outlet

A Costa-kebab map of Norwich, you say? Ok.

Big night out planned? Start here

If you are so inclined and have the skills and tools, you can have a play with this data yourself. If you do, and you're used to working with data then you may end up with some data science-related thoughts, so I talk about this briefly below.


Data science thoughts

I'm sharing this dataset with everyone in csv and GeoPackage format because it's part of what I do, but also because I'm going to be using it in my QGIS training sessions in future. However, it's also pretty interesting from a data science demo point of view, from how to access and download it on the web via the xml files provided by the brilliant Food Standards Agency data team, to the nuts and bolts of the data itself. I think the Wikipedia definition of  data science is a pretty good one, so here it is:

  • "an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data"

The 'interdisciplinary' bit is important for me because it highlights the fact that people come to the data from many different traditions, backgrounds and perspectives. This means we approach, analyse, interpret and play with data coming from a range of different ontological positions, with differing epistemologies, methodologies, methods and tools. Sorry about the fancy language, but I think this is what makes data science (however defined) so interesting.

The 'scientific methods, processes, algorithms and systems' bit is also interesting to me because it speaks to the variety of methods and tools used, as well as methodologies. For example, I used a simple text editor that is great with big files (EmEditor) to convert the data from xml to csv, but you could do it in loads of different ways. 

In relation to the 'extract knowledge and insights' bit, that is kind of what I'm doing here, but in a mostly whimsical, curiosity-driven way. But the bit about 'noisy' data applies, not because of the way the Food Standards Agency share the data but because of the messiness of the way data is collected, inputted, generated, and collated. 

A small example of this is in the spelling of food establishments, which can vary a lot and mess up our results if we're not careful - e.g. when we have McDonald's vs MacDonalds vs MacDonald's or McDonalds. Knowing how to handle these kinds of things is in my view one of the most important skills when dealing with data, whether we think of it as data science or not.

If you were wondering how many of the food establishments in the dataset have 'data' in their name, wonder no more.

I want to eat some data


The Ms of data science

Following on from the data science thoughts above, I'm going to offer my 'Ms of data science' list now, as it kind of summarises things for me when dealing with large datasets.

  • Missing - where is everything? Why are there big holes in this dataset? Why didn't they fill in this cell! Aaargh.
  • Messy - what a mess! Okay, we've got lots of data, but also lots of mess. Why did they put the postcode in the address field? Who is behind this? Is it a conspiracy?
  • Mystifying - what does this mean? "PC load letter? What the **** does that mean?" Maybe I should read the metadata file. Hold on, where is the metadata? Where's the readme? Ok found it. So that's what a flugelbinder is! Okay, I get that, but why is there a cluster of US state fried chicken shops in Oldham?
  • Maddening - why doesn't this work? Please work, please work, please work. Ok, great, this works. Now it doesn't. Why? Help. Solved with code. I am a genius! Oh, broken again. Hmm. If we accept that data is and are maddening at times then we'll be happier.
  • Marvellous - what fun! Kansas Fried Chicken! Buttylicious! The Codfather! It can be good fun taking a deep dive with rich, interesting data. It can be fun but it can also tell us really interesting, useful things about the world and that's a good reason to do it.
  • Mortifying - oops, I made a mistake. I didn't realise Shetland wasn't in Manchester (see below). I didn't realise that to search for an apostrophe I needed to put two apostrophes in between single quotes. But now I do. I also will never get Kingston upon Hull mixed up with Kingston upon Thames again because they are not the same place. 
  • Mundane - it's mostly just a lot of unglamorous grunt work, isn't it? I think this is basically the key to it all. Lots of dirty work, unglamorous and at times painful, but it helps lay the foundations for more fruitful analysis.
I probably could have come up with a few more if I didn't tie myself down to using the letter M. But of course in data science Maslow's Law of the Instrument is often a thing so when coming up with bulleted lists I decided to treat every issue as if it started with an M.


Food establishment names - from 'buttylicious' to The Codmother

There are a disappointingly low number of food establishments with 'haggis' in the name (7), and they are all in Scotland, including the very nice Happy Haggis Restaurant & Takeaway in Aviemore. But there are lots of interesting and brilliant names across the country, including.

  • 17 Buttyliciouses
  • 52 Codfathers- including more than a one Codfather II 
  • 1 called The Codmothers - in Plymouth
  • 74 Taj Mahals (or with Taj Mahal as part of the name)
  • 40 Spice Huts
  • 591 with 'red lion' somewhere in the name
  • 8,080 with 'fish' in the name
  • 1,773 with 'golden' in the name

But what about the best of the names? Well, here are a few contenders.

  • Flavour Junction (Newtownabbey, Northern Ireland)
  • £1 Baguettes and Pies (Stoke-on-Trent)
  • 200 Degrees Coffee Shop (lots of these, coffee sounds worryingly hot)
  • 21 Jumpin Waffles (being a big Waffle House fan, this interests me - in Hyndburn, Lancashire)
  • 900 Degrees (too hot for me, wherever it is)
  • A Rule of Tum Burger Shop Ltd (so bad it's good)
  • Lots of 'A Taste of' ones, but I like 'A Taste of Speyside' and 'A Taste of Hackney' the best
  • Zuhus & Big Bro Burgers (Edinburgh)
  • Zorba 6 (this is in Rutland) 
  • Ye Three Fyshes (this is in Bedford)
  • So many Wok puns in there, including 11 Wok this Ways, but Wok with Jon (Hackney) is quite original
  • The Sea Shanty (Wirral, clearly ahead of its time)
Actually, these are just some of the ones I found quickly but there are more than half a million in there, so be my guest to suggest your favourite.

Costa in Shetland? Pentecostals divided? 

One nice thing when mapping stuff like this is that it can allow you to see some errors straight away, in a way that you wouldn't in a big text file or spreadsheet. Have a look at the Costa map below to see what I mean.

Costa in Shetland

Looks okay, right? Well, perhaps you are wondering about the logistics of Costa supplying that store in Unst, and you'd be right to. This is in fact the costa in the Trafford Centre in Manchester, but with an incorrect lat/long associated with it, somehow. That's why it's plotted at just over 60 degrees north.

But Costa overall is a good example of the fiddlyness and complexity of dealing with free text data when trying to do this kind of analysis. If we want to search for all Costa stores, for example, we could just try filtering out data using something like "BusinessName" = 'costa' in QGIS. This gets us 564 stores, which sounds okay, but there are actually over 2,000 across the UK, so we need to be cleverer. We can then try something like "BusinessName"  LIKE  '%costa%' to find any points with the word costa in it. 

Okay, fine. This gets us 2,646, which seems a bit high and on closer inspection we see that it includes all the Costa Coffee outlets, but also other places that do food and that also have 'costa' in the name, like Pentecostal City Mission Food Bank Supply in Waltham Forest. 

You can filter out the Pentecostals (no offence) manually, but then you discover the the spelling of Birmingham Pentacostal Fellowship is a bit different. Eventually, you do a bit more thinking and filtering and you end up with a very messy query, but a relatively clean costa dataset. Mine has 2,522 in it so it's about right I think - but see below for the different ways they are recorded in the dataset.

The Costa doing data analysis can be very high

There are loads of these kinds of examples, but the Costa one is a good one and provides a nice training example in my opinion. Even when you think you've very cleverly solved all these problems, you then discover that someone owns a 'Costa's Mini Market' (Hammersmith) or a 'Costa's Fish and Chips' (Trafford) or a 'Costa's Fish Bar' (Ealing), 'Costa Pizza' (Salford) so it's always useful to use a human when doing data science / analytics, whatever you want to call it.


Why did I do this? What is the point?

1. Training dataset; 2. Interesting; 3. Curiosity; 4. I like food. These are the main reasons. I like to think that other people who occasionally stumble upon my blog might be interested too, whether that involves reading this War and Peace-length effort or using the dataset itself. Sure, it can be a bit of fun but there are many serious learning points in a dataset like this and I think, given its richness and importance, it could be used more.


Carto note

On the map front, I used OS Zoomstack data for the background mapping, with a style I created just for this little project and a fairly minimalist approach. Here are a few zoom ins for different places, without food locations. For Northern Ireland, I had to create it from OpenStreetMap data since OS Zoomstack obviously doesn't cover the UK.





Different zoom levels have different features turned on/off in my QGIS project but if you want to zoom out and make a very messy/noisy Nandos map you can do that too.

Lots of Nandae on a map

If you have reached this point after reading all the words and looking at all the maps, please congratulate yourself. I appreciate you coming along for the ride.


I hope there has been something of interest in here.