This is a long read, but it's about maps and food and data so hopefully there will be something you can enjoy in here. It's based on a very interesting open dataset from the UK's Food Standards Agency, which contains the location of more than half a million food establishments. The post is split into sections to make it more digestible, and part of the inspiration here is Tom Forth's great Greggs:Pret ratio analysis from 2014 onwards.
It's all good fun, of course, but there are serious potential uses of this kind of open data, such as looking at the impact of Covid on the survival of food establishments, understanding what is actually in big datasets like this, real-world error checking and - not least - understanding the hygiene ratings of the places we buy our food from! But I'm mainly going to write about interesting/fun things here, as well as sharing lots of maps, with a few notes about data science along the way. Talking of which, here's the first map - showing some of the 300+ Prets in London (below). Just want the data? Get it here.
|Surely there's room for a few more|
As part of my company's mission to share data and help people understand the world a bit better through maps and geodata, I like to mash together datasets that are really useful and/or interesting but maybe not that easy to use, either because of the format they come in, the separate files they are comprised of, or any number of other reasons. The Food Standard Agency's Food Hygiene Ratings Scheme (FHRS) dataset fits the bill here because it's REALLY interesting but also (as far as I'm aware) currently only available as hundreds of separate xml files for each individual local authority across the UK. This is great in many ways, but I do like to go on about how a simple, clean csv is so valuable - CLEAN + SIMPLE = VALUABLE! That's why I wish more organisations, in addition to whatever else they provide, would also include a clean, simple csv as standard. You can find my csv of the Food Standard Agency's Food Hygiene Ratings Scheme (FHRS) dataset, plus geodata versions, on this page. Anyway, back to the important stuff: here's a Greggs map of Belfast.
|Is it time for a sausage roll yet?|
You know what, let's zoom out a bit for a Greggs map of the UK too. Big icons, messy map, but you get the idea (which is that the people of Inverness are Greggs-bereft, although maybe this is due to the absolute dominance of the legendary Harry Gow, who knows).
|When are Greggs opening in Ullapool?|
Actually, let's have one more Greggs map, from the home of Greggs.
|You're never more than 39 seconds from a Greggs, or something|
The data - food establishments in the UK
Okay, so we've had a few maps but now it's time to say a bit more about the data. And remember, if you want to explore it yourself, grab the csv or GeoPackages I made and then open them in your software of choice (e.g. Excel, R, Tableau, whatever) and take a deep dive.
My download from 16 October 2021 has 597,037 rows in it, but not every food establishment had a latitude and longitude associated with it, though 80% did. I managed to geocode just over 84% of food establishments across the UK (including fixing all the Stirling ones, which had no lat/long and the postcode was in the address field), so we have a reasonably good dataset of 502,341 rows of data. The full csv, with all rows (including those without location data) is the one called all-FHRS-GB-16-oct-2021-extract.csv in the web folder). Each row relates to a single establishment (e.g. a Pret, a Greggs, a Costa, a Starbucks, a Nandos, a Buttylicious, a Codfather, a Taj Mahal) and for each establishment we have the following columns in the datasset:
- BusinessName (e.g. Greggs)
- BusinessType (e.g. Pub/bar/nightclub)
- LocalAuthorityName (based on the set of 374 UK local authorities as of Oct 2021)
- LocalAuthorityCode (this is not the standard ONS code for each local authority, but a different numeric code for each area)
- RatingValue (a rating of the hygiene level of each establishment, with 5 being the best, apart from in Scotland where it's a Pass/Improvement Required kind of rating)
- AddressLine1, 2, 3 4 (full address of each establishment)
- PostCode (full unit postcode for each establishment)
- Longitude and Latitude (these are included for about 80% of places)
- There are a few more technical fields (e.g. establishment ID) but these are the important ones
|Time for a coffee|
I should add at this stage that the dataset is often quite messy, in terms of the actual business names. For example, if you want to find all the Greggs, you can't simply filter it using 'greggs' - you have to be aware of the many different ways that a Greggs location is named. See below for a screenshot of this from my QGIS attribute table. I'll say more on this issue below, using Costa Coffee as an example, just so that you don't get your Costas mixed up with your Pentecostals.
I had a little look at the food establishments by type and then explored this in relation to the % by type in each travel-to-work-area (TTWA) across the UK, so see below for a couple of examples. First map is takeaways/sandwich shops as a % of all food establishments in a TTWA and the second map is manufacturers/packers as a % of all food establishments in a TTWA.
|Finally, we know that office workers eat sandwiches|
|Is this just a 'here be livestock' geography map?|
Do Alaskans fry chicken?
|I'm just off to register a few new UK trademarks|
- Boston Fried Chicken
- Chicago Fried Chicken
- Dallas Fried Chicken
- Manhattan Fried Chicken
- Miami Fried Chicken
- Orlando Fried Chicken
- Philadelphia Fried Chicken
- Hollywood Pizza Fried Chicken
- Jailbird Nashville Fried Chicken
|But have you eaten at all of them?|
More maps - prety sure this can't be right, can it?
I have a big map project set up where I can look at all the data now, bearing in mind of course that I was only able to geocode approximately 85% of the data across the UK. But still, it makes for interesting viewing so I'll share some more maps below, starting with a Pret map of London, like the one I shared at the beginning. This is interesting, and something of a curiosity but there is a point to this in that it's often by mapping or otherwise visualising the data that we can make more sense of it, spot errors, or generate new questions. My question with the map below was 'can this be right?' Turns out that yes it is basically right - central London is Pretland.
|A few Prets in central London|
If I was some kind of spy looking to find out inside information from business or governments, I could probably think of a worse strategy than to hang around in central London Prets and eavesdrop on conversations. Let's take a look at Glasgow Prets now.
|One of these has closed down (guess which one)|
I don't know about you, but I think it's time for a kebab vs sushi map of Milton Keynes.
|There's a clear winner here|
Now we see a pair of maps that clearly illustrate another kind of north-south divide in England.
|Levelling up is needed|
|That's more like it|
I could go on for hours with these maps, but I'll leave it there - you get the idea. Of course I won't leave it there. We haven't even had a KFC map of West Yorkshire yet.
|The traditional, original, authentic fried chicken outlet|
A Costa-kebab map of Norwich, you say? Ok.
|Big night out planned? Start here|
If you are so inclined and have the skills and tools, you can have a play with this data yourself. If you do, and you're used to working with data then you may end up with some data science-related thoughts, so I talk about this briefly below.
Data science thoughts
I'm sharing this dataset with everyone in csv and GeoPackage format because it's part of what I do, but also because I'm going to be using it in my QGIS training sessions in future. However, it's also pretty interesting from a data science demo point of view, from how to access and download it on the web via the xml files provided by the brilliant Food Standards Agency data team, to the nuts and bolts of the data itself. I think the Wikipedia definition of data science is a pretty good one, so here it is:
- "an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data"
The 'interdisciplinary' bit is important for me because it highlights the fact that people come to the data from many different traditions, backgrounds and perspectives. This means we approach, analyse, interpret and play with data coming from a range of different ontological positions, with differing epistemologies, methodologies, methods and tools. Sorry about the fancy language, but I think this is what makes data science (however defined) so interesting.
The 'scientific methods, processes, algorithms and systems' bit is also interesting to me because it speaks to the variety of methods and tools used, as well as methodologies. For example, I used a simple text editor that is great with big files (EmEditor) to convert the data from xml to csv, but you could do it in loads of different ways.
In relation to the 'extract knowledge and insights' bit, that is kind of what I'm doing here, but in a mostly whimsical, curiosity-driven way. But the bit about 'noisy' data applies, not because of the way the Food Standards Agency share the data but because of the messiness of the way data is collected, inputted, generated, and collated.
A small example of this is in the spelling of food establishments, which can vary a lot and mess up our results if we're not careful - e.g. when we have McDonald's vs MacDonalds vs MacDonald's or McDonalds. Knowing how to handle these kinds of things is in my view one of the most important skills when dealing with data, whether we think of it as data science or not.
If you were wondering how many of the food establishments in the dataset have 'data' in their name, wonder no more.
|I want to eat some data|
The Ms of data science
- Missing - where is everything? Why are there big holes in this dataset? Why didn't they fill in this cell! Aaargh.
- Messy - what a mess! Okay, we've got lots of data, but also lots of mess. Why did they put the postcode in the address field? Who is behind this? Is it a conspiracy?
- Mystifying - what does this mean? "PC load letter? What the **** does that mean?" Maybe I should read the metadata file. Hold on, where is the metadata? Where's the readme? Ok found it. So that's what a flugelbinder is! Okay, I get that, but why is there a cluster of US state fried chicken shops in Oldham?
- Maddening - why doesn't this work? Please work, please work, please work. Ok, great, this works. Now it doesn't. Why? Help. Solved with code. I am a genius! Oh, broken again. Hmm. If we accept that data is and are maddening at times then we'll be happier.
- Marvellous - what fun! Kansas Fried Chicken! Buttylicious! The Codfather! It can be good fun taking a deep dive with rich, interesting data. It can be fun but it can also tell us really interesting, useful things about the world and that's a good reason to do it.
- Mortifying - oops, I made a mistake. I didn't realise Shetland wasn't in Manchester (see below). I didn't realise that to search for an apostrophe I needed to put two apostrophes in between single quotes. But now I do. I also will never get Kingston upon Hull mixed up with Kingston upon Thames again because they are not the same place.
- Mundane - it's mostly just a lot of unglamorous grunt work, isn't it? I think this is basically the key to it all. Lots of dirty work, unglamorous and at times painful, but it helps lay the foundations for more fruitful analysis.
Food establishment names - from 'buttylicious' to The Codmother
There are a disappointingly low number of food establishments with 'haggis' in the name (7), and they are all in Scotland, including the very nice Happy Haggis Restaurant & Takeaway in Aviemore. But there are lots of interesting and brilliant names across the country, including.
- 17 Buttyliciouses
- 52 Codfathers- including more than a one Codfather II
- 1 called The Codmothers - in Plymouth
- 74 Taj Mahals (or with Taj Mahal as part of the name)
- 40 Spice Huts
- 591 with 'red lion' somewhere in the name
- 8,080 with 'fish' in the name
- 1,773 with 'golden' in the name
But what about the best of the names? Well, here are a few contenders.
- Flavour Junction (Newtownabbey, Northern Ireland)
- £1 Baguettes and Pies (Stoke-on-Trent)
- 200 Degrees Coffee Shop (lots of these, coffee sounds worryingly hot)
- 21 Jumpin Waffles (being a big Waffle House fan, this interests me - in Hyndburn, Lancashire)
- 900 Degrees (too hot for me, wherever it is)
- A Rule of Tum Burger Shop Ltd (so bad it's good)
- Lots of 'A Taste of' ones, but I like 'A Taste of Speyside' and 'A Taste of Hackney' the best
- Zuhus & Big Bro Burgers (Edinburgh)
- Zorba 6 (this is in Rutland)
- Ye Three Fyshes (this is in Bedford)
- So many Wok puns in there, including 11 Wok this Ways, but Wok with Jon (Hackney) is quite original
- The Sea Shanty (Wirral, clearly ahead of its time)
Costa in Shetland? Pentecostals divided?
One nice thing when mapping stuff like this is that it can allow you to see some errors straight away, in a way that you wouldn't in a big text file or spreadsheet. Have a look at the Costa map below to see what I mean.
|Costa in Shetland|
|The Costa doing data analysis can be very high|
There are loads of these kinds of examples, but the Costa one is a good one and provides a nice training example in my opinion. Even when you think you've very cleverly solved all these problems, you then discover that someone owns a 'Costa's Mini Market' (Hammersmith) or a 'Costa's Fish and Chips' (Trafford) or a 'Costa's Fish Bar' (Ealing), 'Costa Pizza' (Salford) so it's always useful to use a human when doing data science / analytics, whatever you want to call it.
Why did I do this? What is the point?
1. Training dataset; 2. Interesting; 3. Curiosity; 4. I like food. These are the main reasons. I like to think that other people who occasionally stumble upon my blog might be interested too, whether that involves reading this War and Peace-length effort or using the dataset itself. Sure, it can be a bit of fun but there are many serious learning points in a dataset like this and I think, given its richness and importance, it could be used more.
On the map front, I used OS Zoomstack data for the background mapping, with a style I created just for this little project and a fairly minimalist approach. Here are a few zoom ins for different places, without food locations. For Northern Ireland, I had to create it from OpenStreetMap data since OS Zoomstack obviously doesn't cover the UK.
Different zoom levels have different features turned on/off in my QGIS project but if you want to zoom out and make a very messy/noisy Nandos map you can do that too.
|Lots of Nandae on a map|
If you have reached this point after reading all the words and looking at all the maps, please congratulate yourself. I appreciate you coming along for the ride.