A slightly different kind of tutorial today in that it involves no maps and instead is about how to get a data table from Wikipedia - or other websites - into Excel or Google sheets without any pain at all, and without messy copying and pasting. First I'll do this for Excel and then for Google Sheets. I'm going to be doing it with a table from the List of World Snooker Championship winners Wikipedia page, as of 23 April 2022, so the screenshots below reflect that but regardless of which table you choose, the method is the same. Note that there are multiple tables on the Wikipedia page I am using, just to demonstrate that even if this is the case you can still import the table you want to, quickly and simply. I am doing this from Wikipedia, but of course it should work equally well for any similarly formatted table on the web.
|Messy copy/paste? No way!|
Import a Wikipedia table directly into Excel
I'm using Excel 2019 here but this should work on other recent versions too. So, the basic steps are as follows (screenshots are from Excel on Windows).
- Copy the url of the Wikipedia page that your table is on - in my case its https://en.wikipedia.org/wiki/List_of_World_Snooker_Championship_winners
- From an empty sheet in Excel, go to the Data tab, then Get Data on the far left, then From Other Sources, then From Web (as in screenshot below).
|Well, this is useful|
- Then you can just paste your URL into the From Web box - here I'm doing it the basic way but there are more advanced options available. Once I've pasted in the URL, I just hit OK, Excel has a little think and then opens up a new Navigator window, shown below.
|Excel lists the various tables at that URL we pasted|
- In the Navigator window (screenshot above) you'll see a list of tables on the left if there are more than one - as in my example. Since I know I want the List of World Snooker Championship winners table, I click on that and then I see a preview of the same data that's on the Wikipedia page, as shown below. Hurray!
- Then I click on the Load button and, well, would you believe it!? I see the data from the Wikipedia table in Excel now (formatted in green/white with column filters on, but that's not a problem). You will probably also see the Queries & Connections panel open on the right of Excel here, but that's also not a problem - you can close that if you want to.
|Wikipedia table is now in Excel :)|
- What next? Well, it depends what you need to do with it, but if you just want a clean, text-only dataset then you just remove all the formatting and save it as an xlsx or csv file and that's that. An easy way to remove all formatting is just to select all the text in the sheet (using CTRL+A) and then via the Home tab use the Clear tool to clear the formatting (screenshot below). If you want to also turn off the filters, then select one of the column header cells (e.g. Venue in my case) and then hit the Sort & Filter button in the Editing section in the Home tab (same section as the Clear button is in) and then hit the Filter button to turn the filters off. Save the file and then smile.
|Clear the formatting quickly this way|
Import a Wikipedia table directly into Google Sheets
- Copy the url of the page where your table is located - in my case it's https://en.wikipedia.org/wiki/List_of_World_Snooker_Championship_winners - and note that there are multiple tables on this page, but that's not a problem.
- Make sure you have a new, blank Google Sheet ready to go - and then you are all set. In cell A1 of your sheet, type in =importhtml and then you'll see the importhtml function appear, as in the screenshot below. Note that you don't have to enter the formula in cell A1, but that's what I'm doing here just so that the table starts in the top left cell. Read on for how to complete the formula. You can just copy/paste my formula below if you want to.
|importhtml is the magic function here|
- Now, here is the formula I entered - I'll show you it first and then I'll explain what each part does: =importhtml("https://en.wikipedia.org/wiki/List_of_World_Snooker_Championship_winners","table",3) - note that you don't need to enter the 'locale' bit shown in the screenshot below, but it's there so that you can specify different locales for things like date formatting etc (if relevant).
|The different parts of the formula|
- The =importhtml bit tells Google Sheets to use the importhtml function - that is, it does exactly what you'd expect: it imports some kind of html text from the internet, like a Wikipedia table. It is followed by an open bracket and there is a close bracket at the very end of the formula, just like normal in Google Sheets and Excel.
- Then we have the url we are importing the table from, in quotes, followed by a comma - so in this case I have "https://en.wikipedia.org/wiki/List_of_World_Snooker_Championship_winners" because that's where my table is.
- Then after the comma, I have to specify the 'query' bit we see in the screenshot above - that is just whether you want Google Sheets to import a list or a table, but in this case we know we want to import a table so we just enter table in quotes, like this - "table" and then another comma.
- What does the 'index' bit do? Well, remember how we have more than one table on the page we want to import the table from? The index number relates to the number of the table we want to import - but note that this doesn't necessarily follow the order in which it appears on the page. It looks like my List of World Snooker Championship winners table is the second one on the page, but I had to enter a 3 to get it into Google Sheets using this formula. Just watch out for that, it can be a little confusing but if the table you're looking to import doesn't appear, try a different number here - the formula can always be edited by clicking on the cell you entered the formula into (in my case A1).
- Then, watch in stunned silence while Google Sheets take a moment to think about it and then load your table, as shown below.
|Wikipedia table in Google Sheets, in about a minute|
- If you want to save a text-only version of the table, just go to File, Download and then save it as a csv - or, copy/paste the data into a new sheet as text only, or whatever other method you happen to prefer. But the main thing is that you have the data and can now do what you want with it!