Introduction

How about the analysis of Nobel Prizes for an unusual page on an Excel Blog?

There are two files to download for this post: the Excel file I am discussing and a PDF of the summary report I wrote in answer to a couple of questions about Nobel Prizes that have interested me for many years. See the links at the end of the page.

We all know what Nobel Prizes are, don’t we! Well, as you can read in the PDF file, my home town is Todmorden in West Yorkshire, England and Todmorden is proud of its TWO Nobel Prize winners: Sir John Cockcroft and Sir Geoffrey Wilkinson. One question that I have now answered is this: is Todmorden the smallest city/town in the world to have two Nobel Laureates? I am only interested in two Prizes because Todmorden has two: there are cities with three, four … more Laureates but until Todmorden gets its third, I will leave that question unanswered!

Along the way, someone else asked: is Todmorden Grammar School the only school to produce two Nobel Laureates?

I answer both questions and more in the PDF file, including telling you that the second Nobel Prize for Literature, awarded in 1902 went to a man who was born in November 1817. The oldest recipient to date of a Nobel Prize was to a 90 year old economist. The first Nobel Prize to be awarded to someone born in the 20th century was awarded in 1935. So far, no one born in the 21st century has won a Nobel Prize although a 17 year old has won the Peace Prize, as you might know; but she was born in 1997 so even she is too old!

Excel Features

Please see the note on the Excel file at the end of this article.

What does all of this have to do with Excel, then? Lots! If you go to the Nobel Prize web site you will see a lot of information there:

  • Prize and year of its award
  • Name of Laureate(s)
  • Date and Place of Birth of Laureates
  • Date and Place of Death of Laureates
  • Reason for award
  • and a lot more information

Setting up the Excel Table: data structure

Birth and Death Data

The birth and death data are not provided in tabular form and there is no data, nor should there be, on the populations of the cities relating to the Laureates. Moreover, here is a typical entry for a Laureate:

  • Birth  27 March 1845, Lennep (now Remscheid), Prussia (now Germany)
  • Death  10 February 1923, Munich, Germany

What should we use as the birthplace? Lennep or Remscheid?

What should we use as the country of birth? Prussia or Germany?

I chose to use the modern birthplace name so Remscheid for the above example and the original country, Prussia in the above example. If you want to change my file, feel free to do so and you can use both original place and country or new place and new country … this is what I decided to do! Of course, it means that you will almost certainly see differences between my tables and analysis of countries and Nobel Prizes since I am 99% sure that everyone else has used modern country assignments.

For the population information, I had to leave the Nobel Prize web site and find that information elsewhere: Wikipedia is a good place to start. You will see that I did not try to match population size to the birthplace at the time of the award of the Prize: I took the latest population information as a proxy for that.

Structure of Birth and Death Data

From an analytical point of view, I needed to create birth and death data in such a way that it was easy to use and obvious to understand. Taking the previous section into account, I structured both birth and death data ths way:

  • year, place, country

One anomaly here is that the birth and death detail of anyone connected with the USA, born or died there, was structured this way: year, place, state, country. This means that all USA entries looked like this:

  • Birth: 27 October 1858, New York, NY, USA
  • Death: 6 January 1919, Oyster Bay, NY, USA

I changed them to read

  • Birth: 27 October 1858, New York NY, USA
  • Death: 6 January 1919, Oyster Bay NY, USA

I did consider deleting the state initials but leaving them in did help when there were names similar to each other since some cities in the USA are the same as elsewhere. Similarly, Halifax UK has two Laureates and Halifax Canada has a Laureate too so adding or leaving additional information along with the place name was sometimes useful. There are other examples like this.

Names

There are inconsistencies in the way that some Laureates are named: for exmple, some Literature Prize winners are shown using their nom de plume and other Laureates are shown with one name but their Fact section shows a variation on that name. This means that Jack Smith might be listed as the Prize winner

  • but it’s a nom de plume of Bert Fry; and
  • his full name might be JJ de Jack Beacon Smith

On average, with just one or two exceptions, I used the headline name they gave, Jack Smith

Power Query

You can see from my Excel file that my basic Excel Table containing the raw data has these headings:

  • Year
  • Name
  • Prize
  • Sex
  • Birth Details: standardised as above
  • Death Details: standardised as above

I used Power Query to create the following, working table; and please spend some time reviewing my Query as there are some useful learning points in there:

  • Year
  • Name
  • Prize
  • Birth Details
  • Death Details
  • Sex
  • Birth City
  • Birth Country
  • Death City
  • Death Country
  • DoB ie Date of Birth
  • DoD ie Date of Death
  • DoB Year
  • Birth Month
  • DoD Year
  • Death Month
  • AgeAtDeath
  • AgeAtAward
  • Birth Month No
  • Death Month No
  • Birth-Death Month
  • Age Now

My Query codes are as follows:

let
nobel_ynpbd = let
Source = Excel.CurrentWorkbook(){[Name=”nobel_ynpbd”]}[Content],
#”Changed Type” = Table.TransformColumnTypes(Source,{{“Year”, Int64.Type}, {“Name”, type text}, {“Prize”, type text}, {“Birthplace”, type text}, {“Population”, type any}, {“Column1”, type text}, {“Birth Details”, type text}, {“Death Details”, type text}}),
#”Renamed Columns” = Table.RenameColumns(#”Changed Type”,{{“Column1”, “Sex”}}),
#”Replaced Value” = Table.ReplaceValue(#”Renamed Columns”,null,”Male”,Replacer.ReplaceValue,{“Sex”}),
#”Reordered Columns” = Table.ReorderColumns(#”Replaced Value”,{“Year”, “Name”, “Prize”, “Birth Details”, “Death Details”, “Birthplace”, “Population”, “Sex”}),
#”Removed Columns” = Table.RemoveColumns(#”Reordered Columns”,{“Birthplace”, “Population”}),
#”Inserted Text Between Delimiters” = Table.AddColumn(#”Removed Columns”, “Text Between Delimiters”, each Text.BetweenDelimiters([Birth Details], ” “, “,”, 0, 0), type text),
#”Inserted Text Between Delimiters1″ = Table.AddColumn(#”Inserted Text Between Delimiters”, “Text Between Delimiters.1″, each Text.BetweenDelimiters([Birth Details], ” “, “,”, 3, 0), type text),
#”Renamed Columns1″ = Table.RenameColumns(#”Inserted Text Between Delimiters1″,{{“Text Between Delimiters”, “DoB”}, {“Text Between Delimiters.1”, “Birthplace”}}),
#”Inserted Text After Delimiter” = Table.AddColumn(#”Renamed Columns1″, “Text After Delimiter”, each Text.AfterDelimiter([Birth Details], “, “, 1), type text),
#”Renamed Columns2″ = Table.RenameColumns(#”Inserted Text After Delimiter”,{{“Birthplace”, “Birth City”}, {“Text After Delimiter”, “Birth Country”}}),
#”Inserted Text Between Delimiters2″ = Table.AddColumn(#”Renamed Columns2″, “Text Between Delimiters”, each Text.BetweenDelimiters([Death Details], ” “, “,”, 0, 0), type text),
#”Renamed Columns3″ = Table.RenameColumns(#”Inserted Text Between Delimiters2″,{{“Text Between Delimiters”, “DoD”}}),
#”Inserted Text Between Delimiters3″ = Table.AddColumn(#”Renamed Columns3″, “Text Between Delimiters”, each Text.BetweenDelimiters([Death Details], ” “, “,”, 3, 0), type text),
#”Renamed Columns4″ = Table.RenameColumns(#”Inserted Text Between Delimiters3″,{{“Text Between Delimiters”, “Death City”}}),
#”Inserted Text After Delimiter1″ = Table.AddColumn(#”Renamed Columns4″, “Text After Delimiter”, each Text.AfterDelimiter([Death Details], “, “, 1), type text),
#”Renamed Columns5″ = Table.RenameColumns(#”Inserted Text After Delimiter1″,{{“Text After Delimiter”, “Death Country”}}),
#”Changed Type1″ = Table.TransformColumnTypes(#”Renamed Columns5″,{{“DoB”, type date}, {“DoD”, type date}}),
#”Added Custom” = Table.AddColumn(#”Changed Type1″, “Age”, each ([DoD]-[DoB])/365.25),
#”Changed Type2″ = Table.TransformColumnTypes(#”Added Custom”,{{“Age”, type number}}),
#”Renamed Columns6″ = Table.RenameColumns(#”Changed Type2″,{{“Age”, “AgeAtDeath”}}),
#”Inserted Text Between Delimiters4″ = Table.AddColumn(#”Renamed Columns6″, “Text Between Delimiters”, each Text.BetweenDelimiters([Birth Details], ” “, “,”, 2, 0), type text),
#”Renamed Columns7″ = Table.RenameColumns(#”Inserted Text Between Delimiters4″,{{“Text Between Delimiters”, “Birth Year”}}),
#”Inserted Text Between Delimiters5″ = Table.AddColumn(#”Renamed Columns7″, “Text Between Delimiters”, each Text.BetweenDelimiters([Death Details], ” “, “,”, 2, 0), type text),
#”Renamed Columns8″ = Table.RenameColumns(#”Inserted Text Between Delimiters5″,{{“Text Between Delimiters”, “Death Year”}}),
#”Duplicated Column” = Table.DuplicateColumn(#”Renamed Columns8″, “DoB”, “DoB – Copy”),
#”Inserted Month Name” = Table.AddColumn(#”Duplicated Column”, “Month Name”, each Date.MonthName([#”DoB – Copy”]), type text),
#”Renamed Columns9″ = Table.RenameColumns(#”Inserted Month Name”,{{“Month Name”, “Birth Month”}}),
#”Reordered Columns1″ = Table.ReorderColumns(#”Renamed Columns9″,{“Year”, “Name”, “Prize”, “Birth Details”, “Death Details”, “Sex”, “DoB”, “Birth City”, “Birth Country”, “DoD”, “Death City”, “Death Country”, “AgeAtDeath”, “Birth Year”, “Birth Month”, “Death Year”, “DoB – Copy”}),
#”Removed Columns1″ = Table.RemoveColumns(#”Reordered Columns1″,{“DoB – Copy”}),
#”Duplicated Column1″ = Table.DuplicateColumn(#”Removed Columns1″, “DoD”, “DoD – Copy”),
#”Inserted Month Name1″ = Table.AddColumn(#”Duplicated Column1″, “Month Name”, each Date.MonthName([#”DoD – Copy”]), type text),
#”Renamed Columns10″ = Table.RenameColumns(#”Inserted Month Name1″,{{“Month Name”, “Death Month”}}),
#”Removed Columns2″ = Table.RemoveColumns(#”Renamed Columns10″,{“DoD – Copy”}),
#”Filtered Rows” = Table.SelectRows(#”Removed Columns2″, each ([Birth Details] <> “???” and [Birth Details] <> “Founded: 1873” and [Birth Details] <> “Founded: 1647 in London, United Kingdom” and [Birth Details] <> “Founded: 1863 in Geneva, Switzerland” and [Birth Details] <> “Founded: 1891 in Berne, Switzerland” and [Birth Details] <> “Founded: 1917 in Washington, DC, USA” and [Birth Details] <> “Founded: 1919 in Geneva, Switzerland” and [Birth Details] <> “Founded: 1919 in Paris, France” and [Birth Details] <> “Founded: 1921 in Geneva, Switzerland.” and [Birth Details] <> “Founded: 1945 in New York, NY, USA” and [Birth Details] <> “Founded: 1946 in New York, NY, USA” and [Birth Details] <> “Founded: 1948 in New York, NY, USA” and [Birth Details] <> “Founded: 1951 in Geneva, Switzerland” and [Birth Details] <> “Founded: 1952” and [Birth Details] <> “Founded: 1957 in Pugwash, Canada” and [Birth Details] <> “Founded: 1957 in Vienna, Austria” and [Birth Details] <> “Founded: 1961 in London, United Kingdom” and [Birth Details] <> “Founded: 1971 in Paris, France” and [Birth Details] <> “Founded: 1976 in Dhaka, Bangladesh” and [Birth Details] <> “Founded: 1980 in Boston, MA, USA” and [Birth Details] <> “Founded: 1988 in New York, NY, USA” and [Birth Details] <> “Founded: 1997” and [Birth Details] <> “Founded: 2007 in Australia” and [Birth Details] <> “Founded: Launched 1992 in USA”) and ([Name] <> “None”)),
#”Inserted Year” = Table.AddColumn(#”Added Custom1″, “Year.1”, each Date.Year([DoB]), Int64.Type),
#”Renamed Columns11″ = Table.RenameColumns(#”Inserted Year”,{{“Year.1”, “DoB Year”}}),
#”Added Custom1″ = Table.AddColumn(#”Filtered Rows”, “Age at Award”, each [Year]-[DoB Year])

in
#”Renamed Columns11″,
#”Removed Columns” = Table.RemoveColumns(nobel_ynpbd,{“Age at Award”}),
#”Added Custom” = Table.AddColumn(#”Removed Columns”, “Age at Award”, each [Year]-[DoB Year]),
#”Filtered Rows” = Table.SelectRows(#”Added Custom”, each ([Birth Details] <> “Founded: 1873 in Ghent, Belgium” and [Birth Details] <> “Founded: 2013 in Tunis, Tunisia” and [Birth Details] <> “Founded: 1917 in Washington DC, USA” and [Birth Details] <> “Founded: 1945 in New York, USA” and [Birth Details] <> “Founded: 1946 in New York, USA” and [Birth Details] <> “Founded: 1948 in New York, USA” and [Birth Details] <> “Founded: 1952 in Brussels, Belgium” and [Birth Details] <> “Founded: 1980 in Boston, USA” and [Birth Details] <> “Founded: 1988 in New York, USA” and [Birth Details] <> “Founded: 1992 in Geneva, Switzerland” and [Birth Details] <> “Founded: 1997 in The Hague, The Netherlands” and [Birth Details] <> “Founded: 2007 in Melbourne, Australia”)),
#”Duplicated Column” = Table.DuplicateColumn(#”Filtered Rows”, “DoD”, “DoD – Copy”),
#”Extracted Month” = Table.TransformColumns(#”Duplicated Column”,{{“DoD – Copy”, Date.Month, Int64.Type}}),
#”Renamed Columns” = Table.RenameColumns(#”Extracted Month”,{{“DoD – Copy”, “Death Month No”}}),
#”Duplicated Column1″ = Table.DuplicateColumn(#”Renamed Columns”, “DoB”, “DoB – Copy”),
#”Extracted Month1″ = Table.TransformColumns(#”Duplicated Column1″,{{“DoB – Copy”, Date.Month, Int64.Type}}),
#”Renamed Columns1″ = Table.RenameColumns(#”Extracted Month1″,{{“DoB – Copy”, “Birth Month No”}}),
#”Reordered Columns” = Table.ReorderColumns(#”Renamed Columns1″,{“Year”, “Name”, “Prize”, “Birth Details”, “Death Details”, “Sex”, “Birth City”, “Birth Country”, “Death City”, “Death Country”, “DoB”, “DoD”, “AgeAtDeath”, “Birth Year”, “DoB Year”, “Birth Month”, “Death Year”, “Death Month”, “Age at Award”, “Birth Month No”, “Death Month No”}),
#”Removed Columns1″ = Table.RemoveColumns(#”Reordered Columns”,{“Birth Year”}),
#”Renamed Columns2″ = Table.RenameColumns(#”Removed Columns1″,{{“Death Year”, “DoD Year”}}),
#”Reordered Columns1″ = Table.ReorderColumns(#”Renamed Columns2″,{“Year”, “Name”, “Prize”, “Birth Details”, “Death Details”, “Sex”, “Birth City”, “Birth Country”, “Death City”, “Death Country”, “DoB”, “DoD”, “DoB Year”, “Birth Month”, “DoD Year”, “Death Month”, “AgeAtDeath”, “Age at Award”, “Birth Month No”, “Death Month No”}),
#”Added Custom1″ = Table.AddColumn(#”Reordered Columns1″, “Birth-Death MonthNo”, each if [Death Month No]>0 then [Birth Month No]-[Death Month No] else “”),
#”Filtered Rows1″ = Table.SelectRows(#”Added Custom1″, each ([Birth Details] <> “Founded: 1919 in, Paris, France” and [Birth Details] <> “Founded: 1946 in New York NY, USA” and [Birth Details] <> “Founded: 1971 in, Paris, France”))
in
#”Filtered Rows1″

Query Two

Several Nobel Prizes have been awarded to organisations rather than individuals and there is a second query for that. My basic database includes these organisations and you can find them in the Birth Details column with this kind of entry:

  • Founded: 1873 in Ghent, Belgium

I excluded these entries from my main query as you can see. My second query, then, looks like this:

  • Year
  • Name
  • Prize
  • Start Year
  • City
  • Country
  • Age at Award

and the coding for the query is:

let
nobel_ynpbd_org = let
Source = Excel.CurrentWorkbook(){[Name=”nobel_ynpbd”]}[Content],
#”Removed Columns” = Table.RemoveColumns(Source,{“Birthplace”, “Population”, “Column1”, “Death Details”}),
#”Filtered Rows” = Table.SelectRows(#”Removed Columns”, each Text.StartsWith([Birth Details], “Founded”)),
#”Inserted Text Range” = Table.AddColumn(#”Filtered Rows”, “Text Range”, each Text.Middle([Birth Details], 9, 4), type text),
#”Renamed Columns” = Table.RenameColumns(#”Inserted Text Range”,{{“Text Range”, “Start Year”}}),
#”Inserted Text Between Delimiters” = Table.AddColumn(#”Renamed Columns”, “Text Between Delimiters”, each Text.BetweenDelimiters([Birth Details], “in “, “,”, 0, 0), type text),
#”Renamed Columns1″ = Table.RenameColumns(#”Inserted Text Between Delimiters”,{{“Text Between Delimiters”, “City”}}),
#”Inserted Text After Delimiter” = Table.AddColumn(#”Renamed Columns1″, “Text After Delimiter”, each Text.AfterDelimiter([Birth Details], “, “, 0), type text),
#”Renamed Columns2″ = Table.RenameColumns(#”Inserted Text After Delimiter”,{{“Text After Delimiter”, “Country”}}),
#”Removed Columns1″ = Table.RemoveColumns(#”Renamed Columns2″,{“Birth Details”}),
#”Changed Type” = Table.TransformColumnTypes(#”Added Custom”,{{“Start Year”, Int64.Type}}),
#”Added Custom” = Table.AddColumn(#”Removed Columns1″, “Age at Award”, each [Start Year]-[Year])
in
#”Changed Type”,
#”Removed Columns” = Table.RemoveColumns(nobel_ynpbd_org,{“Age at Award”}),
#”Added Custom” = Table.AddColumn(#”Removed Columns”, “Age at Award”, each [Year]-[Start Year])
in
#”Added Custom”

The raw data comprises 4,487 cells and my working data comprises 17,916 cells with a further 196 cells for the Organisation query

Standardising the data and creating the Query took me a long time. Most of my time was spent in scraping the birth and death data from the Nobel Prize web site since it is given on a Laureate by Laureate basis and not, as I said earlier, in tabular form: I spent many hours on scraping these details.

What to Look out for?

The dates of birth of all of the earliest Laureates come before 1st January 1900 and that is a problem for Excel. However, look at my main query and you will see that Power Query copes with this without a problem: I simply did a subtraction of one date from another and it worked!

I split many columns in my query

I took, eg, the date of birth and used Query functions to split out the birth year and the birth month. Birth month by name, eg January; and by number eg 1 … The same for the date of death if that was appropriate

Analysis

In terms of my analysis, I mainly used Pivot Tables: a pivot table and one or more graphs

Look at the tabs

  • birth_v_death_months
  • birth_v_death_country

To see how I have combined Pivot Tables (PT) with Conditional Formats (CF) using a formula

What these two tables do is to show if someone born in

  • January, died in January … no real reason for knowing this but I did it to illustrate PT, CF with a formula …
  • London, died in London … this is more interesting as it seemed almost normal for someone to have moved during their lifetime. The formula in the CF is simple but interesting here so please look at it. Remember, I used old country names here so it might be different from any other country analysis you see.

I also analyse Prizes per country, Prizes per city, male and female, number of Prizes per category, Prizes per category by year, number of Prizes per year, number of Prizes by Laureate since some people have won more than one Prize.

Conclusions

This is a very interesting database and it is very rich in terms of what we can get out of it. In terms of Excel it is also relatively rich in terms of data capture, data structure and using such features such as Power Query (Get & Transform).

Please feel free to download my files and let me have any comments as you deem appropriate.

 

Duncan Williamson

2th January 2018

Hail John Sutcliffe

PDF File nobel_prize_analysis_2017

Excel File … Please note, I made a lazy mistake with the raw data table and created a column that I didn’t need, in the end. The laziness is that I forgot to delete the column and all would be well. However, I forgot and did all of the rest of the work. It doesn’t cause any problems and errors.

On the other hand, the main Query shows more than 300 errors BUT, you will not see any errors in the Query table and the analysis that follows: it’s to do with how Power Query deals with month numbers in a custom column … not my fault and again it causes no problems. Anyway, here is the file … nobel_prize_winners_to_2017_birth_details_edited

%d bloggers like this: