Here are the opening paragraphs of the PDF file relating to this page that you can download from the link that follows. You can also download the Excel file I have created.
I have played around with the heights and weights of professional footballers for many years and here we are again. This time I managed to get a larger database of players to work from than before: http://www.footballsquads.co.uk/eng/2016-2017/faprem.htm Although I have only analysed the English Premier League (EPL) it is easy to find data on other English leagues and, indeed, from around the world.
In summary, what I wanted to do was to
- Find some data and clean it as appropriate
- Create an overview of the data such as all heights v all weights and to create the regression equation
- Set out descriptive statistics
- Create some graphs
- Do all of the above for
- Individual clubs
- By players’ nationality
- By players’ position on the field
I was also interested in sharing my methods since there are some things that people are doing with their analysis and dashboards that are either overdesigned or are more difficult/complex than they ought to be. I use the very effective and efficient DATABASE functions, for example, while others will use INDEX() and MATCH() or similar combinations.
By the way, I don’t really draw any firm conclusions about this topic since I think you should draw your own conclusions and let the data speak to you! Similarly, I end this case by encouraging you to create your own dashboard out of my work and, of course, any additional work you do yourself.
Using the link I gave in the introduction, I found the data I was looking for although I had to scrape every page to get what I wanted. Still, I did get what I wanted, the heights and weights of 633 EPL players. The database I used contains 1,107 named players but the heights and weights are not collected for everyone for some reason. In the database there is also a section showing some players who are no longer at the club … I found it odd that they would provde these extra few players and ignored them: after all I would be double counting without a doubt in some cases if I did include them.
The analysis concentrates on the 633 players for whom I got full data, therefore.
The following screenshot shows that data I used although you will see in the file that I have also left in the columns containing dates of birth, birth place and previous club. In this version of this case, I have not done anything with dates of birth/ages: feel free to work on this by yourself!
Download the full text in this PDF file: premier_league_analysis
Download the Excel file here: premier_league_analysis
29th December 2016