Power Query, Power Pivot, Power BI Case Study

Here is a case study that I will be using in my course in Singapore next week: Financial Modelling with Business Intelligence: 8th and 9th June 2017.

Feel free to work on this case and you will see some tips on how to make things work in PQb but unless you write to me to discuss your answers, there is no spreadsheet from me!

Time Series Analysis: Retail Sales

A Case Study to use Power Query, Power Pivot and Power BI for Desktop to explore a relatively large data set.

The Data Set

I came across this data set having read a blog post on analysing data using R statistics software. There is nothing wrong with using R for this analysis at all but I wanted to use Excel and so I have developed this case study.

That article can be found here: https://shiring.github.io/forecasting/2017/05/28/retail_forcasting_part1

The data referred to in the blog post come from here and will be called Online Retail.xlsx http://archive.ics.uci.edu/ml/datasets/Online+Retail

Introduction

The purpose of this case study is to encourage you to take the following steps on your road to big data analysis using Excel as well as Power BI for Desktop. Please note, I also encourage you to learn and use the R statistics sofwware that I have aleady mentioned because it can do things that Excel and Power BI for Desktop either cannot or that they cannot do without doing a lot of extra work.

The data set: “This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK based and registered non store online retail. The company mainly sells unique all occasion gifts. Many customers of the company are wholesalers.”

In summary you are required to:

Appreciate the columns that exist in the data set

  • Invoice number
  • Stock code
  • Description of transactions
  • Quantity bought or returned
  • Invoice date and time
  • Unit price
  • Customer ID
  • Country

NOTE on CSV files: It is a common idea to use or convert an xlsx file into a CSV file for Power Query work but in this case doing that might cause you to return about 60% error rate in your query because of the way that dates are recorded in the original data. If your location/date formatting is different from that of the data file, this is where any problem will arise. It happened to me in this case!

Use Power Query and then Power BI fo Desktop to import the data

Tidy the data set that comes with the data set and do these things

Give your Query a meaningful name

Create columns for

  • Day
  • Time
  • Month
  • Revenue
  • Deductions from revenue (eg returns, discounts and so on)

Ensure that the imported columns and your new columns are of the most appropriate Data Type: change anything that you think should be changed. Reorganise the columns, too, to put them in the most logical order: your choice

Creation of the Day and Month Columns

Power Query and Power BI for Desktop

The creation of the Day and Month columns are relatively easy now.

To create the Day column,

  • Select the column from which you will create the Day column
  • Add Column
  • From Date
  • Day
  • Name of Day

This will give you a column called Day Name that is filled with Monday, Tuesday and so on.

Creation of the Month Column is very similar to the creation of the day column:

  • Select the column from which you will create the Day column
  • Add Column
  • From Date
  • Name of Month
  • Month Number

This should create a column called Month Name and you will find it filled with January, February and so on

Power Pivot

Because we are dealing with a large file, it is best if we use Power Pivot to analyse our data now: as I did this, I worked through the Blog article mentioned above, were I could and created multiple Pivot Tables and Charts as I did so.

Even though I used Power Pivot, the resulting file is still large but I did not crash my system or Excel!

  • Count of Revenue transactions by country: note that the UK is an outlier
  • Count of Revenues and ReturnsEtc over time
  • Density of Revenues and ReturnsEtc over time
  • Total Revenues and ReturnsEtc per day
  • Total Revenues and ReturnsEtc per time of day

Transactions, quantities and items per customer and day

I am calculating the following metrics per day and customer.

  • n: number of different items purchased/returned per customer per day
  • sum_it: net sum of items (quantity) purchased/returned per customer per day
  • sum_in: mean net income per customer per day

From these, calculate mean numbers per day:

For Power BI for Desktop

As above plus

  • Count of transactions by day and time (treemap)
  • Revenue per month and day (treemap)

 

  • mean_in_cust: mean net income from all customers per day
  • mean_quant_cust: mean net quantities of items from all customers per day
  • mean_items_cust: mean number of items from all customers per day

 

  • ReturnsEtc per day
  • Proportion of Items Bought/Returned by Country
  • Net income & quantities summaries
  • Income from purchases and returns
  • Mean price per units sold per day
  • Purchases of most sold items

 

Duncan Williamson

3rd June 2017

%d bloggers like this: