Introduction

I was asked a question on quora.com about creating a trendline that fits only part of a data set and without creating a new data set. Here is my answer: I talk about a training and testing regression model, which is fine although it takes the question further than the questioner asked. Never mind, if you do everything I say here apart from the full regression modelling and just create one trendline, this is also how to do that@

Training and Testing: splitting the data

You can create a training and testing data set in Excel using random numbers and you can download my working file from the link at the bottom of this page.

For example, assign every data point a random number and then use the top 70% or 80% or whatever you choose as your training data set: sorting the data to help you to separate out the top 70% or so from the rest. Then create your trendline by using your training, top 70 – 80% data and use it to apply to the testing data set.

I used a data set of mine comprising the Total Revenues and Gross Profits of 14 major airlines over a period of 10 years: 140 data points in all. My model uses total revenue (X) to predict gross profit (Y).

You can see from the screenshots below what I finished up with.

Setting up the Model

To start with, I entered the data by copying them from my original spreadsheet. I then assigned a random number to each data pair, using =RAND(). I converted those numbers to values (otherwise, they are volatile and change every time you press enter of F9) and then sorted the data on that column from largest to smallest.

Using an 80% training set, that meant I used 140 * 80% = 122 data points as my training model. Therefore, my testing data set comprised 14 – 112 = 28 data points

I plotted the training data on a graph and added the trendline and R square result. I also used the =INTERCEPT() and =SLOPE() functions together with the =RSQ() function to confirm my results.

Applying my Model

At that stage, I applied my training regression equation of Y = 0.2181X + 840.992 to my testing data and you can see some of the results I obtained in the screenshots below.

All in all, not difficult to do and you can decide whether a training and testing model works well in this case.

Otherwise, even if you are not interested in everything I have said, at least I have answered the trendline question along the way!

Excel File and Regression from the TooolPak

In my Excel file, see the download link below, you will also find the regression output I obtained from the Data Analysis ToolPak: it gives a few more insights into the training model.

Screenshots

 

Download the Excel file train_test_rev_gp_model

 

Duncan Williamson

11th September 2019

%d bloggers like this: