At the end of one of my courses in September I was asked by a delegate if I could take a look at his attempts at modelling yield curve data as he was trying to optimise his cost of debt estimates for his company. This is a complex matter and you can see from the spreadsheet I produced and the results I got that it is quite a technical matter. If this is something that you are not interested in, that’s fine but I felt it useful from a **data transformation** as well as **regression** and **residuals analysis** point of view if nothing else.

Here is my email to my delegate in which you will see my explanations and method. The link to the spreadsheet is at the end of this page.

** As promised, with apologies for my delay, here are the results I have obtained.

Firstly, I should say that I have relied on your data for all of my analyses since I am unable to replicate your study. However, I am sure your data has a lot of integrity so I am happy to work on it.

You will see that I have taken your data and transformed it: I have not really worked on your data in its raw form. I have **transformed** the data by Natural Logarithms as you can see in the sheet *data_LN*. It is easy for you to see what I did. I have created two graphs: x v LN(y) and LN(x) v LN(y). I have then made predictions based on the LN v LN graph and the regression equation I obtained using **TRENDLINE** analysis. Then I created a Prediction column and a **Residuals** column and created further graphs.

Whilst the **R squared** values are quite good here, you can see from the residuals that there are still problems with these transformation. The residuals form a pattern and then degrade after about 3.5% … maybe this is a good thing because they seem to become more **randomised** over the higher x values, which is what you want.

However, I then carried out **age ordered analysis of the transformations and residuals** and the graphs show a disaster: over time there is a distinct pattern that is replicated with the plotting of **Residuals t -1**: that is, age ordered residuals lagged by one period. I also lagged the residuals by t-2 and it doesn’t get any better: I deleted that column from my worksheets.

I then thought that maybe your model is **over designed**, by containing too much duplicate information: that is, there was TTM of 0,4 twice, 05 twice, 1.1 13 times … all I could do was to eliminate all but the first instance of each repetition. You have the detailed information and can delete like for like in case I didn’t. I then carried out the same analysis as I have just described and again you can see that my results may not be better than before.

The residuals do seem to randomise again and this time from around 1.5% which might be an improvement on the raw data analysis. The age ordered residuals are no better, however! Nevertheless, it is worth your while, I think, simplifying and cleaning these data and concentrating on a good albeit smaller sample of rates as I think I have found they might be easier to work with as far as regression analysis is concerned.

I will conclude by saying that the **NSS methodology** you are using is a standard methodology as you are well aware and my regression work is also standard regression work. I am not sure whether I have improved anything but at least you can see that there is some scope for you to refine your database and gain some improvements through simplification.

Thanks for letting me have your data and your analysis and I hope I have helped you at least a bit and do email again if there’s anything you think I can clarify for you or could even help with again.

Best wishes

Duncan**

Get the spreadsheet here … cost_debt_dw

Duncan Williamson