## Benford Gotcha

Benford’s Law … Gotcha Microsoft!

Introduction

If you don’t know Benford’s Law and you like quirky little mathematical things, you need to know what it is. It’s the law of the first digit which means that the probability of number 1 being the first digit in any numerical value is 30.1%, the probability of number 2 being the second digit in that number is 17.6% or thereabouts. It’s all very predictable and that makes it interesting.

Auditors and forensic accountants use Benford’s Law these days as do scientists and engineers. I use it from time to time too.

The purpose of this page is not to teach you Benford’s law but to show you how I used it this week to demonstrate the law to my delegates and then to determine whether the dataset I had been using was suspicious! For my demonstration I presented this table:

 Net Assets Net Income 45,958 72,889 177,539 27,355 167,634 5,229

There were 49,999 net asset values and 49,999 net income values: perfect for what I wanted!

The first digits from the table above are, reading down then across and down:

4
1
1
7
2
5

In cell C5 in the spreadsheet you can download from here, we extract digit one from the net assets by using this formula:

C5 =LEFT(A5,1)

And in cell D5 we use this formula to extract the first digit =IF(B5<0,MID(B5,2,1),LEFT(B5,1)) … why not just LEFT(…)? Because some of the net income values are negative and this was the most direct way of overcoming that obstacle!

That’s it, digit one done: now we create our frequency distribution and compare it to the values that Benford is expecting for net assets:

 Net Assets Digit 1 Actual Actual % Benford 1 28514 57.03% 30.10% 2 1372 2.74% 17.61% 3 2862 5.72% 12.49% 4 2936 5.87% 9.69% 5 2855 5.71% 7.92% 6 2841 5.68% 6.69% 7 2843 5.69% 5.80% 8 2874 5.75% 5.12% 9 2902 5.80% 4.58% 49999

We created a similar table for Net Income.

If the actual percentage is significantly greater or less than the Benford percentage, as is the case here for digits 1 – 6, we conclude that something funny is going on and that this data set is not behaving normally.

Well, in keeping with the purpose of this page, all we say is: someone must check the source of data now as they are either unreliable for our purpose or someone has to explain what they have done to fudge the data!

In truth, what had happened is that I had downloaded a dataset from a Microsoft site and I am sure they created that dataset using random numbers or some other creative process and never dreamed that someone like me would come along and test the veracity of their numbers.

No harm done, nothing serious happened: in fact, the lesson worked perfectly and Benford was vindicated.