Rough to Smooth

When I posted this page, it would be a good idea to encourage my visitors to work out for themselves how I did what I did to go from downloading data in a very unfriendly format to data that was exactly what I needed to be able to analyze.

Brian Green wrote to me and asked how I did it! When I worked through and wrote down what I did it took a long time: hence this page! This is what I sent to Brian and PLEASE read it carefully as it is a bit complex in parts:

This problem comes very often when copying and pasting from some web sites: it’s a pain but not deadly! Very commonly, the problem is that all data on one row ends up in one cell and that is easy to manage with the text to columns function in Excel. Of course, that is not what has happened here so I threw that in as a free gift!

The approach I used in the case of the Forbes 2000 is firstly, as I said on the site, to copy and paste special: that gets rid of hyperlinks and graphics that otherwise came with the data. You don’t want to have to try to clean up such nonsense when there are 2000 instances of each problem!

The following is what I did; and I have to say that there are other ways of solving this problem so I am only giving one solution:

each record looks like this

Cell A7 1
Cell A8 JP Morgan Chase
Cells A9:E9 United States $115.5 B $17.4 B $2117.6 B $182.2 B

That is, each company’s record has been spread across three rows;

I recommend the first thing you do is to copy the sheet so that if you really make a mess, you’ve got the original to work from.

Starting in Cell F7 I entered the following =A7
Then in Cell G7 I entered =A8
Finally, in Cell H7 I entered =A9 … which I then copied across to L7

In the range F7:L7 you should now see the entire, contiguous record relating to JP Morgan Chase

The problem now is to copy that down the entire sheet which is, you’ll appreciate more than 6,000 rows deep. Do the following:

Select the range F7:L10 and then drag that all the way down to the end of the entire range … make sure you select F7:L10 … can you see what you are doing and why? You should now have the list of all records in a format that you are almost happy with

Now you have to copy and paste all of your new data series and paste special as values: this converts your formulas to numbers and words rather than leaving them as formulas.

The next step is to custom sort the data: sort on column F and select smallest to largest: what this does is to get rid of all the empty rows between records. This uses the Rank data column and puts the company ranked 1 at the top of the list and the company ranked 2000 at the bottom of the list.

Now you need to convert $115.5 B into 115,500 and $56.7 M into 56.7 … this puts both sets of numbers on the same base: millions. Alternatively, keep $115.5 B as $115.5 and make $56.7 M read 0.0567 … billions

Find and Replace all $ … please note, there might be a SPACE following this $, so find and replace $ SPACE … with nothing so that $115.5 B becomes 115.5 B, $56.7 M becomes 56.7 M and so on

Select all of the values and Find and Replace the letter B and the space that precedes it, in all cells … that is, SPACE B … they should all become values now. All cells with the letter M in them have not been changed yet and the following formula will deal with that now:

In Cell N7 enter =IF(RIGHT(I7,1)=”M”,LEFT(I7,FIND(“.”,I7,1)+1),I7*1000)

What that formula does is to find all cells that have SPACE M in them and then extract the values from them. Alternatively it says, this cell has no SPACE M in it so multiply what is there by 1000. We have now converted B to billions and M to billions too.

Copy your new values range with paste special as values and delete the original numbers …

If by chance the values that had letter M in them are still not showing as values, do the following: enter number 1 in a blank cell and copy it, select all of the values then choose paste special and select multiply, click replace all and everything should now be a number. Delete that number 1 now.

That’s it!

As I said above, there are other ways of doing this without a doubt but that is what I did!

I hope this helps and if you do work through with a different set of techniques then I would be very happy to learn about it.