As I am working away this week, I can be found looking at, opening up, working on a spreadsheet or two. Last night I came across yet another excuse to build an Excel data set. I found the TopTrack 250 and just had to have it: all seven years’ worth of day for the top 250 companies according to the Top Track definition. Look here for the 2013 data set.
So I duly copied and pasted the data for 2013 into a spreadsheet and edited it. Having seen that I liked what I had done I then copied over the other six years’ worth of data from Top Track. Because the data copied over in a very standard and predictable way, I was able to select all six sheets of the additional data and work on them all concurrently: moving data from one row to another, setting up columns in a better way, lining up the data to make it consistent with what I had already done with the 2013 set. Easy and not too time consuming.
But now the wildcards. On the web site that the data came from they had symbols such as asterisks that pointed to a note that said … average, estimate … or something like that. I needed to get rid of them all! As it was late and I wasn’t looking carefully, I carried out an edit and replace of * and having pressed Replace All, found that ALL of my data disappeared … it was doing what a wildcard must do … in this case, any character was being replaced by nothing! A very hasty Ctrl+Z put everything back to the way it was but I still had to get rid of these symbols.
What to do? Delete every symbol manually? No, there were too many of them. Given that the symbol was in the same place in the cell it was in, at the beginning, I ended up with a formula solution that I entered in a new blank column that I would eventually use to replace the current data:
Why D6? That’s where the first of my data points is in each of the tables I set up. I then copied the formula across to the next cell on the right and then down to the bottom of the table … it worked a treat.
I remembered that I should have added that there were other symbols in the data in addition to the * and I amended my formula to cope with the ‡ symbol:
Now I had to copy and paste my edited values as text rather than leaving them in formula format, otherwise I have to leave the original data in the file AND have the edited version. So I just copied and pasted the edited values over the original data and my data were now clean and tidy … but read on!
Numbers as Text
One problem remained: having replaced these symbols, the result was numbers as text rather than number as number. In this case I did this:
- enter 1 in any available cell
- copy that cell
- select the entire range of numbers … all of them so you miss nothing and don’t waste your time selecting just the affected ones if your data set is large like mine
- paste special, multiply
- ta daa! your numbers as text have now become numbers as numbers
- Delete that cell with 1 in it
There you are wildcards and paste special to set up a nice, clean table of data copied from the web.