Introduction

We all know that there are times when we are given an Excel file or we download and Excel file or we copy a table from a web site, a Word or PDF file and when we put it into Excel something like this happens:

Figure 1

There are rows of data but the data for each row is contained in just one column. In this example, in cell A5, we see these bits of information:

ABBEY, Alfred F, Leading Carpenter’s Crew, M 8249, Hazard, 28 January 1918, ship lost … the record of a man who died on 28th January 1918 when the ship he was serving on was lost because it was involved in the first world war.

What we want is a result like we see in Figure 2 so that we can analyse the data in some detail:

Figure 2

In that screenshot we see the column headings

  • Surname
  • Christian Name
  • Rank
  • Date
  • Event

The question is, how did I do that ? Almost 39,000 rows split as you can see here AND with no formulas in sight. No =LEFT(). No = RIGHT(). No =MID() … I did it all by using Get & Transform using Excel 2016 … Power Query by its new name. Here is how I did it!

Split Columns Using Get & Transform

The data came as a list so the first thing I did was to turn in into an Excel Table: make sure every column has a heading and that there are no blank rows in the list … in this case there were some blank rows so I selected the entire list and sorted it from A to Z and then deleted all of the blank rows.

Now, with the cursor in the list somewhere, click Ctrl+T and Excel starts the process of changing the list into an Excel Table, which will look like Figure 1 above.

Now, to Get & Transform: click on your table then go to the Data tab and click From Table/Range:

Figure 3

The Query Editor will open now, with this:

Figure 4

You really need the latest possible version of Excel for this because we are about to use features that are not included in the 2013 and 2010 versions of Excel.

Right click the Details column and click on Duplicate Column: that will create a new column called Details – Copy and select it. On the Home tab,  click the Split Columns Icon and Split by Delimiter: this is the dialogue box you will then see:

Figure 5

Make sure you have selected

  • the Comma delimiter
  • Split at Each occurrence of the delimiter
  • Split into Columns
  • Number of columns to split into … 3

Click OK and what you get is what you see in the first three columns of Figure 2

We will deal with the dates in a second but first, splitting out the final piece of information in each row, what I have called the Event:

Duplicate the Details column again if necessary and then choose to Split Column again but this time:

Figure 6

In this case, we split by the right most delimiter and that gives us the Event information we need! You can delete the Details – Copy column now!

We could simply have split all cells at the comma delimiter but in some rows there is a complication: after the Christian name in some rows you will see something like this: (service number not listed … ) and other messages: see the end of this page for advice on what to do with this additional information!

The Dates

To guarantee that I split out the dates properly I created a Conditional Column in this way:

Duplicate the Details column again

Split Column using the Comma Delimited at Each occurrence … BUT Advanced options … split into ROWS

Figure 7

OK

Figure 8 shows the result:

All elements of the data are not shown row by row … as at August 2017 this is a fairly new feature!

You might say, what? The dates are still buried. True but now we are going to create a Conditional Column to isolate them. See Figure 9 for this step.

Select this column, Details – Copy Add Columns tab … Conditional Column: you will see this dialogue box,

Figure 9

Essentially, we are going to build a sub query to find every date: I have done this in two ways but I am showing the most efficient way here: Call the column Dates and then working across the dialogue box do this

IF column name … Details – Copy Operator … contains Value … 1914 Then Output … Details – Copy

Click Add rule  and repeat the above but for the value 1915 instead of 1914

Add rule again for 1916 and again for 1917 then 1918 … these are the years over which the first word war was fought

Otherwise null

OK

Figure 10

Figure 11 shows the output column which requires one more tweak:

Look across each row now and you will see that we have six or seven duplicate rows for each person in the database, most of them have the word null in the Dates column but all we want is the date in the Dates column. So finally, select the Dates column and click on the drop down arrow at the top of the column and scroll down to find the word null and deselect it. Click OK. All of the duplicate rows just disappeared!

Delete all working columns that you don’t need if you haven’t already deleted them and then that is it: a complete list split out by Surname … Event as you can see in Figure 2 with not one VLOOKUP or INDEX and MATCH combination in sight.

Give your query a good name and then click Close & Load to load your work into Excel.

Done!

Additional Work: probably best not to bother with it

At this stage I have not dealt with the problem of (service number not listed … ) etc as I highlighted above. Since this is a common problem throughout the database, you could Find and Replace the phrases with nothing: delete the whole thing. If you do this and get rid of all such phrases, there will be no need for a conditional column. Moreover, since Get & Transform remembers everything you do, I also recommend that if you do want to get rid of the phrases we are talking about here, do so within Get & Transform so you don’t need to do it all again if this table is ever updated at source.

Having said all of that, unless you are prepared to spend hours on making your output perfect, do what I have done here but limit yourself to these columns:

  • Surname
  • Christian Name
  • Date
  • Event

This takes a long time to read I am sure but with practice, this will take you about 10 minutes to do from start to finish!

Dates: more detail

As I started to use this file I found a few niggling errors but since there are almost 39,000 entries, they are insignificant. Then I realised I should split the dates column into

  • Year
  • Quarter
  • Month
  • Day

To do this I copied the Dates column in the query editor and then used Transorm Date to create the Yar column … copied the Dates column again and created the Quarters column … and a new column for Month and finally for Day. That gave me a lot of flexibility in the analysis I was looking for

Duncan Williamson

28th August 2017

File to download: sorry, this is a large file that does not belong to me but you can download the data from here: good luck with it as it is a very good test of how to split data using Get & Transform.

 

%d bloggers like this: