A friend of mine works for a very large insurance company and she contacted me this morning to tell me that she receives a 600 page pdf file from time to time. The file contains summary data of business generated …

My friend’s job is to convert the pdf into Excel format. One problem is that this report comes from the IT department and is merely a report from a database. This means that there is a file header and page headers: the same column headings on every page. Question: how to copy the pdf file into Excel AND deal with the 600+ repeated page headings … ?

My friend sent me the pdf file and part of her Excel file and it was fine apart from these things:

Excel generated merged cells for everything: every column heading was spread across two merged cells; every value or date was merged across four merged cells; and as well as every other row being blank, many row heights had been changed from their default value.

What to advise?

Firstly, I asked my friend to go to the IT department and ask them to generate an Excel report in addition to or instead of the pdf file … for me that’s the most sensible thing to do!

However, let’s deal with the problems that have arisen, as a lesson to everyone.

Firstly, what about the merged cells? Select the whole table including headings, values … the lot and click on Merge and Centre Icon: all merged cells should be unmerged now.

Because the pdf file comes from a database, the column headings are a bit disjointed and that meant that now there are two columns with the same name: inception date … you don’t want column headings that are repeated.

Now there is a table with no merged cells but every other row is blank: how do we get rid of the 600+ pages of blank rows? Answer: sort the table! Select the whole table including the column headings and sort on the column heading most important to you; decide whether you want to sort from smallest to largest … then click OK. All data are now gathered together and the blank rows are either at the top of the table or at the bottom of the table, depending on your sort choices. Delete blank rows if necessary now and delete any data/headings that you don’t need too.

This report also generated a blank column in addition to every populated column: just delete the blank columns now, ensuring that they really were empty.

What’s left? In this report, the heights of a lot of rows had been set at non default values: select all rows and change the heights to the value you want.

In this case, that’s it. By following those few steps, a potential nightmare and day long job, just shrank to almost nothing.

My key advice, though, is to talk to the IT department.

Duncan

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: