Tuesday, December 23, 2008

Part 20: Tab-delimiting whether you want it or not

Sometimes, I have data I want to put in Excel that's not in terrific shape. For whatever reason (generally: I didn't create it) fields are separate by a variety of methods: commas, spaces and tabs. What I want to do is put it in to excel, search for multiple spaces, replace them with a symbol (perhaps an ampersand) and then use the "text to columns" feature to get the text to line up properly in columns.

Yes, that is what I want to do. Excel, however has other ideas. I paste the data and it says (Is it the paperclip? At least seems to have been mercifully killed off.) "oh, look, I see tabs, they must want it tab-delimited" and it tab-delimits it. Which would work great, if it were tab-delimited. But if that were the case, I could just as well put it in to rows and then to a text to columns and tab delimit it myself. But no, it takes one step forward and at least three steps back.

So my data is all mangled. Some rows had three tabs and were delimited in to three columns, some had five and were in five, some had none, and were in none. Oh lah-dee-dah. That didn't work, so I went back and tried a "paste special." Assuredly, if I chose to paste in "text" it would, you know, let me paste text, without the formats. Oh no, the columns remain.

That's one step back. The next step back is to open Word. Why Word? Well, even though it is part of the same product suite, it seems to have been written by an entirely different team of chimpanzees. I paste the data in to Word and it doesn't break in to columns, probably because Word doesn't have that capability. I then find all tabs and replace them with spaces, so that every column is now delimited by multiple spaces.

Now, I can paste this in to Excel, and it doesn't break in to columns (thankfully it doesn't recognise it as "space delimited" and throw in a bunch of blank columns). I search for two spaces and replace it with a semicolon. Now I have strings of semicolons (e.g. ";;;;;"). That's easy enough, I run a series of "find ';;' replace with ';'" until I have each field separated by a semicolon. Then, I do text-to-columns, semicolon-delimit and Voila! I have my data in the format I want.

That was simple, right?

No comments: