Announcement

Collapse
No announcement yet.

Excel boffins, thinking caps on please

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Excel boffins, thinking caps on please

    I have a series of large text files holding details about organisations and businesses (they're basically search dumps from a website) which I want to parse into a spreadsheet ideally (database at a push). When saved as a text file each record on the page appears like this..
    Code:
    *ACNE SUPPORT GROUP* *Address: *P.O.Box 9, NEWQUAY, TR9 6WG. *Tel: *08708 xxxxxx. *Email: *[email protected]. *Website: *www.stopspots.org. *LAST UPDATED:* 16/01/2007 ------------------------------------------------------------------------
    Every record is essentially the same, but some do not have a contact name, and some don't have an email address etc.

    Is there a semi-easy way to parse this information into a spreadsheet, using column headers such as company, contact, address, etc.? I imagine we're talking some form of macro/scripting, but as familiar as I am with Excel I've never got into that side of things.

    Any ideas then to save me from inane googling and headaches? My best thought so far was to use a find/replace to replace the ------------ with a carriage return, and read it in as a text file using * as a delimiter and then do some manual tidying

    #2
    Use text to columns in excel (data > text to columns). Choose Delimited then set the * as the delimiter in the other box. You'll then just need to add column headers and tidy up a bit. I just tried it with the sample you posted and it worked a treat.

    Comment


      #3
      Yeah, that's the same as using the txt/csv file import wizard, and works ok, but the text file comes out as one continuous line, I need to sort out replacing the ------------------- which ends each record with a carriage return.

      Comment


        #4
        Originally posted by babs View Post
        the text file comes out as one continuous line
        Ah. Hadn't realised that. Like you, I've not delved into macro/scripting. I'll keep it the matter in mind in case I have a flash of insight.

        Comment


          #5
          The continuous line isn't a problem, as it's easy enough to open in Word and replace each ---------- with ^p (carriage return) which forces excel to put each new line on a new row, the problem comes where some records have less fiields (columns) than others, so the columns don't all line up accurately. I'm sure I'll get it, I'm just trying to find the most efficient way

          Comment


            #6
            If I understood correctly what you need, your problem is easy to solve by using text editor that supports bot tab and carriage return symbols. If you don't have one, my personal recommendation is Fookes Software's Notetab Light, excellent (and completely free) text editor for Windows.

            1. Paste your results into a text editor.
            2. Replace every fields header (i.e *Address: *) with nothing. After doing it, your sample row should look like this:

            *ACNE SUPPORT GROUP* **P.O.Box 9, NEWQUAY, TR9 6WG. **08708 xxxxxx. **[email protected]. **www.stopspots.org. ** 16/01/2007 ------------------------------------------------------------------------

            3. Replace every ** with tab return (^T)
            4. Replace --------------------- with carriage return.
            5. Paste pure ASCII result into Excel.
            6. Excel will now automatically format text that ^T means start of the new cell, and ^P means row change. End result will look like this:



            It doesn't matter that some fields might be empty in the original source, because even empty fields have field header before them. For example let's suggest that you are missing address from the first row. After replacing the field headers, result would be something like this:

            *ACNE SUPPORT GROUP* ****08708 xxxxxx. **[email protected]. **www.stopspots.org. ** 16/01/2007

            After you replace **'s, empty field will be simply marked with ^T, meaning that it will parse correctly into Excel.

            I hope this helps.
            Last edited by elkatas; 02-10-2007, 21:37.

            Comment


              #7
              Thanks, I actually did something very similar already, just using Word (of all things) in the end, which I used to replace ------- with ^p. I then used Excel's text import wizard and set it to use * as a delimiter which gave the same results.

              The problem I've come up against is that some records have fewer fields. For example, some don't have a website, or a phone number, and so the fields (columns) to the right get shunted left, whereas I need them to line up, so a blank field should result in an empty cell. This is where it gets tricky and some scripting is needed I think to look for field names and do some if..then..elses.

              Comment


                #8
                Originally posted by babs View Post
                The problem I've come up against is that some records have fewer fields. For example, some don't have a website, or a phone number...
                Hmh, it doesn't give header field name always... This starts to get tricky... If you want, I could program rough script for text replacement, but I have no guarantee how fast I could do it, as I'm little busy (new job and all).

                Comment


                  #9
                  No worries, I'll figure something out, I figured it'd probably end up being VBA or something anyway

                  Comment

                  Working...
                  X