Author Topic: Delimited Text  (Read 863 times)

0 Members and 1 Guest are viewing this topic.

Offline mbealsTopic starter

  • Enthusiast
  • Posts: 247
    • View Profile
Delimited Text
« on: November 12, 2007, 12:34:00 PM »
I'm writing a script to strip out data from incoming e-mails that are of a standard format.

If I have a line that reads:

.    Name: Foo Bar<br />

I need to extract just the "Foo Bar" part.

I'm using this regex:  [Name:\s.+\W], which keys off the "Name" and "<".  It works fantastic, except it returns "Name: Foo Bar"..... I don't want the leading "Name: ".

How do you structure it so that it searches for but does not return that opening tag?  I also suspect it is returning the < character, but it's not showing up in any of the output (CLI or web).

Offline effigy

  • Staff Alumni
  • Freak!
  • *
  • Posts: 7,301
  • Gender: Male
  • We must be the change we wish to see in the world.
    • View Profile
Re: Delimited Text
« Reply #1 on: November 12, 2007, 12:39:14 PM »
Are you using [ and ] as your delimiters? Capture the content: /Name:\s+(.+)/
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

Offline mbealsTopic starter

  • Enthusiast
  • Posts: 247
    • View Profile
Re: Delimited Text
« Reply #2 on: November 12, 2007, 01:11:21 PM »
Thanks,  that worked perfectly

Offline mbealsTopic starter

  • Enthusiast
  • Posts: 247
    • View Profile
Re: Delimited Text
« Reply #3 on: November 12, 2007, 01:56:10 PM »
okay....new unforseen issue.

When one of the fields has no text, it looks like:

Alt Phone:<br />

The regex I'm using us dropping down to the next line and grabbing that full string.

So for this text block:

Alt Phone:<br />
Name: Foo Bar<br />

The regex is returning:

Name: Foo Bar

« Last Edit: November 12, 2007, 02:01:38 PM by mbeals »

Offline obsidian

  • Managed Insanity
  • Staff Alumni
  • Freak!
  • *
  • Posts: 6,440
  • Gender: Male
  • Talk to me, I won't bite... hard.
    • View Profile
    • Guahan Web
Re: Delimited Text
« Reply #4 on: November 12, 2007, 01:59:32 PM »
Why not post your full loop code? If you are pulling from an external file, you could easily get around this by simply reading the file line by line. If you are reading this from a variable, be sure that you don't have the multi-line match flag turned on in your regexp match.
You can't win, you can't lose, you can't break even... you can't even get out of the game.

Code: [Select]
<?php
while (count($life->getQuestions()) > 0)
{   
$life->study(); } ?>
  LINKS: PHP: Manual MySQL: Manual PostgreSQL: Manual (X)HTML: Validate It! CSS: A List Apart | IE bug fixes | Zen Garden | Validate It! JavaScript: Reference Cards RegEx: Everything RegEx

Offline mbealsTopic starter

  • Enthusiast
  • Posts: 247
    • View Profile
Re: Delimited Text
« Reply #5 on: November 12, 2007, 02:11:37 PM »
I'm using the mailparse extensions, so it opens up the /var/mail/$user  mail file then runs a regex search

the exact code in question is:




preg_match_all('/Phone:\s*(.*)/', $contents, $altphone);
preg_match_all('/Name:\s+(.+)/', $contents, $names);

I'd prefer not to pull it in line by line and to just let mailparse handle the input side of things.

Offline mbealsTopic starter

  • Enthusiast
  • Posts: 247
    • View Profile
Re: Delimited Text
« Reply #6 on: November 12, 2007, 02:29:23 PM »
I'm sorry, I forgot to use the code tags and consequently a big piece of info was left out.

The source file looks like this:

Code: [Select]
Alt Phone:<br />
Name: Foo Bar<br />

Not like:
Code: [Select]
Alt Phone:

Name: Foo Bar

So I'm attempting to pull out everything between the : and the <

Offline effigy

  • Staff Alumni
  • Freak!
  • *
  • Posts: 7,301
  • Gender: Male
  • We must be the change we wish to see in the world.
    • View Profile
Re: Delimited Text
« Reply #7 on: November 12, 2007, 02:31:34 PM »
How about /Name:\x20+([^<]*)/?
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

Offline mbealsTopic starter

  • Enthusiast
  • Posts: 247
    • View Profile
Re: Delimited Text
« Reply #8 on: November 12, 2007, 02:34:43 PM »
How about /Name:\x20+([^<]*)/?

that ends up pulling in the entire remainder of the email

I think I resolved it.  I'm just using /Phone:(.*)/

It does capture the leading space when there is data, but that's not a big deal.

thanks for the help
« Last Edit: November 12, 2007, 02:45:16 PM by mbeals »

Offline effigy

  • Staff Alumni
  • Freak!
  • *
  • Posts: 7,301
  • Gender: Male
  • We must be the change we wish to see in the world.
    • View Profile
Re: Delimited Text
« Reply #9 on: November 12, 2007, 03:09:49 PM »
[^<\r\n]*
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/