Jump to content

Struggling with my first regular expression!


daj

Recommended Posts

Hi

 

I've been trying to figure this out half the afternoon!!  It's my first attempt at using preg_match and regular expressions and I am really struggling.  I could do it using other methods based on a number of if's and strpos, etc but I know this is not the best  way.

 

I have a short string, for example D34V78.  I would like to split this into four elements, with the result an array of D, 34, V and 78

 

The D (first character) is always D.

It is then always followed by a number of unknown length

Then a V, A or E

then another number of unknown length

 

Further examples.... D1V3  or D98E62

 

Can some kind person help me by suggesting the regular expression -- my little brain is in melt down with it.  :-\

Link to comment
Share on other sites

$string = "D34V78";
preg_match('~^(D)([0-9]+)([VAE])([0-9]+)$~',$string,$parts);
print_r($parts);

 

Array
(
    [0] => D34V78
    [1] => D
    [2] => 34
    [3] => V
    [4] => 78
)

 

edit: I added a capture group around the first "D" because I noticed you said you wanted the array to include it...but if you say it will ALWAYS Start with D, then I'm not sure why you really *need* it to be captured, but there you have it.

Link to comment
Share on other sites

Excellent, thanks for the quick reply.

 

I'm slowly trying to understand it.  I had since come up with this, which seems to work

 

^D([0-9]+)([A|E|V])([0-9]+)^

 

But I will read up on your method to help me understand

 

Link to comment
Share on other sites

Okay so the way your pattern works...

 

1) You are using ^ as the pattern delimiter.  While it is technically okay to use (most) non-alphanumeric characters as the pattern delimiter, you should avoid using characters that have special meaning to the regex engine.  ^ is a marker to signify start of string or line (depending on modifiers used), and also used for negative character classes, so you want to avoid that.  I use ~ in as my delimiter because it has no special meaning to regex engine and it also rarely comes up in the subject I need to regex.  But on the note of specifying start (and end) of string...

 

2) Your pattern does not specify start or end of string.  What this means is that your pattern will not just match for example "D34V78", it will also match "anythinghereD34V78orhere".  So in order to tell the engine to evaluate the string as a whole, IOW "the entire value of this string must be this format, not just some substring within it", you have to specify beginning and end of string, with ^ and $, respectively.

 

3) Wrapping your (sub)patterns in parens (...) (capture groups) is what makes the captured bits of your pattern show up as individual array elements.  So your pattern does not wrap that first D in parens, so it will not show up as a captured element.  I did mention in my post how I thought this was indeed not necessary, since you said that the string will always start with D, but technically you did specify that it be listed within the matched array, which is why I edited my pattern to capture it. 

 

4) This bit of your pattern: ([A|E|V]) This *works* but it will also match a pipe, for example "D123|456" will match.  Why?  Okay square brackets [..] is a character class.  It will match for one character listed inside the brackets (or match for a character NOT listed in it if ^ is the first character listed within the brackets). So basically that pattern is saying "match for an 'A' or a '|' or an 'E' or a '|' or a 'V'" so basically you just list a pipe 3 times.  I can see why you thought to separate the characters with a pipe though, since that is the alternation character.  You would normally use the pipe if you want to match for  "abc" or "xyz" where the match is more than a single static character (or character range), since a character class will only match for ONE character in the list.  IOW, abc|def will match "abc" or "def" but [abcdef] will match "a" or "b" or "c" or "d" or "e" or "f".  And [abc|def] just matches the same thing or a pipe.  Sidenote: even though character classes can only match for any ONE character in the list, and they must be literal characters (with the exception of escaped characters to signify certain other characters), you CAN specify ranges, such as 0-9 will match for a 0 or 1 or 2 or ... 9 (you get the picture).  Which you will see demonstrated in the pattern.

 

So, having said all that, here is the breakdown of my pattern:

 

~^(D)([0-9]+)([VAE])([0-9]+)$~

 

~ pattern delimiter

^ start of string assertion (cannot have anything before what follows)

(D) match for and capture a literal "D".  As mentioned, IMO capturing it is not necessary since you say it will always be "D".

([0-9]+) match for and capture 1 or more of any single digit number

([VAE]) matched for and capture a "V" or "A" or "E"

([0-9]+) match for and capture 1 or more of any single digit number

$ end of string assertion (cannot have anything after the pattern)

~ pattern delimiter

 

So it basically reads as "start at the beginning of the string and match and capture a 'D', followed by 1 or more numbers, followed by one of these 3 characters, and ending  with 1 or more numbers"

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.