Jump to content

Preg match song information


sphinx

Recommended Posts

Hello,

 

I'm using:

 

<?php
$page_contents = file_get_contents("http://zixtyycraft.com/radio/song.php");
$matches = array();
preg_match('/Currently Playing:/', $page_contents, $matches);
echo $matches[0];
?>

 

To try and echo the song onto another website, but all that is displaying is: Currently Playing:

 

I'm attempting to get data from:

http://zixtyycraft.com/radio/song.php

 

Many thanks

Link to comment
Share on other sites

Hi there,

 

Sorry I'm unsure how to apply this because I generally set it up to look for certain attributes, ie: numbers.

 

<?php
$page_contents = file_get_contents("http://zixtyycraft.com/radio/song.php");
$matches = array();
preg_match('/<b></b>/', $page_contents, $matches);
echo $matches[0];
?>

 

Basically i want the contents between the <b> tags

 

Many thanks for your time.

Link to comment
Share on other sites

DOM solution is ideal for scraping html. 

 

But to address your problem with the regex, issue is you haven't told it to match anything except for the literal string "Currently Playing:" or "<b></b>".  You need to use things like wildcards and quantifiers etc.. to create a pattern.  For example, if you want to grab everything within <b>..</b> tag:

 

preg_match('~<b>(.*?)</b>~i',$page_contents, $matches);

 

So ~<b>(.*?)</b>~i is the pattern.  Overall the goal here is to use the <b> and </b> as anchors, basically a way to tell the regex engine where in the string to look for something. Then we have (.*?) which will match for the stuff between those tags.

 

~ This is the pattern delimiter.  All patterns must be wrapped in a delimiter, because preg_match has optional modifiers you can put within the first argument string.  I included a modifier in this pattern so you can see (the "i" at the end). In your code you used / which is fine except if you need to use that character as part of your pattern, you will need to escape it, and closing html tags use /.  So if you are making a regex to scrape html, it makes for cleaner patterns to pick some other delimiter.

 

<b> Match for literal string "<b>".  This is to tell the engine where you want to start matching

 

( Start of group to capture.  Basically when you wrap part of your pattern in parenthesis, you are telling the engine to put what it matches in an additional, separate element in the returned $matches array. 

 

. This is a wildcard. It means to match one of any single character (except newline chars unless you tell it to w/ a modifier)

 

* This is a quantifier.  It says to match 0 or more of any of the previous character or group.  So together .* means to match 0 or more of any characters

 

? This means to make the .* a lazy match.  By default quantifiers are greedy.  This means that they will match everything they can possibly match in the string and then start giving stuff back in order to satisfy the rest of the pattern.  This isn't ideal a lot of times.  Consider the string "<b>foo</b><b>bar</b>".  If you have ~<b>(.*)</b>~i and your intention is to match stuff between the "b" tag, this will actually match everything up to the last instance of </b> : "<b>foo</b><b>bar</b>".  So ? tells the quantifier not to be greedy, to only match one character at a time until it finds the first instance of the rest of the pattern.  So ~<b>(.*?)</b>~i will match "<b>foo</b><b>bar</b>".

 

) End of group to capture.

 

</b> Match for literal string "</b>". This is to tell the engine where to stop matching.

 

~ Ending pattern delimiter.

 

i A pattern modifier.  This tells the regex engine to do a case-insensitive match. 

 

 

So an example:

 

 

$page_contents = file_get_contents("http://zixtyycraft.com/radio/song.php");
preg_match('~<b>(.*?)</b>~i',$page_contents, $matches);
print_r($matches);

 

This will print out the following:

 

Array
(
    [0] => <b>The Chemical Brothers - Life Is Sweet (Daft Punk Remix)</b>
    [1] => The Chemical Brothers - Life Is Sweet (Daft Punk Remix)
)

 

$matches[0] contains the full matched pattern, everything between the ~ pattern delimiters. 

 

$matches[1] contains everything in the first captured group, everything between the parenthesis (the (.*?))

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.