Jump to content

scrape list that has sub headings


a.mlw.walker

Recommended Posts

Hi Guys quite new to php, but getting along roses. I am a big football fan and on my localhost I want to replicate the football fixtures for the week. Its really an learning exercise, but I need projects to learn - struggle to learn from books.

I want to copy this list into a table (eventually):

http://news.bbc.co.uk/sport1/hi/foot...es/default.stm

 

I have managed to understand how to grab the data, however I think my problem is in the for loops that display the data.

It displays the dates, and the tournament titles, but then it lists all the games - and doesnt get some quite right anyway (look at Stevenage). Then if you scroll down it then just displays the Tournaments.

I think its pretty close, but please can someone point out my mistakes.

Thanks guys:

<?Php
$file_string = file_get_contents('http://news.bbc.co.uk/sport1/hi/football/fixtures/default.stm');
preg_match_all('/<div class="mvb"><b>(.*)<\/b><\/div>/i', $file_string, $links);
preg_match_all('/<div class="pvtb"><b>(.*)<\/b><\/div>/i', $file_string, $games);
preg_match_all('/class="stats">(.*)<\/a>/i', $file_string, $teams);
echo '<ol>';
$l = 0;

for($i = 0; $i < count($links[1]); $i++) {
echo '<div>'  . $links[1][$i] . '</div><BR>';

for($j = 0; $j < count($games[1]); $j++) {
echo '<BR><B><U><div>'  . $games[1][$j] . '</div></U></B><BR>';

for($k = $l; $k < count($teams[1]); $k++) {
echo  strip_tags($teams[1][$k]) . '</a><BR>';
$l=$k;
}
}
}
echo '</ol>';
?>

I think the problem is that it doesnt know what order the stuff is supposed to be in. But Im not sure how to write the code to tell it the order. Should each type of preg_match_all be an array or something?

Link to comment
Share on other sites

The problem is you are breaking the data into three separate unrelated arrays of data with nothing to tie them together.

 

You would be much better processing the data sequentially and storing the found data into arrays in the order they are found.  You might be able to take advantage of the comments they provide which have unique id's... <!--Fixture ID: 3470645-->

 

Link to comment
Share on other sites

Actually, this displays it bascially correctly:

preg_match_all('/<div class="mvb"><b>(.*)<\/b><\/div>/i', $file_string, $links);
preg_match_all('/<div class="pvtb"><b>(.*)<\/b><\/div>/i', $file_string, $games);
preg_match_all('/class="stats">(.*)<\/a>/is', $file_string, $teams);
echo "<B><div>".$links[1][0]."</div></B>";
echo "<B><div>".$games[1][0]."</div></B>";
echo "<div>".$teams[1][0]."</div>";

however it also displays the other stuff right at the bottom. Im not sure if i understand correctly how to store it all into an array in the correct order. Can someone please help with that.

Link to comment
Share on other sites

Can no one comment on this? I understand I must be sound like an amateur, and thats partly because I am, but I think this is a little different from a normal scrape, because I am trying to also log what games are on what days - which I am finding particularly difficult.

Please can someone have a look?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.