Jump to content

Screen Scrape original content from RSS


kailash001

Recommended Posts

Hello guys, i'm trying to screen scrape the original content from every RSS feed. The RSS feeds works fine however when i try to screen scrape every content using the library simple html dom. At first it works fine but when it tries to extract the second  feed's original content then i get this error:

 

Fatal error: Cannot redeclare file_get_html() (previously declared in C:\wamp\www\mashup\protected\views\articles\simple_html_dom.php:37) in C:\wamp\www\mashup\protected\views\articles\simple_html_dom.php on line 41

part of my code is as follows:

 

foreach($RSS_DOC->channel->item as $RSSitem)
{

	$item_id 	= md5($RSSitem->title);
	$item_title = $RSSitem->title;
	$item_date  = date("Y-m-j G:i:s", strtotime($RSSitem->pubDate));
	$item_url	= $RSSitem->link;

	echo "Processing item '" , $item_id , "<br/>";
	echo $item_title, " - ";
	echo $item_date, "<br/>";
	echo $item_url, "<br/>";

	//screen scrape original article
	include('simple_html_dom.php');
	$html = file_get_dom($item_url);  
	foreach($html->find('td[class=rel_headline_cmt]') as $element)
	{
		echo $element;
	}
}

Any help with this?

Link to comment
Share on other sites

Move the line      include('simple_html_dom.php'); outside the foreach loop. You don't need to include the file at every iteration.

 

Thanx for the help. But now i'm getting another problem. i'm able to extract the 1st article properly but when it extracts the 2nd one it extract it twice then the 3rd one once and then i get this error:

 

Fatal error: Maximum execution time of 60 seconds exceeded in C:\wamp\www\mashup\protected\views\articles\simple_html_dom.php on line 70

can you tell me how can i make the script run faster? or any other solution?

Link to comment
Share on other sites

It's hard to say where your problem is. But it's definately some loop problem. One thing that caatched my eye is this:

foreach($RSS_DOC->channel->item as $RSSitem)

{

 

Do you really need to loop trough one item? ($RSS_DOC->channel->item) Maybe loop trough the channel only?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.