Jump to content

scrape issue! New ones arent adding


Help!php

Recommended Posts

I usually run scrape on a website and get information and the new ones are usually added automatically but its not adding a new information. I am wondering whethere I have an issue with my code.

 

I usually run scrape through the sitemap.

 

$qry = "CREATE TABLE sitemap ( id varchar(30), price decimal(6,2), url varchar( 1024 ) )";

// Create the table
mysql_query ( $qry, $con );

$numSitemapPages = 350;	
$html = new simple_html_dom();

if($_ECHO) echo "START: Fetching site map...<br />";

for( $i = 0; $i < $numSitemapPages; $i++ )
{
	if($_ECHO) echo "Page $i<br />";

	$fileContents = file_get_contents( "http://www.website.co.uk/SiteMap-S" . $i . ".aspx" );
	$html->load( $fileContents );

	$hrefs = $html->find( "a[style=color: Blue; text-decoration: underline;]" );

	if ( isset( $hrefs[ 0 ] ) )
	{
		foreach( $hrefs as $href )
		{						
			$url = "http://www.website.co.uk/" . $href->href;
			$qry = "INSERT INTO sitemap(url) VALUES( '$url' )";

			mysql_query( $qry, $con );

			if($_ECHO) echo "MYSQL: Added $href->href to DB<br />";
		}
	}
	else
		if($_ECHO) echo "NO URLS FOUND ON THIS PAGE!<br />";
}

echo "END: Fetching site map...<br />";

exit(0);

Link to comment
Share on other sites

If your code used to work and you made no changes.....it's most likely the website is blocking you, or looking at how you look for content they could have easily just changed their style.

 

Try a simple file_get_contents and see if can connect to that website or webpage, maybe they are blocking you now.

 

You could try incorporating some error checking for empty values in your code.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.