Jump to content

Checking if Data avalible for a Domain with Curl


natasha_thomas

Recommended Posts

Friends,

 

I want to check a site if there is any data avaliable for a given Domain Name....

 

For example

 

 

In this URl the Domain is the "&q=" POST Parameter which is: abc.co.uk

 

In the Output, Observe the Columns Keyword, Position, Volume etc.... So for any Domain we want to check if there is one or more keyword returned.

 

Here is the Example of a Domain for which there is no Data:

 

 

As you Observe there is no data in Keyword, Position or Volume Columns for this Domain (i.e. notfoundforme.co.uk)

 

All i want is to Echo All the Keywords, their Position and Volumes for any domain for which data is avaliable.

 

Code:

 

Here is the Code i have which scrapes the Whole data with Curl, but i am not able to make getkwspydata function work to get the  Keyword, Position or Volume data for the domain, i know some regex and preg_match needs to be done but its too technical for me, may someone help me with this?

 

 

<?php


function getkwspydata($host) {
$request = "http://www.google.com/search?q=" . urlencode("site:" . $host) . "&hl=en";
    
    $request = "http://www.keywordspy.com/research/search.aspx?tab=domain-organic&market=uk&q=". urldecode($host);
    
$data = getPageData($request);
//	preg_match('/<div id=resultStats>(About )?([\d,]+) result/si', $data, $p);
//	$value = ($p[2]) ? $p[2] : "n/a";
//	$string = "<a href=\"" . $request . "\">" . $value . "</a>";
//return $string;
    
    print_r($data);
}



function getDomainName($host) {
$hostparts = explode('.', $host); // split host name to parts
$num = count($hostparts); // get parts number
if(preg_match('/^(ac|arpa|biz|co|com|edu|gov|info|int|me|mil|mobi|museum|name|net|org|pp|tv)$/i', $hostparts[$num-2])) { // for ccTLDs like .co.uk etc.
	$domain = $hostparts[$num-3] . '.' . $hostparts[$num-2] . '.' . $hostparts[$num-1];
}
else {
	$domain = $hostparts[$num-2] . '.' . $hostparts[$num-1];
}
return $domain;
}






function getPageData($url) {
if(function_exists('curl_init')) {
	$ch = curl_init($url); // initialize curl with given url
	curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); // add useragent
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable
	if((ini_get('open_basedir') == '') && (ini_get('safe_mode') == 'Off')) {
		curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any
	}
	curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5); // max. seconds to execute
	curl_setopt($ch, CURLOPT_FAILONERROR, 1); // stop when it encounters an error
	return @curl_exec($ch);
}
else {
	return @file_get_contents($url);
}
}


$sitehost = ($_POST['sitehost']) ? $_POST['sitehost'] : $_SERVER['HTTP_HOST'];
$sitedomain = getDomainName($sitehost);

?>
<html>
<head>
<title>SEO Report for <?=$sitedomain;?></title>
</head>
<body>
<form method="post" action="<?$_SERVER['PHP_SELF'];?>">
<p><b>Domain/Host Name:</b> <input type="text" name="sitehost" size='30' maxlength='50' value="<?=$sitehost;?>"> <input type="submit" value="Grab Details"></p>
</form>
<ul>
<li>Google indexed pages: <?=getkwspydata($sitehost);?></li>

</ul>
</body>
</html>

 

Cheers

Natasha T.

Link to comment
Share on other sites

Hey there,

 

Ok so I wrote a script to conceptually display how to parse through a curl return. You can mess with it yourself and put the two $url variables above and below each other to see how I handle it. This is the concept but I'm leaving it up to you to figure out how to search through and find what data you want.

 

One pointer I'll give you is to echo the full $page variable, look at the source, and see exactly what tags you need to find to parse your data. With a little bit of studying, you'll be able to pull out whatever data you want.

 

Here is my code that you can mess around with:

 

<?php

$url = 'http://www.keywordspy.com/research/search.aspx?tab=domain-organic&market=uk&q=abc.co.uk';
$url = 'http://www.keywordspy.com/research/search.aspx?tab=domain-organic&market=uk&q=notfoundforme.co.uk';


$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
//make sure you put a popular web browser here (signature for your web browser can be retrieved with 'echo $_SERVER['HTTP_USER_AGENT'];'
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.12) Gecko/2009070611 Firefox/3.0.12");
curl_setopt($ch, CURLOPT_URL, $url );
$page= curl_exec($ch);
//echo $page;
if (strpos($page, '0 - 0')){
echo 'No Results';

} else {

$startpos = strpos($page,'<div id="OrgKeywords">');
$endpos = strpos($page,'<table id="ctl00_contentHolder_ctl00_GuestFooter_FreeUsersFooter"',$startpos);
$table = substr($page,$startpos,($endpos-$startpos));
echo $table;

}
?>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.