natasha_thomas Posted December 27, 2010 Share Posted December 27, 2010 Friends, I want to check a site if there is any data avaliable for a given Domain Name.... For example http://www.keywordspy.com/research/search.aspx?tab=domain-organic&market=uk&q=abc.co.uk In this URl the Domain is the "&q=" POST Parameter which is: abc.co.uk In the Output, Observe the Columns Keyword, Position, Volume etc.... So for any Domain we want to check if there is one or more keyword returned. Here is the Example of a Domain for which there is no Data: http://www.keywordspy.com/research/search.aspx?tab=domain-organic&market=uk&q=notfoundforme.co.uk As you Observe there is no data in Keyword, Position or Volume Columns for this Domain (i.e. notfoundforme.co.uk) All i want is to Echo All the Keywords, their Position and Volumes for any domain for which data is avaliable. Code: Here is the Code i have which scrapes the Whole data with Curl, but i am not able to make getkwspydata function work to get the Keyword, Position or Volume data for the domain, i know some regex and preg_match needs to be done but its too technical for me, may someone help me with this? <?php function getkwspydata($host) { $request = "http://www.google.com/search?q=" . urlencode("site:" . $host) . "&hl=en"; $request = "http://www.keywordspy.com/research/search.aspx?tab=domain-organic&market=uk&q=". urldecode($host); $data = getPageData($request); // preg_match('/<div id=resultStats>(About )?([\d,]+) result/si', $data, $p); // $value = ($p[2]) ? $p[2] : "n/a"; // $string = "<a href=\"" . $request . "\">" . $value . "</a>"; //return $string; print_r($data); } function getDomainName($host) { $hostparts = explode('.', $host); // split host name to parts $num = count($hostparts); // get parts number if(preg_match('/^(ac|arpa|biz|co|com|edu|gov|info|int|me|mil|mobi|museum|name|net|org|pp|tv)$/i', $hostparts[$num-2])) { // for ccTLDs like .co.uk etc. $domain = $hostparts[$num-3] . '.' . $hostparts[$num-2] . '.' . $hostparts[$num-1]; } else { $domain = $hostparts[$num-2] . '.' . $hostparts[$num-1]; } return $domain; } function getPageData($url) { if(function_exists('curl_init')) { $ch = curl_init($url); // initialize curl with given url curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); // add useragent curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable if((ini_get('open_basedir') == '') && (ini_get('safe_mode') == 'Off')) { curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any } curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5); // max. seconds to execute curl_setopt($ch, CURLOPT_FAILONERROR, 1); // stop when it encounters an error return @curl_exec($ch); } else { return @file_get_contents($url); } } $sitehost = ($_POST['sitehost']) ? $_POST['sitehost'] : $_SERVER['HTTP_HOST']; $sitedomain = getDomainName($sitehost); ?> <html> <head> <title>SEO Report for <?=$sitedomain;?></title> </head> <body> <form method="post" action="<?$_SERVER['PHP_SELF'];?>"> <p><b>Domain/Host Name:</b> <input type="text" name="sitehost" size='30' maxlength='50' value="<?=$sitehost;?>"> <input type="submit" value="Grab Details"></p> </form> <ul> <li>Google indexed pages: <?=getkwspydata($sitehost);?></li> </ul> </body> </html> Cheers Natasha T. Quote Link to comment Share on other sites More sharing options...
snowman15 Posted December 27, 2010 Share Posted December 27, 2010 Hey there, Ok so I wrote a script to conceptually display how to parse through a curl return. You can mess with it yourself and put the two $url variables above and below each other to see how I handle it. This is the concept but I'm leaving it up to you to figure out how to search through and find what data you want. One pointer I'll give you is to echo the full $page variable, look at the source, and see exactly what tags you need to find to parse your data. With a little bit of studying, you'll be able to pull out whatever data you want. Here is my code that you can mess around with: <?php $url = 'http://www.keywordspy.com/research/search.aspx?tab=domain-organic&market=uk&q=abc.co.uk'; $url = 'http://www.keywordspy.com/research/search.aspx?tab=domain-organic&market=uk&q=notfoundforme.co.uk'; $ch = curl_init(); curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //make sure you put a popular web browser here (signature for your web browser can be retrieved with 'echo $_SERVER['HTTP_USER_AGENT'];' curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.12) Gecko/2009070611 Firefox/3.0.12"); curl_setopt($ch, CURLOPT_URL, $url ); $page= curl_exec($ch); //echo $page; if (strpos($page, '0 - 0')){ echo 'No Results'; } else { $startpos = strpos($page,'<div id="OrgKeywords">'); $endpos = strpos($page,'<table id="ctl00_contentHolder_ctl00_GuestFooter_FreeUsersFooter"',$startpos); $table = substr($page,$startpos,($endpos-$startpos)); echo $table; } ?> Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.