Jump to content

Losing my Array somewhere..


monkeytooth

Recommended Posts

Alright I know scraping is frowned upon.. but its for a client...  Anyways.. The below is supposed to form an array $kw for output. But its breaking on line 27:

  foreach ($kw as $keyword => $pages)  

 

Ive concluded my array is being broken and then formed into an empty string, but I can't figure out where it wen't wrong.

 

I want to say the trouble comes from lines 20-23:

  foreach($data as $temp) {
    $kx = text_between('"','"',$temp);
    if (is_array($kx)) $kw[key($kx)] = current($kx);
  }

 

 

The full version..

 

<?php
function text_between($start,$end,$string) {
  $keyword = '';
  if ($start != '') {$temp = explode($start,$string,2);} else {$temp = array('',$string);}
  $temp = @explode($end,$temp[1],2);
  $temp2 = @explode($end,$temp[1],3);
  $pages = (int)@str_replace(',','',$temp2[1]); 
  if ($pages) $keyword[$temp[0]] = $pages;
  return $keyword;
}

function gsscrape($keyword) {
  $keyword=str_replace(" ","+",$keyword);
  $keyword=str_replace("%20","+",$keyword);
  global $kw;
  $data=file_get_contents('http://clients1.google.com/complete/search?hl=en&gl=us&q='.$keyword);
  
  $data=explode('[',$data,3);
  $data=explode('],[',$data[2]);
  foreach($data as $temp) {
    $kx = text_between('"','"',$temp);
    if (is_array($kx)) $kw[key($kx)] = current($kx);
  }
}
#simple to use, just use yourscriptname.php?keywords
echo $_SERVER['QUERY_STRING'];
if ($_SERVER['QUERY_STRING']!='') {
  gsscrape($_SERVER['QUERY_STRING']);
  foreach ($kw as $keyword => $pages) {
      gsscrape($keyword);
  }
}
#all results are in array $kw...
echo "<pre>";
print_r($kw);
echo "</pre>";
?>

Link to comment
Share on other sites

Instead of trying to fix that broken text_between function, I think you'll be better off with a nice, stable, regular expression.

<?php
function gsscrape($keyword) {
  $keyword=str_replace(" ","+",$keyword);
  $keyword=str_replace("%20","+",$keyword);
  global $kw;
  $data=file_get_contents('http://clients1.google.com/complete/search?hl=en&gl=us&q='.$keyword);
  
  if(preg_match_all('/\["([a-z0-9 "]+)",".*","(\d+)"\]/iU', $data, $matches)) {
    for($i=0; $i<count($matches[1]); $i++) {
      $kw[$matches[1][$i]] = $matches[2][$i];
    }
  }
}
#simple to use, just use yourscriptname.php?keywords
//echo $_SERVER['QUERY_STRING'];
if ($_SERVER['QUERY_STRING']!='') {
  gsscrape($_SERVER['QUERY_STRING']);
  foreach ($kw as $keyword => $pages) {
      gsscrape($keyword);
  }
}
#all results are in array $kw...
echo "<pre>";
print_r($kw);
echo "</pre>";
?>

This is the array I get with the keyword 'keyword':

Array
(
    [keyword tool] => 0
    [keyword bidding] => 0
    [keyword spy] => 0
    [keyword anchor text links] => 0
    [keyword generator] => 0
    [keyword density] => 0
    [keyword discovery] => 0
    [keyword elite] => 0
    [keyword stuffing] => 0
    [keyword density tool] => 0
    [keyword tool dominator] => 1
    [keyword toolbox] => 2
    [keyword tool search volume] => 3
    [keyword tool api] => 4
    [keyword tool ipad] => 5
    [keyword tool by city] => 6
    [keyword tool yahoo] => 7
    [keyword tool seo] => 8
    [keyword toolkit] => 9
    [keyword bidding tool] => 1
    [keyword bidding google adwords] => 2
    [keyword bidding strategy] => 3
    [keywordspy promo code] => 2
    [keyword spy vs] => 3
    [keyword spy review] => 4
    [keyword spy download] => 5
    [keyword spy vs spyfu] => 6
    [keyword spy alternative] => 7
    [keyword spy coupon] => 8
    [keyword spy free trial] => 9
    [keyword generator google] => 1
    [keyword generator tool] => 2
    [keyword generator free] => 3
    [keyword generator online] => 4
    [keyword generator free online] => 5
    [keyword generator from text] => 6
    [keyword generator excel] => 7
    [keyword generator seo] => 8
    [keyword generator for seo] => 9
    [keyword density checker] => 2
    [keyword density analyzer] => 3
    [keyword density seo] => 4
    [keyword density analysis] => 5
    [keyword density google] => 6
    [keyword density calculator] => 7
    [keyword density count] => 9
    [keyword discovery tool] => 1
    [keyword discovery review] => 2
    [keyword discovery api] => 3
    [keyword discovery vs wordtracker] => 4
    [keyword discovery trellian] => 5
    [keyword discovery from trellian] => 6
    [keyword elite review] => 1
    [keyword elite vs market samurai] => 2
    [keyword elite training] => 4
    [keyword elite university] => 5
    [keyword elite trial] => 6
    [keyword elite blackhat] => 7
    [keyword elite login] => 8
    [keyword stuffing seo] => 1
    [keyword stuffing google] => 2
    [keyword stuffing tool] => 3
    [keyword stuffing in url] => 4
    [keyword stuffing penalty] => 5
    [keyword stuffing not allowed] => 6
    [keyword stuffing examples] => 7
    [keyword density tool free] => 1
    [keyword density tool paste] => 2
    [keyword density tool google] => 3
    [keyword density tool text] => 4
    [keyword density tool for documents] => 5
    [keyword density tool for word] => 6
    [keyword density tool download] => 7
    [keyword density tool firefox] => 8
)

Not sure if you want it some other way, it was kind of hard to decipher what you were trying to do.

Link to comment
Share on other sites

Yea, your right I should have been more descript. However I think you may have nailed it..

 

The basics of the idea is scrape the auto suggest feature on gogle. Get its results returned in the array as you have placed, as well as a count for each one in the same array which from what your showing got me set in the right way. I didn't think of preg_match I suck with regular expressions.. I need to find a good resource one of these days to study from thats less technical then technical to read from. Now I just need to figure out how to add one more piece of functionality to the puzzle.. Where it will take the keyword(s) provided and append {space}, {A, B, C, .... Z} to the query and build out that array further with the results it gets based off of it.

Example I type "Taco Burger" as my inital query

 

and it goes down the line gets all the results for that, then appends even further like

"Taco Burger A"

"Taco Burger B"

"Taco Burger C"

....

"Taco Burger Z"

and applies the same type of results to the output array.. fun is with that idea huh.. Thanks for your help by the way. Gonna go give that a test run, then try to figure out the last part I just mentioned.

Link to comment
Share on other sites

That sounds pretty easy... just loop through the keywords again

if ($_SERVER['QUERY_STRING']!='') {
  gsscrape($_SERVER['QUERY_STRING']);
  foreach ($kw as $keyword => $pages) {
      gsscrape($keyword);
  }
  $appends = range('a', 'z');
  foreach($kw as $keyword => $pages) {
    foreach($appends as $append) {
      gsscrape($keyword." ".$append);
    }
  }
}

You might want to think about using curl instead of file_get_contents. If you're going to be doing a lot of requests, you'll want something more optimized for http requests so it takes less time.

Link to comment
Share on other sites

Ok taking into concept what we've done here.. Ive run into a random problem that I can only recreate on luck

 

lets say.. I plug

 

http://mtpdev.com/dev/scrape.php?angelina%20jolie%20movie%202019 and attempt to run it.

I get:

Warning: Invalid argument supplied for foreach() in /home/monkey/public_html/sites/mine/mtpdev.com/dev/scrape.php on line 18

Warning: Invalid argument supplied for foreach() in /home/monkey/public_html/sites/mine/mtpdev.com/dev/scrape.php on line 22

 

yet if if plug

http://mtpdev.com/dev/scrape.php?angelina%20jolie%20movie%202011 and attempt to run it.

i get:

Array
(
    [angelina jolie movie 2011] => 0
)

 

I can't seem to figure it out, this was in part my last issue per say. It should be forming an arrary for each "foreach"

 

line 18 and 22 are:

foreach ($kw as $keyword => $pages) //18
foreach($kw as $keyword => $pages)//22

 

This is the whole current version:

<?php
function gsscrape($keyword) {
  $keyword=str_replace(" ","+",$keyword);
  $keyword=str_replace("%20","+",$keyword);
  global $kw;
  $data=file_get_contents('http://clients1.google.com/complete/search?hl=en&gl=us&q='.$keyword);
  
  if(preg_match_all('/\["([a-z0-9 "]+)",".*","(\d+)"\]/iU', $data, $matches)) {
    for($i=0; $i<count($matches[1]); $i++) {
      $kw[$matches[1][$i]] = $matches[2][$i];
    }
  }
}
#simple to use, just use yourscriptname.php?keywords
//echo $_SERVER['QUERY_STRING'];
if ($_SERVER['QUERY_STRING']!='') {
  gsscrape($_SERVER['QUERY_STRING']);
  foreach ($kw as $keyword => $pages) {
      gsscrape($keyword);
  }
  $appends = range('a', 'z');
  foreach($kw as $keyword => $pages) {
    foreach($appends as $append) {
      gsscrape($keyword." ".$append);
    }
  }
}
#all results are in array $kw...
echo "<pre>";
print_r($kw);
echo "</pre>";
?>

 

What is the pluasable clause of the tiggered errors, I think it might be the fact the im yielding no results with the first link so foreach has nothing to do with anything thus errors with an empty array. But is that really the case or am I missing something here?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.