dflow Posted December 8, 2010 Share Posted December 8, 2010 how should i approach the following: a page with a products list+link to product page i want to build a crawler that loops through all the products in the list and goes to the product page and and parses the product page. need help with the loop Quote Link to comment Share on other sites More sharing options...
Anti-Moronic Posted December 8, 2010 Share Posted December 8, 2010 What do you mean? You need an array of products and product links which you crawl using whatever method. You then use a foreach() loop to loop through each product, then use the product link to parse the product link page. $products = array('linkpage1.html','linkpage2.html'); foreach($products as $product){ parse($product); } Quote Link to comment Share on other sites More sharing options...
dflow Posted December 8, 2010 Author Share Posted December 8, 2010 ok got you now how can i explode a link structured"product/product_1.htm" from the array created? i got all the links on the page and want only the specific ones for example: foreach($html->find('a') as $e) { echo $arraylinks[] = $e->href . '<br>'; } $linkChunks = explode("product/", $apartmentpage_linkr); Quote Link to comment Share on other sites More sharing options...
dflow Posted December 8, 2010 Author Share Posted December 8, 2010 ok i came up with this: $arr = $arraylinks; foreach ($arr as &$value) { $linkChunks = explode("apartments/", $value); if ($linkChunks[0]=="apartments/") ; echo $linkChunks[1].'<br />'; } Quote Link to comment Share on other sites More sharing options...
dflow Posted December 8, 2010 Author Share Posted December 8, 2010 i get this error Notice: Undefined offset: 1 ??? Quote Link to comment Share on other sites More sharing options...
AbraCadaver Posted December 8, 2010 Share Posted December 8, 2010 As a test: foreach ($arraylinks as $link) { $category = basename(dirname($link)); $page = basename($link); if ($category == "apartments") { echo $page.'<br />'; } } Quote Link to comment Share on other sites More sharing options...
dflow Posted December 8, 2010 Author Share Posted December 8, 2010 As a test: foreach ($arraylinks as $link) { $category = basename(dirname($link)); $page = basename($link); if ($category == "apartments") { echo $page.'<br />'; } } works thanks what was the problem before? i got the results but with that error Quote Link to comment Share on other sites More sharing options...
dflow Posted December 8, 2010 Author Share Posted December 8, 2010 As a test: foreach ($arraylinks as $link) { $category = basename(dirname($link)); $page = basename($link); if ($category == "apartments") { echo $page.'<br />'; } } actually now ill need the results as an array and to loop through each link Quote Link to comment Share on other sites More sharing options...
AbraCadaver Posted December 8, 2010 Share Posted December 8, 2010 Something like this will give all the results in an array: foreach ($arraylinks as $link) { $category = basename(dirname($link)); $page = basename($link); $links[$category][] = $page; } Then you can do something like this: foreach($links['apartments'] as $page) { echo $page; } or: foreach($links as $category => $page) { echo $category . ': ' . $page; } Quote Link to comment Share on other sites More sharing options...
dflow Posted December 9, 2010 Author Share Posted December 9, 2010 ok im getting the links but i have 3 results of each how can i limit it to 1 result per link now im trying to put things together and making a mess i want to loop through each link and get the html contents parsed <?php // example of how to use basic selector to retrieve HTML contents include('../simple_html_dom.php'); // get DOM from URL or file $html = file_get_html('http://www.example.com/ViewAllApartments.aspx'); // find all links foreach($html->find('a') as $e) { $arraylinks[] = $e->href . '<br>'; } foreach ($arraylinks as $link) { $category = basename(dirname($link)); $page = basename($link); if ($category == "apartments") { { $url="http://www.example.com/apartments/"; echo $page.'<br />'; echo $url.$page.'<br />'; } } foreach($links['apartments'] as $page) { $phtml = file_get_html($url.$page); foreach($phtml->find('span[id=apartmentname]') as $apartmentname) echo $apartmentname->plaintext.'<br><br>'; } ?> Quote Link to comment Share on other sites More sharing options...
dflow Posted December 9, 2010 Author Share Posted December 9, 2010 can anyone direct me? Quote Link to comment Share on other sites More sharing options...
dflow Posted December 9, 2010 Author Share Posted December 9, 2010 bump Quote Link to comment Share on other sites More sharing options...
AbraCadaver Posted December 9, 2010 Share Posted December 9, 2010 can anyone direct me? No, because I have no idea what you're doing now. Quote Link to comment Share on other sites More sharing options...
dflow Posted December 10, 2010 Author Share Posted December 10, 2010 can anyone direct me? No, because I have no idea what you're doing now. what im trying to do is 1. parse all the links to the product pages 2.use the parsed links to loop through and parse each product page. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.