Jump to content

setup of curl-multi: looping over a bunch of sites [how to adress the array]


dilbertone

Recommended Posts

hello  dear php-friends

 

i currently work on a little parser project

 

 

i have to find solutions for the

 

a. fetching part

b. parser part

 

 

here we go - the target urls:

 

see the overview:  http://dms-schule.bildung.hessen.de/index.html

http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html

Search by pressing the button "type" and then choose all schools with the mouse!

Results 2400 schools

Here i can provide some "more help for getting the target!" -

 

 

btw: see some details for this target-server:

http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=9009

http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=9742

http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=9871

 

 

 

well - you see i have to itterate over the sites - with a function /(a loop)

http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=1000 to 10000

 

BTW - after fetching the page i have to see which one are empty - those ones do not need to be parsed!

 

 

Well - i want to do this with curl-multi since this is the most advanced way to do this:

 

I see i have an array that can be filled -... but i have to think about the string-concatenation - i guess that i have make some sophisticated string concatenation.

 

this one does not fit -

 

for($i=1;$i<=$match[1];$i++)
{
$url = "http://www.example.com/page?page={$i}";

 

and besides this i have an array - i c an fill the array.

 

can you help me how to run in a loop with

 

 




<?php
/************************************\
* Multi interface in PHP with curl  *
* Requires PHP 5.0, Apache 2.0 and  *
* Curl 				    *
*************************************
* Writen By Cyborg 19671897         *
* Bugfixed by Jeremy Ellman         *
\***********************************/

$urls = array(
   "http://www.google.com/",
   "http://www.altavista.com/",
   "http://www.yahoo.com/"
   );

$mh = curl_multi_init();

foreach ($urls as $i => $url) {
       $conn[$i]=curl_init($url);
       curl_setopt($conn[$i],CURLOPT_RETURNTRANSFER,1);//return data as string 
       curl_setopt($conn[$i],CURLOPT_FOLLOWLOCATION,1);//follow redirects
       curl_setopt($conn[$i],CURLOPT_MAXREDIRS,2);//maximum redirects
       curl_setopt($conn[$i],CURLOPT_CONNECTTIMEOUT,10);//timeout
       curl_multi_add_handle ($mh,$conn[$i]);
}

do { $n=curl_multi_exec($mh,$active); } while ($active);

foreach ($urls as $i => $url) {
       $res[$i]=curl_multi_getcontent($conn[$i]);
       curl_multi_remove_handle($mh,$conn[$i]);
       curl_close($conn[$i]);
}
curl_multi_close($mh);


print_r($res);

?>

Link to comment
Share on other sites

Question: Do the people running that hessen.de site know you're going to take information from it? Have they specifically told you it's okay?

 

 

well - you see i have to itterate over the sites - with a function /(a loop)

http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=1000 to 10000

 

BTW - after fetching the page i have to see which one are empty - those ones do not need to be parsed!

That is a horrible idea. Get a list of schools from the site - one way or another.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.