dilbertone Posted November 26, 2010 Share Posted November 26, 2010 hello dear php-friends i currently work on a little parser project i have to find solutions for the a. fetching part b. parser part here we go - the target urls: see the overview: http://dms-schule.bildung.hessen.de/index.html http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html Search by pressing the button "type" and then choose all schools with the mouse! Results 2400 schools Here i can provide some "more help for getting the target!" - btw: see some details for this target-server: http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=9009 http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=9742 http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=9871 well - you see i have to itterate over the sites - with a function /(a loop) http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=1000 to 10000 BTW - after fetching the page i have to see which one are empty - those ones do not need to be parsed! Well - i want to do this with curl-multi since this is the most advanced way to do this: I see i have an array that can be filled -... but i have to think about the string-concatenation - i guess that i have make some sophisticated string concatenation. this one does not fit - for($i=1;$i<=$match[1];$i++) { $url = "http://www.example.com/page?page={$i}"; and besides this i have an array - i c an fill the array. can you help me how to run in a loop with <?php /************************************\ * Multi interface in PHP with curl * * Requires PHP 5.0, Apache 2.0 and * * Curl * ************************************* * Writen By Cyborg 19671897 * * Bugfixed by Jeremy Ellman * \***********************************/ $urls = array( "http://www.google.com/", "http://www.altavista.com/", "http://www.yahoo.com/" ); $mh = curl_multi_init(); foreach ($urls as $i => $url) { $conn[$i]=curl_init($url); curl_setopt($conn[$i],CURLOPT_RETURNTRANSFER,1);//return data as string curl_setopt($conn[$i],CURLOPT_FOLLOWLOCATION,1);//follow redirects curl_setopt($conn[$i],CURLOPT_MAXREDIRS,2);//maximum redirects curl_setopt($conn[$i],CURLOPT_CONNECTTIMEOUT,10);//timeout curl_multi_add_handle ($mh,$conn[$i]); } do { $n=curl_multi_exec($mh,$active); } while ($active); foreach ($urls as $i => $url) { $res[$i]=curl_multi_getcontent($conn[$i]); curl_multi_remove_handle($mh,$conn[$i]); curl_close($conn[$i]); } curl_multi_close($mh); print_r($res); ?> Quote Link to comment Share on other sites More sharing options...
requinix Posted November 26, 2010 Share Posted November 26, 2010 Question: Do the people running that hessen.de site know you're going to take information from it? Have they specifically told you it's okay? well - you see i have to itterate over the sites - with a function /(a loop) http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=1000 to 10000 BTW - after fetching the page i have to see which one are empty - those ones do not need to be parsed! That is a horrible idea. Get a list of schools from the site - one way or another. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.