dilbertone Posted December 18, 2010 Share Posted December 18, 2010 Good day dear community. I need to build a function which parses the domain from a url. I have used various ways to parse html sources. But this one is is a bit tricky! See the target i want to parse - it has some invaild Markup: http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=644.0013008534253&SchulAdresseMapDO=194190 well what do you think - can i apply this code here <?php require_once('config.php'); // call config.php for db connection $filename = "url.txt"; // Include the txt file which have urls $each_line = file($filename); foreach($each_line as $line_num => $line) { $line = trim($line); $content = file_get_contents($line); //echo ($content)."<br>"; $pattern = '/<td>(.*?)<\/td>/si'; preg_match_all($pattern,$content,$matches); foreach ($matches[1] as $match) { $match = strip_tags($match); $match = trim($match); //var_dump($match); $sql = mysqli_query("insert into tablename(contents) values ('$match')"); //echo $match; } } ?> well i have to rework the parser-part of this script. I need to parse somway different - since i have other site here. Can anybody help me here to get a better regex - or a better way to parse this site ... Any and all help will be greatly apprecaited. regards db1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.