lifeisrisky Posted January 7, 2011 Share Posted January 7, 2011 I have 4 websites http://www.mercurymagazines.com/pr1/101/101395 http://www.mercurymagazines.com/pr1/169/16000 http://www.mercurymagazines.com/pr1/101/101396 http://www.mercurymagazines.com/pr1/169/16001 I am trying to write a program that will open each webpage one at a time, look for a unique word such as 'FREE' and if it is found the write the URL to a file, if not then no output and the next webpage is opened. I know this word exists only on one of the pages above (http://www.mercurymagazines.com/pr1/101/101396). When http://www.mercurymagazines.com/pr1/101/101396 appears last in the list located in source.txt the programs WORKS. When http://www.mercurymagazines.com/pr1/101/101396 appears in any other position such as first, or second, the program does NOT work. My head is sore from banging against my desk. Any assistance would be appreciated. <?php $file_source = fopen("source.txt", "r"); $file_target = fopen("targets.txt", "w"); while (!feof($file_source) ) { $url = fgets($file_source); echo '<br>'.$url; $raw_text = file_get_contents($url); mb_regex_encoding( "utf-8" ); $words = mb_split( ' +', $raw_text ); foreach ($words as $uniques) { echo $uniques; if ($uniques == 'FREE') { fwrite($file_target, $url); } } } ?> Quote Link to comment Share on other sites More sharing options...
darth_tater Posted January 7, 2011 Share Posted January 7, 2011 Can you post working and non working copies of your text files? also, USE THE tags!!!! it is very difficult to read your code w/o them! I don't have a whole lot of experience when it comes to dealing with files, but $raw_text = file_get_contents($url) should work.... add in some debugging statements as each variable is allocated and filled so you know if you're allocating them and filling them correctly. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.