open html http:// file and retrieve <title>

TravisT · May 14, 2011

Hello, I am new to the forum and somewhat new to php, nice to meet you all. This is the first time I have ever really scripted with PHP so I'm still learning about all the tools I have and what I have to call things in PHP.

I have a list of urls and as I loop through each one, I'd like to be able to get information from the webpage. The <title> would be a good start. I also want to know the best way for me to compare data I have.

I'll show the basic code below, but I successfully go through each url in this text file. I put it in a <ul><li> list just fine. So if $url == http://www.youtube.com/file how is the normal way to check and see if the word "youtube" is in $url?

I found preg_match() but I think I'm approaching the whole thing wrong because I get no output. I am an intermediate to somewhat advanced scripter in other languages similar to php, I just need to learn how you do the normal things in PHP.

So I'd like to compare a string "youtube" to a variable '$url'. And I would like to be able to grab the title or other info from the file $url. Here is what I have so far. (Recent research showed me how I should do this with an XML file so I will probably change the .txt to .xml) Can you please tell me what to look for as I have been searching and can't really find a comprehensive answer.

I changed the whole page to an echo trying to fix something last night. Before it was written like..

<?php
if ($true) {
$var = value
?>
<html code>The value is <?php $var ?> .</html code>
<?php
}
?>

/index.php

<?php
include 'include/header.html';


echo "<div id='wrapper'>
	<div id='left'>
		<div class='article'>
<br />
<p>";
echo "Today is " . date("l") . ", the " . date("jS") . " of " . date("F") . ".";
$lines = file('data/news.txt');
if ($lines){
foreach ($lines as $line_num => $line) {

$url = htmlspecialchars($line);

//Now I have url. I want to check the url and get the <title> & misc. data.
//if youtube is in $url {html code to embed youtube};
//my attempt was $x = file($url); but I got a lot of 404 and 403 errors.

//now I fill html.
echo "<ul id='menu1' class='auroramenu'>
<li><a href='#'>Story ".$line_num."</a> <a style='display: none;' class='aurorashow' href='#'></a> <a style='display: inline;' class='aurorahide' href='#'></a>
		<ul> <br />
			<p>".$url."</p><br />
  				<li style='text-align:right;'><a href='".$url."' target='_blank'>Read the story.</a>  </li> 
		</ul> 
	</li>
</ul>";
}
		}
		echo "</p><br />
		</div>
	</div>
<div id='right'>";
include 'include/sidebar.html';
echo "</div><br class='clr' /></div><br />";
include 'include/footer.html';
?>

Thank you for your help.

TravisT · May 14, 2011

I am reading through the forum right now. :rtfm:

anupamsaha · May 14, 2011

Go for strstr() or stristr() to get your job done. Read more from here:

http://php.net/manual/en/function.strstr.php

Hope it helps.

Thanks!

TravisT · May 14, 2011

Thanks! I'm trying that out. I found a post showing how to use preg_match better and that helped. StrStr() seems easier.

QuickOldCar · May 14, 2011

Here's a way I came up with. If anyone has better or faster methods tan this I'd love to hear it.

I parse the url to find the host, then match against that, you could easily be finding the word youtube or youtube.com in any part of a url.

Example would be:

http://mysite.com/out.php?url=http://www.youtube.com/movies

Stripping the protocol, exploding the / , using $variable[0], and then preg_match also works.

If you want fast displaying results on a page in whatever order look into multi-curl.

This is the simple method and should find most titles but not all.

<?php

//check if youtube function
function checkYoutube($inserturl) {
$inserturl = strtolower(trim($inserturl));
if(substr($inserturl,0,5) != "http:"){
$inserturl = "http://$inserturl";
}
$parsedUrl = parse_url($inserturl);
$host = trim($parsedUrl['host'] ? $parsedUrl['host'] : array_shift(explode('/', $parsedUrl['path'], 2)));
                
$checkhost = "youtube.com";
    // match
    if(preg_match("/$checkhost/i", $inserturl)){
     return TRUE; 
     } else {
     return FALSE;
     }
}

//read a file
$my_file = "urls.txt";//change file name to yours
if (file_exists($my_file)) {
$data = file($my_file);
$total = count($data);
echo "<br />Total urls: $total<br />";
foreach ($data as $line) {
if($line != "" && checkYoutube($line) == TRUE){
$url = trim($line);
//making sure any url has the http protocol
if(substr($url,0,5) != "http:"){
$url = "http://$url";
}

//using curl is better for more options, setting the timeout matters for speed versus accuracy
$context = stream_context_create(array(
    'http' => array(
        'timeout' => 8
    )
));
//get the content from url
$the_contents = @file_get_contents($url, 0, $context);
//alive or dead condition
if (empty($the_contents)) {
$status = "dead";
$color = "#FF0000";
$title = $url;
} else {
$status = "alive";
$color = "#00FF00";
preg_match("/<title>(.*)<\/title>/Umis", $the_contents, $title); 
$title = $title[1];
//$title = htmlspecialchars($title, ENT_QUOTES); //saving data to database

}

//show results on page
echo "<a style='font-size: 20px; background-color: #000000; color: $color;' href='$url' TARGET='_blank'>$title</a><br />";
}
}
} else {
echo "Can't locate $my_file";
}
?>

QuickOldCar · May 14, 2011

I made a slight error as I wasn't checking just the host area but the entire url.

I made the changes here.

For anyone wanting to use this just make a text file named urls.txt in the same folder of this script.

Place the urls 1 per line.

<?php

//check if youtube function
function checkYoutube($inserturl) {
$inserturl = strtolower(trim($inserturl));
if(substr($inserturl,0,5) != "http:"){
$inserturl = "http://$inserturl";
}
$parsedUrl = parse_url($inserturl);
$host = trim($parsedUrl['host'] ? $parsedUrl['host'] : array_shift(explode('/', $parsedUrl['path'], 2)));
                
$checkhost = "youtube.com";
    // match
    if(preg_match("/$checkhost/i", $host)){
     return TRUE; 
     } else {
     return FALSE;
     }
}

//read a file
$my_file = "urls.txt";//change file name to yours
if (file_exists($my_file)) {
$data = file($my_file);
$total = count($data);
echo "<br />Total urls: $total<br />";
foreach ($data as $line) {
if($line != "" && checkYoutube($line) == TRUE){
$url = trim($line);
//making sure any url has the http protocol
if(substr($url,0,5) != "http:"){
$url = "http://$url";
}

//using curl is better for more options, setting the timeout matters for speed versus accuracy
$context = stream_context_create(array(
    'http' => array(
        'timeout' => 8
    )
));
//get the content from url
$the_contents = @file_get_contents($url, 0, $context);
//alive or dead condition
if (empty($the_contents)) {
$status = "dead";
$color = "#FF0000";
$title = $url;
} else {
$status = "alive";
$color = "#00FF00";
preg_match("/<title>(.*)<\/title>/Umis", $the_contents, $title); 
$title = $title[1];
//$title = htmlspecialchars($title, ENT_QUOTES); //saving data to database

}

//show results on page
echo "<a style='font-size: 20px; background-color: #000000; color: $color;' href='$url' TARGET='_blank'>$title</a><br />";
}
}
} else {
echo "Can't locate $my_file";
}
?>

Sign In

open html http:// file and retrieve <title>

Recommended Posts

TravisT

Link to comment

Share on other sites

TravisT

Link to comment

Share on other sites

anupamsaha

Link to comment

Share on other sites

TravisT

Link to comment

Share on other sites

QuickOldCar

Link to comment

Share on other sites

QuickOldCar

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information