Jump to content

Please help with my extraction


EchoFool

Recommended Posts

Hey,

 

 

I have a script that is meant to grab links of a specific type.

 

 

<?php



//need to some how extract all links from this text 
$link = '[quote]http://www.domain.com/?d=03WO6WPC random text, http://www.domain.com/?d=0334fWPChttp://www.domain.com/?d=03WV4SPC[/quote]';
?>

 

 

The urls are always the same character length, but there is no telling how the user types them out so im looking for a way to extract the 3 urls based on domain.com.

 

 

The end result should be to create an array of 3 urls which belong to domain.com .

 

 

 

 

I tried explode but some one may type it without spaces so explode fails.. any ideas?

Link to comment
Share on other sites

<?php
$link = '[quote]http://www.domain.com/?d=03WO6WPC random text, http://www.domain.com/?d=0334fWPChttp://www.domain.com/?d=03WV4SPC[/quote]';

$new = explode('domain.com', $link);

$new1 = 'http://www.domain.com'.$new[1].'<br>';
$new2 = 'http://www.domain.com'.$new[2].'<br>';
$new3 = 'http://www.domain.com'.$new[3].'<br>';


echo substr($new1, 0, 33),'<br>';
echo substr($new2, 0, 33),'<br>';
echo substr($new3, 0,33);
?>

Link to comment
Share on other sites

Hi thank you for the reply,

 

quick question, if i want to then edit the $new1 2 and 3 and then re-import it back to the original string how would i do that?

 

My intentions is to extract the domain links..check if they are valid then add a <img src="tick.jpg"> in font of the link  and put each link on its own line.

 

Thus resulting in :

 

<?php
$newlinks = '[quote]
<img src="tick.jpg"/> http://www.domain.com/?d=03WO6WPC
random text, 
<img src="tick.jpg"/> http://www.domain.com/?d=0334fWPC
<img src="tick.jpg"/> http://www.domain.com/?d=03WV4SPC
[/quote]';
?>

 

 

What would you suggest ?

 

Link to comment
Share on other sites

I know of no way to check if a url exists with PHP. fopen should work, it should return false if url does not exist, but I had no luck with it. Try it yourself and see if it works for you.

 

As for putting img in front, NP.

 

<?php
$link = '[quote]http://www.domain.com/?d=03WO6WPC random text, http://www.domain.com/?d=0334fWPChttp://www.domain.com/?d=03WV4SPC[/quote]';

$new = explode('domain.com', $link);

$new1 = '<img src="tick.jpg">http://www.domain.com'.$new[1];
$new2 = '<img src="tick.jpg">http://www.domain.com'.$new[2];
$new3 = '<img src="tick.jpg">http://www.domain.com'.$new[3];

$new_again = $new[0].$new1.$new2.$new3;

echo 'To test if this works - see below.<br><br>';
echo $new_again;
?>

Link to comment
Share on other sites

No, 'domain.com' will not be part of the output array since we're exploding with it being the boundary string. Just echo out the array to see.

 

What I worry about is your question

My intentions is to extract the domain links..check if they are valid
I hope by 'valid' you mean 'does the site exist'  If it is a question about existence then you should have no problem with the code.
Link to comment
Share on other sites

I will interject here, try the following...

 

<?PHP

  //## Link String
  $link = '[quote]http://www.domain.com/?d=03WO6WPC random text, http://www.domain.com/?d=0334fWPChttp://www.domain.com/?d=03WV4SPC[/quote]';

  //## Explode string to get urls, also set the $urlArray array
  $linkArray = explode('domain.com/',$link);
  $urlArray  = array();

  //## Allows us to check each link individually
  foreach($linkArray AS $url) {
    //## If the $url variable contains the "?d=" [GET query variable] we process it
    if(strstr($url,'?d=')) {
      $checkURL = 'http://www.domain.com/?d='.substr($url,3,;

      //## Fetch the URL headers
      $urlHeaders = @get_headers($checkURL);

      //## If all is okay the URL exists, if not then it doesn't
      if(in_array('HTTP/1.1 200 OK', $urlHeaders)) {
        $urlArray[] = array('URL'=>$checkURL, 'EXISTS'=>' -> Does Exist');
      } else {
        $urlArray[] = array('URL'=>$checkURL, 'EXISTS'=>' -> Does Not Exist');
      }
    }
  }

  //## Print out the URLs and status, do whatever with these results
  echo '<pre>';
  print_r($urlArray);
  echo '</pre>';

?>

 

Regards, PaulRyan.

Link to comment
Share on other sites

@Paul,

  I think you will run into the same problem using 'get_headers' as I did with 'fopen', some sites that do not exist take you to a different web site (mostly asking if you want to buy that domain name) and you get a positive hit on a non existent site. I know of no way to differentiate between the two.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.