Jump to content

Extract data from site?


Graxeon

Recommended Posts

I'm trying to build a calculator for a game...but I'm stuck on the extracting part. I don't really know what it's called (so if you know of a tutorial/handbook please link), but how I would I extract this data:

 

//data to extract
$name = "Robin hood hat"
$currentprice = "3.3m"
$change = "+11.5k"
//display
echo $name
echo $currentprice
echo $change

 

from: http://itemdb-rs.runescape.com/results.ws?query=robin hood hat

 

And a small question: how would I convert "+11.5k" into "11500" (1k = 1,000)? The * multiplies it but how do I check for and remove the "+" and "k"?

Link to comment
Share on other sites

$source=preg_replace('/\.$/', '', $source);

 

param1=regex to search for, must be enclosed in 2 slashes, one at each end.

param2=string to replace with

param3=string to search and replace on

 

If (preg_match('/\+([\d\.])+k/', $change, $matches))
    {
    //Full match of whole regex is in $matches[0].
    //First parenthesis match in $matches[1].
    $amt=$matches[1]; 
    $fullamt=$amt*1000;
    }
//If k is 1000, then m must mean a million.
If (preg_match('/\+([\d\.])+m/', $change, $matches))
    {
    $amt=$matches[1]; //Full match of whole regex is in $matches[0].
    $fullamt=$amt*1000000;
    }

In your case:

$change=preg_replace('/^\+/', '', $change); //Drop + at beginning of string, + must be escaped with backslash.
$change=preg_replace('/[km]$/', '', $change); //Drop k or m at end of string

 

^ represents beginning of string

$ represents end of string

 

That will get you started.

Link to comment
Share on other sites

Are you trying to grab data from a generated web page, like from the link you gave? I would call that "screen scraping". Basically, you put the whole webpage into a string and parse the string, one line at a time. Or, make it an array, each entry in the array is a line in the html file.

 

HTML is just text, after all. I did this once using MS Access and the Yahoo stock quote page.

 

 

Link to comment
Share on other sites

Well...thats good except that this would have to be done for over 100 different pages. How would I make it more dynamic? I know google docs does it by table like this:

 

=Index(ImportHtml("http://itemdb-rs.runescape.com/results.ws?query=Robin Hood Hat", "table", 2),2,4)

 

(that's for the "Change")

 

Is there something similar to this in PHP?

Link to comment
Share on other sites

I just found a problem with the preg_replace.

 

I need to convert all values that higher than 999 (for example, +11.5k needs to convert to 11500). I would do this by just multiplying the source (or $change in your code) by 1000. But how would I distinguish between a value that has a "k" or "m"  at the end (higher than 999) and one that doesn't (lower than 1000)?

Link to comment
Share on other sites

Never mind about the converting question. I'm still clueless on the extracting question xD

 

Btw...here's how I did the converting (yes, it's lengthy and newbie...but it works for what I need since all values go to the tenths):

 

if (strpos($source, 'k') !== false) {
    if (strpos($source, '.') !== false) {
    $source = preg_replace("/[^0-9]/", "", $source);
echo $source*100;
   } else {
             $source = preg_replace("/[^0-9]/", "", $source);
         echo $source*1000;
              } 
} else {
      if (strpos($source, 'm') !== false) {
          if (strpos($source, '.') !== false) {
          $source = preg_replace("/[^0-9]/", "", $source);
      echo $source*100000;
        } else {
         $source = preg_replace("/[^0-9]/", "", $source);
         echo $source*1000000;
		 }
	} else {
	      $source = preg_replace("/[^0-9]/", "", $source);
	      echo $source;
		   }
	}

 

I think you can just put the preg_replace at the top of the code and it'll do the same thing (so it won't be repetitive). I just kept it cause I might need the original values later on.

Link to comment
Share on other sites

Your options are:

 

1) Get read-only access directly into the site database. Talk to the site owners and state you just want READ ONLY access, which will prevent you (or a hacker who hacks your program) from messing with their db.

 

2) Ask the site owners to provide data you need in a tab-delimited text file. Update the file each day. The site owners can automate this process using unix cron and use SQL to dump the data you need from a query into a text file. (Is once a day enough for your purposes?)

 

3) Write your own screen scraper.

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.