Jump to content

WEB SPIDER/CRAWLER HELP PLEASE!


Dysfunktional

Recommended Posts

Hi all ,Its my first post here and I'm still very new to PHP . Im trying to wright a Web crawler script except i want this script to just crawl the 1 target website I enter. Basically i want my script to go to ultimateguitar.com or 911tabs.com or any other guitar tabs website and crawl the site and index any guitar tabs they have in there database. This will provid my website with a "phonebook" of guitar tabs. Its not illeagle or in breach of any copyrights im only making a database of links. Any help would be greatly appreciated!

Link to comment
Share on other sites

How to make a web crawler/scraper is a lot of information to tell someone how to do it.

 

Basic concept:

Designate a url either by input,a list or from a database.

Connect to it by using curl, file_get-contents or other.

Obtain desired information, could be header info, meta info, some content that matches within that page.

Preg_match using regular expressions is a common way to find the related content. Dom or something like simple html dom could also find specific areas in the content.

Once links are found you could then insert them into a database, those same links later on could be used to visit that page and acquire more links. You could make yourself a system that knows pages already visited or just don't insert duplicate urls into the database.

Have your scraper keep running and visiting these pages in loops.

 

I have seen some example scraper scripts on the net, they could give you an idea of how to do it, but not one of them is a complete solution, you must do lots of work to them for your needs.

 

Consider using this already made search spider or something similar if do not want to invest the time to make your own.

http://www.sphider.eu/

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.