Jump to content

scraping websites


phpsycho

Recommended Posts

Okay so I am scraping websites for their descriptions keywords and titles.

I noticed that a lot of websites use the same keywords and descriptions on every page..

so my idea is to scrape the index and find all the links in there and scrape them all then after they been scraped check all of the descriptions and if the descriptions match then pull some text unique to each page and use that.

I can't seem to wrap my head around it.. how would I accomplish this?

I scrape with curl then find keywords description and title then find all links on the site and scrape those.

 

soo I was thinking making an array of the descriptions and then checking and inserting to the db but doesn't seem like it would work.

Any ideas?

 

Oh also.. how would I grab just text from each page that is different from every other page?

lol very confusing

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.