Jump to content

Crawl internet through SOCKS proxy


implitech

Recommended Posts

So, I have this web crawler that I want to use to index .onion websites through the TOR network.

I amOk, here's what I need. I have a PHP based web crawler. It is accessible here: http://rz7ocnxxu7ka6ncv.onion/ Now, my problem is that my spider that actually crawls pages needs to do so on a SOCKS port 9050. The thing is, I have to tunnel its connection through TOR so that It can resolve .onion domains, which is what I'm indexing. (Only ending in .onion.) I call this script from the command line using php crawl.php, and I add the appropriate parameters to crawl the page. Here is what I think: Is there any way to force it to use TOR? OR can i force my ENTIRE MACHINE to tunnel things through tor, and how? (Like forcing all traffic through 127.0.0.1:9050) perhaps if i set up global proxy settings, php would respect them?

 

If any of my solutions work, how would I do it? (Step by step instructions please, I am a noob.)

 

I just want to crate my own TOR search engine. (Don't recommend my p2p search engines- it's not what I want for this- I know they exist, I did my homework.) Here is the crawler source if you are interested to take a look at: Perhaps someone with a kind heart can modify it to use 127.0.0.1:9050 for all crawling requests? spider.php: http://pastebin.com/kscGJCc5

spiderfuncs.php: http://pastebin.com/m5y54RUh

 

PLEASE someone help me! I am desperate. :(

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.