Jump to content

How to extract data using a pattern by Simple HTML DOM?


torontobb

Recommended Posts

:confused:

Hi Everyone,

 

I have just started using Simple HTML DOM today and I have spent 4 hours not getting what I want.

 

I want to be able to extract the following information:

 

<div class="listing_content">
	<span class="serialNumb" style="line-height: 21px;">77777</span>
<br />
444 ASDF, Alpha, Tango, Beta
<br />
77777 Director:99999
              <div>
<img title='web' src='http://cpgimg.com/images/icon_sm_web.gif' alt='web'/>  <a href='javascript:void(0)' onClick="window.open('/redir.jsp?p_url=http:%2f%2fwww.cnn.com&p_cid=2707304&p_hid=279E00&p_ct=3527&p_pr=KO&p_fr=U');" class='listing_link'>website</a>  <img title='email' src='http://cpgimg.com/images/icon_sm_mail.gif' alt='email'/>  <a class='listing_link' href="javascript:void(0)" onclick="popupEmail('/email.jsp?lang=0&p_cid=2707304');(new Image()).src='/redir.jsp?p_url=&p_cid=2707304&p_hid=279E00&p_ct=3527&p_pr=ON&p_fr=E&msec='+(new Date()).getMilliseconds()">E-mail</a>  
               </div>
</div>

               

The content I need to pull separately from above include:

1- serialNumb = 77777

2- 444 ASDF, Alpha, Tango, Beta

3- 77777 Director:99999

4- www.cnn.com

 

I want all the data to recorded to different variables so I can upload them to MySQL.

 

Any help with this is much appreciated. I don't have to use Simple DOM HTML but per my search it seems to be the best tool (however, I am not so lucky with it.)

 

***Not to forget that this page is full of <div>, <br />, <img>, and other tags. The quoted part is just one excerpt but this part is unique and used once in the page "style="line-height: 21px;". Also the "('/redir.jsp?p_url" is also unique for the URL portion.

 

Thanks again.

 

Link to comment
Share on other sites

I have had a look at it, but I think you took the little minor part of my post that is not an issue to me and pointed me to it.

 

I need to do PARSING of html file. That is it in nutshell.

 

I have already overcome a lot of issues. But I have issue with space available in the html file.

 

Anyone who has experience with HTML PARSING please let me know how you would parse out the address out of this excerpt of an html (***Notice- All the spaces exist in the html source file like quoted here):

 

<span class="basic_serial">(777) 777-7777</span>

												<br />









										1111 ABCD, EFGH, IJKL

										<br />

 

 

Thanks,

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.