Jump to content

parsing a larger number of locally based files...


dilbertone

Recommended Posts

Hello dear Community,

 

i have a large document - i need to parse it and spit out only this part: schule.php?schulnr=80287&lschb=

 

how to i parse the stuff!?

  <td>
<A HREF="schule.php?schulnr=80287&lschb=" target="_blank">
    <center><img border=0 height=16 width=15 src="sh_info.gif"></center></A>
        </td>

Love to hear from you

Link to comment
Share on other sites

How large is large?  How do you think you would parse the stuff?

 

There are many ways to "parse" a document; for HTML you could use the DOM to get an object-based view of the file, or you could read in the whole file and do a quicky preg_match_all(), or if it's really huge you could read it line-by-line and test each line for matching links.

Link to comment
Share on other sites

Hello dear salathe

 

many thanks for the quick reply.

 

i see that you are a regex-expert. Well i will try to do the job according your advices.  I will try out the /preg_match_all]preg_match_all() way

 

btw - i try also to parse those little examplesites - which are not very complicated - but seem to be some nice examples to learn alot.

 

 

http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=5459&lschb=

 

http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=675.635319953953&SchulAdresseMapDO=193975

 

any idea to do it with a quick way - or so.... love to hear from you

 

db1 :shy:

Link to comment
Share on other sites

hello dear salathe,

 

<?php

$content = file_get_contents("http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=94468&lschb");

var_dump($content);

$pattern = '/<td>(.*?)<\/td>/si';
preg_match_all($pattern,$content,$matches);

foreach ($matches[1] as $match) {
    $match = strip_tags($match);
    $match = trim($match);
    var_dump($match);
}

Link to comment
Share on other sites

Hi Salathe,

 

many many thanks - great to hear from you!  you re  right - there a question was missing

 

hello dear salathe

 

Hi there. :)  You didn't ask anything in the last post, are you happy with the code that you've got or do you still have questions or things that you would like to talk through?

 

i want to apply the above mentioned code on this URL - is this possible!? Guess that the HTML is a bit invaid!?

But besides this - is it possible to apply the code on this new target-URL!?

 

<?php

$content = file_get_contents("http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=94468&lschb");

var_dump($content);

$pattern = '/<td>(.*?)<\/td>/si';
preg_match_all($pattern,$content,$matches);

foreach ($matches[1] as $match) {
    $match = strip_tags($match);
    $match = trim($match);
    var_dump($match);
}
...[...]...

 

i love to hear from you!

 

Regards

db-one!

 

 

 

Link to comment
Share on other sites

Hello Salathe

 

If the "target URL" has the HTML structure and content that $pattern looks for, then yes. Otherwise, no.

 

many many thanks - no - it has not. But it has tables! So i have to re-design  the Regex a bit

 

Can you give me some advices... Note it also has got tables!

 

 

 

Link to comment
Share on other sites

hello dear salathe, good evening!

 

Many many thanks for the  reply. I am very happy to hear from you !

 

Oh, it has got tables! Then I'm sorry you'll have to start all over again!

 

Joking aside, your $pattern only looks for <td>...</td> so since the page uses tables, you should be OK (fingers crossed).

 

the page has tables ...

http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=675.635319953953&SchulAdresseMapDO=193975

 

Okay i will try out the regex and will see what is spit out!

 

Many thanks for your help!

regards db1

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.