Jump to content

Parsing HTML from WikiPedia


DIM3NSION

Recommended Posts

Hi guys. I have been using the wikipedia API to retrieve information about a topic. Ive managed to get a response and retrieve the first section of the topic (in this case football)

 

Using this method - http://en.wikipedia.org/w/api.php?action=parse&page='.$search.'&redirects=1&format=json&prop=text&section=0');

 

However the first section that is retrieved includes the pictures and i just want to main text from the introduction.

 

The code that is sent back from wiki is this -

Array
(
    [parse] => Array
        (
            [text] => Array
                (
                    [*] => <div class="dablink">This article is about sports known as football.  For the ball used in these sports, see <a href="/wiki/Football_(ball)">Football (ball)</a>.</div> 
<div class="thumb tright"> 
<div class="thumbinner" style="width:227px;"><a href="/wiki/File:Football4.png" class="image"><img alt="" src="http://upload.wikimedia.org/wikipedia/commons/thumb/d/d2/Football4.png/225px-Football4.png" width="225" height="274" class="thumbimage" /></a> 
<div class="thumbcaption"> 
<div class="magnify"><a href="/wiki/File:Football4.png" class="internal" title="Enlarge"><img src="http://bits.wikimedia.org/skins-1.17/common/images/magnify-clip.png" width="15" height="11" alt="" /></a></div> 
Some of the many different games known as football. From top left to bottom right: <a href="/wiki/Association_football">Association football</a> or soccer, <a href="/wiki/Australian_rules_football">Australian rules football</a>, <a href="/wiki/International_rules_football">International rules football</a>, <a href="/wiki/Rugby_Union" class="mw-redirect" title="Rugby Union">Rugby Union</a>, <a href="/wiki/Rugby_League" class="mw-redirect" title="Rugby League">Rugby League</a>, and <a href="/wiki/American_Football" class="mw-redirect" title="American Football">American Football</a>.</div> 
</div> 
</div> 
<p>The game of <b>football</b> is any of several similar <a href="/wiki/Team_sport" title="Team sport">team sports</a>, of similar origins which involve advancing a ball into a goal area in an attempt to score. Many of these involve <a href="/wiki/Kick_(football)" title="Kick (football)">kicking</a> a ball with the foot to score a <a href="/wiki/Goal_(sport)" title="Goal (sport)">goal</a>, though not all codes of football using kicking as a primary means of advancing the ball or scoring. The most popular of these sports worldwide is <a href="/wiki/Association_football">association football</a>, more commonly known as just "football" or "soccer". Unqualified, the word <i><a href="/wiki/Football_(word)" title="Football (word)">football</a></i> applies to whichever form of football is the most popular in the regional context in which the word appears, including <a href="/wiki/American_football">American football</a>, <a href="/wiki/Australian_rules_football">Australian rules football</a>, <a href="/wiki/Canadian_football">Canadian football</a>, <a href="/wiki/Gaelic_football">Gaelic football</a>, <a href="/wiki/Rugby_league">rugby league</a>, <a href="/wiki/Rugby_union">rugby union</a> and other related games. These variations are known as "codes".</p> 

 

 

I want the code that resides in the <p> tags. How would i go about parsing this and removing the rest. ive tried to get to work simple html dom parser but with no luck.

 

Any help would be greatly appreciated  8)

 

Thanks,

 

DIM3NSION

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.