Jump to content

Parse HML table to PHP - AND strip out the microsoft stuff


techmonkey78

Recommended Posts

Hi,

 

I'm designing a website for an online radio station. One show a week they have a DJ who has his own top 40 show with HIS top 40 listed on his own site

 

At the moment, the only solution I have to getting the top 40 onto my clients site is an iframe (not good I know!)

However, what's worse is that this top 40 list has horrendous colours that completely break the theme of my clients site and worse still, is Excel generated HTML! The author of this list is not a client of mine, and so, I can't do anything to persuade him to change. I've attached a copy of the offending file so you can see what I'm working with, changed the extension from html to txt for the forum post.

Just for clarification, the excel generated HTML file is not hosted on my server (although I could set up a cron job to get it if required)

 

I've done some  digging and found a useful bit of code

 

However, it displays the table 4 or 5 times

In addition to this, is there a way to manipulate the width of the columns with this code?

 

<?php 
$oldSetting = libxml_use_internal_errors( true ); 
libxml_clear_errors(); 

$html = new DOMDocument(); 
$html->loadHtmlFile('Chart%20Table2.htm'); 

$xpath = new DOMXPath( $html ); 
$elements = $xpath->query( "//table" ); 

foreach ( $elements as $item ) {
  $newDom = new DOMDocument;
  $newDom->appendChild($newDom->importNode($item,true));

  $xpath = new DOMXPath( $newDom ); 

  foreach ($item->attributes as $attribute) { 

    for ($node = $item->firstChild; $node !== NULL; 
         $node = $node->nextSibling) {
      if (($attribute->nodeName =='valign') && ($attribute->nodeValue=='top'))
      {
        print($node->nodeValue); 
      }
      else
      {
        print("<br>".$node->nodeValue);
      }
    }
    print("<br>");
  } 
}

libxml_clear_errors(); 
libxml_use_internal_errors( $oldSetting );



?>

 

Basically, I just want the table,

I'm not bothered about the first column that contains some pictures as I think that's a separate table

 

Ideally, I'd like to apply my own formatting to the table too (padding etc) if that is possible (eg from line 897 <td class=xl28 x:num>1</td>) down to line 1433

 

The trouble is, there is a lot of custom widths and styles inbetween which I don't want

I'd like all the tr and td's without all the associated guff from the original export from Excel

 

Geez I wish there was an easy way to get this guy just to export as CSV but unfortunately that's not an option!

 

Many thanks in advance!

 

[attachment deleted by admin]

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.