Jump to content

Processing large CSV files


philipsbe

Recommended Posts

I need to process a large CSV file (40 MB - 300,000 rows). While I have been working with smaller files my existing code is not able to work with large files.

All I need to do is - read a particular column from file and then count total number of rows and add all the values from column.

My exisitng piece of code imports whole CSV file into an array (a class is used) and then using 'ForEach' loop, reads the required column and values into another array. Once the data is in this array i can simply sum or count it. While this served me well for smaller files, i am not able to use this approach to read a larger php file. I have already increased the memory allocated to php and max_execution_time but the script just keeps on running

 

I am no php expert but usualy get around with trial and error.......your thoughts and help will be greatly appreciated

 

Exisiting code:

Once data has been processed by initial class  (Class used is freely available and known as 'parsecsv',  available at http://code.google.com/p/parsecsv-for-php/)

 

After calling the class and processing the csv file:

 

?php

   

ini_set('max_execution_time', 3000);

$init = array(0);                   //Initialize a dummy array, which hold value '0'

foreach ($csv->data as $key => $col):    //Get value from array, 'Data' is an array processed by class and holds csv data

 

$ColValue = $col[sALARY']; //retrieves the column you want

 

{  

$SAL= $col['SALARY']; //Column that you want to process from csv

array_push ($init, $SAL);    // Push value into dummy array created above

echo "<pre>"; 

}

endforeach;

 

            $total_rows = (Count($init) -1); //Count total number of value, '-1' to remove the first initilaized value in array         

            echo "Total # of rows: ". $total_rows . "\n";

            echo "Total Sum: ". array_sum($init) . "\n";

?>

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Link to comment
Share on other sites

You could try to get better performance by reading only one line into memory at once.

 

Something like:

<?php 

$handle = fopen('file.csv', 'r');
$totalSalary = 0;
$totalRows = 0;

while(($lineData = fgetcsv($handle)) !== FALSE)
{
    if($lineData[0] !== NULL)
    {
        $totalRows++;
        $totalSalary += $lineData[indexToSalaryValue];
    }
}

fclose($handle);

?>

 

That way you don't have a huge array stored in your memory which I think causes your script to be so seriously slow.

 

Another thing that could help is to split your file into many files and read those to save some memory.

Link to comment
Share on other sites

Thank you Johnny,

While your code was handy, but since my team has growing needs, I ended up dumping the data from CSV in a MySql tables using 'LOAD DATA INFILE' - I am amazed that whole file was loaded in less than 10 seconds. Now we will manuplate the data in this table using sql which should be fairly quick.

 

Thanks again for sharing your thoughts.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.