Jump to content

Help me break my script, please


xyph

Recommended Posts

I was wondering if you guys could help me 'break' a function I've been working on. It converts a CSV-formatted string to a 2d array following RFC4180. Here's the function.

 

/**
* 
* Covert a multi-line CSV string into a 2d array. Follows RFC 4180, allows
* "cells with ""escaped delimiters""" and multi-line enclosed cells
* It assumes the CSV file is properly formatted, and doesn't check for errors
* in CSV format.
* @param string $str The CSV string
* @param string $d The delimiter between values
* @param string $e The enclosing character
* @param bool $crlf Set to true if your CSV file should return carriage return
* 						and line feed (CRLF should be returned according to RFC 4180
* @return array 
*/
function csv_explode( $str, $d=',', $e='"', $crlf=TRUE ) {
// Convert CRLF to LF, easier to work with in regex
if( $crlf ) $str = str_replace("\r\n","\n",$str);
// Get rid of trailing linebreaks that RFC4180 allows
$str = trim($str);
// Do the dirty work
if ( preg_match_all(
	'/(?:
		'.$e.'((?:[^'.$e.']|'.$e.$e.')*+)'.$e.'(?:'.$d.'|\n|$)
			# match enclose, then match either non-enclose or double-enclose
			# zero to infinity times (possesive), then match another enclose,
			# followed by a comma, linebreak, or string end
		|	####### OR #######
		([^'.$d.'\n]*+)(?:['.$d.'\n]|$)
			# match anything thats not a comma or linebreak zero to infinity
			# times (possesive), then match either a comma or a linebreak or
			# string end
	)/x', 
	$str, $ms, PREG_SET_ORDER
) === FALSE ) return FALSE;
// Initialize vars, $r will hold our return data, $i will track which line we're on
$r = array(); $i = 0;
// Loop through results
foreach( $ms as $m ) {
	// If the first group of matches is empty, the cell has no quotes
	if( empty($m[1]) )
		// Put the CRLF back in if needed
		$r[$i][] = ($crlf == TRUE) ? str_replace("\n","\r\n",$m[2]) : $m[2];
	else {
		// The cell was quoted, so we want to convert any "" back to " and
		// any LF back to CRLF, if needed
		$r[$i][] = ($crlf == TRUE) ?
			str_replace(
				array("\n",$e.$e),
				array("\r\n",$e),
				$m[1]) :
			str_replace($e.$e, $e, $m[1]);
	}
	// If the raw match doesn't have a delimiter, it must be the last in the
	// row, so we increment our line count.
	if( substr($m[0],-1) != $d )
		$i++;
}
// An empty array will exist due to $ being a zero-length match, so remove it
array_pop( $r );
return $r;

}

 

And to use it:

 

$csv = 'this,will,"be ""separated""",by
"commas,",,"should work with
""multiline,"",
",entries
some,last,data,"test"';

print_r( csv_explode($csvn) );

 

or

 

$csv_eurwin = "this;will;'be ''separated''';by\r\n";
$csv_eurwin .= "'semicolons;';;'should work with\r\n";
$csv_eurwin .= "''multiline;'';';entries\r\n";
$csv_eurwin .= "some;'last';data;'test'";


print_r( csv_explode($csv_eurwin, ';', '\'', TRUE) );

 

Thanks! Here's the actual spec if anyone cares

   1.  Each record is located on a separate line, delimited by a line
       break (CRLF).  For example:

       aaa,bbb,ccc CRLF
       zzz,yyy,xxx CRLF

   2.  The last record in the file may or may not have an ending line
       break.  For example:

       aaa,bbb,ccc CRLF
       zzz,yyy,xxx

   3.  There maybe an optional header line appearing as the first line
       of the file with the same format as normal record lines.  This
       header will contain names corresponding to the fields in the file
       and should contain the same number of fields as the records in
       the rest of the file (the presence or absence of the header line
       should be indicated via the optional "header" parameter of this
       MIME type).  For example:

       field_name,field_name,field_name CRLF
       aaa,bbb,ccc CRLF
       zzz,yyy,xxx CRLF

   4.  Within the header and each record, there may be one or more
       fields, separated by commas.  Each line should contain the same
       number of fields throughout the file.  Spaces are considered part
       of a field and should not be ignored.  The last field in the
       record must not be followed by a comma.  For example:

       aaa,bbb,ccc

   5.  Each field may or may not be enclosed in double quotes (however
       some programs, such as Microsoft Excel, do not use double quotes
       at all).  If fields are not enclosed with double quotes, then
       double quotes may not appear inside the fields.  For example:

       "aaa","bbb","ccc" CRLF
       zzz,yyy,xxx

   6.  Fields containing line breaks (CRLF), double quotes, and commas
       should be enclosed in double-quotes.  For example:

       "aaa","b CRLF
       bb","ccc" CRLF
       zzz,yyy,xxx

   7.  If double-quotes are used to enclose fields, then a double-quote
       appearing inside a field must be escaped by preceding it with
       another double quote.  For example:

       "aaa","b""bb","ccc"

 

Link to comment
Share on other sites

User comment from the manual

 

do not spam aleske at live dot ru 08-Jul-2010 09:38
The PHP's CSV handling stuff is non-standard and contradicts with RFC4180, thus fgetcsv() cannot properly deal with files like this example from Wikipedia: 

1997,Ford,E350,"ac, abs, moon",3000.00 
1999,Chevy,"Venture ""Extended Edition""","",4900.00 
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00 
1996,Jeep,Grand Cherokee,"MUST SELL! 
air, moon roof, loaded",4799.00 

 

His code sample wasn't quite as elegant as mine, and most other examples use preg_match in a loop. Just want to make sure everything is solid in mine.

 

The example I posted showed me an error! Empty quoted fields are throwing a notice. Fixed :D

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.