The CSV ("Comma Separated Values") file format is often used to exchange data between differently similar applications. The CSV file format is useable by KSpread, OpenOffice Calc and Microsoft Excel spread-sheet applications. Many other applications support CSV in some fashion, to import or export data. CSV files have become obsolete due to XML data exchange possibilities (ie ODF, SOAP).
The CSV Format
- Each record is one line - Line separator may be LF (0x0A) or CRLF (0x0D0A), a line separator may also be embedded in the data (making a record more than one line but still acceptable).
- Fields are separated with commas. - Duh.
- Leading and trailing whitespace is ignored - Unless the field is delimited with double-quotes in that case the whitespace is preserved.
- Embedded commas - Field must be delimited with double-quotes.
- Embedded double-quotes - Embedded double-quote characters must be doubled, and the field must be delimited with double-quotes.
- Embedded line-breaks - Fields must be surounded by double-quotes.
- Always Delimiting - Fields may always be delimited with double quotes, the delimiters will be parsed and discarded by the reading applications.
CSV Files and Leading Zeros on Numeric Fields
Sometimes leading zero values are required in a data set and while the leading zeros are present in the data they are not displayed. In some software it's possible to force strict interpretation of the CSV field value with a leading = (equal) symbol.
This may chop the leading zero on some softwares, even if quoted.
0306703,0035866,NO_ACTION,06/19/2006 0086003,"0005866",UPDATED,06/19/2006
This incantation may convince that software to keep the leading zero.
="0306703",="0035866",NO_ACTION,06/19/2006 ="0086003",="0005866",UPDATED,06/19/2006
Acceptable CSV Mime Types
Sadly there is no definitive standard for this, here is a collection of types we've seen in use.
- application/octet-stream
- text/comma-separated-values - this is best
- text/csv
CSV Examples
Here are some examples that demonstrate the rules above. Each sample describes the data and how the reading application should interpret it.
Standard Line
This shows three fields, each with simple data.
Edoceo, Seattle, WA
Whitespace
The first field should be interpreted by reading applications as [space]Edoceo[comma][space]Inc.[space]. Whitespace also could include line breaks.
" Edoceo, Inc. ",Seattle,WA
Embedded Commas
The first field should be interpreted by reading applications as Edoceo[comma][space]Inc.
"Edoceo, Inc.",Seattle,WA