The CSV ("Comma Separated Values") file format is often used to exchange data between differently similar applications. The CSV file format is useable by KSpread, OpenOffice Calc and Microsoft Excel spread-sheet applications. Many other applications support CSV in some fashion, to import or export data. CSV files have become obsolete due to XML data exchange possibilities (ie ODF, SOAP).
The CSV Format
- Each record is one line - Line separator may be LF (0x0A) or CRLF (0x0D0A), a line separator may also be embedded in the data (making a record more than one line but still acceptable).
- Fields are separated with commas. - Duh.
- Leading and trailing whitespace is ignored - Unless the field is delimited with double-quotes in that case the whitespace is preserved.
- Embedded commas - Field must be delimited with double-quotes.
- Embedded double-quotes - Embedded double-quote characters must be doubled, and the field must be delimited with double-quotes.
- Embedded line-breaks - Fields must be surounded by double-quotes.
- Always Delimiting - Fields may always be delimited with double quotes, the delimiters will be parsed and discarded by the reading applications.
CSV Files and Leading Zeros on Numeric Fields
Sometimes leading zero values are required in a data set and while the leading zeros are present in the data they are not displayed. In some software it's possible to force strict interpretation of the CSV field value with a leading = (equal) symbol.
This may chop the leading zero on some softwares, even if quoted.
0306703,0035866,NO_ACTION,06/19/2006 0086003,"0005866",UPDATED,06/19/2006
This incantation may convince that software to keep the leading zero.
="0306703",="0035866",NO_ACTION,06/19/2006 ="0086003",="0005866",UPDATED,06/19/2006
Acceptable CSV Mime Types
Sadly there is no definitive standard for this, here is a collection of types we've seen in use.
- application/octet-stream
- text/comma-separated-values - this is best
- text/csv
CSV Examples
Here are some examples that demonstrate the rules above. Each sample describes the data and how the reading application should interpret it.
Standard Line
This shows three fields, each with simple data.
Edoceo, Seattle, WA
Whitespace
The first field should be interpreted by reading applications as [space]Edoceo[comma][space]Inc.[space]. Whitespace also could include line breaks.
" Edoceo, Inc. ",Seattle,WA
Embedded Commas
The first field should be interpreted by reading applications as Edoceo[comma][space]Inc.
"Edoceo, Inc.",Seattle,WA
Notes & Comments
As an IT consulting firm, you might want to review pages like this before associating them with your name. It wasn't even spell-checked, and the use of phrases like "between differently similar applications" might reduce confidence in your technical abilities. Just offering some helpful input./A. Friend on 28 Jan 2010
Thanks for the tip, typos corrected. We like "differently similar" - it defines two apps for similar purpose, w/different data-sets - like exporting contacts from Outlook to import to some Mail-Bombing (eg Constant Contact) tools./edoceo on 31 Jan 2010
sepErated should be sepArated
text/csv is a standard mime type -- see http://tools.ietf.org/html/rfc4180/Jerry on 01 Feb 2010
Thanks; that RFC references us - neat!/edoceo on 01 Feb 2010
I'm finding that ="value,value2" displays the equal sign and the quotes and puts the value2" into the next cell./Bill on 18 Feb 2010
You say (1) leading trailing whitespace is ignored, and (2) always delimiting, but a file like:
"this", "is", "a", "test"
will be read in by Excel containing all the quotes, because there are spaces after the commas. If you remove all extraneous whitespace, then the quotes will be removed.
You two comments taken together seem to suggest otherwise./Arif Zaman on 14 May 2010


Visitor Maps