Comma Separated Values (CSV) Standard File Format
The CSV ("Comma Separated Values") file format is often used to exchange data between differently similar applications.
The CSV file format is useable by KSpread, OpenOffice Calc and Microsoft Excel spread-sheet applications.
Many other applications support CSV in some fashion, to import or export data.
CSV files have become obsolete due to XML data exchange possibilities (ie ODF, SOAP), JSON and gRPC
The CSV Format
- Each record is one line
- Line separator may be LF (
0x0A
) or CRLF (0x0D0A
), a line separator may also be embedded in the data (making a record more than one line but still acceptable). - Fields are separated with commas.
- Duh. However, it's not uncommon to see the comma (
, [0x2c]
) replaced with a tab ([0x09]
) , semi-colon (; [0x3b]
) or pipe (| [0x7c]
). - Quote Wrapping
- When the fields contain exotic characters, such as a comma or quote or new line (or anything really) it must be wrapped with
" [0x22]
. - Equal Prefix
- In some cases a quote wrapped field may be prefixed with an equal symbol (
= [0x3d]
) to provide an even stronger hint to the receving software to interpret the field value literally. - Leading and trailing whitespace is ignored
- Unless the field is wrapped with double-quotes (
" [0x22]
) in that case the whitespace is preserved. - Embedded commas
- Field must be wrapped with double-quotes.
- Embedded double-quotes
- Embedded double-quote characters must be doubled, and the field must be delimited with double-quotes.
- Embedded line-breaks
- Fields must be wrapped by double-quotes.
- Always Wrapped
- Fields may always be wrapped with double quotes, they should be parsed and discarded by the reading applications.
CSV Files and Leading Zeros on Numeric Fields
Sometimes leading zero values are required in a data set and while the leading zeros are present in the data they are not displayed. In some software it's possible to force strict interpretation of the CSV field value with a leading = (equal) symbol.
This may chop the leading zero on some softwares, even if quoted.
0306703,0035866,NO_ACTION,06/19/2006 0086003,"0005866",UPDATED,06/19/2006
This incantation may convince that software to keep the leading zero.
="0306703",="0035866",NO_ACTION,06/19/2006 ="0086003",="0005866",UPDATED,06/19/2006
Acceptable CSV Mime Types
Sadly there is no definitive standard for this, here is a collection of types we've seen in use.
- application/octet-stream
- text/comma-separated-values
- text/csv - this is best (and has become the standard since this writing)
CSV Examples
Here are some examples that demonstrate the rules above. Each sample describes the data and how the reading application should interpret it.
Standard Line
This shows three fields, each with simple data.
Edoceo, Seattle, WA
Whitespace
The first field should be interpreted by reading applications as [space]Edoceo[comma][space]Inc.[space]. Whitespace also could include line breaks.
" Edoceo, Inc. ",Seattle,WA
Embedded Commas
The first field should be interpreted by reading applications as Edoceo[comma][space]Inc.
"Edoceo, Inc.",Seattle,WA