Writing by Peter Hilton

Publish data using the CSVW standard

Use W3C’s ‘CSV on the Web’ (CSVW) standard to make data accessible 2021-12-28 #data

Aaron Burden

  1. CSV delimiters
  2. CSV character encoding
  3. Why CSV survives
  4. RFC-4180 compliant CSV
  5. CSV on the Web (CSVW) ←
  6. Excel’s broken CSV

RFC 4180 standardised comma-separated values (CSV) in 2005, but didn’t address its data model limitations. With the exception of column headings, CSV files contain data but lack the metadata you need to use them reliably. Ten years later, the W3C CSV on the Web Working Group published standards for publishing data, described in CSV on the Web: A Primer.

CSVW extends standard CSV with the ingredients for a reliable, flexible and extensible data exchange format. Most importantly, CSVW accommodates metadata.

Document metadata

Like JSON, CSV has no syntax for comments or other metadata to describe the data in the file. As a result, when you publish CSV, you have to use external documentation to answer the kinds of questions that good code comments answer about code.

This poses less of a problem in JSON, whose flexible structure makes it easier to include metadata without making a mess of the original data. With CSV, document metadata and other documentation belongs in a separate file, much like the introduction sheets that advanced spreadsheet users tend to invent to document their multi-sheet spreadsheet workbooks, like word-processor document cover pages.

Column metadata

Whatever the data in a CSV file represents, CSV only contains text. CSV has no standard way to format numbers and dates as text, so you end up with incompatible application-specific approaches instead of reliable data interchange.

Some applications require the ability to specify column data more precisely, and validate data. Using standards such as ISO 8601 dates and times helps, but you still have to choose a specific date-time precision and format.

Data validation requires data type definitions that specify types, precision, and valid values. Similarly, internationalising text data requires content language metadata. Again, JSON’s flexibility allows more options than CSV, including using JSON Schema to ‘annotate and validate JSON documents’

CSVW

CSV on the web (CSVW) extends CSV with a specification for CSV metadata. CSVW includes:

Like other schema languages, CSVW metadata lets you choose how richly to describe the data, making it flexible for a variety of applications. In particular, while CSVW defines machine-readable metadata, its readability makes it suitable as concise human-readable documentation.

One more thing… extensibility

Ad hoc structured metadata tends to lack extensibility. Without extensibility - support for changes to the data format - software that reads it will break when the data format inevitably changes. CSVW handles this with extensible data types, schemas, transformations and other metadata.

CSVW does a few things well. If you publish CSV on the web, use the CSVW standard.

Share on BlueskyShare on XShare on LinkedIn