August 1, 2016

Characteristics of Open Data CSV Files

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

This work analyzes an Open Data corpus containing 200K tabular resources with a total file size of 413 GB from a data consumer perspective. Our study shows that ~10% of the resources in Open Data portals are labelled as a tabular data of which only 50% can be considered CSV files. The study inspects the general shape of these tabular data, reports on column and row distribution, analyses the availability of (multiple) header rows and if a file contains multiple tables. In addition, we inspect and analyze the table column types, detect missing values and report about the distribution of the values.

Preguntar a la IA

Me gusta

Guardar