Current Issues in data.gov

From Data-gov Wiki

Jump to: navigation, search
Infobox (Tech Report) edit with form
  • name: Current Issues in data.gov

  • description: Not all datasets in data.gov labeled as CSV/TXT are friendly to machine consumption. Here are our findings.
  • creator(s): Sarah Magidson,Li Ding,Dominic DiFranzo
  • created: 2009/07/15
  • modified: 2010-6-29

Current Issues in data.gov



Contents

Issue Reports

We maintain a list of issue reports to reflect our latest findings in data.gov datasets


(total 13)

Overview

(Note: A tangential subject is the User Experience Issues page, which covers how users are reacting to the current presentation of the data catalog)

While translating data.gov data into RDF, we have discovered some issues with the published datasets. These issues can be roughly categorized as follows:

For more details, please visit http://data-gov.tw.rpi.edu/wiki/Current_Issues_in_data.gov .


Issues with Data.gov Raw Data

Duplicated Datasets

  • The EPA publishes both a nation-wide dataset and state-wide datasets on the Toxics Release Inventory. The national data files of all US States and Territories - Dataset 191 (2005), Dataset 249 (2006), and Dataset 307 (2007)—are supersets of the 171 (3*57) state-and-territory-specific datasets.
  • Dataset 59 is a subset of Dataset 10, containing only columns with general information and data specific to energy consumption. Dataset 10 contains all the data from the Residential Energy Consumption Survey, conducted by the Energy Information Administration.

Formatting Issues

According to data.gov, CSV/TXT files are there to "Use... for easy access to the data. [They] could be opened by most desktop spreadsheet applications". We found many CSV/TXT links on data.gov that do not follow this criterion, however. What follows is the detailed categorization of the data formats we have encountered:

Access Point Issues

Besides the format of data, the URLs given as the CSV access points sometimes introduced additional complications. What follows is the detailed categorization of the data access point issues we have encountered:

Resources

Facts about Current Issues in data.govRDF feed
Dcterms:created15 July 2009  +
Dcterms:creatorSarah Magidson  +, Li Ding  +, and Dominic DiFranzo  +
Dcterms:descriptionNot all datasets in data.gov labeled as CSV/TXT are friendly to machine consumption. Here are our findings.
Dcterms:modified2010-6-29
Foaf:nameCurrent Issues in data.gov
Skos:altLabelCurrent Issues in data.gov  +, current issues in data.gov  +, and CURRENT ISSUES IN DATA.GOV  +
Personal tools
internal pages