Issue Reports

We maintain a list of issue reports to reflect our latest findings in datasets

(Note: A tangential subject is the User Experience Issues page, which covers how users are reacting to the current presentation of the data catalog)

While translating data into RDF, we have discovered some issues with the published datasets. These issues can be roughly categorized as follows:

Issues with Raw Data

Duplicated Datasets

  • The EPA publishes both a nation-wide dataset and state-wide datasets on the Toxics Release Inventory. The national data files of all US States and Territories - Dataset 191 (2005), Dataset 249 (2006), and Dataset 307 (2007)—are supersets of the 171 (3*57) state-and-territory-specific datasets.
  • Dataset 59 is a subset of Dataset 10, containing only columns with general information and data specific to energy consumption. Dataset 10 contains all the data from the Residential Energy Consumption Survey, conducted by the Energy Information Administration.

Formatting Issues

According to, CSV/TXT files are there to "Use... for easy access to the data. [They] could be opened by most desktop spreadsheet applications". We found many CSV/TXT links on that do not follow this criterion, however. What follows is the detailed categorization of the data formats we have encountered:

Access Point Issues

Besides the format of data, the URLs given as the CSV access points sometimes introduced additional complications. What follows is the detailed categorization of the data access point issues we have encountered:


