  • name: What's in data.gov

  • description: statistical survey of the data.gov datasets
  • creator(s): Li Ding,Dominic DiFranzo
  • created: June 26,2009
  • modified: 2010-5-19



This article presents the current status of the datasets indexed and converted at http://data-gov.tw.rpi.edu. The datasets can be roughly partitioned as follows:

Original Data.gov Datasets

Information about Data.gov datasets are from data.gov catalog, a special Data.gov dataset containing the catalog metadata about data.gov datasets, and some conversion results. You may browse its RDF version using Tabulater following this link.

Live and Deleted Datasets

Historically, there were 6367 Category:Data.gov Dataset; however, some (7 , see Category:Data.gov Deleted Dataset) were deleted after being published, and the rest (6360, see Category:Data.gov Live Dataset) are still live. In what follows, we only count live data.gov datasets.

Format of datasets' access points

data.gov mentioned 4431 Data files as the access points of the datasets:

Tag cloud of keywords

We can easily generate a tag cloud from all keywords of the data.gov datasets using a SPARQL query and here is the query result.

Snapshot Tag Cloud as of April 7 ,2010

Sources of datasets

the datasets are contributed by 74 US government agencies. Following is a list of top 10 agencies with highest number dataset contribution.

  1. Environmental Protection Agency (1,775)
  2. Department of Defense (222)
  3. Department of Commerce (219)
  4. Department of the Interior (170)
  5. Department of Health and Human Services (160)
  6. Executive Office of the President (125)
  7. Department of Energy (101)
  8. Department of Justice (101)
  9. Department of Agriculture (92)
  10. Department of Veterans Affairs (88)
  11. more...
Agency Dataset Contribution (snapshot as of June 2009 )

RDFized Datasets

There are currently 350 Category:RDFized Dataset, including

category no. of RDF-ized datasets no. of triples (Billion) no. of instances (Million) notes
Data.gov Raw Data Catalog 302 3.178 385.77 covering the content of 529 out of 3132,see data
Data.gov Tool Catalog 2 0.018 1.52 covering the content of 2 out of 564, see data
Data.gov Deleted Data 2 0.001 0.15 covering the content of 2 out of 7, see data
Total Data.gov Data 306 3.197 387.43 covering the content of 533 out of 3703
Other Government Data 31 0.072 3.48 see data
Non Government Data 1 0 0.01 see data
User Generated Data 10 0 0.01 see data
Total RDF-ized Data 348 3.269 390.93 see data

Facts about What's in data.govRDF feed
Dcterms:created26 June 2009  +
Dcterms:creatorLi Ding  +, and Dominic DiFranzo  +
Dcterms:descriptionstatistical survey of the data.gov datasets
Foaf:nameWhat's in data.gov
Skos:altLabelWhat's in data.gov  +, what's in data.gov  +, and WHAT'S IN DATA.GOV  +
