What's in data.gov

From Data-gov Wiki

Jump to: navigation, search
Infobox (Data-gov Insight) edit with form
  • name: What's in data.gov

  • description: statistical survey of the data.gov datasets
  • creator(s): Li Ding,Dominic DiFranzo
  • created: June 26,2009
  • modified: 2010-5-19



This article presents the current status of the datasets indexed and converted at http://data-gov.tw.rpi.edu. The datasets can be roughly partitioned as follows:

Original Data.gov Datasets

Information about Data.gov datasets are from data.gov catalog, a special Data.gov dataset containing the catalog metadata about data.gov datasets, and some conversion results. You may browse its RDF version using Tabulater following this link.

Live and Deleted Datasets

Historically, there were 6569 Category:Data.gov Dataset; however, some (7 , see Category:Data.gov Deleted Dataset) were deleted after being published, and the rest (6562, see Category:Data.gov Live Dataset) are still live. In what follows, we only count live data.gov datasets.

Format of datasets' access points

data.gov mentioned 4541 Data files as the access points of the datasets:

Tag cloud of keywords

We can easily generate a tag cloud from all keywords of the data.gov datasets using a SPARQL query and here is the query result.

Snapshot Tag Cloud as of April 7 ,2010

Sources of datasets

the datasets are contributed by 83 US government agencies. Following is a list of top 10 agencies with highest number dataset contribution.

  1. Environmental Protection Agency (1,808)
  2. Department of Defense (263)
  3. Department of Commerce (244)
  4. Department of the Interior (171)
  5. Department of Health and Human Services (163)
  6. Department of Energy (117)
  7. Executive Office of the President (116)
  8. Department of Agriculture (99)
  9. Department of Justice (98)
  10. Department of the Treasury (93)
  11. more...
Agency Dataset Contribution (snapshot as of June 2009 )

RDFized Datasets

There are currently 375 Category:RDFized Dataset, including

category no. of RDF-ized datasets no. of triples (Billion) no. of instances (Million) notes
Data.gov Raw Data Catalog 323 5.786 416.91 covering the content of 574 out of 3237,see data
Data.gov Tool Catalog 1 0.018 1.52 covering the content of 1 out of 608, see data
Data.gov Deleted Data 2 0.001 0.15 covering the content of 2 out of 7, see data
Total Data.gov Data 326 5.805 418.57 covering the content of 577 out of 3852
Other Government Data 33 0.084 4.5 see data
Non Government Data 1 0 0.01 see data
User Generated Data 14 0 0.02 see data
Total RDF-ized Data 374 5.889 423.11 see data

Facts about What's in data.govRDF feed
Dcterms:created26 June 2009  +
Dcterms:creatorLi Ding  +, and Dominic DiFranzo  +
Dcterms:descriptionstatistical survey of the data.gov datasets
Foaf:nameWhat's in data.gov
Skos:altLabelWhat's in data.gov  +, what's in data.gov  +, and WHAT'S IN DATA.GOV  +
Personal tools
internal pages