What's in data.gov
From Data-gov Wiki
|
Contents |
Overview
This article presents the current status of the datasets indexed and converted at http://data-gov.tw.rpi.edu. The datasets can be roughly partitioned as follows:
- Data.gov datasets: there are 6360 live datasets published at data.gov, including datasets from Raw Data Catalog (3132 ), datasets from Tool Catalog ( 564 ) and some special categorized or misclassified datasets. The geo-datasets from Data.gov are not countered here.
- Data.gov OGD datasets: among the datasets, some ( 279 ) are published as Open Government Directive Agency Datasets. Among them, some ( 138, see Category:Data.gov High Value OGD Dataset ) are reported as high value by agencies.
- Non-Data.gov Datasets: we also collected some ( 34 ) other government data from sources other than data.gov.
Original Data.gov Datasets
Information about Data.gov datasets are from data.gov catalog, a special Data.gov dataset containing the catalog metadata about data.gov datasets, and some conversion results. You may browse its RDF version using Tabulater following this link.
Live and Deleted Datasets
Historically, there were 6367 Category:Data.gov Dataset; however, some (7 , see Category:Data.gov Deleted Dataset) were deleted after being published, and the rest (6360, see Category:Data.gov Live Dataset) are still live. In what follows, we only count live data.gov datasets.
Format of datasets' access points
data.gov mentioned 4431 Data files as the access points of the datasets:
- 78 datasets publishing feeds (RSS,ATOM): SPARQL results
- 2259 datasets publishing csv/txt: SPARQL results
- 244 datasets publishing xml: SPARQL results
- 430 datasets publishing xls (MS Excel): SPARQL results
- 19 datasets publishing kml or kmz : SPARQL results
- 140 datasets publishing ESRI shape format: SPARQL results
Tag cloud of keywords
We can easily generate a tag cloud from all keywords of the data.gov datasets using a SPARQL query and here is the query result.
Sources of datasets
the datasets are contributed by 74 US government agencies. Following is a list of top 10 agencies with highest number dataset contribution.
- Environmental Protection Agency (1,775)
- Department of Defense (222)
- Department of Commerce (219)
- Department of the Interior (170)
- Department of Health and Human Services (160)
- Executive Office of the President (125)
- Department of Energy (101)
- Department of Justice (101)
- Department of Agriculture (92)
- Department of Veterans Affairs (88)
- more...
RDFized Datasets
There are currently 350 Category:RDFized Dataset, including
- 339 from Category:Converted Dataset
- 305 from Category:Data.gov Live Dataset
- 2 from Category:Data.gov Deleted Dataset
- 31 from Category:Other Government Dataset
- 1 from Category:Non-Government Dataset
- 11 from Category:User Generated Dataset
| category | no. of RDF-ized datasets | no. of triples (Billion) | no. of instances (Million) | notes |
|---|---|---|---|---|
| Data.gov Raw Data Catalog | 302 | 3.178 | 385.77 | covering the content of 529 out of 3132,see data |
| Data.gov Tool Catalog | 2 | 0.018 | 1.52 | covering the content of 2 out of 564, see data |
| Data.gov Deleted Data | 2 | 0.001 | 0.15 | covering the content of 2 out of 7, see data |
| Total Data.gov Data | 306 | 3.197 | 387.43 | covering the content of 533 out of 3703 |
| Other Government Data | 31 | 0.072 | 3.48 | see data |
| Non Government Data | 1 | 0 | 0.01 | see data |
| User Generated Data | 10 | 0 | 0.01 | see data |
| Total RDF-ized Data | 348 | 3.269 | 390.93 | see data |
| Dcterms:created | 26 June 2009 + |
| Dcterms:creator | Li Ding +, and Dominic DiFranzo + |
| Dcterms:description | statistical survey of the data.gov datasets |
| Dcterms:modified | 2010-5-19 |
| Foaf:name | What's in data.gov |
| Skos:altLabel | What's in data.gov +, what's in data.gov +, and WHAT'S IN DATA.GOV + |

