How to build a data-gov demo

From Data-gov Wiki

Jump to: navigation, search
Infobox (How-To) edit with form
  • name: How to build a data-gov demo


Contents

Watch this video first

Select a dataset

The single easiest way to get started with the data-gov wiki is to simply start looking at the Data.gov Catalog for examples that would fit what you are interested in. If this is your first time ever working either with SPARQL or any of the semantic stuff, it's probably smart to pick a relatively straightforward data set that does not require you to synthesize too many different layers of data. For example, Dataset 32, the Earthquake data for the past 7 days, updated every hour, is a perfect way to begin learning. It's interesting as a data set, it is always current, and it is a pretty straightforward graph: just the list of the earthquakes, and their locations; nothing too tricky.

So, we'll assume you're using that set for this quick tutorial. When we look at the data catalog page for the earthquake data, there's a bit of confusion as to where to go next. If we know the data set that we're going to use, then how do we look at the data set, and how do we query it?

Reading the catalog

At first glance, the catalog can seem a bit daunting. Perhaps the best way of approaching the sets, then, is to know some of the attributes listed that are most important. Towards the bottom, Dgtwc:complete data is the attribute where the raw RDF data link is stored. In the case of the earthquake data, the raw data link is http://data-gov.tw.rpi.edu/raw/32/data-32.rdf. As an important side note, if you're not sure of the actual link to the raw data, a good guess is to take the ID number of the data set (the property in the data-gov catalog is called Datagov id), and insert that into the url in place of the "32" in the URL above.

Along with the Dgtwc:complete data attribute, some other interesting ones are the Agency, Title, and Time Period attributes. Jointly, this gives us the sort of quick and dirty understanding of what the data is, what sort of basis its collected on, and what official department it is from. These small clues provide a fair amount of contextual evidence, and is likely all you really need to determine whether or not you should investigate the data set further.

  • This is important: The data catalog is not always completely query-able. Only some data is available at any given time. Sorry. If you really think your query is working and everything else is in order, then this may well likely be the problem.

Start SPARQLing!

By clicking that link mentioned above (the [[Property:Dgtwc:complete_data| Dgtwc:complete data] attribute), we can see the huge set of data, and some of its attributes. for example, some of the more potentially immediately interesting attributes for an earthquake measurement are the ["datetime", "lat", "lon", "magnitude", "region"] metrics. Together these data attributes answer the where, when, and how big questions, which seems to be what people really care about when we talk about earthquakes. So, let's build a SPARQL query that capitalizes on this.

Visit the SPARQL endpoint for the first time, and you'll be greeted with the following query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
SELECT ?g ?number_of_triples
WHERE 
{GRAPH ?g
{
?s a <http://data-gov.tw.rpi.edu/2009/data-gov-twc.rdf#Dataset> .
?s <http://data-gov.tw.rpi.edu/2009/data-gov-twc.rdf#number_of_triples> ?number_of_triples.
}
}
order by ?g
Picture of the SPARQL service when accessed via browser, no parameters passed

This query simply finds all datasets currently in the triple store and lists them in order of the number of triples they have.


Remember the complete data URL? That will become the default graph we specify. The PREFIX, or shortcutted URL that points to the definitions of the data set, is the Owl:sameAs attribute in the data catalog. In this case, it's http://data-gov.tw.rpi.edu/vocab/Dataset_32 for the earthquake data. Let's write a basic SPARQL query to get things started:

PREFIX  dgp32: <http://data-gov.tw.rpi.edu/vocab/p/32/>
SELECT  ?s ?o
WHERE { 
  graph ?g{
    ?s dgp32:magnitude ?o.
  }
} 
LIMIT 1000

This query selects all subjects with a magnitude of anything. The hook in here is that the triple must have predicate that has that prefix and that attribute in order to be returned. Let's get a little more useful, though, with our query. Lets instead grab the name as well, and drop the subject URL, since it doesn't really have a use for a visualization, which is what we're trying to build.

PREFIX  dgp32: <http://data-gov.tw.rpi.edu/vocab/p/32/>
SELECT  ?region ?magnitude
WHERE { 
  graph ?g{
    ?s dgp32:magnitude ?magnitude.
    ?s dgp32:region ?region.
  }
} 
LIMIT 1000

There are some changes being made here. magnitude and region have become the variables in the statement (no longer ?s and ?o) while ?s is still being used as a temporary variable for the subject in the actual query clause. What is returned is a listing of the magnitudes and regions. Let's clean it up a bit and make our final query:

PREFIX  dgp32: <http://data-gov.tw.rpi.edu/vocab/p/32/>
SELECT  ?lat ?lon ?magnitude ?region
WHERE { 
  graph ?g{
    ?s dgp32:magnitude ?magnitude .
    ?s dgp32:region ?region .
    ?s dgp32:lat ?lat .
    ?s dgp32:lon ?lon .
  }
} 
LIMIT 1000

This can actually be simplified into an equivalent statement:

PREFIX  dgp32: <http://data-gov.tw.rpi.edu/vocab/p/32/>
SELECT  ?lat ?lon ?magnitude ?region
WHERE { 
  graph ?g{
  	?s dgp32:magnitude ?magnitude ;
       dgp32:region ?region ;
       dgp32:lat ?lat ;
       dgp32:lon ?lon .

  }
} 
LIMIT 1000


This will return our final list of data. Now, clicking the radio button for JSON output should give us the type of data we'd want for google visualizations.

Come up with a concrete question

After we have done a few sample SPARQL queries, it's time to start thinking about a specific question. I can't help you out much here, because this is on you. Whatever you want to do, though, make sure it's cool. A good demo should be using the data sets in a novel way; remember, the whole point of this technology is to leverage it in interesting forms fairly rapidly. Look at the Demos page for some inspiration, possibly.

Using your data set in Google Visualization

Now, we need to decide what type of visualization we would want to use. For starters, the most reasonable thing seems like just plotting these points out on a map. Let's just go for that.

The link for the Google Visualization API for this particular example is http://code.google.com/apis/visualization/documentation/gallery/map.html. You may want to dig around a bit, and get familiarized with how the API works.

After digging through that code, you'll notice a few things. First off, we're creating a JSON file with all our data (latitude, longitude, magnitude, and region), but the google map expects a DataTable (basically a two dimensional array) with only three columns. This means that in the (where x is some random row in the data set) DataTable[x][0] slot, longitude should be place, DataTable[x][1] should be latitude, and DataTable[x][2] should be stuffed with any other information that you want in the data set. Since this is the case, we will have to take our data set and convert it into this set up. This is a fairly straightforward process: iterate through the data returned from the request to the SPARQL service, then place the data in its correct spaces in a new DataTable row. The HTML and Javascript for accomplishing this in our demo is as follows. We'll step through each section.

<html>
  <head>
   <meta http-equiv="content-type" content="text/html; charset=utf-8" />

    <title>Earthquakes</title>

   <style type="text/css">
      img { border: 0; }
   </style>
       <script 
src="http://maps.google.com/maps?file=api&v=2&key=ABQIAAAAD0qlpMDIvnvezwed-ZL9tBQwH0hfZ_hMkwZkmTeIFGi2I_yWqxSTnxCDWeowfqGzTP5B2PQSHZI0zg"
 type="text/javascript"></script>
    <script type="text/javascript" src="http://www.google.com/jsapi"></script>
    <script type="text/javascript">
      google.load('visualization', '1', {'packages': ['map']});
      google.setOnLoadCallback(drawVisualization);

      var map;
      var data;

  	  function drawVisualization() {

  		  var query = new google.visualization.Query(
'http://data-gov.tw.rpi.edu/sparql?query-option=uri&output=gvds&query-uri=' + 
encodeURIComponent("http://www.devingaffney.com/sparql_queries/working.sparql"));
  		  query.setQuery('');
        query.send(drawTable);
      };

      function drawTable(response) {
    	  var predata = response.getDataTable();
    	  var vals = new Array();
    	  var rownum = predata.getNumberOfRows();

  	    data = new google.visualization.DataTable();
        data.addColumn('number', 'lat');
        data.addColumn('number', 'lon');
        data.addColumn('string', 'region');

        for (var r=0; r < rownum; r++) {
    			var longitude = predata.getValue(r, 0);
    			var latitude = predata.getValue(r, 1);      			
    			var magnitude = predata.getValue(r, 2);
    			var region = predata.getValue(r, 3);
          vals[0] = longitude;
          vals[1] = latitude;
          vals[2] = region+"<br /> Magnitude: "+magnitude;
    			data.addRow(vals);
  	    }
  	    
        map = new google.visualization.Map(document.getElementById('map_div'));
        map.draw(data, {showTip: true});
      }
      
    </script>
  </head>

  <body>
    <font face="arial">
      <center><h1>Earthquake Map</h1>
        <div id="map_div" style="width: 800px; height: 600px">
          <br/>
          <br/>
          <img src="http://data-gov.tw.rpi.edu/images/loading.gif" />
        </div>
        <br/>
        <br/>
      </center>
    </font>

  </body>

</html>


Header information

You only need to include two different Javascript files; the first is API key authentication. The second is the Javascript to actually accomplish building the map. To get your own API key, go here.

<script src="http://maps.google.com/maps?file=api&v=2&key=YOUR_KEY_HERE" type="text/javascript"></script>
<script type="text/javascript" src="http://www.google.com/jsapi"></script>

Next, let's look at the actual javascript:

google.load('visualization', '1', {'packages': ['map']});//this loads the google visualization and specifies we want a map
google.setOnLoadCallback(drawVisualization);//when its done loading, show the map
var map;//initialize our map
var data;//initialize the cleaned data set from the raw data scrub

function drawVisualization() {
  var query = new google.visualization.Query(
'http://data-gov.tw.rpi.edu/sparql?query-option=uri&output=gvds&query-uri=' + 
encodeURIComponent("ONLINE_RESOURCE_TO_YOUR_SPARQL_QUERY")); //this is the query sent off to the sparql service
//you need to put up a copy of your sparql query 
//somewhere online with the extension.sparql. 
//For example, http://www.example.org/my.sparql 
//would be the file containing the query from above.
  query.setQuery('');
  query.send(drawTable); 
};

function drawTable(response) {
  var predata = response.getDataTable();
  var vals = new Array();
  var rownum = predata.getNumberOfRows();

  data = new google.visualization.DataTable();
  data.addColumn('number', 'lat'); //add our three columns
  data.addColumn('number', 'lon');
  data.addColumn('string', 'region');

  for (var r=0; r < rownum; r++) {
    //for r rows, inject longitude, latitude, and magnitude+region into anew row in the data set
		var longitude = predata.getValue(r, 0);
		var latitude = predata.getValue(r, 1);      			
		var magnitude = predata.getValue(r, 2);
		var region = predata.getValue(r, 3);
		
    vals[0] = longitude;
    vals[1] = latitude;
    vals[2] = region+"<br /> Magnitude: "+magnitude;
		
		data.addRow(vals);
  }
  //initialize and draw the map.
  map = new google.visualization.Map(document.getElementById('map_div'));
  map.draw(data, {showTip: true});
}

In the body of the page, all you have to do is make sure that you have a div with the id of "map_div", and this should work out just fine. When finished, it should look like the screenshot below:

Your demo should look something like this!
Facts about How to build a data-gov demoRDF feed
Dcterms:created25 February 2010  +
Dcterms:creatorDevin Gaffney  +
Dcterms:descriptionA nice tutorial about building a semantic web application using the google visualization library.
Dcterms:modified2010-8-22
Dcterms:relationHow to publish and fix data-gov demos  +
Foaf:nameHow to build a data-gov demo
Skos:altLabelHow to build a data-gov demo  +, how to build a data-gov demo  +, and HOW TO BUILD A DATA-GOV DEMO  +
Personal tools
internal pages