State Variable Comparison JS API

From Data-gov Wiki

Jump to: navigation, search
Infobox (How-To) edit with form
  • name: State Variable Comparison JS API

  • description: How to use the state variables comparison Javascript API
  • creator(s): Sarah Magidson
  • created: Aug. 5th, 2010
  • modified: 2010-8-20


Contents

Source code

How to use

Early in the head of your document you should insert the script by writing

 <script type="text/javascript" src="http://data-gov.tw.rpi.edu/ws/state-vars-compare-google.js">
 </script>

There are two possible overarching functions in the script that will draw the entire visualization:

     vars_compare (rawjson, container_name);
  (equivalent to calling 
     vars_compare (rawjson, container_name, true);
     vars_compare_nodate (rawjson, container_name);
  (equivalent to calling 
     vars_compare (rawjson, container_name, false);

Parameters:

  • rawjson - This is a JSON object. It should be formatted like a Google DataTable (so if it starts with '{"cols":[{"id":' , you're probably in good shape).
    • If calling vars_compare, the layout of the columns of the table should be:
 statename date numbers numbers numbers ...

With as many numerical columns as you want (within reason, of course).

    • If calling vars_compare_nodate, the layout is the same as above, but minus the date column. In this case, the script will simply insert a date column with 2009 at every row. The date column is needed to make the motion chart work.
  • container_name - The ID of the div where the visualization will go.

Regarding the date column

It is worth mentioning that the API currently is not meant to handle data as it changes across time. It expects one row per state. This means that having multiple data points for a single state, one per, say, year, could break the API. Also, the correlation values would then be across time, not for one particular point in time.

Conclusion: If you are going to include a date column in your data, make sure it has the same value in each row.

A note on using this API with a SPARQL query

Thus far the API does not handle SPARQL queries, only Google data tables. Note that if you get the Google table from a SPARQL query (more on how to do this), you will need to load the Google visualization package with an empty package list

   google.load("visualization", "1", {"packages":[]});

before sending your query.

In your handleQueryResponse(response) function, you would then call:

   var datatable = response.getDataTable();  // Get Google data table from query

   var jsonstring = datatable.toJSON();      // JSON of Google data table in string format
   var jsontable = JSON.parse(jsonstring);   // Parse string into JSON object
   vars_compare(jsontable, "viz_div");       // Run script. In this case the container for the visualization has ID "viz_div"

Results

There are three main components to the visualization:

  • A scatter plot of the two selected variables. This is actually a very simple Google motion chart. You can pick which variables to graph, as well as map, using the pull-down menus on the X and Y axes. Mouse over the data points to see which states they correspond to.
  • Two Google geomaps of the US, each showing the intensity of a selected variable by state. Mouse over states to see values. (Unfortunately, because the maps are relatively small, Hawaii seems to get cut off.)
  • A table of correlations.
    • In the top-left corner is a Google gauge showing the correlation between the two selected variables. (This is technically called the correlation coefficient - see definitions from Wikipedia and MathWorld). Correlation ranges between -1 and 1 and is a numerical expression of how variables are related. A high positive correlation implies that as variable A increases, so does variable B. A high negative correlation (e.g. -0.9) implies that as A increases, B will decrease. A correlation of 0 means that A increasing has no effect on whether B increases or decreases.
    • The "correlation matrix", which gives the correlation between all pairs of variables given. The background is color-coded (by increments of 0.25) to the correlation value. Clicking on a cell in the table will select the row and column variables to be graphed on the scatter plot and mapped.

Problems

  • The script does not work in Internet Explorer. (Though it does work in Firefox, Chrome, and Opera.)
  • Resizing the visualization within a browser may mess up the alignment and the visualizations a bit.

Future Work

  • Getting the script to handle SPARQL query URLs rather than just Google JSON table data.
  • Being able to select the variables you want to compare from the datasets and having the program write the SPARQL for you. This would likely be done with some PHP.
  • Supporting data across time
  • Making the layout configurable?
  • Adding more statistical functions?
  • Making it possible to create new variables by using various operations (e.g. var A - var C, var B / var A, etc.)

Version 2

A newer, spiffier version of this API that is ready to be put to use.

Improvements over last version

  • It can handle SPARQL queries.
  • It can handle multiple data sources.
  • One can add mashups of variables to the final table.
  • You flip back and forth between viewing the raw data as maps or as a Google viz table.
  • Highlighting of column and row labels as you mouse over the correlation matrix.
  • The maps stay right next to each other, regardless of the size of the correlation matrix or zoom.

How to use

The main idea is that you load all your data and data manipulations first, then launch the draw program, which will take care of the rest.

addData

To add data:

 addData (datatype, jsonOrUrl, [meaningOfNull], [idByCol]);

Parameters:

  • datatype (string) - must be "sparql" (for a sparql query) or "json" (for a Google table JSON object)
  • jsonOrUrl - If datatype is "sparql", the URL of the SPARQL query (string). If datatype is "json", the JSON object.
    • As before, the layout of the columns of the table should be:
 statename numbers numbers numbers ...

With as many numerical columns as you want (within reason, of course).

    • The statename column must agree for all tables. The script currently does not work if one dataset uses state abbreviations and another uses full names. The state column should have full state names or state abbreviations prefixed by "US-". Dataset 10011 is very helpful for this.
  • meaningOfNull [optional] - What to replace null values in the table with, if anything (default is null).
  • idByCol [optional] - if datatype is "sparql", give the name of one of the columns that should be in the results (unique to this table) so that it can be associated with its meaningOfNull value.

addMashup

To add a variable that is a mashup of other variables:

 addMashup(newname, varnames, fn);

Parameters:

  • newname (string) - What this new variable should be called
  • varnames (array of strings) - Names of the variables that are used to create this mashup variable. Note that while underscores in variable names are converted to spaces in the visualization, the arguments here must use the original variable names, i.e. with the underscores. The names are case-sensitive.
  • fn (function) - A function that will take the mashup variables and return a value for the new variable. The number of arguments should be equal to the length of varnames. An example:
 function divide (a, b) { return (a / b); };
 
 addMashup ("Income Per Capita", ["Adjusted_Gross_Income", "Population"], divide);

removeVariables

To remove a set of variables from the final table:

 removeVariables (varname);

Parameters:

  • varnames (array of strings) - names of variables to be removed

mashVariables

To combine variables into a mashup variable, i.e. add a mashup variable and then delete the "ingredient" variables used to make it:

 mashVariables (newname, varnames, fn);

Parameters: The same as addMashup.

drawStateVarsViz

To launch the drawing of the entire visualization:

 drawStateVarsViz (container_id, DateVal);

Parameters:

  • container_id - ID of the DIV where visualization will go.
  • DateVal [optional] - Value to put in date column of final data table (this must be of type Date). The default is today's date. This is needed to make the motion chart work. This script no longer handles date columns already included in the data (though even then you could only use one value for the date in every row).

Results

Basically the same as before, although now there is a tab over the maps that allows one to switch to a table view of the raw data.

A demo can be found here: http://logd.tw.rpi.edu/test/tobacco/demo-state-smoke-agi-campaign.html

Please note that this is not officially on data-gov, and is not guaranteed to be stable (though it should be). It is also subject to changes.

Source Code

Note that the source code is not guaranteed to be stable or bug-free, nor is it as "clean" as what is ultimately uploaded to data-gov.

The source code can be found here: http://logd.tw.rpi.edu/test/statevar-compare/state-vars-compare-sparql.js

The code to include both the google API and the above script is: http://logd.tw.rpi.edu/test/statevar-compare/state-vars-compare-google-sparql.js

Issues

  • Still does not work in IE.

Version 3 - More Readable Interface

(At least I hope it's more readable.)

Problems this intended to address

One big issue is that correlation in the statistical sense is not a widely-understood concept, so having a table of correlation coefficients was meaningless to most people. The color-coding, as well as the link to a Wikipedia article, did not really help.

Another complaint was that people didn't really understand what the scatterchart was showing.

Another issue was that it was too cluttered and difficult to read. Was it necessary to have the maps there?

Changes

  • New box to explain the concept of correlation more thoroughly, with the help of plainer English and a colorful spectrum. This broke down the meaning of the correlation coefficient into two questions:
    • How much is variable A related to variable B? - Will the value of one tell you something about the other? Two variables are strongly related the closer the correlation is to 1 or -1, and farther from 0. A state's gross income is strongly related to its population, so the correlation between those two is 0.99 . We might expect that high cigarette taxes would drive down the number of smokers, and therefore the correlation between the two would be around -1, but it's actually closer to 0, telling us cigarette taxes and the smoker/non-smoker ration are unrelated.
    • High value for variable A tend to be associated with what kind of values for variable B? - As A increases, does B increase? Decrease? The more positive the correlation, the more likely B will increase. The more negative the correlation, the greater the likelihood B will decrease.
    • The answer to the above questions for the selected variables was color-coordinated with the table and appears on the upper right:

File:Correlexplanation.png

  • The scatterplot now gets a title, "VARIABLE A vs. VARIABLE B", which will hopefully make it a little clearer what it is showing.
  • Show/hide raw data. By default, the raw data (the maps and the Google table) are now hidden. One can click "View Raw Data" to open up the maps and table for viewing, and hide them again by clicking "Hide Raw Data".
  • Changed layout. The correlation matrix is on the top-left, making it more central. The colorful correlation explanation is on the top-right, hopefully causing it to grab eyes faster. The scatterplot has been moved to the bottom-right. Finally, when the raw data is shown, it appears on the bottom-left, relatively out of the way.
  • Fancier borders. Hoo-hah.

Results

A demo can be seen here: http://logd.tw.rpi.edu/test/tobacco/demo-state-smoke-agi-campaign-simple.html Note that this is not officially published, so it therefore may contain inaccuracies, bugs, "unclean" code, etc.

By default, raw data is not shown
Raw data appears on bottom-left

Source

Note that the source code is not guaranteed to be stable, bug-free, "clean", etc.

The source code can be found here: http://logd.tw.rpi.edu/test/statevar-compare/state-vars-compare-simple.js

The code to include both the google API and the above script is: http://logd.tw.rpi.edu/test/statevar-compare/state-vars-compare-google-simple.js

Using the API via a Web Form

There is web form (PHP) that allows users to fill in variables from datasets on Data-gov.tw and then view them through the API.

Source and location

The source code/form is here:
http://logd.tw.rpi.edu/test/statevar-compare/statevars.php
There is another PHP file used to write SPARQL based on user input, which can be found at:
http://logd.tw.rpi.edu/test/statevar-compare/statevars-sparql-writer.php

How to Use

There are three different ways to submit data via the form:

  1. Variable names
  2. SPARQL Query URIs
  3. Google table URIs

In all cases, one can click the "Add variable/SPARQL Query/Google Table" button at the bottom of the section to add another data source, and a data source can be removed by clicking the "Remove variable/Query/Table" button next to it.

To run the visualization, click "submit" at the bottom. It currently uses the latest (3rd) version of the Statevars Javascript API.

Ignore the "add mashup" box, as it does not work yet.

Individual Variables

The idea behind this is that someone should be able to browse through the data we have here (or at least datasets that are in the triple store) and pick out individual variables that they want to compare as they go. To load an individual variable, several fields are required:

Fields
  • Dataset Number - the ID of the dataset from which the variable comes, e.g. "353" for dataset 353
  • Variable name in raw data - the name of the variable in the RDF (excluding the prefix), e.g. "popu_st" for population in dataset 353.
  • New variable name - How you want the variable to be labeled in the visualization, e.g. "Population"
  • Name of parameter in raw data containing state - the name of the variable in the dataset containing the state name/abbreviation, e.g. "phys_st" for DS 353.
  • States are listed under their - Whether states are listed by full name or abbreviation. If dataset 10007 is loaded into the triple store, FIPS code can be inserted here as well (it would just require uncommenting one line, currently 85).
  • Aggregation function - An aggregation function to be applied to this variable. The current list is: sum, average, count, min, and max. "None" is also an option.
Notes
  • Variables must have numeric values, unless the "count" aggregator is being applied to them.
  • Dataset-spaceific and not variable-specific information, namely state parameter name and state form (abbreviation, full name, or FIPS), only needs to be entered once. If you list multiple variables from the same dataset, this information need only be entered for one variable--the rest can have those fields blank.
  • One could add the same variable twice, but with different aggregation functions.
  • If Johanna's algorithm to detect the name of the parameter listing the state gets very good, the state parameter name and form fields could be left out and just detected automatically by the program.
  • We should load dataset 10007 into the triple store so that FIPS code can be used, too.
  • If sum or aggregation is applied, the SPARQL query will first cast the values for that variable as xsd:decimal s.

SPARQL Queries

This is pretty straightforward. The URI of a SPARQL query goes under SPARQL Query URI. If one is using different SPARQL endpoint than http://data-gov.tw.rpi.edu/sparql, write in the SPARQL endpoint URI in the appropriate spot.

Google DataTables

Also straightforward. Provide the URI of a file that has the JSON for a Google Datatable object.

Example

Filling out the form:

Result

SPARQL Query:

PREFIX dgtwc: <http://data-gov.tw.rpi.edu/2009/data-gov-twc.rdf#>
PREFIX foaf:  <http://xmlns.com/foaf/0.1/>
PREFIX skos:  <http://www.w3.org/2004/02/skos/core#>
PREFIX ds10007: <http://data-gov.tw.rpi.edu/vocab/p/10007/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

PREFIX ds353: <http://data-gov.tw.rpi.edu/vocab/p/353/> 
PREFIX ds1356: <http://data-gov.tw.rpi.edu/vocab/p/1356/> 
PREFIX ds1623: <http://data-gov.tw.rpi.edu/vocab/p/1623/> 

PREFIX ds10040: <http://data-gov.tw.rpi.edu/vocab/p/10040/> 

SELECT  ?state 
	 ?popu_st_353 AS ?Population 
	 SUM(xsd:decimal(?agi_1356)) AS ?Gross_Income 
	 ?total_1623 AS ?Medicare_claims 
	 ?total_no_internet_use_no_10040 AS ?Non_internet_users 
	 ?total_internet_use_anywhere_no_10040 AS ?Internet_users 

WHERE 
{ 
	 GRAPH <http://data-gov.tw.rpi.edu/vocab/Dataset_353> 
	 { 
		 ?thing_353 ds353:phys_st ?stvar_353 . 
		 ?thing_353 ds353:popu_st ?popu_st_353 . 
	 } 
	 GRAPH <http://data-gov.tw.rpi.edu/vocab/Dataset_1356> 
	 { 
		 ?thing_1356 ds1356:state_abbrv ?stvar_1356 . 
		 ?thing_1356 ds1356:agi ?agi_1356 . 
	 } 
	 GRAPH <http://data-gov.tw.rpi.edu/vocab/Dataset_1623> 
	 { 
		 ?thing_1623 ds1623:state ?stvar_1623 . 
		 ?thing_1623 ds1623:total ?total_1623 . 
	 } 
	 GRAPH <http://data-gov.tw.rpi.edu/vocab/Dataset_10040> 
	 { 
		 ?thing_10040 ds10040:state ?stvar_10040 . 
		 ?thing_10040 ds10040:total_no_internet_use_no ?total_no_internet_use_no_10040 . 
		 ?thing_10040 ds10040:total_internet_use_anywhere_no ?total_internet_use_anywhere_no_10040 . 
	 } 
	GRAPH <http://data-gov.tw.rpi.edu/vocab/Dataset_10011>
	{
		?uri_dgtwc skos:altLabel ?stvar_353 . 
		?uri_dgtwc skos:altLabel ?stvar_1356 . 
		?uri_dgtwc skos:altLabel ?stvar_1623 . 
		?uri_dgtwc skos:altLabel ?stvar_10040 . 
		?uri_dgtwc foaf:name ?state .
	}
  
}
GROUP BY ?state ?popu_st_353 ?total_1623 ?total_no_internet_use_no_10040 ?total_internet_use_anywhere_no_10040
ORDER BY ?state

Demos that use this API

(2)
Creator Created Description
Demo: Comparing Types of Campaign Money by State Sarah Magidson 24554145 August 2010 Compare the disbursements, receipts, and loans of Democratic and Republican candidates by state. See how these variables are or aren't correlated.
Demo: Estimate Correlations between Smoking Rate, Cigarette Tax and Beyond Sarah Magidson 24554156 August 2010 Look at how smoking rates, population, cigarette taxes, and other related variables relate to one another, by state
Facts about State Variable Comparison JS APIRDF feed
Dcterms:created5 August 2010  +
Dcterms:creatorSarah Magidson  +
Dcterms:descriptionHow to use the state variables comparison Javascript API
Dcterms:modified2010-8-20
Foaf:nameState Variable Comparison JS API
Skos:altLabelState Variable Comparison JS API  +, state variable comparison js api  +, and STATE VARIABLE COMPARISON JS API  +
Personal tools
internal pages