Demo: Connect Tobacco Survey Data with Linked Government Data

From Data-gov Wiki

Jump to: navigation, search

Infobox (experimental demo) edit with form
  • name: Demo: Connect Tobacco Survey Data with Linked Government Data

  • description: This set of demos show how existing tobacco research data can be linked to other open government data in unexpected way.
  • keyword(s): smoking,cigarette,tax,health
  • creator(s): Li Ding,Tim Lebo
  • created: Feb 1, 2010
  • relation(s): PopSciGrid
  • modified: 2010-12-8

live demo here

Contents

Facts about this Demonstration

Live Demo(s)
Video Demo(s)
Data.gov Data source(s)
Other Data source(s)
Technology Used
Related SPARQL
Related Demo(s)


Overview

The National Cancer Institute’s (NCI) PopSciGrid Community Health Portal is an evolving platform demonstrating how health behavior, policy, and demographic data can be integrated, visualized, and communicated to empower communities and support new avenues of research and policy for cancer prevention and control. As a proof of concept for cyber-enabled population health research, the PopSciGrid Portal is designed to encourage trans-disciplinary collaboration, data harmonization, and development of new computational methods for disparate health related data.

The National Cancer Institute’s (NCI) PopSciGrid Community Health Information Portal enables users to explore the relationship between cancer risk factors, health outcomes, and factors in the social and political environment. The maps, graphs, and tables on this website are not intended to convey information about causal relationships, and the NCI does not endorse any conclusions or inferences that individual users may draw from the maps, tables, or graphs produced by this website.

Learn more about cancer.

Browser considerations

Firefox and IE 8 on Windows XP have difficulties displaying the following demonstrations. IE 7 on Windows Vista also has difficulties. Demonstrations can be successfully viewed using IE 8 on Windows 7, and Firefox and Safari on Mac. Flash is required for all browsers.

Participants

National Cancer Institute

  • Abdul R. Shaikh
  • Richard Moser
  • Bradford W. Hesse
  • Glen D. Morgan
  • Erik M. Augustson
  • Yvonne Hunt
  • Zaria Tatalovich
  • Gordon Willis
  • Kelly Blake
  • Paul Courtney (contractor)
  • Lila Rutten (contractor)
  • Amy Sanders (contractor)

Rensselaer Polytechnic Institute

  • Deborah L. McGuinness
  • Li Ding
  • Tim Lebo
  • Jim McCusker

Northwestern University

  • Noshir Contractor
  • Yun Huang
  • Hugh Devlin
  • York Yao

Demos

The National Cancer Institute’s (NCI) PopSciGrid Community Health Information Portal enables users to explore the relationship between cancer risk factors, health outcomes, and factors in the social and political environment. The maps, graphs, and tables on this website are not intended to convey information about causal relationships, and the NCI does not endorse any conclusions or inferences that individual users may draw from the maps, tables, or graphs produced by this website.

Learn more about cancer.

An initial approach: comparing statistics per state

Smoke Rate (v1, HINTS 2005 only)

  • respondent: anyone who responded to the survey
  • smoker: anyone who (i) answered yes(1) for TU-01 and (ii) answered Every Day(1) or Some Days(2) for TU-02
TU-01. Have you smoked at least 100 cigarettes in your entire life? (1:yes; 2:no; 8:refused; 9:don't know)
TU-02. How often do you now smoke cigarettes? (1:every day; 2:some days; 3:Not at All; 9:don't know)

http://data-gov.tw.rpi.edu/demo/stable/tobacco-smoker/demo-state-10026-smoke-rate.html

Observations:

  • Smoke Rate map (right) and Smokers map (center) look different. e.g. Maine seems to have a high smoke rate, why?

Smoke Rate (v2, Linking HINTS dataset to State Population and Cigarette Tax)

Although we know each state's smoke rate, we need to incorporate state population:

http://data-gov.tw.rpi.edu/demo/stable/tobacco-smoker/demo-state-10026-smoke-rate-mashup.html

Observations:

  • Data Sampled: HINTS data was linked with the state population data from data.gov - “State Library Agency Survey: Fiscal Year 2006” published by Institute of Museum and Library Services
  • Sampling Methodology: we can check if a state has been reasonably sampled based on the number of respondents and the state population. This operation can guide the analyst measure the level of confidence of the derived smoke rate. For example, the well-populated states, such as California, New York and Texas, are under-sampled; therefore, the smoking rate from these state could be less trustworthy then that from many other states (in darker color in sample Rate map). Note that this measure is not validated by practicing analysts for evaluating statistical significance.
  • Estimate Smokers: we can also leverage population data to estimate the actual number of smokers in the state. For example, there are many reported smokers from Pennsylvania’s correspondents but the number of estimated smokers from Pennsylvania is relatively low in national ranking.
  • Interactive Chart: We can use motion chart to observe the correlations between the features. From the chart, we can see that, in 2005, Cigarette Tax (x-axis) is not strongly correlated with smoking rate (y-axis) or state population (size of bubble). While this chart is showing only data from the year 2005, the relationship between the Smoke Rate and Cigarette Tax can, and does, change over time.

Interactive Multi-Year Tobacco Tax Analysis

We can also plot changes of tobacco tax 2000-2007

http://data-gov.tw.rpi.edu/demo/stable/tobacco-smoker/demo-10028-tobacco-tax.html

Observations:

  • Tobacco tax changes differently in different states

How it works: Linking Related Datasets

National Health Interview Survey 2005 (NHIS) and Health Information National Trends Survey 2005 (HINTS) share common measures, allowing them to be linked. These two surveys also share measures with a variety of datasets available from data.gov. These commonalities allow additional linking for analysis of new relationships.

We we will look into more ways connecting datasets, see Connections among NHIS, HINTS, and TW's data-gov.

Existing datasets

tobacco related

other datasets

other relevant datasets

With these datasets, we can in the future link to other government datasets for analysis (see figure below).


potential sources of datasets

Conclusion

  • Existing tobacco control research data can be linked to other government data and yield interesting results.
  • Visualization on the Web is an effective and low-cost way to communicate tobacco research result to citizens.
Facts about Demo: Connect Tobacco Survey Data with Linked Government DataRDF feed
Data sourceDataset 353  +, Dataset 10026  +, and Dataset 10028  +
Dcterms:created1 February 2010  +
Dcterms:creatorLi Ding  +, and Tim Lebo  +
Dcterms:descriptionThis set of demos show how existing tobacco research data can be linked to other open government data in unexpected way.
Dcterms:modified2010-12-8
Dcterms:relationPopSciGrid  +
Dcterms:subjectsmoking  +, cigarette  +, tax  +, and health  +
Foaf:depictionhttps://data-gov.tw.rpi.edu/w/images/0/02/Demo-state-10026-smoke-rate-mashup.png  +
Foaf:nameDemo: Connect Tobacco Survey Data with Linked Government Data
ImageDemo-state-10026-smoke-rate-mashup.png  +
Live demohttp://data-gov.tw.rpi.edu/demo/stable/tobacco-smoker/demo-state-10026-smoke-rate-mashup.html  +
Skos:altLabelDemo: Connect Tobacco Survey Data with Linked Government Data  +, demo: connect tobacco survey data with linked government data  +, and DEMO: CONNECT TOBACCO SURVEY DATA WITH LINKED GOVERNMENT DATA  +
Sparqlhttp://data-gov.tw.rpi.edu/demo/stable/tobacco-smoker/demo-state-10026-correspondents.sparql  +, http://data-gov.tw.rpi.edu/demo/stable/tobacco-smoker/demo-state-10026-smoker.sparql  +, http://data-gov.tw.rpi.edu/demo/stable/tobacco-smoker/demo-state-10028-tobacco-tax-2005.sparql  +, http://data-gov.tw.rpi.edu/demo/stable/tobacco-smoker/demo-state-353-population.sparql  +, and http://data-gov.tw.rpi.edu/demo/stable/tobacco-smoker/demo-10028-tobacco-tax.sparql  +
Technology usedRDF  +, SPARQL  +, SparqlProxy  +, and Google Visualization API  +
Personal tools
internal pages