Demo: White House Visitation Social Network

From Data-gov Wiki

Jump to: navigation, search

Infobox (stable demo) edit with form
  • name: Demo: White House Visitation Social Network

  • description: This demo tries to visualize the network of White House visitors and the people whom they visit. Right now it's still in development stage. The data used for this work is about the top 100 most frequently visited people in White House.
  • creator(s): Xian Li,
  • created: 2010/05/03
  • modified: 2010-5-19



This work aims at looking into the social relations of White House visitors and visitees. The underlying data is from Dataset 10025. From a graph point of view, visitors and visitees are considered as nodes, while an directed edge from node A to node B is to represent visitor A visits visitee B. As a result, the data encoded in Dataset 10025 would form a social network graph.

Screen shots

There are three types of indicators showing three types of information:

  • Size of node:

The size of each node corresponds to the number of distinct visitors this node has, e.g. the in-degree of each node.

  • Color of node:

The more red the node is, the more close it is to other nodes in the graph. It corresponds to the closeness centrality within the graph.

  • Thickness of the edge:

The thickness of each edge is calculated by the weight of that edge, which decodes the total number of visits between a visitor and a visitee.

Overview: Top 100 frequently visited people in White House

The original view of the data is shown as below.

  • Looking at the biggest node, we could observe that Nancy Hogan, who is the White House Personnel Director, had most visitors.
  • Looking at the thick edges, we found that people such as Vivek Kundra, who were visited most frequently.
  • Looking at the color of the nodes, we found some nodes are more red than others, normally those who visited more than one visitees in White House, such as people around Lawrence Summers and Diana Farrell at the upper left.

File:Screen shot 2010-05-03 at 7.00.27 PM.png

Social cliques

Now we zoom in the part of the network which has most red nodes. There are several observations from this social cluster:

  • There are two visitors(David Scharfstein and Samuel Hanson) visiting both Lawrence Summers and Diana Farrell, so that we could infer that Summers and Farrell work on the same area (economics). However, as we can see, David Scharfstein visited Summers much more frequently than he visited Farrell, while Lewis Saches (Lee Saches) visited Farrell more frequently. From Wikipedia, we could find that both Scharfstein and Summers are Professors at Harvard Business School. And through google, it could be found that Diana Farrell had been working for Goldman Sachs, and Lewis Sachs worked at Tricadia, which was the most aggressive C.D.O. creator that Goldman did business with. So these knowledge might help in explaining the closeness within the social network.
  • After identifying Summers and Farrell are close to each as White House people, it might be interesting to identify people who visited only one of them, e.g. we can see Ron Bloom only visited Summers.
  • Now it's more easier to identify the name inconsistencies in the visitor/visitee log, such as Lawrence Summers and Larry Summers are the same person, Lee Sachs and Lewis Sachs are the same person.

File:Screen shot 2010-05-03 at 7.59.20 PM.png


The data used for this work is about the top 100 frequented visited people in White House.


The sparql query that retrieves the data is:

PREFIX rdf: <>
PREFIX dgp: <>
PREFIX dgpe1: <>
PREFIX dgp: <>
distinct ?namefirst, ?namelast, ?visitee_namefirst, ?visitee_namelast, count(*) as ?cnt
  ?s dgp:namelast ?namelast . 
  ?s dgp:namefirst ?namefirst . 
  ?s dgp:visitee_namelast ?visitee_namelast . 
  ?s dgp:visitee_namefirst ?visitee_namefirst . 

 filter (?namefirst!= "NULL")
 filter (?visitee_namelast != "OFFICE")
group by ?namelast ?namefirst ?visitee_namelast ?visitee_namefirst 
order by desc(?cnt)
limit 100

The visualization is done by Gephi. The data input is a csv file or gdf file, with 3 columns:

visitor, visitee, how many visits in total

From a graph perspective, visitors and visitees are nodes, a directed link from node A to node B represents a visit from visitor A to visitor B. The weight of each edge is associated with the frequency of the visit.


So far I've tried two softwares: Gephi and NodeXL(which only has Windows version). The input of these two softwares could be csv files.

  • Pros:

They can produce social network visualizations with some statistical functions about the graph and filtering. As shown in the screen shot below, after setting the degree range from 2 to 27 as filter, nodes around Nancy Hogan(who had 29 visitors) were filtered out, and all other nodes became gray except for people who visited more than one visitee or who were visited by more than one visitors.

  • Cons:

The output of the visualization is just static picture, which is lacks interaction with users as ManyEyes has. File:Screen shot 2010-05-03 at 6.39.45 PM.png

Facts about Demo: White House Visitation Social NetworkRDF feed
Dcterms:created3 May 2010  +
Dcterms:creatorXian Li  +
Dcterms:descriptionThis demo tries to visualize the network of White House visitors and the people whom they visit. Right now it's still in development stage. The data used for this work is about the top 100 most frequently visited people in White House.
Dcterms:modified2010-5-19  +
Foaf:depiction  +
Foaf:nameDemo: White House Visitation Social Network
ImageScreen shot 2010-05-03 at 7.00.27 PM.png  +
Skos:altLabelDemo: White House Visitation Social Network  +, demo: white house visitation social network  +, and DEMO: WHITE HOUSE VISITATION SOCIAL NETWORK  +
Personal tools
internal pages