Social Science Data

Amidst my search for social science data (to perform social capital experiments), I have discovered the following data:
  • Social Science Data on the Internet - seems to have lots of links to data, however the search is limited and results pages are clunky.
  • Social capital datasets - presents some information on data sources that are specifically related to investigating social capital.
  • INSNA - has the data used in Wasserman and Faust book on Social Network Analysis.
  • ICIPSR - Inter-university consortium for political and social research
  • SDA - web-based software available for accessing much of the social science data. Additionally, I noticed some archived data available.
There appears to be plenty of data.  Now on to the task of filtering it down to the most relevant...

Social Network Graphing Tools

Here is a write-up on various social network graphing tools that I posted on the Data Mining Lab wiki: Social Network Graphing Tools

Most recently, I have enjoyed using Cytoscape and the Network Workbench Tool for larger networks and GraphViz for small example networks.

What tools do you like to use?

Google Chatback

Google just released a new gadget that will allow people visiting your blog or website to chat with you (via Google Talk). It is like the live chat and call-back buttons that have been available previously. To set it up, all you need to do is add the chatback badge code to your blog. Then, it'll look something like this...

Quick Mac and Windows Keyboard Switching

Since Mac OS X doesn't save profiles for multiple keyboards (which I hope will change in subsequent versions), I had to find a solution for quickly changing between a Windows style keyboard and a Mac style keyboard. After spending some time with Google, I found some AppleScript that seems to provide an adequate solution. For convenience, I made two scripts: one called keyboard_win.scpt that switches the modifier keys on a Windows style keyboard to behave Mac-like, and one called keyboard_mac.scpt that restores the default keyboard behavior for a Mac style keyboard.

(1) keyboard_win.scpt:
(2) keyboard_mac.scpt:
Hopefully, this will save you some time.

Mining "Stories" from Blogs

I just finished reading a paper called, Mining Blog Stories Using Community-Based and Temporal Clustering, which I found very interesting as it is extremely related to the work that we have been doing. As I read, I had the following thoughts and questions.

  • It is refreshing to discover others that are pursuing similar research paths as us.
  • Their work gives more fuel to the approaches we have been taking.
  • Some of the formalism is elegant and useful (particularly, for talking about blogs and entries), however, some of it gets cumbersome (the lookup table is definitely necessary).
  • Using Lucene library to index the data is an idea that we could consider (we have been storing the data in a MySQL database of our own make)
  • I'd be curious to know how many degrees of separation they crawled from their seed blogs? I would guess that they did not get too far since the blogs in our study appear to be less sparse on average. They reported over 2,000 blogs having 1 million entries (on average, 50 entires per blog per month). In Social Capital in the Blogosphere we retrieved blog content just two degrees away from Scoble (our single seed blog) and obtained over 38,000 blogs having 13 million entries (on average, 28.5 entries per blog per month).
  • How did they perform blog entity resolution? (here is an approach that we used)
(Note: Thanks to both Christophe who sent me the paper and Robbie Haertel for sending it to Christophe --- Social capital in action. ;) )


The Bandwidth Place is a nice site that tests the bandwidth available with you current network connection.

Escaping JavaScript and PHP

Here are some references for encoding and decoding in JavaScript and PHP, so that passing information via ajax or urls can be done cleanly.

The Art of Web - useful testing area for JavaScript and PHP.
W3Schools JavaScript Reference - just JavaScript

What is Uplift Modeling?

Uplift modeling is an approach to predicting the incremental impact a marketing campaign has on a customer through controlled experimentation. It measures the variation in the difference between a treated group and a control group segmenting customers into the following groups:

1. Those that buy only when treated
2. Those that would buy or not buy regardless of whether or not they were treated
3. Those that do not buy when treated, but do buy when not treated

I've have yet to use Uplift Modeling, but it sounds rather interesting.

Star Rating Widget

Here is a rating widget and how to use it. It can be used for collecting ratings from website visitors. Typically, it will be used for five-star rating, but you can pass it any number, so that it can be used as an n-star rating widget.

Notes on Social Capital Views

In Social Capital, by Nan Lin, presents some differing perspectives on social capital. I have made some notes of his assessment below, and have added some additional comments.

Bordieu - Social capital consists of social obligations or connections. It can be reduced to economic capital as it is viewed as collective asset that endows members with credits. The collective asset can be obtained through group membership.
Burt - Social capital is based on the number of people that an individual is connected to, the strength of these relationships, and the location the individual reside within this structure (see Flap's perspective). Individuals that bridge non-redundant groups of people have more social capital. Social capital accrues through bridging structural holes.
Coleman - Social capital is an aspect of social structure, and it facilitates certain actions of individuals within the structure. Social capital accrues through bonding.
Flap - Social capital consists of (1) the number of persons within one's social network who "are prepared or obliged to help you when called upon to do so," (2) the strength of the relationship indicating readiness to help, and (3) the resources of these persons.
Lin - Resources are divided into two types: personal and social. Personal resources include material objects (e.g, an airplane) and symbolic objects (e.g., a diplomas and degrees). Social resources, on the other hand, are resources accessed through an individual's social connections. Social resources, in both quantity and quality, far outweigh personal resources.
Putnam - Group level theory which tends to measure social capital collectively.

These notes are in no way conclusive or complete. However, they serve as a quick reminder to the various views.

Social Capital Explanations

In Nan Lin's book, Social Capital, he provides the following explanations as to why social capital works, or more clearly, why embedded resources in social networks enhance the outcomes of actions. They are as follows with an example of each (see pages 19-20):
  1. information flow is facilitated - word of mouth effect (e.g., being connected to people with useful information is a benefit)
  2. social ties may exert influence - some ties carry more weight (e.g., son of the President)
  3. social credentials - resources beyond an a single individuals' are available if necessary (e.g., politician endorsement)
  4. reinforcement identity and recognition - can provide support and public acknowledgment of one's claim of resources (e.g., maintenance of mental health and entitlement of resources)
Lin suggests that these four elements "may explain why social capital works in instrumental and expressive actions not accounted for by forms of personal capital such as economic or human capital."

I would agree and add that relationships among people add an additional layer of complexity that cannot be accounted using only individual centered metrics.

Social Capital Simulation Updated

The social capital simulation has been improved! The usability has been improved, preset examples have been added, and additional information in now reported.

Social Connections in Decline

Robert Putnam, an influential social capital researcher, visited BYU nearly two years ago to discuss how social connections are on the decline. Here is good summary of Putnam's talk on BYU NewsNet. His research during the past decade has shown a negative trend in that people are socially connecting less these days. The speech gave fuel to the research on social networks that I had been involved in and has been a strong motivation to our current work on social capital.

Figure 1. "The TV Connection" shows that group membership tends to decline as television viewing increases among those having twelve or more years of education. (see The Strange Disappearance of Civic America)

Empirical studies on group membership, like the study shown in the plot above contribute to the evidence which Putnam uses to support this claim.

Social Capital Simulation (Online)

The past couple days I have been working on an online social capital simulation that was created primarily with Javascript. Currently, it calculates social capital in the same manner as the excel version, however, it is more powerful as it allows you to set how many nodes you would like in the network, it dynamically creates a visual graph of the network, and it is accessible online.
I used Walter Zorn's High Performance JavaScript Vector Graphics Library to draw the network (i.e., nodes, lines, and text). This is an impressive library, which makes drawing with Javascript more pleasant than I originally expected. Also, to facilitate this project, I extended Zorn's library by adding getColor, getOpacity, and setOpacity methods. Furthermore, Michael Deardeuff and other Data Mining Lab members used their keen pattern finding skills to develop the mathematical equation for node placement in the graph.

Let me know how it works for you. I want to make this available so that it is easy to for people to get a feel for how we calculate social capital, which will allow us to refine our method.

Burt's Views

I have been reading Ronald Burt's book called "Brokerage and Closure" to gain a better understanding of his view of social capital. Here are a few points that I have found interesting as I have read:

The value of a relationship is not defined inside the relationship; it is defined by the social context around the relationship" (Brokerage and Closure, pg. 11).

The "working definition" of a structural hole is the relationship between two people is a hole-spanning bridge when there is no effective indirect connection between the people. However, Burt does not wish to imply that the concept of a structural hole has an absolute meaning. He explains that structural holes could come and go by simply changing the population size of the network being analyzed. He argues that the same definitional issue exists for the absolute meaning of a relationship, a fundamental element of network theory (Brokerage and Closure, pg. 24).So, as an example, the structural hole in the network above is the relationship marked blue, while the value of that relationship is defined by the social context, or the other nodes in the network (i.e., A, B, G, E, F, H, and I).

Gmail Temporarily Down

Dependence on Google is not always good. Here is the error message:

Update: Gmail is back up! The duration of the "Temporary Error" lasted about half an hour.

Book List: Social Capital

In my quest to better understand how sociologists view social capital, I visited the library and checked out the following books:
  • Structural holes : the social structure of competition Burt, Ronald S.
  • Brokerage and closure : an introduction to social capital Burt, Ronald S.
  • Applied network analysis : a methodological introduction Burt, Ronald S.
  • Social capital : theory and research Lin, Nan
  • New social ties : contemporary connections in a fragmented society Chambers, Deborah.
  • Complex social networks Vega-Redondo, Fernando.
  • Effective small group and team communication Hoover, Judith D. (Judith Davis)
  • The role of social capital in development : an empirical assessment Grootaert, Christiaan
  • Social capital : a theory of social structure and action Lin, Nan
  • The well-connected community : a networking approach to community development Gilchrist, Alison.
  • The Analysis of social interactions : methods, issues, and illustrations Cairns, Robert B.
Of course, I won't have time to read all of these books from cover to cover. However, I do plan on reading enough to understand how they define, calculate, and show the presence of social capital.

Social Capital Simulation

Our recent work has explored the concept of social capital, which I have discussed previously. Our social capital metrics, namely bonding and bridging (popularized by Robert Putnam), utilize the hybrid network methodology that we have developed for online communities.

To understand our metrics, I have created a basic social capital simulation (an excel spreadsheet) having five nodes. The simulation allows for you to change the connection strengths in both the implicit affinity network (IAN) and the explicit social network (ESN). Changing these values will give you an idea of how social capital fluctuates as the social network changes.

The figure above shows the initial configuration of the simulation. The dashed blue lines represent the IAN and the solid pink lines represent the ESN. The thicker the lines the stronger the connection. The weights for the IAN were randomly assigned, while the ESN weights were all set to one, thus creating a clique.

Initially, the bonding and bridging social capital are both 1, since everyone in the network is connected. To see how the social capital fluctuates, change the blue and/or pink values, again representing the IAN and the ESN weights respectively, in the spreadsheet.

Social Graph API by Google

Google's Social Graph API allows developers to utilize the public connections among people on the web. The idea is simple, yet it could make it easier for people to connect across sites. Of course, the data all comes through Google, which yet increases our dependence on them. No doubt, other search engines could easily create the same API.

Since Google already saves a copy of all of the web pages that it spiders for search, the task of extracting the annotated links is somewhat trivial. Currently, this will only work as web developers annotate user links with XHTML Friends Network (XFN) and Friend of a Friend (FOAF). It is a great idea, but may take some time before web developers start annotating.
Furthermore, the easy access to people's connections is a nice data source for some new applications and experiments.