Saturday, December 27, 2008

Recording Skype Calls

Here are the basic settings that I used on a Mac to record Skype conversations using SoundFlower, Skype, and Audacity.


Default Input: SoundFlower (2ch)
Default Output: Built-in Output

Audio output: SoundFlower (2ch)
Audio input: SoundFlower (2ch) (if you wan't to record both sides of the Skype call)
Built-in Microphone (if you only wish to record the other side of the conversation)


Recording Device: SoundFlower (2ch)
Playback Device: Core Audio: Built-in Output

You could record many other things using similar configurations. Happy Recording!

Update: Since writing this I happened to come across another tutorial on doing this with pictures, that you might wish to follow.

Tuesday, December 16, 2008

Network Roles

Social networks tend to have people that fill positions within the community. For example, within an academic community the roles of being a professor or a student can sometimes be identified solely by using the directed interactions among the individuals. To perform such an analysis, some measure of equivalence is used. In Social Network Analysis by Wasserman and Fauts, the following definitions of equivalence are reviewed (each with a note):
  1. Structural Equivalence - requires identical ties to other identical actors
  2. Automorphic and Isomorphic Equivalence - requires identical ties to other actors
  3. Regular Equivalence - actors have identical ties to and from equivalent actors
  4. Local Role Equivalence - actors are role equivalent if they have the same role sets
  5. Ego Algebra Equivalence - based on algebra of relational structuresSo, why is knowing how to use this important? Well, say you would like to better understand the network surrounding your blog by learning which other blogs are similar in their ties as you, then this is how you could do it.

Friday, November 28, 2008

Blog Posts Increasing and Cyclic

During the span of a year, the over 200 blogs aggregated above show very cyclic behavior. It shows very prominently that these bloggers post significantly more posts during the week rather than on the weekends.

Additionally, as time went on, these blogs as a group posted more frequently.

I would guess that the sink during December was caused by the Christmas holiday.

Tuesday, November 25, 2008

Christmas Gift Giving Tool

Christmas is a time for gift-giving --- sometimes, families and friends need a quick way to determine who gives to who. So, I created this little tool that allows families and friends to quickly enter the names of people (that will be giving and receiving gifts) to have gift-giving list automatically (and randomly) generated.

It is nice to have a tool like this so the lists can be generated early and gift-giving becomes that much easier --- you don't have to wait until somebody gets the old hat out and writes down everyone's name. Additionally, it seems to be more fair and is less complicated. ;)

The program was built using Python and Google App Engine.

Thursday, November 20, 2008

LaTex: Vertical Text

Scientific publications in computer science are often created using LaTeX. Here is a little tip for making text appear vertically in Latex.
First, you need to include the following library:

Next, you can use the sideways environment, as follows:

Finally here is an example used within a table:
\begin{sideways}Letter\end{sideways} & \begin{sideways}Frequency\end{sideways} & Words \\
A & 0.05 & Apple, Algebra, Altruistic, Angel \\
B & 0.45 & Basketball, Ballroom, Bear, Bountiful \\

Which produces:

Tuesday, November 04, 2008

Obama: 44th US President

Barack Obama has been elected as the next President of the United States of America. I enjoyed following the election coverage this season as it was particularly exciting. Although my first choice for President was Mitt Romney, I believe that Obama will work to unite the country and improve International relations.

Thursday, October 30, 2008

PIKM - rough notes

Session 1: Chair - Prasan Roy
A SQL Database System for Solving Constraints
An interesting take on enhancing SQL to solve constraint problems
Acquiring Advanced Properties in Ontology Mapping
Using ontologies to improve knowledge management
Social Capital in Online Communities
This was my presentation. ;)

Session 2: Chair - Aparna Varde
Concept Search in Urdu - interesting challenges. He proposes to write a language specific stemmer to be used for the Urdu language
Topic Models and a Revisit of Text-related Applications
An Extended Cooperative Transaction Model for XML

Session 3: Chair - Anisoara Nica
The Benefit of additional Semantics in Folksonomy Systems
Exploiting additional context in folksonomies (
Ideas: GroupMe, Social HITS, Automating MOAT, relations between tag assignments
MOAT - Meaning of a Tag (automatic MOAT using context of resource), MOAT server, DBPedia
A Microscopic View on Community Detection in Complex Networks - No Show
Towards Privacy-Preserving Integration of Distributed Heterogeneous Data
Privacy-Perserving Data Sharing Architecture. Pawel Jurczyk presented a fairly complex system which hopes to solve the problem of preserving privacy when sharing data. This approach could be applied to hospitals that wish to share data in order to make use of one another's data.
Concurrency Control and Recovery for Multiversion Database Structures
Tuukka Haapasalo ( proposes a solution for multiversion databases. Propositions: 1) extend B-Trees: TSBT, Transactional MVBT, or 2) Two-dimensional R-tree.

Wednesday, October 22, 2008

Web Startup Group Meets Thursday Night

The first official Web Startup Group meeting is tomorrow night. It should be a fun event that good things will come from. It will be at 7:00 PM in the TMCB at Brigham Young University. All interested are welcome to attend.
the updated logo

Wednesday, October 15, 2008

Security Analysis of Reputation Systems

I came across this report on reputation-based systems today which I found at a reputation based social capital blog. It highlights the security threats against current reputation systems, use cases, and even ten recommendations to combat these threats.
Snapshot of the some of the recommendations

Friday, October 10, 2008

Information Pathways in Social Networks

The first talk presented in the social network session of KDD 2008 was for an interesting paper by G. Kossinets, J. Kleinberg, and D. Watts titled The Structure of Information Pathways in a Social Communication Network (PDF). Although I was not at KDD I was able to watch it online at
Kleinberg, the presenter, made some interesting observations having to do with our "rhythmic" everyday conversations. The approach to analyzing communication within these social networks is focused on the frequency of correspondence, rather than the content conveyed.

They measure "distance" between individuals by measuring the minimum time required for information to pass from one node to another. A methodology based on Lamport's work and vector clocks in the area of distributed computing.

Using this metric they are able to filter a busy network (one having edges for all communication packets) in a simplified network that contains only the edges that are minimum-delay paths between a pair of nodes. They call this simplified network view the network backbone. Below is an example of such a network (along with the caption) taken from the paper.
The nodes further outside of the center of the graph are more "out-of-date" with respect to node v, since they communicate less frequently.

I found the approach to be novel and useful. As with nearly any analysis technique, caution should be used in selecting the time-period and group size to be studied. Recency and frequency issues come into play as correspondence is aggregated. However, this pursuit offers another approach for more fully understanding information flow.

Monday, October 06, 2008

Revision Control

If you worked on software in collaboration with multiple developers, then you've probably used (or wished you used) some sort of revision control system. The Google Search Volume Index plot below suggests some trends surrounding the currently available tools.
(Note: by no means is this very scientific, due to the fact that people searching with these terms could have been searching for something entirely different.)

CVS, although huge in its time, is on the decline, while SVN, Git, and Mercurial are on the rise. I have used plenty of CVS and SVN to be ready for change. I am now using Git which I have really liked so far. If you have already been using SVN as I had, I would recommend the Git-SVN Crash course to get started quickly.

Thursday, October 02, 2008

Facebook growth rising past MySpace

From my local perspective Facebook has been on the rise --- I've noticed that many of my less computer savvy friends have now joined Facebook. I wondered if this trend was global, so I decided to investigate...

During the past few years MySpace has been the dominant social network, however, Facebook has continued to grow much quicker and is expected to become the leading social network. The first plot below (Figure 1) shows a comparison of searches for the keywords "facebook" and "myspace". Lately, for most of 2008, Facebook has been getting a little more attention in the news (lower portion of Figure 1) and has achieved a significantly higher search volume index.

Figure 1. Search Volume Index Comparison of 'facebook' and 'myspace'
(source: Google Trends)

Figure 2 shows the massive popularity of MySpace which began late in 2004, peaked in the middle of 2006, and has since declined --- possibly in part due to the rise of Facebook.

Figure 2. Search Volume Index of '' and ''
(source: Google Trends)

Finally, Figure 3 shows the number of daily unique visitors to Facebook as being more than that of MySpace as far back as November of 2007. (I'm not sure, but I would guess these figures to be based upon Google search result click-thrus)

Figure 3. Daily Unique Visitors of 'facebook' and 'myspace'
(source: Google Trends)

I find it very interesting to see how quickly social networks grow and evolve. As an aside, I think that Facebook is doing things more efficiently and currently providing a better service.

Wednesday, September 24, 2008

The Web Startup Group

Today, we founded the Web Startup Group to bring together people interested in creating new sites and services online. Group members include web developers (programmers and designers), marketing and business-minded individuals, creative idea people, and others with technology related skills. The group intends to meet regularly to discuss and make these ideas come to life.

If you are here at BYU and are interested in making a difference online then come join us!

Monday, September 22, 2008

Social Capital, Nan Lin

This is a great book that provides a sociologist's perspective on social capital.

Sunday, September 14, 2008

Google Releases a Browser

Google Chrome is the new Web browser that was just released for Windows. It is much faster, elegant, and easy to use than Firefox, IE, and Safari. After using it for a day, the I love everything about it except for the following:
  • It is not available for Mac yet (currently only for Windows)
  • Nifty plug-ins are not yet available (Firefox wins here)
I would predict that the above shortcomings will be quickly overcome and Google will end up having the dominant browser.  If interested, learn more about the features of Google Chrome or why they decided to build a new browser.  

Update: as you probably guessed, development of Mac and Linux versions is underway.  Sign up to be notified when Mac or Linux versions are ready for download. 

Google's Picassa 3 (Beta)

Picassa 3 is now available which has some pretty nice updates. Watch the video above to hear about what has been added. I just wish they had a Mac version available as it is superior to iPhoto. (Note: The current release is still Beta, which means that there will likely be a few minor bugs here and there.)

Saturday, August 02, 2008

Eclipse for Web Development

Well, I've been doing quite a bit of programming lately, however, it has been in a number of different languages. Naturally, I have considered Eclipse as a possible editor. Vanilla Eclipse is usually geared to toward Java development and can become tricky and tedious to setup for Web Development (hence the rise of Aptana). Eclipse can be great to work with, but only when you can get it working how you'd like. The following update sites may be helpful to get Eclipse setup how you hope to have it:

JavaScript (JSEclipse)

Python (PyDev)

PHP (PHPEclipse)

Java Tapestry (Maven2) (Jetty)

Currently, all of these plugins can be loaded into a single installation of Eclipse Europa. However, I'm not sure that they are all compatible with Ganymede (latest version of Eclipse).

Tuesday, July 08, 2008

Google Visualization API

Here is another great presentation from Google I/O.

Monetizing Social Application Traffic

This is a presentation from Google I/O that was done by a company called SocialMedia.

Saturday, May 24, 2008

Nifty Data Technique

Google Spreadsheets now added some nifty ways to auto-fill data. For instance, rather than typing all of the days or months, you can simply type two or three, select them, and then drag the little blue square in the bottom right corner of the selection. Then, the rest of the days or months will be populated below. That is nice, but what I think is much more interesting, is that you can click and drag holding while down Ctrl (Windows and Linux) or Option (Mac) to pull data from Google Sets. So, in the image below, I only filled in the first three rows of each column. Then, I used the former technique to auto-fill the first three columns and the latter technique (holding down Ctrl or Option) to auto-fill the extra twelve rows.

Nowadays, software developers, such as Google, have a great opportunity to utilize the ginormous pile of data available online. The data that individuals generate is ever increasing and can be extraordinarily useful.

Save R Plot in EPS format

Here is a code example of how to save an R plot in EPS (instead of PS):

yvalues = runif(50)
The variation is adding paper="special" and horizontal=FALSE.

Friday, May 16, 2008

Java Programming Notes

Java Programming Notes is a handy Java reference by Fred Swartz. In his words, he explains:
These Java programming notes are written to fill in missing or weak topics in textbooks that I've taught from. Many pages are useful for reference, but not as an ordered tutorial. Some pages are still rough drafts, but I'm slowly working on fixing them.

Monday, May 05, 2008

Walmart Visualization

Here is an interesting animation of Walmart Store growth overtime. Below is a snapshot of the movie in progress (1991).

Sunday, May 04, 2008

Virtual Host Setup

To add a virtual host on your local machine (running apache), do the following two things:

1. Add a virtual host definition to your apache configuration file, like this:

<VirtualHost *:80>
ServerName sitename
DocumentRoot "/location/of/your/site/"

2. Add a corresponding line to your HOSTS file (on my Mac, it is located at /etc/hosts). sitename

You should then be able to access your site in any Web browser by going to:


This then allows you to develop locally in an environment nearer to how it will likely be deployed.

Friday, May 02, 2008

Abstract classes and Interfaces

In response to some of the questions asked in class today, I compiled some properties of interfaces and abstract classes that should help guide your choice when deciding when to use an Abstract class or an Interface as a parent class.

Neither an Interface nor an Abstract class can be instantiated. Both can be used to as a template for concrete (implemented) child classes.

  • example interface definition:
public interface Monkey {
public double getWeight();
public void setWeight(double w);
public void walk();
public void talk();
  • fields (i.e., members, variables) are not allowed
  • all methods are implicitly abstract
  • a child class can implement many interfaces in Java
  • child classes must implement all methods
Abstract Classes
  • example abstract class definition:
public abstract class Monkey {
private double weight;
public Monkey(){
public double getWeight(){
return weight;
public void setWeight(double w){
weight = w;
public abstract void walk();
public abstract void talk();
  • may have members (e.g., weight)
  • may have implemented methods (e.g., getWeight, setWeight) and abstract methods (e.g., walk, talk)
  • a child class can only extend a single parent class in Java (multiple inheritance is not allowed)
  • child classes must implement all of the parent's abstract methods
Section 4.4 in Data Structures and Problem Solving in Java discusses this more extensively. 

This talks more about when you might use one, the other, or both. Furthermore, I found some questions and answers about the two that interviewers like to use. ;)

Java Tutorials

Sun provides some excellent tutorials that cover most aspects of programming in Java. Learning the Java language is a set of tutorials, or "trails", on the following fundamental topics:
The content of these trails is also available as a book, called The Java Tutorial, Fourth Edition.

Wednesday, April 23, 2008


ForwardTrack is an open source tool (now entirely written in php) that allows email campaigns to be tracked and mapped as they are forwarded from person to person. This is definitely useful as it reveals the spread of information and some of the underlying social network.

Political Campaign Contributions

The Federal Election Commission (FEC) requires that all campaign contributions over $200 (per donor) be reported publicly. The reported information includes the donor's name, job title, zip code, and even address. All of it, since 2001 is available electronically via FTP at

In collaboration with Political Scientists here at BYU we have been performing record linkage (aka. entity resolution) on this data, so that they will be able to more accurately perform their studies.

On a related note, has created an interesting mashup (shown below) that maps donors on a Google map colored by the party or candidate donated to. It also, reveals donor information and appears do do some coarse record linkage.

FEC Maps
Additionally, the FEC itself has started to produce maps both for the Presidential Election and House and Senate Elections.  The maps they provide aggregate the donated funds by state, party, and candidate.

Tuesday, April 22, 2008

Duncan Watts Downplays Viral Marketing Hype

A while back I quickly saw Clive Thompson's article entitled Is the Tipping Point Toast?, but didn't have the time to read it all nor investigate it any further --- until today.

Thompson's article pits Malcolm Gladwell's thesis (in The Tipping Point) against the recent research of Duncan Watts (cited below). I thought the article was well-written and adequately presented both sides of the issue. In short, Watt's claims that spending time and money marketing to influential individuals is no better than marketing to the masses.

Through all of this, Watts makes some important points such as (quoted from Thompson's article):
  • The problem of popular viral marketing talk is that it is "incredibly vague"; "how an influential actually influences is not explained." "Precision matters when trying to explain highly social epidemics"
  • "Influentials don't govern person-to-person communication. We all do."
    "Common sense is misleading"
  • Thompson writes that Watts found the "rank-and-file citizen [to be] far more likely to start a contagion"
So, today I finally took the time to learn more about Watts' recent research, available at Collective Dynamics Group website (at Columbia University) as a Working paper in the Papers section. Through the years, I had previously read some of Watts' work, so I was excited to see his recent findings. In this paper he presents an approach they call "Big Seed Marketing", which in essence combines a traditional mass marketing model with a viral propagation.

The idea that there is "no free lunch" in viral marketing is useful to point out, as "there are many more unsuccessful attempts that one never hears about." He also, points out that it is "hard, if not impossible" to predict which of attempts will succeed.

The take-home message in the conclusion is that effective marketing campaigns can be produced without identifying "influentials", but simply by adding a mechanism of peer-to-peer sharing to propagate the message. (As an aside, the formalism presented in the paper is useful for discussing the problem and easily evaluating the results.)

Watts makes some good points, however, I would still argue that people with high social capital (you might call "highly influential") can heighten the network effect. This is even evidenced in Duncan's paper --- as one of Tom Mauser's 'friend' was StopTheNRA, who, in turn sent a large email blast (Table 1, footnote 1). So, Tom Mauser, had a significant enough relationship with StopTheNRA that they used their resources (their large email list) to forward his message.

Although, there is an element of hype in the presentation of "Big Seed Marketing", I find it useful as it presents a nice way of making the issue sticky and bringing to light these more subtle points. The desired effect of propagating these ideas seems to be occurring.

Update (4/23): Podcast with Duncan Watts on Buzz Marketing (mp3)

Tuesday, April 15, 2008

Looking for a Job?

There are a lots of places to search for jobs online these days including:
An interesting approach to finding your next job might be to leverage your social connections to match you with a good employer with needs inline with your skills. Of course, as nice that sounds in theory, I would bet it could be challenging in practice.

Although, I won't be needing a full-time job for another couple years, it is always interesting to see what jobs are available (and what skills are in demand) by quickly searching on your skills and interests.

Tuesday, April 01, 2008

SIP Recap - Thursday

Here is a recap from the Social Information Processing Symposium:
  1. Brian Skyrms (UCI), Signaling Games: Some Dynamics of Evolution and Learning
  2. John Nicholson (USU), The Blind Leading the Blind: Toward Collaborative Online Route Information
  3. Cosma Shalizi (CMU), Social Media as Windows on the Social Life of the Mind
  4. Gustavo Glusman (Systems Biologist), Users, photos, groups, words: Analyzing mixed networks on Flickr
  5. Luc Steels (Vrije U), Social tagging in community memories
  6. Aram Galstyan (USC/ISI), Influence Propagation in Modular Networks
  7. Adam Anthony (UMBC), Generative Models for Clustering: The Next Generation
  8. Peter Pirolli (PARC), A Probabilistic Model of Semantics in Social Information Foraging
  9. Hak-Lae Kim (DERI), Building a Tag Sharing Service with the SCOT Ontology
  10. Yu Zhang (Zhejiang U), Mining Target Marketing Groups From Users' Web of Trust on Epinions
  11. Andrei Broder (Yahoo), Reviewing the Reviewers: Characerizing Biases and Competencies using Socially Meaningful Attributes (see Sihem Amer-Yahia)

The Wednesday talks were excellent. In particular, I really enjoyed:
  • The subtleties of the blind leading the blind (see 2 above)
  • Gustavo's unique way of analyzing Flickr relationships (see 4)
  • Adam Anthony's overview of generative models that can be used in clustering (see 7)
  • Pirolli's analysis of Lostpedia using LDA (see 8)
  • Hak-Lae Kim's tag aggregator application (see 9)
  • The use of socially meaningful attributes as presented by Yahoo's Andrei Broder (see 11)

Freemium Business Model

The freemium model is something that I've been telling start-up businesses to do for quite a while. It allows community --- where social capital resides --- to build around your service at a fundamental level. After a captive community has been established, premium features (or services) can be offered, which effectively converts social capital into profit.

Monday, March 31, 2008


At the SIP Symposium (at Stanford), there were some guys from the Freebase development crew. This was my first exposure to Freebase and I was intrigued with the idea. According to Kurt Bollacker (the Chief Scientist) the reason they were at the symposium was to "get people using our data". I also learned from him that they were VC funded and currently had about 60 developers.

They have built upon the existing data sources, such as Wikipedia, and have added structurally typed data to go along with it. The resulting data repository is then easily accessible via the Freebase API.

It'll be interesting to see what happens with it.

Wednesday, March 26, 2008

SIP Recap - Wednesday

I'm here in Palo Alto, California attending the AAAI Spring Symposium at Stanford. So far, the Social Information Processing Symposium has been very interesting and exciting. So far, I've met some people doing some neat research. Today's presentations were the following (I've added links to those I could find online):
  1. Bernardo Huberman (HP Labs), Social Dynamics in the Age of the Web
  2. Ed Chi (PARC), Augmented Social Cognition
  3. Tad Hogg (HP Labs), Solving the organizational free riding problem with social networks
  4. Riley Crane (ETH), Viral, Quality, and Junk Videos on YouTube: Separating Content From Noise in an Information-Rich Environment
  5. Yi-Ching Huang (NTU), You Are What You Tag
  6. Julia Stoyanovich (Columbia), Leveraging Tagging to Model User Interests in
  7. Steve Whittaker (Sheffield), Temporal Tagging: Implicit Behaviour Identifies Points of Interest in Complex Event
  8. Georg Groh, Implicit Social Network Construction and Expert User Determination in Web Portals
  9. Elizeu Santos-Neto, Content Reuse and Interest Sharing in Tagging Communities
  10. Matt Smith (BYU), Social Capital in the Blogosphere: A Case Study (this was our presentation, of course)
I enjoyed all of the presentations, in particular I liked Bernardo's address which covered a variety of interesting topics, Ed's comments, Riley's trend analysis, Julia's talk analyzing hotlist generation and tag, Steve's flamboyant presentation, Georg's work (as it had some thoughts related to our work on Implicit Affinity Networks).

(Oh, and I lost my cell phone today.)

I'm looking forward to another great day tomorrow!

Thursday, March 20, 2008

Firefox 3 Extensions

As Firefox 3 nears graduation from the Beta phase, many of the great extensions are finally becoming available. Here are links to the extensions that I find essential:
What extensions do you use?

Monday, March 17, 2008

Sunday, March 09, 2008

Emptying a File in the Terminal

Here is a handy trick that I find useful from time to time:

$ echo "here is some jibberish" > somefile.txt
$ cat somefile.txt
here is some jibberish
$ cat /dev/null > somefile.txt
$ cat somefile.txt

Essentially by calling "cat /dev/null > somefile.txt" we redirect nothing (/dev/null) to the file 'somefile.txt', which in effect empties it. I like to use this for emptying log files as it saves time recreating files and possibly resetting the right permissions.

If interested, you can read more on I/O redirection.

Thursday, February 28, 2008

Social Science Data

Amidst my search for social science data (to perform social capital experiments), I have discovered the following data:
  • Social Science Data on the Internet - seems to have lots of links to data, however the search is limited and results pages are clunky.
  • Social capital datasets - presents some information on data sources that are specifically related to investigating social capital.
  • INSNA - has the data used in Wasserman and Faust book on Social Network Analysis.
  • ICIPSR - Inter-university consortium for political and social research
  • SDA - web-based software available for accessing much of the social science data. Additionally, I noticed some archived data available.
There appears to be plenty of data.  Now on to the task of filtering it down to the most relevant...

Wednesday, February 27, 2008

Social Network Graphing Tools

Here is a write-up on various social network graphing tools that I posted on the Data Mining Lab wiki: Social Network Graphing Tools

Most recently, I have enjoyed using Cytoscape and the Network Workbench Tool for larger networks and GraphViz for small example networks.

What tools do you like to use?

Tuesday, February 26, 2008

Google Chatback

Google just released a new gadget that will allow people visiting your blog or website to chat with you (via Google Talk). It is like the live chat and call-back buttons that have been available previously. To set it up, all you need to do is add the chatback badge code to your blog. Then, it'll look something like this...

Monday, February 25, 2008

Quick Mac and Windows Keyboard Switching

Since Mac OS X doesn't save profiles for multiple keyboards (which I hope will change in subsequent versions), I had to find a solution for quickly changing between a Windows style keyboard and a Mac style keyboard. After spending some time with Google, I found some AppleScript that seems to provide an adequate solution. For convenience, I made two scripts: one called keyboard_win.scpt that switches the modifier keys on a Windows style keyboard to behave Mac-like, and one called keyboard_mac.scpt that restores the default keyboard behavior for a Mac style keyboard.

(1) keyboard_win.scpt:
(2) keyboard_mac.scpt:
Hopefully, this will save you some time.

Thursday, February 21, 2008

Mining "Stories" from Blogs

I just finished reading a paper called, Mining Blog Stories Using Community-Based and Temporal Clustering, which I found very interesting as it is extremely related to the work that we have been doing. As I read, I had the following thoughts and questions.

  • It is refreshing to discover others that are pursuing similar research paths as us.
  • Their work gives more fuel to the approaches we have been taking.
  • Some of the formalism is elegant and useful (particularly, for talking about blogs and entries), however, some of it gets cumbersome (the lookup table is definitely necessary).
  • Using Lucene library to index the data is an idea that we could consider (we have been storing the data in a MySQL database of our own make)
  • I'd be curious to know how many degrees of separation they crawled from their seed blogs? I would guess that they did not get too far since the blogs in our study appear to be less sparse on average. They reported over 2,000 blogs having 1 million entries (on average, 50 entires per blog per month). In Social Capital in the Blogosphere we retrieved blog content just two degrees away from Scoble (our single seed blog) and obtained over 38,000 blogs having 13 million entries (on average, 28.5 entries per blog per month).
  • How did they perform blog entity resolution? (here is an approach that we used)
(Note: Thanks to both Christophe who sent me the paper and Robbie Haertel for sending it to Christophe --- Social capital in action. ;) )


The Bandwidth Place is a nice site that tests the bandwidth available with you current network connection.

Escaping JavaScript and PHP

Here are some references for encoding and decoding in JavaScript and PHP, so that passing information via ajax or urls can be done cleanly.

The Art of Web - useful testing area for JavaScript and PHP.
W3Schools JavaScript Reference - just JavaScript

What is Uplift Modeling?

Uplift modeling is an approach to predicting the incremental impact a marketing campaign has on a customer through controlled experimentation. It measures the variation in the difference between a treated group and a control group segmenting customers into the following groups:

1. Those that buy only when treated
2. Those that would buy or not buy regardless of whether or not they were treated
3. Those that do not buy when treated, but do buy when not treated

I've have yet to use Uplift Modeling, but it sounds rather interesting.

Would you like to read more?
DM Review article
Using Control Groups to Target on Predicted Lift

Wednesday, February 20, 2008

Star Rating Widget

Here is a rating widget and how to use it. It can be used for collecting ratings from website visitors. Typically, it will be used for five-star rating, but you can pass it any number, so that it can be used as an n-star rating widget.