Well, I've been doing quite a bit of programming lately, however, it has been in a number of different languages. Naturally, I have considered Eclipse as a possible editor. Vanilla Eclipse is usually geared to toward Java development and can become tricky and tedious to setup for Web Development (hence the rise of Aptana). Eclipse can be great to work with, but only when you can get it working how you'd like. The following update sites may be helpful to get Eclipse setup how you hope to have it:
JavaScript
http://download.macromedia.com/pub/labs/jseclipse/autoinstall (JSEclipse)
Python
http://pydev.sourceforge.net/updates/ (PyDev)
PHP
http://update.phpeclipse.net/update/nightly (PHPEclipse)
Java Tapestry
http://m2eclipse.sonatype.org/update (Maven2)
http://jettylauncher.sourceforge.net/updates (Jetty)
Currently, all of these plugins can be loaded into a single installation of Eclipse Europa. However, I'm not sure that they are all compatible with Ganymede (latest version of Eclipse).
This blog focuses on the relationships that connect us together providing potent insights for decision makers. In addition, a few data mining topics are presented.
Saturday, August 02, 2008
Tuesday, July 08, 2008
Saturday, May 24, 2008
Nifty Data Technique
Google Spreadsheets now added some nifty ways to auto-fill data. For instance, rather than typing all of the days or months, you can simply type two or three, select them, and then drag the little blue square in the bottom right corner of the selection. Then, the rest of the days or months will be populated below. That is nice, but what I think is much more interesting, is that you can click and drag holding while down Ctrl (Windows and Linux) or Option (Mac) to pull data from Google Sets. So, in the image below, I only filled in the first three rows of each column. Then, I used the former technique to auto-fill the first three columns and the latter technique (holding down Ctrl or Option) to auto-fill the extra twelve rows.

Nowadays, software developers, such as Google, have a great opportunity to utilize the ginormous pile of data available online. The data that individuals generate is ever increasing and can be extraordinarily useful.

Nowadays, software developers, such as Google, have a great opportunity to utilize the ginormous pile of data available online. The data that individuals generate is ever increasing and can be extraordinarily useful.
Save R Plot in EPS format
Here is a code example of how to save an R plot in EPS (instead of PS):
postscript(file="testplot.eps",
paper="special",
width=10,
height=10,
horizontal=FALSE)
yvalues = runif(50)
plot(yvalues)
dev.off()
The variation is adding paper="special" and horizontal=FALSE.
Friday, May 16, 2008
Java Programming Notes
Java Programming Notes is a handy Java reference by Fred Swartz. In his words, he explains:
These Java programming notes are written to fill in missing or weak topics in textbooks that I've taught from. Many pages are useful for reference, but not as an ordered tutorial. Some pages are still rough drafts, but I'm slowly working on fixing them.
Monday, May 05, 2008
Walmart Visualization
Here is an interesting animation of Walmart Store growth overtime. Below is a snapshot of the movie in progress (1991).
Sunday, May 04, 2008
Virtual Host Setup
To add a virtual host on your local machine (running apache), do the following two things:
1. Add a virtual host definition to your apache configuration file, like this:
2. Add a corresponding line to your HOSTS file (on my Mac, it is located at /etc/hosts).
You should then be able to access your site in any Web browser by going to:
This then allows you to develop locally in an environment nearer to how it will likely be deployed.
1. Add a virtual host definition to your apache configuration file, like this:
<VirtualHost *:80>
ServerName sitename
DocumentRoot "/location/of/your/site/"
</VirtualHost>
2. Add a corresponding line to your HOSTS file (on my Mac, it is located at /etc/hosts).
127.0.0.1 sitenameYou should then be able to access your site in any Web browser by going to:
http://sitenameThis then allows you to develop locally in an environment nearer to how it will likely be deployed.
Friday, May 02, 2008
Abstract classes and Interfaces
In response to some of the questions asked in class today, I compiled some properties of interfaces and abstract classes that should help guide your choice when deciding when to use an Abstract class or an Interface as a parent class.
Neither an Interface nor an Abstract class can be instantiated. Both can be used to as a template for concrete (implemented) child classes.
Interfaces
This talks more about when you might use one, the other, or both. Furthermore, I found some questions and answers about the two that interviewers like to use. ;)
Neither an Interface nor an Abstract class can be instantiated. Both can be used to as a template for concrete (implemented) child classes.
Interfaces
- example interface definition:
public interface Monkey {
public double getWeight();
public void setWeight(double w);
public void walk();
public void talk();
}- fields (i.e., members, variables) are not allowed
- all methods are implicitly abstract
- a child class can implement many interfaces in Java
- child classes must implement all methods
- example abstract class definition:
public abstract class Monkey {
private double weight;
public Monkey(){
}
public double getWeight(){
return weight;
}
public void setWeight(double w){
weight = w;
}
public abstract void walk();
public abstract void talk();
}- may have members (e.g., weight)
- may have implemented methods (e.g., getWeight, setWeight) and abstract methods (e.g., walk, talk)
- a child class can only extend a single parent class in Java (multiple inheritance is not allowed)
- child classes must implement all of the parent's abstract methods
Section 4.4 in Data Structures and Problem Solving in Java discusses this more extensively.
Java Tutorials
Sun provides some excellent tutorials that cover most aspects of programming in Java. Learning the Java language is a set of tutorials, or "trails", on the following fundamental topics:
The content of these trails is also available as a book, called The Java Tutorial, Fourth Edition.
The content of these trails is also available as a book, called The Java Tutorial, Fourth Edition.
Wednesday, April 23, 2008
ForwardTrack
ForwardTrack is an open source tool (now entirely written in php) that allows email campaigns to be tracked and mapped as they are forwarded from person to person. This is definitely useful as it reveals the spread of information and some of the underlying social network.
Political Campaign Contributions
The Federal Election Commission (FEC) requires that all campaign contributions over $200 (per donor) be reported publicly. The reported information includes the donor's name, job title, zip code, and even address. All of it, since 2001 is available electronically via FTP at ftp://ftp.fec.gov/FEC/electronic/.
In collaboration with Political Scientists here at BYU we have been performing record linkage (aka. entity resolution) on this data, so that they will be able to more accurately perform their studies.
Fundrace
On a related note, fundrace.org has created an interesting mashup (shown below) that maps donors on a Google map colored by the party or candidate donated to. It also, reveals donor information and appears do do some coarse record linkage.

FEC Maps
Additionally, the FEC itself has started to produce maps both for the Presidential Election and House and Senate Elections. The maps they provide aggregate the donated funds by state, party, and candidate.

In collaboration with Political Scientists here at BYU we have been performing record linkage (aka. entity resolution) on this data, so that they will be able to more accurately perform their studies.
Fundrace
On a related note, fundrace.org has created an interesting mashup (shown below) that maps donors on a Google map colored by the party or candidate donated to. It also, reveals donor information and appears do do some coarse record linkage.

FEC Maps
Additionally, the FEC itself has started to produce maps both for the Presidential Election and House and Senate Elections. The maps they provide aggregate the donated funds by state, party, and candidate.

Tuesday, April 22, 2008
Duncan Watts Downplays Viral Marketing Hype
A while back I quickly saw Clive Thompson's article entitled Is the Tipping Point Toast?, but didn't have the time to read it all nor investigate it any further --- until today.
Thompson's article pits Malcolm Gladwell's thesis (in The Tipping Point) against the recent research of Duncan Watts (cited below). I thought the article was well-written and adequately presented both sides of the issue. In short, Watt's claims that spending time and money marketing to influential individuals is no better than marketing to the masses.
Through all of this, Watts makes some important points such as (quoted from Thompson's article):
The idea that there is "no free lunch" in viral marketing is useful to point out, as "there are many more unsuccessful attempts that one never hears about." He also, points out that it is "hard, if not impossible" to predict which of attempts will succeed.
The take-home message in the conclusion is that effective marketing campaigns can be produced without identifying "influentials", but simply by adding a mechanism of peer-to-peer sharing to propagate the message. (As an aside, the formalism presented in the paper is useful for discussing the problem and easily evaluating the results.)
Watts makes some good points, however, I would still argue that people with high social capital (you might call "highly influential") can heighten the network effect. This is even evidenced in Duncan's paper --- as one of Tom Mauser's 'friend' was StopTheNRA, who, in turn sent a large email blast (Table 1, footnote 1). So, Tom Mauser, had a significant enough relationship with StopTheNRA that they used their resources (their large email list) to forward his message.
Although, there is an element of hype in the presentation of "Big Seed Marketing", I find it useful as it presents a nice way of making the issue sticky and bringing to light these more subtle points. The desired effect of propagating these ideas seems to be occurring.
Update (4/23): Podcast with Duncan Watts on Buzz Marketing (mp3)
Thompson's article pits Malcolm Gladwell's thesis (in The Tipping Point) against the recent research of Duncan Watts (cited below). I thought the article was well-written and adequately presented both sides of the issue. In short, Watt's claims that spending time and money marketing to influential individuals is no better than marketing to the masses.
Through all of this, Watts makes some important points such as (quoted from Thompson's article):
- The problem of popular viral marketing talk is that it is "incredibly vague"; "how an influential actually influences is not explained." "Precision matters when trying to explain highly social epidemics"
- "Influentials don't govern person-to-person communication. We all do."
"Common sense is misleading" - Thompson writes that Watts found the "rank-and-file citizen [to be] far more likely to start a contagion"
The idea that there is "no free lunch" in viral marketing is useful to point out, as "there are many more unsuccessful attempts that one never hears about." He also, points out that it is "hard, if not impossible" to predict which of attempts will succeed.
The take-home message in the conclusion is that effective marketing campaigns can be produced without identifying "influentials", but simply by adding a mechanism of peer-to-peer sharing to propagate the message. (As an aside, the formalism presented in the paper is useful for discussing the problem and easily evaluating the results.)
Watts makes some good points, however, I would still argue that people with high social capital (you might call "highly influential") can heighten the network effect. This is even evidenced in Duncan's paper --- as one of Tom Mauser's 'friend' was StopTheNRA, who, in turn sent a large email blast (Table 1, footnote 1). So, Tom Mauser, had a significant enough relationship with StopTheNRA that they used their resources (their large email list) to forward his message.
Although, there is an element of hype in the presentation of "Big Seed Marketing", I find it useful as it presents a nice way of making the issue sticky and bringing to light these more subtle points. The desired effect of propagating these ideas seems to be occurring.
Update (4/23): Podcast with Duncan Watts on Buzz Marketing (mp3)
Tuesday, April 15, 2008
Looking for a Job?
There are a lots of places to search for jobs online these days including:
- SimplyHired - easy to use job search
- HotJobs - Yahoo's job site
- Monster - you've probably heard of Monster
- Indeed - newer site that allows you to filter by salary (for instance, here is an example salary search on computer science and data mining)
- LinkedIn - use your social network to find your next job
- erecruiting - for new college graduates
An interesting approach to finding your next job might be to leverage your social connections to match you with a good employer with needs inline with your skills. Of course, as nice that sounds in theory, I would bet it could be challenging in practice.
Although, I won't be needing a full-time job for another couple years, it is always interesting to see what jobs are available (and what skills are in demand) by quickly searching on your skills and interests.
Tuesday, April 01, 2008
SIP Recap - Thursday
Here is a recap from the Social Information Processing Symposium:
The Wednesday talks were excellent. In particular, I really enjoyed:
- Brian Skyrms (UCI), Signaling Games: Some Dynamics of Evolution and Learning
- John Nicholson (USU), The Blind Leading the Blind: Toward Collaborative Online Route Information
Cosma Shalizi (CMU), Social Media as Windows on the Social Life of the Mind- Gustavo Glusman (Systems Biologist), Users, photos, groups, words: Analyzing mixed networks on Flickr
- Luc Steels (Vrije U), Social tagging in community memories
- Aram Galstyan (USC/ISI), Influence Propagation in Modular Networks
- Adam Anthony (UMBC), Generative Models for Clustering: The Next Generation
- Peter Pirolli (PARC), A Probabilistic Model of Semantics in Social Information Foraging
- Hak-Lae Kim (DERI), int.ere.st: Building a Tag Sharing Service with the SCOT Ontology
- Yu Zhang (Zhejiang U), Mining Target Marketing Groups From Users' Web of Trust on Epinions
- Andrei Broder (Yahoo), Reviewing the Reviewers: Characerizing Biases and Competencies using Socially Meaningful Attributes (see Sihem Amer-Yahia)
The Wednesday talks were excellent. In particular, I really enjoyed:
- The subtleties of the blind leading the blind (see 2 above)
- Gustavo's unique way of analyzing Flickr relationships (see 4)
- Adam Anthony's overview of generative models that can be used in clustering (see 7)
- Pirolli's analysis of Lostpedia using LDA (see 8)
- Hak-Lae Kim's tag aggregator application (see 9)
- The use of socially meaningful attributes as presented by Yahoo's Andrei Broder (see 11)
Subscribe to:
Comments (Atom)