From my local perspective Facebook has been on the rise --- I've noticed that many of my less computer savvy friends have now joined Facebook. I wondered if this trend was global, so I decided to investigate...
During the past few years MySpace has been the dominant social network, however, Facebook has continued to grow much quicker and is expected to become the leading social network. The first plot below (Figure 1) shows a comparison of searches for the keywords "facebook" and "myspace". Lately, for most of 2008, Facebook has been getting a little more attention in the news (lower portion of Figure 1) and has achieved a significantly higher search volume index.
Figure 2 shows the massive popularity of MySpace which began late in 2004, peaked in the middle of 2006, and has since declined --- possibly in part due to the rise of Facebook.
Finally, Figure 3 shows the number of daily unique visitors to Facebook as being more than that of MySpace as far back as November of 2007. (I'm not sure, but I would guess these figures to be based upon Google search result click-thrus)
I find it very interesting to see how quickly social networks grow and evolve. As an aside, I think that Facebook is doing things more efficiently and currently providing a better service.
This blog focuses on the relationships that connect us together providing potent insights for decision makers. In addition, a few data mining topics are presented.
Showing posts with label data mining. Show all posts
Showing posts with label data mining. Show all posts
Thursday, October 02, 2008
Wednesday, April 23, 2008
Political Campaign Contributions
The Federal Election Commission (FEC) requires that all campaign contributions over $200 (per donor) be reported publicly. The reported information includes the donor's name, job title, zip code, and even address. All of it, since 2001 is available electronically via FTP at ftp://ftp.fec.gov/FEC/electronic/.
In collaboration with Political Scientists here at BYU we have been performing record linkage (aka. entity resolution) on this data, so that they will be able to more accurately perform their studies.
Fundrace
On a related note, fundrace.org has created an interesting mashup (shown below) that maps donors on a Google map colored by the party or candidate donated to. It also, reveals donor information and appears do do some coarse record linkage.

FEC Maps
Additionally, the FEC itself has started to produce maps both for the Presidential Election and House and Senate Elections. The maps they provide aggregate the donated funds by state, party, and candidate.

In collaboration with Political Scientists here at BYU we have been performing record linkage (aka. entity resolution) on this data, so that they will be able to more accurately perform their studies.
Fundrace
On a related note, fundrace.org has created an interesting mashup (shown below) that maps donors on a Google map colored by the party or candidate donated to. It also, reveals donor information and appears do do some coarse record linkage.

FEC Maps
Additionally, the FEC itself has started to produce maps both for the Presidential Election and House and Senate Elections. The maps they provide aggregate the donated funds by state, party, and candidate.

Friday, September 16, 2005
Customer Segmentation
Customer analysis helps a business better meet customer needs. Learning more about your customers is often benefited by intelligent segmentation. Customers can be segmented into a variety of groups. These segments can be based on behavioural, demographic, geographic, and psychographic variables. In fact customers can be segmented by any combination of these variables. Through viewing customers within such segments the problem of identifying and serving customers is simplifed. The knowledge provided by these segments is usually useful for determining actionable marketing tactics.
Tuesday, September 13, 2005
Stanford Data Mining Course
Stanford offers a nice Data Mining and Electronic Business course within the Statistics department. It looks like it covers many exciting aspects of the field.
Thursday, July 28, 2005
What is Lift?
In data mining, "lift" is often used to measure model performance. Here is a link to an article that explains how it is used: DMReview article
Tuesday, June 14, 2005
Exploring Bayesian Methods
Bayesian methods can be used to deal with uncertainty.
Here are some links that help to explore the area:
Bayesian Inference
Empirical Bayes
Hierarchal Bayes
Bayesian Network
Bayes' theorem
Statistics Topics
Expected Value
Likelihood
Mean
Variance
Mean Squared Error (MSE)
Posterior Probability
Conditional, Joint, and Marginal Probability
Utility Functions (Link 2)
Distributions
Normal
Gamma
Poisson
Beta
Binomial
Conjugate Prior
Other Related Topics
Markov Chain Monte Carlo (MCMC)
Simulated Annealing
Tabu Search (Link 2)
Kalman Filter (Link 2)
Particle Filter
Directed Acyclic Graphs (DAG)
Markovian Random Field (MRF)
EM Algorithm (Bayesian Structural EM - Friedman)
Reading List of Bayesian Methods
Helpful Software
JavaBayes
Graphviz
Useful Java Libraries
Colt
Tomato
Wednesday, March 02, 2005
Stages of Knowledge Discovery in Websites
|------------------
|--------------------| 3. PERSONALIZATION
|---------------------| 2. Advanced Web Mining
| 1. Clickstream Analysis
Labels:
data mining,
machine learning,
personalization,
web mining
Saturday, February 05, 2005
Data Mining Researchers
Rakesh Agrawal
Surajit Chaudhuri
Umesh Dayal
Max J. Egenhofer
Usama Fayyad (Microsoft)
Christophe Giraud-Carrier
Jiawei Han
Daniel Keim
Hans-Peter Kriegel
Yike Guo
Laks V.S. Lakshmanan
Hongjun Lu
Alberto Mendelzon
Raymond T. Ng
Tamer Ozsu
Rajeev Rastogi
Ken Ross
Sunita Sarawagi
Wei-Min Shen
Kyuseok Shim
Avi Silberschatz
Matt Smith
Jaideep Srivastava
Philip S. Yu
Clement Yu
Jeffrey D. Ullman
Ke Wang
Osmar Zaiane
Surajit Chaudhuri
Umesh Dayal
Max J. Egenhofer
Usama Fayyad (Microsoft)
Christophe Giraud-Carrier
Jiawei Han
Daniel Keim
Hans-Peter Kriegel
Yike Guo
Laks V.S. Lakshmanan
Hongjun Lu
Alberto Mendelzon
Raymond T. Ng
Tamer Ozsu
Rajeev Rastogi
Ken Ross
Sunita Sarawagi
Wei-Min Shen
Kyuseok Shim
Avi Silberschatz
Matt Smith
Jaideep Srivastava
Philip S. Yu
Clement Yu
Jeffrey D. Ullman
Ke Wang
Osmar Zaiane
Wednesday, January 19, 2005
Data Mining Resources
Data Mining Resources (@ www.scd.ucar.edu)
Data Mining Resources (@ www.cs.purdue.edu)
Data Mining Resources (Zillman's List)
Data Mining Resources (@ www.cs.purdue.edu)
Data Mining Resources (Zillman's List)
Thursday, January 06, 2005
Knowledge Discovery Approaches
Data Mining enables us to automatically sift through mass amounts of data to discover KNOWLEDGE. A couple popular approaches are summarized below:
Identify customer groups and forecast their behaviour. This is commonly used in Marketing, fraud detection, and more and more frequently on the web for various purposes.
Market basket analysis: "If customer bought product P, he or she
is likely to buy product Q and R" (Amazon.com uses this approach)
is likely to buy product Q and R" (Amazon.com uses this approach)
Tuesday, November 23, 2004
Data Mining Interest Continues To Rise
University
leads data mine plan (NEWS.com.au - Australia)
"The University of Technology Sydney is trying to establish a $38
million data mining center of excellence to involve universities,
industry and government..."
It is interesting to see how important Data Mining is becoming.
leads data mine plan (NEWS.com.au - Australia)
"The University of Technology Sydney is trying to establish a $38
million data mining center of excellence to involve universities,
industry and government..."
It is interesting to see how important Data Mining is becoming.
Friday, November 19, 2004
Papers
Repository of publications related to Data Mining
Includes interesting papers such as Efficient
algorithms for creating product catalogs, Selective
Markov Models for Predicting Web-Page Accesses, Web
Page Categorization and Feature Selection Using Association Rule and
Principal Component Clustering, and some others that look
interesting.
Includes interesting papers such as Efficient
algorithms for creating product catalogs, Selective
Markov Models for Predicting Web-Page Accesses, Web
Page Categorization and Feature Selection Using Association Rule and
Principal Component Clustering, and some others that look
interesting.
Thursday, November 11, 2004
Data Mining Dataset & Model Repositories
UCI KDD Archive
The central repository for data mining datasets.
ML UCI Repository
The central ML machine learning repository
PMML Sample
Models various PMML models for some of the commonly used datasets
(such as Iris, Voting, and Elnino).
The central repository for data mining datasets.
ML UCI Repository
The central ML machine learning repository
PMML Sample
Models various PMML models for some of the commonly used datasets
(such as Iris, Voting, and Elnino).
DMG - PMML
PMML Specs
"Predictive Model Markup Language (PMML) is an XML-based language
which provides a quick and easy way for companies to define predictive
models and share models between compliant vendors' applications."
It is sponsored by the Data Mining Group (DMG).
"Predictive Model Markup Language (PMML) is an XML-based language
which provides a quick and easy way for companies to define predictive
models and share models between compliant vendors' applications."
It is sponsored by the Data Mining Group (DMG).
Thursday, October 28, 2004
Data Mining in the news...
Uncle Sam is Watching You
This article talks about various ways in which the government uses data mining.
Data-Mining
research ($600,000 funded by Carnegie Mellon) that will that will be
used to create software for "for discovering, visualizing and
exploring significant patterns across large collections of full-text
humanities resources in digital libraries and collections." The
project is titled: "Web-based Text-Mining and Visualization for
Humanities Digital Libraries."
Oracle(R) Data Mining Recognized as a Leader by Independent ...
Latest Version of SPSS Data Mining Workbench Enhances Integration ...
This article talks about various ways in which the government uses data mining.
Data-Mining
research ($600,000 funded by Carnegie Mellon) that will that will be
used to create software for "for discovering, visualizing and
exploring significant patterns across large collections of full-text
humanities resources in digital libraries and collections." The
project is titled: "Web-based Text-Mining and Visualization for
Humanities Digital Libraries."
Oracle(R) Data Mining Recognized as a Leader by Independent ...
Latest Version of SPSS Data Mining Workbench Enhances Integration ...
Saturday, September 18, 2004
Meta Learning (METAL)
These past couple days I have been browsing the Internet and reading more about Data Mining while focusing on Meta Learning. I have posted links to some of the documents that I thought interesting. Unfortunately the main MetaL-KDD website (http://www.metal-kdd.org) is down so I cannot read what is available there. So I've been googling and reading what else is available today.
http://www.statsoft.com/textbook/stdatmin.html#meta
Discusses basic concepts about Data Mining and Meta Learning
http://www.kdnuggets.com/websites/data-mining.html
List of Data Mining and Knowledge Discovery (KD) Websites
http://www.fedstats.gov/
The gateway to statistics from over 100 U.S. Federal agencies
Weka Metal (Meta Learning Extension for Weka)
http://www.cs.bris.ac.uk/Publications/pub_by_author.jsp?id=12799
References for Christophe Giraud-Carrier
http://www.scd.ucar.edu/hps/GROUPS/dm/dm.html
Data Mining Resources (somewhat outdated)
UCL Data Mining
Protein Structure Analysis and Modeling (not sure what this is)
Web site navigation...
http://www.dcs.bbk.ac.uk/~mark/download/besttrail.pdf
http://citeseer.ist.psu.edu/levene03navigating.html
PDF version of the citation above
http://www.statsoft.com/textbook/stdatmin.html#meta
Discusses basic concepts about Data Mining and Meta Learning
http://www.kdnuggets.com/websites/data-mining.html
List of Data Mining and Knowledge Discovery (KD) Websites
http://www.fedstats.gov/
The gateway to statistics from over 100 U.S. Federal agencies
Weka Metal (Meta Learning Extension for Weka)
http://www.cs.bris.ac.uk/Publications/pub_by_author.jsp?id=12799
References for Christophe Giraud-Carrier
http://www.scd.ucar.edu/hps/GROUPS/dm/dm.html
Data Mining Resources (somewhat outdated)
UCL Data Mining
Protein Structure Analysis and Modeling (not sure what this is)
Web site navigation...
http://www.dcs.bbk.ac.uk/~mark/download/besttrail.pdf
http://citeseer.ist.psu.edu/levene03navigating.html
PDF version of the citation above
Subscribe to:
Comments (Atom)


