Showing posts with label data mining. Show all posts
Showing posts with label data mining. Show all posts

Thursday, October 02, 2008

Facebook growth rising past MySpace

From my local perspective Facebook has been on the rise --- I've noticed that many of my less computer savvy friends have now joined Facebook. I wondered if this trend was global, so I decided to investigate...

During the past few years MySpace has been the dominant social network, however, Facebook has continued to grow much quicker and is expected to become the leading social network. The first plot below (Figure 1) shows a comparison of searches for the keywords "facebook" and "myspace". Lately, for most of 2008, Facebook has been getting a little more attention in the news (lower portion of Figure 1) and has achieved a significantly higher search volume index.

Figure 1. Search Volume Index Comparison of 'facebook' and 'myspace'
(source: Google Trends)


Figure 2 shows the massive popularity of MySpace which began late in 2004, peaked in the middle of 2006, and has since declined --- possibly in part due to the rise of Facebook.

Figure 2. Search Volume Index of 'facebook.com' and 'myspace.com'
(source: Google Trends)

Finally, Figure 3 shows the number of daily unique visitors to Facebook as being more than that of MySpace as far back as November of 2007. (I'm not sure, but I would guess these figures to be based upon Google search result click-thrus)

Figure 3. Daily Unique Visitors of 'facebook' and 'myspace'
(source: Google Trends)

I find it very interesting to see how quickly social networks grow and evolve. As an aside, I think that Facebook is doing things more efficiently and currently providing a better service.

Wednesday, April 23, 2008

Political Campaign Contributions

The Federal Election Commission (FEC) requires that all campaign contributions over $200 (per donor) be reported publicly. The reported information includes the donor's name, job title, zip code, and even address. All of it, since 2001 is available electronically via FTP at ftp://ftp.fec.gov/FEC/electronic/.

In collaboration with Political Scientists here at BYU we have been performing record linkage (aka. entity resolution) on this data, so that they will be able to more accurately perform their studies.

Fundrace
On a related note, fundrace.org has created an interesting mashup (shown below) that maps donors on a Google map colored by the party or candidate donated to. It also, reveals donor information and appears do do some coarse record linkage.


FEC Maps
Additionally, the FEC itself has started to produce maps both for the Presidential Election and House and Senate Elections.  The maps they provide aggregate the donated funds by state, party, and candidate.

Friday, September 16, 2005

Customer Segmentation

Customer analysis helps a business better meet customer needs. Learning more about your customers is often benefited by intelligent segmentation. Customers can be segmented into a variety of groups. These segments can be based on behavioural, demographic, geographic, and psychographic variables. In fact customers can be segmented by any combination of these variables. Through viewing customers within such segments the problem of identifying and serving customers is simplifed. The knowledge provided by these segments is usually useful for determining actionable marketing tactics.

Tuesday, September 13, 2005

Stanford Data Mining Course

Stanford offers a nice Data Mining and Electronic Business course within the Statistics department. It looks like it covers many exciting aspects of the field.

Thursday, July 28, 2005

What is Lift?

In data mining, "lift" is often used to measure model performance. Here is a link to an article that explains how it is used: DMReview article

Wednesday, March 02, 2005

Stages of Knowledge Discovery in Websites


|------------------
|--------------------| 3. PERSONALIZATION
|---------------------| 2. Advanced Web Mining
| 1. Clickstream Analysis

Wednesday, January 19, 2005

Thursday, January 06, 2005

Knowledge Discovery Approaches

Data Mining enables us to automatically sift through mass amounts of data to discover KNOWLEDGE. A couple popular approaches are summarized below:

  • Identify customer groups and forecast their behaviour. This is commonly used in Marketing, fraud detection, and more and more frequently on the web for various purposes.

  • Market basket analysis: "If customer bought product P, he or she
    is likely to buy product Q and R" (Amazon.com uses this approach)
  • Tuesday, November 23, 2004

    Data Mining Interest Continues To Rise

    University
    leads data mine plan (NEWS.com.au - Australia)


    "The University of Technology Sydney is trying to establish a $38
    million data mining center of excellence to involve universities,
    industry and government..."

    It is interesting to see how important Data Mining is becoming.

    Thursday, November 11, 2004

    Data Mining Dataset & Model Repositories

    UCI KDD Archive
    The central repository for data mining datasets.

    ML UCI Repository
    The central ML machine learning repository

    PMML Sample
    Models
    various PMML models for some of the commonly used datasets
    (such as Iris, Voting, and Elnino).

    DMG - PMML

    PMML Specs
    "Predictive Model Markup Language (PMML) is an XML-based language
    which provides a quick and easy way for companies to define predictive
    models and share models between compliant vendors' applications."

    It is sponsored by the Data Mining Group (DMG).

    Thursday, October 28, 2004

    Data Mining in the news...

    Uncle Sam is Watching You
    This article talks about various ways in which the government uses data mining.

    Data-Mining
    research ($600,000 funded by Carnegie Mellon) that will that will be
    used to create software for "for discovering, visualizing and
    exploring significant patterns across large collections of full-text
    humanities resources in digital libraries and collections." The
    project is titled: "Web-based Text-Mining and Visualization for
    Humanities Digital Libraries."

    Oracle(R) Data Mining Recognized as a Leader by Independent ...

    Latest Version of SPSS Data Mining Workbench Enhances Integration ...

    Saturday, September 18, 2004

    Meta Learning (METAL)

    These past couple days I have been browsing the Internet and reading more about Data Mining while focusing on Meta Learning. I have posted links to some of the documents that I thought interesting. Unfortunately the main MetaL-KDD website (http://www.metal-kdd.org) is down so I cannot read what is available there. So I've been googling and reading what else is available today.

    http://www.statsoft.com/textbook/stdatmin.html#meta
    Discusses basic concepts about Data Mining and Meta Learning

    http://www.kdnuggets.com/websites/data-mining.html
    List of Data Mining and Knowledge Discovery (KD) Websites

    http://www.fedstats.gov/
    The gateway to statistics from over 100 U.S. Federal agencies

    Weka Metal (Meta Learning Extension for Weka)

    http://www.cs.bris.ac.uk/Publications/pub_by_author.jsp?id=12799

    References for Christophe Giraud-Carrier
    http://www.scd.ucar.edu/hps/GROUPS/dm/dm.html

    Data Mining Resources (somewhat outdated)
    UCL Data Mining
    Protein Structure Analysis and Modeling (not sure what this is)

    Web site navigation...
    http://www.dcs.bbk.ac.uk/~mark/download/besttrail.pdf
    http://citeseer.ist.psu.edu/levene03navigating.html
    PDF version of the citation above