Monday, August 07, 2006

Personalized Marketing

Personalized marketing as a four phase process:

  1. identifying potential customers

  2. determining their needs and their lifetime value to the company

  3. interact with customers so as to learn about them

  4. customize products, services, and communications to individual customers


From Wikipedia, “Personalized marketing,” (Cited: Peppers, D. and Rogers, M. 1993)

Wednesday, June 28, 2006

Dimensionality Reduction Notes

Principal Components Analysis (PCA)

How do you choose how many and which eigenvalues/eigenvectors to use?


Kaiser Criterion
This says to retain only factors with eigenvalues greater than 1. In other words, if a factor does not extract at least as much as the equivalent of one original variable then it is discarded. This criterion is named after Kaiser as he proposed it in 1960. It seems used quite frequently.

The Scree Test
This is a graphical test used to decide how many factors to keep. To perform this test, first, plot the eigenvalues in decreasing order. Next, Cattell suggests to find the place where the smooth decrease of eigenvalues appears to level off (to the right) similar geological scree (loose rock debris at the bottom of a rocky slope).

Here are some other useful terms and definitions from the dictionary:

Multicollinearity refers to linear inter-correlation among variables. Simply put, if nominally "different" measures actually quantify the same phenomenon to a significant degree -- i.e., wherein the variables are accorded different names and perhaps employ different numeric measurement scales but correlate highly with each other -- they are redundant.

Friday, June 23, 2006

K-Means Clustering

Basic Algorithm
1. Choose k cluster centers at random
2. Assign each point to nearest cluster center
3. Compute the new cluster centers based on the assigned points
4. Repeat until cluster centers converge

Shortcomings
Finds local minima
The random placement of cluster centers affects the outcome

Here is a nice K-Means Demo

Friday, March 17, 2006

Google and data

Here is an interesting article about Google and all their data.

Tuesday, January 31, 2006

Friday, January 20, 2006

Networked Data File Types

Here are reference links to common network data file types used in Link Mining and Social Network Analysis:
Pajek Net File
UCINet DL Files and VNA