Principal Components Analysis (PCA)
How do you choose how many and which eigenvalues/eigenvectors to use?
Kaiser Criterion
This says to retain only factors with eigenvalues greater than 1. In other words, if a factor does not extract at least as much as the equivalent of one original variable then it is discarded. This criterion is named after Kaiser as he proposed it in 1960. It seems used quite frequently.
The Scree Test
This is a graphical test used to decide how many factors to keep. To perform this test, first, plot the eigenvalues in decreasing order. Next, Cattell suggests to find the place where the smooth decrease of eigenvalues appears to level off (to the right) similar geological scree (loose rock debris at the bottom of a rocky slope).
Here are some other useful terms and definitions from the dictionary:
Multicollinearity refers to linear inter-correlation among variables. Simply put, if nominally "different" measures actually quantify the same phenomenon to a significant degree -- i.e., wherein the variables are accorded different names and perhaps employ different numeric measurement scales but correlate highly with each other -- they are redundant.
This blog focuses on the relationships that connect us together providing potent insights for decision makers. In addition, a few data mining topics are presented.
Showing posts with label machine learning. Show all posts
Showing posts with label machine learning. Show all posts
Wednesday, June 28, 2006
Friday, June 23, 2006
K-Means Clustering
Basic Algorithm
1. Choose k cluster centers at random
2. Assign each point to nearest cluster center
3. Compute the new cluster centers based on the assigned points
4. Repeat until cluster centers converge
Shortcomings
Finds local minima
The random placement of cluster centers affects the outcome
Here is a nice K-Means Demo
1. Choose k cluster centers at random
2. Assign each point to nearest cluster center
3. Compute the new cluster centers based on the assigned points
4. Repeat until cluster centers converge
Shortcomings
Finds local minima
The random placement of cluster centers affects the outcome
Here is a nice K-Means Demo
Friday, December 09, 2005
Machine Learning Topics
Particle Swarm Optimization
wikipedia
Swarm Intelligence
Ant Algorithms
ant colony optimization
Reinforcement Learning
wikipedia
Q-learning
Q-learning definition
Markov decision process
Computational Learning Theory
wikipedia
VC dimension
Principle of maximum entropy
Ensembles, Bagging and Boosting
Boosting
Meta-Learning
METAL KDD
Christophe Giraud-Carrier
HMMs
Hidden Markov model
wikipedia
Swarm Intelligence
Ant Algorithms
ant colony optimization
Reinforcement Learning
wikipedia
Q-learning
Q-learning definition
Markov decision process
Computational Learning Theory
wikipedia
VC dimension
Principle of maximum entropy
Ensembles, Bagging and Boosting
Boosting
Meta-Learning
METAL KDD
Christophe Giraud-Carrier
HMMs
Hidden Markov model
Wednesday, March 02, 2005
Stages of Knowledge Discovery in Websites
|------------------
|--------------------| 3. PERSONALIZATION
|---------------------| 2. Advanced Web Mining
| 1. Clickstream Analysis
Labels:
data mining,
machine learning,
personalization,
web mining
Thursday, November 11, 2004
Data Mining Dataset & Model Repositories
UCI KDD Archive
The central repository for data mining datasets.
ML UCI Repository
The central ML machine learning repository
PMML Sample
Models various PMML models for some of the commonly used datasets
(such as Iris, Voting, and Elnino).
The central repository for data mining datasets.
ML UCI Repository
The central ML machine learning repository
PMML Sample
Models various PMML models for some of the commonly used datasets
(such as Iris, Voting, and Elnino).
Wednesday, October 13, 2004
Tuesday, October 05, 2004
ID3 - Machine Learning
I've been studying and coding up the ID3 classification algorithm.
When I get a free moment I'll be reading Text and Web Mining papers.
When I get a free moment I'll be reading Text and Web Mining papers.
Saturday, September 18, 2004
Meta Learning (METAL)
These past couple days I have been browsing the Internet and reading more about Data Mining while focusing on Meta Learning. I have posted links to some of the documents that I thought interesting. Unfortunately the main MetaL-KDD website (http://www.metal-kdd.org) is down so I cannot read what is available there. So I've been googling and reading what else is available today.
http://www.statsoft.com/textbook/stdatmin.html#meta
Discusses basic concepts about Data Mining and Meta Learning
http://www.kdnuggets.com/websites/data-mining.html
List of Data Mining and Knowledge Discovery (KD) Websites
http://www.fedstats.gov/
The gateway to statistics from over 100 U.S. Federal agencies
Weka Metal (Meta Learning Extension for Weka)
http://www.cs.bris.ac.uk/Publications/pub_by_author.jsp?id=12799
References for Christophe Giraud-Carrier
http://www.scd.ucar.edu/hps/GROUPS/dm/dm.html
Data Mining Resources (somewhat outdated)
UCL Data Mining
Protein Structure Analysis and Modeling (not sure what this is)
Web site navigation...
http://www.dcs.bbk.ac.uk/~mark/download/besttrail.pdf
http://citeseer.ist.psu.edu/levene03navigating.html
PDF version of the citation above
http://www.statsoft.com/textbook/stdatmin.html#meta
Discusses basic concepts about Data Mining and Meta Learning
http://www.kdnuggets.com/websites/data-mining.html
List of Data Mining and Knowledge Discovery (KD) Websites
http://www.fedstats.gov/
The gateway to statistics from over 100 U.S. Federal agencies
Weka Metal (Meta Learning Extension for Weka)
http://www.cs.bris.ac.uk/Publications/pub_by_author.jsp?id=12799
References for Christophe Giraud-Carrier
http://www.scd.ucar.edu/hps/GROUPS/dm/dm.html
Data Mining Resources (somewhat outdated)
UCL Data Mining
Protein Structure Analysis and Modeling (not sure what this is)
Web site navigation...
http://www.dcs.bbk.ac.uk/~mark/download/besttrail.pdf
http://citeseer.ist.psu.edu/levene03navigating.html
PDF version of the citation above
Subscribe to:
Comments (Atom)