Wednesday, September 22, 2004

Papers that I'm reading

Low-Complexity Fuzzy Relational Clustering Algorithms for Web Mining (PDF)

Introduction to Data Mining and Knowledge Discovery (PDF)

Saturday, September 18, 2004

Meta Learning (METAL)

These past couple days I have been browsing the Internet and reading more about Data Mining while focusing on Meta Learning. I have posted links to some of the documents that I thought interesting. Unfortunately the main MetaL-KDD website ( is down so I cannot read what is available there. So I've been googling and reading what else is available today.
Discusses basic concepts about Data Mining and Meta Learning
List of Data Mining and Knowledge Discovery (KD) Websites
The gateway to statistics from over 100 U.S. Federal agencies

Weka Metal (Meta Learning Extension for Weka)

References for Christophe Giraud-Carrier

Data Mining Resources (somewhat outdated)
UCL Data Mining
Protein Structure Analysis and Modeling (not sure what this is)

Web site navigation...
PDF version of the citation above

Wednesday, September 15, 2004

Data Mining

Talked with Dr. Giraud-Carrier about the LDSM data and some other interesting projects associated with the following links:

I also read the following article which gives a nice overview of Data Mining:


Monday, September 13, 2004

More Testing

Without using the switches (-mx and -oss), I was able to use 91,080 KB before running out of memory (weka.jarjava.lang.OutOfMemoryError). I ran Weka using the following:
C:\Program Files\Weka-3-4>java -jar

With using the switches (-mx and -oss), I was able to use 123,804 KB before again running out of memory (weka.jarjava.lang.OutOfMemoryError). I ran Weka using the following:
C:\Program Files\Weka-3-4>java -mx100000000 -oss100000000 -jar

Even though I was able to use up to 123,804 KB (~ 121 MB) before running out of memory, it isn't sufficient to produce the results that I would like (By the way, I've been running on a machine with 1GB of RAM).

I have attempted various methods in order to stay within memory limits. As I'm somewhat unsure of what I'm mining for, I have selected attributes that seem merely seem most interesting to me. For instance, I removed every column except for the mission and state. I then ran j48 and it succeeded! The tree visualization, however, wasn't very impressive since it on had these two attributes.

It has been discouraging to be constantly running out of memory.

I clustered the complete dataset and found nothing very interesting. The KMeans clustering algorithm didn't require much memory.

I created a complete E-R Diagram of the LDSM database. I'd like to meet with Dr. Giraud-Carrier, an experienced data-miner, and talk about the diagram and determine what more (if anything) would be interesting to run on the data. I'd like to eliminate useless attribute columns so that I can achieve some interesting results before exhausting the memory.

Saturday, September 11, 2004

Running Weka

I think this may be the best way to run Weka having more memory, even though I still ran out of memory...I believe it allocated more. This is what I did:

> java -mx100000000 -oss100000000 -jar weka.jar

Still trying to learn a best approach to this. I think as I ran it I was able to use more memory, but it didn't allocate 100000000K of memory, which is what I requested.

Friday, September 10, 2004

Data Mining

I have setup Weka and begun working with my LDSM data set. The LDSM dataset is about 15000 records and has a number of attributes. I want to analyze it with Weka and see if I can find anything interesting. I have setup Weka on two computers, but java has run out of memory on both. I believe it is due to the amount of memory that has been allocated to Java, not how much memory each of the computers has.
Attempts at running the suggested command (java -mx100000000 -oss100000000) have been unsucessful thus far.

According to the website the command is used as follows:
Specify the maximum size, in bytes, of the memory allocation pool. This value must a multiple of 1024 greater than 2MB. Append the letter k or K to indicate kilobytes, or m or M to indicate megabytes. The default value is 64MB.

I may simply need to make the number a multiple of 1024. I'll try that next...nope it didn't work.

I tried the following:
C:\>java -Xmx 80m
Invalid maximum heap size: -Xmx
Could not create the Java virtual machine.

C:\>java -Xmx83886080
Usage: java [-options] class [args...]
(to execute a class)
or java [-options] -jar jarfile [args...]
(to execute a jar file)

where options include:
-client to select the "client" VM
-server to select the "server" VM