Social Capital in Networks: September 2004

Wednesday, September 22, 2004

Papers that I'm reading

Low-Complexity Fuzzy Relational Clustering Algorithms for Web Mining (PDF)

Introduction to Data Mining and Knowledge Discovery (PDF)

Monday, September 20, 2004

Web Mining Links

Web Mining
http://www.cs.umbc.edu/~ajoshi/web-mine/
http://www.cs.utexas.edu/users/pebronia/text-mining/
http://www-2.cs.cmu.edu/afs/cs/project/theo-3/www/
http://www.cs.ualberta.ca/~tszhu/webmining.htm
http://filebox.vt.edu/users/wfan/text_mining.html
http://www.kdnuggets.com/
http://www.kddresearch.org/
CMU World Wide Knowledge Base (Web->KB) project

Web Mining Software
http://www.kdnuggets.com/software/web.html
http://www-ai.cs.uni-dortmund.de/SOFTWARE/YALE/download.html

Saturday, September 18, 2004

Meta Learning (METAL)

These past couple days I have been browsing the Internet and reading more about Data Mining while focusing on Meta Learning. I have posted links to some of the documents that I thought interesting. Unfortunately the main MetaL-KDD website (http://www.metal-kdd.org) is down so I cannot read what is available there. So I've been googling and reading what else is available today.

http://www.statsoft.com/textbook/stdatmin.html#meta
Discusses basic concepts about Data Mining and Meta Learning

http://www.kdnuggets.com/websites/data-mining.html
List of Data Mining and Knowledge Discovery (KD) Websites

http://www.fedstats.gov/
The gateway to statistics from over 100 U.S. Federal agencies

Weka Metal (Meta Learning Extension for Weka)

http://www.cs.bris.ac.uk/Publications/pub_by_author.jsp?id=12799

References for Christophe Giraud-Carrier
http://www.scd.ucar.edu/hps/GROUPS/dm/dm.html

Data Mining Resources (somewhat outdated)
UCL Data Mining
Protein Structure Analysis and Modeling (not sure what this is)

Web site navigation...
http://www.dcs.bbk.ac.uk/~mark/download/besttrail.pdf
http://citeseer.ist.psu.edu/levene03navigating.html
PDF version of the citation above

Wednesday, September 15, 2004

Talked with Dr. Giraud-Carrier about the LDSM data and some other interesting projects associated with the following links:
http://www.metal-kdd.org
http://www.ai.univie.ac.at/oefai/ml/metal/metal-bib.html

I also read the following article which gives a nice overview of Data Mining:

Article

Tuesday, September 14, 2004

DB Schema

DB Schema (text file)

Monday, September 13, 2004

More Testing

Without using the switches (-mx and -oss), I was able to use 91,080 KB before running out of memory (weka.jarjava.lang.OutOfMemoryError). I ran Weka using the following:
C:\Program Files\Weka-3-4>java -jar

With using the switches (-mx and -oss), I was able to use 123,804 KB before again running out of memory (weka.jarjava.lang.OutOfMemoryError). I ran Weka using the following:
C:\Program Files\Weka-3-4>java -mx100000000 -oss100000000 -jar

Even though I was able to use up to 123,804 KB (~ 121 MB) before running out of memory, it isn't sufficient to produce the results that I would like (By the way, I've been running on a machine with 1GB of RAM).

I have attempted various methods in order to stay within memory limits. As I'm somewhat unsure of what I'm mining for, I have selected attributes that seem merely seem most interesting to me. For instance, I removed every column except for the mission and state. I then ran j48 and it succeeded! The tree visualization, however, wasn't very impressive since it on had these two attributes.

It has been discouraging to be constantly running out of memory.

I clustered the complete dataset and found nothing very interesting. The KMeans clustering algorithm didn't require much memory.

I created a complete E-R Diagram of the LDSM database. I'd like to meet with Dr. Giraud-Carrier, an experienced data-miner, and talk about the diagram and determine what more (if anything) would be interesting to run on the data. I'd like to eliminate useless attribute columns so that I can achieve some interesting results before exhausting the memory.

Saturday, September 11, 2004

Running Weka

I think this may be the best way to run Weka having more memory, even though I still ran out of memory...I believe it allocated more. This is what I did:

> java -mx100000000 -oss100000000 -jar weka.jar

Still trying to learn a best approach to this. I think as I ran it I was able to use more memory, but it didn't allocate 100000000K of memory, which is what I requested.

Friday, September 10, 2004

Data Mining

I have setup Weka and begun working with my LDSM data set. The LDSM dataset is about 15000 records and has a number of attributes. I want to analyze it with Weka and see if I can find anything interesting. I have setup Weka on two computers, but java has run out of memory on both. I believe it is due to the amount of memory that has been allocated to Java, not how much memory each of the computers has.

http://www.cs.waikato.ac.nz/~ml/weka/tips_and_tricks.html
Attempts at running the suggested command (java -mx100000000 -oss100000000) have been unsucessful thus far.

According to the java.sun.com website the command is used as follows:

-Xmxn

Specify the maximum size, in bytes, of the memory allocation pool. This value must a multiple of 1024 greater than 2MB. Append the letter k or K to indicate kilobytes, or m or M to indicate megabytes. The default value is 64MB.

      -Xmx83886080

     -Xmx81920k

     -Xmx80m

I may simply need to make the number a multiple of 1024. I'll try that next...nope it didn't work.

I tried the following:
C:\>java -Xmx 80m
Invalid maximum heap size: -Xmx
Could not create the Java virtual machine.

C:\>java -Xmx83886080
Usage: java [-options] class [args...]
(to execute a class)
or java [-options] -jar jarfile [args...]
(to execute a jar file)

where options include:
-client to select the "client" VM
-server to select the "server" VM
... {MERELY PRINTED OUT USAGE INFORMATION}

Social Capital in Networks

Wednesday, September 22, 2004

Papers that I'm reading

Monday, September 20, 2004

Web Mining Links

Saturday, September 18, 2004

Meta Learning (METAL)

Wednesday, September 15, 2004

Data Mining

Tuesday, September 14, 2004

DB Schema

Monday, September 13, 2004

More Testing

Saturday, September 11, 2004

Running Weka

Friday, September 10, 2004

Data Mining

Followers

Blog Archive

SiteSays Feed

Labels

Shared Articles