Showing posts with label data. Show all posts
Showing posts with label data. Show all posts

Friday, November 28, 2008

Blog Posts Increasing and Cyclic


During the span of a year, the over 200 blogs aggregated above show very cyclic behavior. It shows very prominently that these bloggers post significantly more posts during the week rather than on the weekends.

Additionally, as time went on, these blogs as a group posted more frequently.

I would guess that the sink during December was caused by the Christmas holiday.

Saturday, May 24, 2008

Nifty Data Technique

Google Spreadsheets now added some nifty ways to auto-fill data. For instance, rather than typing all of the days or months, you can simply type two or three, select them, and then drag the little blue square in the bottom right corner of the selection. Then, the rest of the days or months will be populated below. That is nice, but what I think is much more interesting, is that you can click and drag holding while down Ctrl (Windows and Linux) or Option (Mac) to pull data from Google Sets. So, in the image below, I only filled in the first three rows of each column. Then, I used the former technique to auto-fill the first three columns and the latter technique (holding down Ctrl or Option) to auto-fill the extra twelve rows.

Nowadays, software developers, such as Google, have a great opportunity to utilize the ginormous pile of data available online. The data that individuals generate is ever increasing and can be extraordinarily useful.


Save R Plot in EPS format

Here is a code example of how to save an R plot in EPS (instead of PS):
postscript(file="testplot.eps",
paper="special",
width=10,
height=10,
horizontal=FALSE)

yvalues = runif(50)
plot(yvalues)

dev.off()
The variation is adding paper="special" and horizontal=FALSE.

Monday, May 05, 2008

Walmart Visualization

Here is an interesting animation of Walmart Store growth overtime. Below is a snapshot of the movie in progress (1991).

Wednesday, April 23, 2008

Political Campaign Contributions

The Federal Election Commission (FEC) requires that all campaign contributions over $200 (per donor) be reported publicly. The reported information includes the donor's name, job title, zip code, and even address. All of it, since 2001 is available electronically via FTP at ftp://ftp.fec.gov/FEC/electronic/.

In collaboration with Political Scientists here at BYU we have been performing record linkage (aka. entity resolution) on this data, so that they will be able to more accurately perform their studies.

Fundrace
On a related note, fundrace.org has created an interesting mashup (shown below) that maps donors on a Google map colored by the party or candidate donated to. It also, reveals donor information and appears do do some coarse record linkage.


FEC Maps
Additionally, the FEC itself has started to produce maps both for the Presidential Election and House and Senate Elections.  The maps they provide aggregate the donated funds by state, party, and candidate.

Monday, March 31, 2008

Freebase

At the SIP Symposium (at Stanford), there were some guys from the Freebase development crew. This was my first exposure to Freebase and I was intrigued with the idea. According to Kurt Bollacker (the Chief Scientist) the reason they were at the symposium was to "get people using our data". I also learned from him that they were VC funded and currently had about 60 developers.

They have built upon the existing data sources, such as Wikipedia, and have added structurally typed data to go along with it. The resulting data repository is then easily accessible via the Freebase API.

It'll be interesting to see what happens with it.

Thursday, February 28, 2008

Social Science Data

Amidst my search for social science data (to perform social capital experiments), I have discovered the following data:
  • Social Science Data on the Internet - seems to have lots of links to data, however the search is limited and results pages are clunky.
  • Social capital datasets - presents some information on data sources that are specifically related to investigating social capital.
  • INSNA - has the data used in Wasserman and Faust book on Social Network Analysis.
  • ICIPSR - Inter-university consortium for political and social research
  • SDA - web-based software available for accessing much of the social science data. Additionally, I noticed some archived data available.
There appears to be plenty of data.  Now on to the task of filtering it down to the most relevant...

Monday, February 04, 2008

Social Graph API by Google

Google's Social Graph API allows developers to utilize the public connections among people on the web. The idea is simple, yet it could make it easier for people to connect across sites. Of course, the data all comes through Google, which yet increases our dependence on them. No doubt, other search engines could easily create the same API.

Since Google already saves a copy of all of the web pages that it spiders for search, the task of extracting the annotated links is somewhat trivial. Currently, this will only work as web developers annotate user links with XHTML Friends Network (XFN) and Friend of a Friend (FOAF). It is a great idea, but may take some time before web developers start annotating.
Furthermore, the easy access to people's connections is a nice data source for some new applications and experiments.

Friday, March 17, 2006

Google and data

Here is an interesting article about Google and all their data.

Thursday, November 11, 2004

Data Mining Dataset & Model Repositories

UCI KDD Archive
The central repository for data mining datasets.

ML UCI Repository
The central ML machine learning repository

PMML Sample
Models
various PMML models for some of the commonly used datasets
(such as Iris, Voting, and Elnino).