Today, I put together a web page topic tool by using the web service that Nathan Davis made available. The tool takes in one or more web pages (i.e., a list of URLs) and then extracts the topics given the text on the web pages. The topics, or more accurately, the most likely topic components are extracted using an algorithm called Latent Dirichlet Allocation (LDA).
One potential use is for quickly generating blogger profiles to be used for implicit affinity networks. You can try it out at:
http://dml.cs.byu.edu/matthewsmith/tools/topictool/
Nathan uses his web service to make the query expansion service for Google searches called GooEgg.
This blog focuses on the relationships that connect us together providing potent insights for decision makers. In addition, a few data mining topics are presented.
Tuesday, January 29, 2008
Thursday, January 24, 2008
Social Capital Measurement
There are countless methods for measuring social capital. Below is a list of papers, studies, and resources that discuss some of these techniques:
- Paldam, M. (2000), Social Capital: One Or Many? Definition And Measurement, in Journal of Economic Surveys, 14, 5, pp. 629-653.
- Stone, W. (2001), Measuring Social Capital, Melbourne, Australian Institute of Family Studies, Research Paper No. 24/2001.
- Li, Y, Pickles, A., Savage, M. (2003), Conceptualizing and measuring social capital: a new approach, Paper for BHPS 2003, Centre for Census and Survey Research, University of Manchester.
- Grootaert, C., van Bastelaer, T. (2002), Understanding and Measuring Social Capital: A Multi-Disciplinary Tool for Practitioners, Washington DC, The World Bank.
- Bullen, P., Onyx, J. (2000), Measuring Social Capital in Five Communities in NSW, Onyx and Bullen - Journal of Applied Behavior Science, Vol 36 No 1 March 2000 pp23-42. (Summary Report)
- More Readings on Measuring Social Capital
- Measuring social capital within health surveys: key issues
- Social capital and value creation: The role of intrafirm networks
- Observations on social capital
- Measurement of Social Capital (added on 1/29/08)
Wednesday, January 23, 2008
DML Research Collaboration
In effort to collaborate and refine our research at a faster pace, it might be fun to (1) blog about our research ideas each week and then (2) visit each lab member's blog to comment on what they have written to encourage good ideas and weed out the bad.
What do you think?
What do you think?
Tuesday, January 22, 2008
Weekly Update
The tasks that I was involved (and some thoughts) during this week include the following:
Outlined a preliminary schedule and brainstormed for the "Making a Blog Big" project with Nathan Purser
Read more about Structural Holes and Network Closure, including:
Outlined a preliminary schedule and brainstormed for the "Making a Blog Big" project with Nathan Purser
Read more about Structural Holes and Network Closure, including:
- Ronald Burt's Structural Holes versus Network Closure (PDF). Notes:
- Some more quickly become prominent; some enjoy higher incomes; some lead more important projects; the interests of some are better served than interests of others. Better connected people enjoy higher returns. (These are some things that we could use to validate our social capital metrics.)
- The Social Capital Metaphor is "people that are better connected do better", whereas the Human Capital Metaphor is "people who do better are more able individuals; they are more intelligent; more attractive; more articulate; more skilled."
- A generic research finding in sociology and social psychology is that information circulates more within than between groups
- Social capital can be argued to exist in structural holes (bridging) and network closures (bonding). However, the studies he performs and shows indicate that structural holes are the source of social capital.
- Networks of densely interconnected contacts are systematically associated with substandard performance (bonding); networks that span structural holes are associated with creativity and innovation, positive evaluations, early promotions, high compensation and profits (bridging).
- There remains an important role for closure. It can be critical to realizing the value buried in structural holes.
- The mechanisms remain distinct. Closure describes how dense or hierarchical networks lower the risk associated with transaction and trust, which can be associated with performance. The hole argument describes how structural holes are opportunities to add value with brokerage across the holes, which is associated with performance.
- Bruce Hoppe's blog post, Reputation and Trust (aka "Network Closure")
- Uses a good personal example of Network Closure
- I commented, "Virtual communities, no doubt, have an impact on real communities. However, I would argue that Putnam's work stands as does Burt's. The main issue seems to be that social capital is challenging to define precisely. Are there any agreed upon mathematical definitions of social capital? (It seems that there is still too much ambiguity.)"
- Nan Lin's Building a Network Theory of Social Capital (PDF)
- Unpacking Burt's Redundancy Measure
- Measurement of Social Capital
- Institute for Social Network Analysis of the Economy (ISNAE)
- Social Network Researchers By Author (at ISNAE)
- Introduction to Networks Mathematics by Bruce Hoppe
- Bruce Hoppe's blog
- Social Network Researchers:
Monday, January 21, 2008
Top Social Networking and Blogging Sites
Nielsen Online released the top Social Networking sites (ranked by unique audience), as follows:
MySpace still leads the pack this year, yet Facebook grew at an astounding rate of 72%, proving to be the "hottest" of the Top 5.
Top Blogging sites were as follows, as of December 2007:
Google backed Blogger continues to have and add the most unique audience.
For more information, try this or either of these related reports.
Top Blogging sites were as follows, as of December 2007:
For more information, try this or either of these related reports.
Finding an Important Problem
There seems to be infinitely many problems in the world. The trick is finding one that is sufficiently interesting to focus a dissertation on. The hope is that it has the following qualities:
- Should impact a diverse audience (more than just the geeks in the computer lab)
- Should be scientific yet have a host of business applications
- Would be nice if it were related, at least in part, to my previous work
- To be continued...
Wednesday, January 16, 2008
Generating Research
As I am amidst the decision of what to focus my PhD work on, my thoughts take me back to a talk given nearly a year ago by Dan Olsen. In this presentation he mentioned the following steps for generating research:
- Find an important problem
- Generate lots of ideas
- Filter them down
- Make them real
Wednesday, December 12, 2007
What is a Blog?
Much or our recent work has been focused on the Blogosphere, which refers to the growing social network of people that write blogs, or web logs. We explore the social capital found in these networks of bloggers. Here is a little screencast that explains what blogs are and their relevance in today's world.
Tuesday, December 04, 2007
Javascript Word Jumper
Here is a little JavaScript I wrote that can be used to move the cursor between words within a input fields.
Saturday, December 01, 2007
WITS '07 in Montreal December 8-9
The Seventeenth Annual Workshop on Information Technologies and Systems (WITS'07) will be held next weekend in Montreal, Canada. Despite the fact that Montreal is a very cold place, above average temperatures are expected on the days of the workshop (the white line on the plot below indicates the average temperature), which will be nice.

The workshop will be covering the following topics:

The workshop will be covering the following topics:
- Social Networks
- Security, Privacy and Risk Management
- Collaboration and Blogging
- Learning and Classification
- Software Strategies
- Knowledge Management
- Ecommerce Systems
- Workflow and Business Process Management
- Recommender Systems
- Data Management
- Modeling and Evaluation of IS
- Data Mining
Monday, November 26, 2007
Becoming More Meaningful
Since 2004, I've been posting occasionally to this blog --- however much of the content has been inaccessible for many visitors. This problem is evident in the low number of people that have commented on previous posts. ;)
In the future, I plan to make posts more meaningful and accessible for readers that have no special knowledge of the issue under discussion. This will give an opportunity for anyone to contribute and provide insight through comments. I should hope to increase the number of comments attached with future posts, including this one. :)
In the future, I plan to make posts more meaningful and accessible for readers that have no special knowledge of the issue under discussion. This will give an opportunity for anyone to contribute and provide insight through comments. I should hope to increase the number of comments attached with future posts, including this one. :)
Thursday, October 11, 2007
Methods to Identify Cell Architecture and Dynamics
Notes and summary from Chapter 11 of Szallasi:
Regulatory networks consist of both network topology and dynamics. The term "network" emphasizes the topology, whereas the term "regulatory" emphasizes the dynamic interactions within the network, also called kinetics.
Experimental data can be used to capture both the topology, or architecture, and the dynamics of a cell. In the preceding chapter (and my corresponding write-up) data acquisition was discussed, whereas this chapter (and this write-up) focuses on using this data to model the topology and dynamics.
Szallasi mentions that, "engineering approaches have been instrumental in the reverse engineering effort". Reverse engineering or network inference in this context refers to identification of cellular networks from experiments. Various approaches are discussed in this chapter including Bayesian networks, iterative modeling, dynamic flux balance analysis, and
The reverse engineering of cellular networks is very complex. The kinetics/dynamics within the network are changing in time, which makes modeling incredibly challenging. The interactions change in complex manner that are often difficult to model by collecting data.
Szallasi identifies three challenges (within reverse engineering cellular networks) that will allow efficient and accurate dynamical modeling of networks:
(i) to improve the signal-to-noise ratio in the measurements
(ii) to develop new tools for measuring the cellular concentrations, fluxes, and interactions in both space and time
(iii) to incorporate model-based design of experiment protocol
Regulatory networks consist of both network topology and dynamics. The term "network" emphasizes the topology, whereas the term "regulatory" emphasizes the dynamic interactions within the network, also called kinetics.
Experimental data can be used to capture both the topology, or architecture, and the dynamics of a cell. In the preceding chapter (and my corresponding write-up) data acquisition was discussed, whereas this chapter (and this write-up) focuses on using this data to model the topology and dynamics.
Szallasi mentions that, "engineering approaches have been instrumental in the reverse engineering effort". Reverse engineering or network inference in this context refers to identification of cellular networks from experiments. Various approaches are discussed in this chapter including Bayesian networks, iterative modeling, dynamic flux balance analysis, and
The reverse engineering of cellular networks is very complex. The kinetics/dynamics within the network are changing in time, which makes modeling incredibly challenging. The interactions change in complex manner that are often difficult to model by collecting data.
Szallasi identifies three challenges (within reverse engineering cellular networks) that will allow efficient and accurate dynamical modeling of networks:
(i) to improve the signal-to-noise ratio in the measurements
(ii) to develop new tools for measuring the cellular concentrations, fluxes, and interactions in both space and time
(iii) to incorporate model-based design of experiment protocol
Tuesday, October 09, 2007
Biological Data Acquisition
My thoughts and comments are in between the points, labeled A-D, that Szallasi discusses in chapter 10 of "System Modeling in Cellular Biology":
A) "The overall size and complexity of intracellular networks"
From my experience working with social networks, I easily accept this to be a challenge. The complexity of a network grows quickly as the number of possible interactions grows
quadratically as each node is added.
In biological systems the complexity abounds as scientists have created models for everything form human social behavior on down to the an atom's behavior. Szallasi discusses the modular approach to Systems Biology and that can alleviate some of this complexity. He mentions that estimating the size of intracellular networks is the number of active genes in a given cell. (Of course, the number of genes is slightly in flux as new research continues to be done.)
B) "The general principles of biological measurements --- their technical and conceptual limitations."
When working with data there are always limitations based on your measurements. In some cases we often are not even measuring the right thing. Other times we are limited by the precision of our measurement devices.
I find it quite interesting that measurement devices often do not (and sometimes cannot) account for various environmental changes, which cause irregularities in the collected data.
C) "Concentration measurement versus kinetic parameter measurements"
Parameter estimation is important to get right, yet it is extremely challenging. In fact, both measurement techniques build upon the uncertainty acquired through data measurement.
D) "The actual target of the measurements"
Biological data can analyzed and measured using individual-centric or community-centric approaches, each of which are useful for in their own right. In each view, different problems exist and different interactions occur. Researchers should take great care to justify which measurements is being used to be how they are being viewed.
In conclusion, researchers must understand and accept the many limitations of biological data. As with any research, assumptions must be made and justified.
A) "The overall size and complexity of intracellular networks"
From my experience working with social networks, I easily accept this to be a challenge. The complexity of a network grows quickly as the number of possible interactions grows
quadratically as each node is added.
In biological systems the complexity abounds as scientists have created models for everything form human social behavior on down to the an atom's behavior. Szallasi discusses the modular approach to Systems Biology and that can alleviate some of this complexity. He mentions that estimating the size of intracellular networks is the number of active genes in a given cell. (Of course, the number of genes is slightly in flux as new research continues to be done.)
B) "The general principles of biological measurements --- their technical and conceptual limitations."
When working with data there are always limitations based on your measurements. In some cases we often are not even measuring the right thing. Other times we are limited by the precision of our measurement devices.
I find it quite interesting that measurement devices often do not (and sometimes cannot) account for various environmental changes, which cause irregularities in the collected data.
C) "Concentration measurement versus kinetic parameter measurements"
Parameter estimation is important to get right, yet it is extremely challenging. In fact, both measurement techniques build upon the uncertainty acquired through data measurement.
D) "The actual target of the measurements"
Biological data can analyzed and measured using individual-centric or community-centric approaches, each of which are useful for in their own right. In each view, different problems exist and different interactions occur. Researchers should take great care to justify which measurements is being used to be how they are being viewed.
In conclusion, researchers must understand and accept the many limitations of biological data. As with any research, assumptions must be made and justified.
Thursday, October 04, 2007
Subscribe to:
Comments (Atom)