Social Capital in Networks: 2007

Wednesday, December 12, 2007

What is a Blog?

Much or our recent work has been focused on the Blogosphere, which refers to the growing social network of people that write blogs, or web logs. We explore the social capital found in these networks of bloggers. Here is a little screencast that explains what blogs are and their relevance in today's world.

Tuesday, December 04, 2007

Javascript Word Jumper

Here is a little JavaScript I wrote that can be used to move the cursor between words within a input fields.

Saturday, December 01, 2007

WITS '07 in Montreal December 8-9

The Seventeenth Annual Workshop on Information Technologies and Systems (WITS'07) will be held next weekend in Montreal, Canada. Despite the fact that Montreal is a very cold place, above average temperatures are expected on the days of the workshop (the white line on the plot below indicates the average temperature), which will be nice.

The workshop will be covering the following topics:

Social Networks
Security, Privacy and Risk Management
Collaboration and Blogging
Learning and Classification
Software Strategies
Knowledge Management
Ecommerce Systems
Workflow and Business Process Management
Recommender Systems
Data Management
Modeling and Evaluation of IS
Data Mining

They have me scheduled to present first in the "Social Networks" session, which happens to be the first session of the conference. I'm excited to share the work that we have been doing and collaborate with those that attend.

Monday, November 26, 2007

Becoming More Meaningful

Since 2004, I've been posting occasionally to this blog --- however much of the content has been inaccessible for many visitors. This problem is evident in the low number of people that have commented on previous posts. ;)

In the future, I plan to make posts more meaningful and accessible for readers that have no special knowledge of the issue under discussion. This will give an opportunity for anyone to contribute and provide insight through comments. I should hope to increase the number of comments attached with future posts, including this one. :)

Thursday, October 11, 2007

Methods to Identify Cell Architecture and Dynamics

Notes and summary from Chapter 11 of Szallasi:

Regulatory networks consist of both network topology and dynamics. The term "network" emphasizes the topology, whereas the term "regulatory" emphasizes the dynamic interactions within the network, also called kinetics.

Experimental data can be used to capture both the topology, or architecture, and the dynamics of a cell. In the preceding chapter (and my corresponding write-up) data acquisition was discussed, whereas this chapter (and this write-up) focuses on using this data to model the topology and dynamics.

Szallasi mentions that, "engineering approaches have been instrumental in the reverse engineering effort". Reverse engineering or network inference in this context refers to identification of cellular networks from experiments. Various approaches are discussed in this chapter including Bayesian networks, iterative modeling, dynamic flux balance analysis, and

The reverse engineering of cellular networks is very complex. The kinetics/dynamics within the network are changing in time, which makes modeling incredibly challenging. The interactions change in complex manner that are often difficult to model by collecting data.

Szallasi identifies three challenges (within reverse engineering cellular networks) that will allow efficient and accurate dynamical modeling of networks:
(i) to improve the signal-to-noise ratio in the measurements
(ii) to develop new tools for measuring the cellular concentrations, fluxes, and interactions in both space and time
(iii) to incorporate model-based design of experiment protocol

Tuesday, October 09, 2007

Biological Data Acquisition

My thoughts and comments are in between the points, labeled A-D, that Szallasi discusses in chapter 10 of "System Modeling in Cellular Biology":

A) "The overall size and complexity of intracellular networks"

From my experience working with social networks, I easily accept this to be a challenge. The complexity of a network grows quickly as the number of possible interactions grows
quadratically as each node is added.

In biological systems the complexity abounds as scientists have created models for everything form human social behavior on down to the an atom's behavior. Szallasi discusses the modular approach to Systems Biology and that can alleviate some of this complexity. He mentions that estimating the size of intracellular networks is the number of active genes in a given cell. (Of course, the number of genes is slightly in flux as new research continues to be done.)

B) "The general principles of biological measurements --- their technical and conceptual limitations."

When working with data there are always limitations based on your measurements. In some cases we often are not even measuring the right thing. Other times we are limited by the precision of our measurement devices.

I find it quite interesting that measurement devices often do not (and sometimes cannot) account for various environmental changes, which cause irregularities in the collected data.

C) "Concentration measurement versus kinetic parameter measurements"

Parameter estimation is important to get right, yet it is extremely challenging. In fact, both measurement techniques build upon the uncertainty acquired through data measurement.

D) "The actual target of the measurements"

Biological data can analyzed and measured using individual-centric or community-centric approaches, each of which are useful for in their own right. In each view, different problems exist and different interactions occur. Researchers should take great care to justify which measurements is being used to be how they are being viewed.

In conclusion, researchers must understand and accept the many limitations of biological data. As with any research, assumptions must be made and justified.

Thursday, October 04, 2007

MicroArray Data Links

Gene Expression Omnibus (GEO)
Stanford MicroArray Data

Monday, October 01, 2007

Improved Algorithm for Learning of Gene Regulatory Network Connectivity from Time Series Data

Barker et al. presents the GeneNet algorithm designed to learn genetic regulatory network connectivity from time series data. The GeneNet algorithm is similar to work by Yu et al (2004), however, it takes a new approach by computing ratios of conditional probabilities and accumulating votes to determine influence between species. The approach taken by Yu et al uses Dynamic Bayesian Networks (DBN) and a cumulative distribution function (cdf) to determine a score for each species that may influence a gene. GeneNet approaches the problem differently by searching for differences between time points.

The pseudocode of the GeneNet algorithm is as follows:

GeneNet(Species S, Expts E, Influences I, Thresholds T, Levels L)
L:=DetermineLevels(S,E,L)
foreach c element of S:
Y:=CreateInfluenceVectorSet(c,S,E,I,T,L)
Y:=CombineInfluenceVectors(c,S,E,I,T,L,Y)
I(c):=CompeteInfluenceVectors(c,S,E,T,L,Y)
return I

Due to the lack of time series data available, synthetic data sets were generated for comparison. Empirical studies were performed which pitted GeneNet versus Yu's DBN algorithm on these synthetic datasets. GeneNet had significantly better precision, recall, and runtime for the majority of experiments.

(See the paper in Transactions on Computation Biology and Bioinformatics, No. 8, March 2007.)

Thursday, September 20, 2007

Metabolite

Quoted from the Columbia Encyclopedia:

metabolite, organic compound that is a starting material in, an intermediate in, or an end product of metabolism. Starting materials are substances, usually small and of simple structure, absorbed by the organism as food. These include the vitamins and essential amino acids. They can be used to construct more complex molecules, or they can be broken down into simpler ones. Intermediary metabolites are by far the most common; they may be synthesized from other metabolites, perhaps used to make more complex substances, or broken down into simpler compounds, often with the release of chemical energy. For example, glucose, perhaps the single most important metabolite, can be synthesized in a process called gluconeogenesis, can be polymerized to form starch or glycogen, and can be broken down during glycolysis in order to obtain chemical energy. End products of metabolism are the final result of the breakdown of other metabolites and are excreted from the organism without further change; they usually cannot be used to synthesize other metabolites.

The Wikipedia article on Bioinformatics is worth scanning to get a feel for the research area.

Thursday, September 13, 2007

Bioinformatics Reading Thoughts

I'm taking a Bioinformatics class and have been reading "System Modeling in Cellular Biology" by Szallasi et. al. Here are the thoughts that I have had while reading the first few chapters.

Data-driven versus hypothesis-driven research
The world is very complex. Science has been used to understand how things work. Science has often been driven by a hypothesis followed by experimentation which then increases our understanding of the problem --- these questions were based on what we observe or maybe a few researchers have observed. Recently, we continue to gather more and more data which also can be used to drive research --- these questions are based not only on what we might observe in life, but additionally on what the data suggests, in some instances of millions of people. Both ways of attacking the problem can lead us to the same truth, however, it seems that the later has more potential of getting us there quicker.

Modeling
Modeling is constantly used in biological research. Szallasi mentions a couple reasons why models might be useful (1) testing whether a model is accurate and relect known facts, and (2) models can help us to understand which parts of the system contribute most to some desired properties of interest.

Robustness
I love how robust and resilient biological processes are. I would love to be able to create a computer program that is a fraction as robust as, say the body is at healing itself.

Modularity
Many biological processes are modular, much like how good programmers would make a function or class. For example, the human kidney can be substituted into another person and it can work successfully in them. Likewise, in programming, code that connects to a database can be used interchangeably withing multiple programs.

Bottom-up versus Top-down approaches
Bottom-up approaches typically build on existing biological knowledge, whereas, top-down approaches leverage the enormous amount of biological data to find something important to then delve into.

Thursday, May 03, 2007

Export data from Postgres

To export data from Postgres to an output file of your choice can be done by following the simple steps below:

1. Start psql with the database that you'd like to export from...

$ psql [DATABASE]

2. Toggle the output mode to unaligned (\a toggles between unaligned and aligned output mode)

=# \a

3. Turn "tuples only" off (\t toggles between tuples on and off)

=# \t

4. Set the output file (replace [FILE] with what you'd like to call your output file). It will send all query results to the file or |pipe.

=# \o [FILE]

5. Run whatever query you'd like to send to the output file. For example,

=# SELECT * FROM [TABLE];

In summary:

\a
\t
\o /tmp/outputfile.txt
SELECT ......
\o

Social Capital in Networks