Thursday, October 11, 2007

Methods to Identify Cell Architecture and Dynamics

Notes and summary from Chapter 11 of Szallasi:

Regulatory networks consist of both network topology and dynamics. The term "network" emphasizes the topology, whereas the term "regulatory" emphasizes the dynamic interactions within the network, also called kinetics.

Experimental data can be used to capture both the topology, or architecture, and the dynamics of a cell. In the preceding chapter (and my corresponding write-up) data acquisition was discussed, whereas this chapter (and this write-up) focuses on using this data to model the topology and dynamics.

Szallasi mentions that, "engineering approaches have been instrumental in the reverse engineering effort". Reverse engineering or network inference in this context refers to identification of cellular networks from experiments. Various approaches are discussed in this chapter including Bayesian networks, iterative modeling, dynamic flux balance analysis, and

The reverse engineering of cellular networks is very complex. The kinetics/dynamics within the network are changing in time, which makes modeling incredibly challenging. The interactions change in complex manner that are often difficult to model by collecting data.

Szallasi identifies three challenges (within reverse engineering cellular networks) that will allow efficient and accurate dynamical modeling of networks:
(i) to improve the signal-to-noise ratio in the measurements
(ii) to develop new tools for measuring the cellular concentrations, fluxes, and interactions in both space and time
(iii) to incorporate model-based design of experiment protocol

Tuesday, October 09, 2007

Biological Data Acquisition

My thoughts and comments are in between the points, labeled A-D, that Szallasi discusses in chapter 10 of "System Modeling in Cellular Biology":

A) "The overall size and complexity of intracellular networks"

From my experience working with social networks, I easily accept this to be a challenge. The complexity of a network grows quickly as the number of possible interactions grows
quadratically as each node is added.

In biological systems the complexity abounds as scientists have created models for everything form human social behavior on down to the an atom's behavior. Szallasi discusses the modular approach to Systems Biology and that can alleviate some of this complexity. He mentions that estimating the size of intracellular networks is the number of active genes in a given cell. (Of course, the number of genes is slightly in flux as new research continues to be done.)

B) "The general principles of biological measurements --- their technical and conceptual limitations."

When working with data there are always limitations based on your measurements. In some cases we often are not even measuring the right thing. Other times we are limited by the precision of our measurement devices.

I find it quite interesting that measurement devices often do not (and sometimes cannot) account for various environmental changes, which cause irregularities in the collected data.

C) "Concentration measurement versus kinetic parameter measurements"

Parameter estimation is important to get right, yet it is extremely challenging. In fact, both measurement techniques build upon the uncertainty acquired through data measurement.

D) "The actual target of the measurements"

Biological data can analyzed and measured using individual-centric or community-centric approaches, each of which are useful for in their own right. In each view, different problems exist and different interactions occur. Researchers should take great care to justify which measurements is being used to be how they are being viewed.

In conclusion, researchers must understand and accept the many limitations of biological data. As with any research, assumptions must be made and justified.

Monday, October 01, 2007

Improved Algorithm for Learning of Gene Regulatory Network Connectivity from Time Series Data

Barker et al. presents the GeneNet algorithm designed to learn genetic regulatory network connectivity from time series data. The GeneNet algorithm is similar to work by Yu et al (2004), however, it takes a new approach by computing ratios of conditional probabilities and accumulating votes to determine influence between species. The approach taken by Yu et al uses Dynamic Bayesian Networks (DBN) and a cumulative distribution function (cdf) to determine a score for each species that may influence a gene. GeneNet approaches the problem differently by searching for differences between time points.

The pseudocode of the GeneNet algorithm is as follows:

GeneNet(Species S, Expts E, Influences I, Thresholds T, Levels L)
L:=DetermineLevels(S,E,L)
foreach c element of S:
Y:=CreateInfluenceVectorSet(c,S,E,I,T,L)
Y:=CombineInfluenceVectors(c,S,E,I,T,L,Y)
I(c):=CompeteInfluenceVectors(c,S,E,T,L,Y)
return I

Due to the lack of time series data available, synthetic data sets were generated for comparison. Empirical studies were performed which pitted GeneNet versus Yu's DBN algorithm on these synthetic datasets. GeneNet had significantly better precision, recall, and runtime for the majority of experiments.

(See the paper in Transactions on Computation Biology and Bioinformatics, No. 8, March 2007.)