Information theory; Network inference; Probabilistic modeling
One important problem in contemporary computational biology, is that of reconstructing the best possible set of regulatory interactions between genes (a so called gene regulatory network -GRN) from partial knowledge, as given for example by means of gene expression analysis experiments. Since only highly noisy-data is available, doing this represents a challenge to common probabilistic modeling approaches. However, a variety of algorithms rooted in information theory and maximum entropy methods, have been developed and they have coped with the problem successfully (to a certain degree). Mutual information maximization, Markov random fields, use of the data processing inequality, minimum description length, Kullback-Liebler divergence and information-based similarity are some of these. Another approach to modeling gene regulatory networks combines information theory and machine learning techniques. Monte Carlo methods and variational methods can also be used to measure data information content. Hidden Markov models (HMM) or stochastic linear dynamical systems use time series data to represent information of a state sequence about the past through a discrete random variable called the hidden state. Similarly, stochastic linear dynamical systems represent information about the past but through a real-valued hidden state vector. Common to these models is the fact that conditioned on the hidden state vector, the past, present and future observations are statistically independent. State-Space models, also known as Linear Dynamical Systems (LDS) or Kalman Filter models, are a subclass of dynamic Bayesian networks used for modeling time series data. Expressing time series models in state-space form allows for unobserved components - an important factor when modeling gene expression data. Unobserved variables can model biological effects that are not taken into account by the observables. They could model the effects of genes that have not been included in the experiment, levels of regulatory proteins or possible effects of mRNA degradation. Work presented here shows the use of these models to reverse engineer regulatory networks from high-throughput data sources such as microarray gene expression profiling. In this review we will also describe the basic theoretical foundations common to such methods and will briefly outline their virtues and limitations. © 2012 by Nova Science Publishers, Inc. All rights reserved.