Research Overview
Our research lies in the areas of computational systems biology, bioprocess engineering, and bioinformatics. Our domain expertise includes mathematical modeling of biological networks, parameter estimation, systems analysis, bioprocess optimization, and design of experiments. The group’s research mission is to create enabling and innovative technology for the extraction of mechanistic and actionable insights from biological data, based on rigorous mathematical underpinnings, systems modeling and analysis, and advanced computational algorithms. Our research projects are motivated by a wide range of problems with biopharmaceutical, pharmaceutical, and biomedical significance. Below, you can find more descriptions of CABSEL research projects.
Our tools are freely available on our GitHub page.
Metabolism and Aging
Continued advances in high-throughput omics technology have produced a tsunami of biological data, fueling the new era of personalized medicine. Presently, we face the challenge of extracting from such data, key biological insights for the understanding of cellular and organismal phenotypes and importantly, their alterations in different contexts such as diseases and ageing. On its own, each omics dataset (e.g., genomics, epigenomics, transcriptomics, proteomics, metabolomics, etc.) gives only a partial view of the state of the cell or tissue or organism. Only by integrating these datasets can we gain a complete systems-wide and mechanistic understanding of the processes that give rise to observed phenotypes.
In one of our projects, we are tackling the challenge of omics data integration specifically in the context of metabolic syndrome and aging. Metabolic syndrome (MetS) is a multiplex risk factor for major chronic diseases such as diabetes and cardiovascular disease. With modernization and changes in dietary intake, MetS has become more prevalent among adults over the world. Among other factors, aging is known to be an important contributor to MetS. In this project, we are developing tools for the creation of gene-protein-phenotype networks from omics databases and metabolic network analysis of genome-scale metabolic models. From such network-based data integration and analysis, we aim 1) to elucidate alterations in metabolic pathways underlying metabolic diseases and aging; (2) to predict (human) phenotypes associated with perturbations to metabolic pathways; and (3) to identify potential genes that modulate aging and longevity (which will be tested on model organisms, such as C. elegans in our collaborators' lab).
Key contact person: Sudharshan Ravi
Mitochondria and Aging
Mitochondrial dysfunction is a central and conserved feature of the ageing process. Mitochondria are the powerhouses of eukaryotes, producing much of the cellular energy. A human cell houses 100s-1000s of mitochondrial DNA (mtDNA), which encode proteins that are necessary for energy production. Accumulations of mutant mtDNA molecules could lead to the loss of mitochondrial function, a hallmark of many age-related diseases. Mitochondrial quality control (QC) processes exist for the purpose of maintaining mitochondrial function and genomic integrity. While the machinery of this QC has been shown to deteriorate with aging it is, however, difficult to identify targets for intervention against mutant mtDNA accumulation without a comprehensive understanding of the intricate interplay among the QC components. Such an understanding would require an enormous experimental undertaking.
In collaboration with the groups of Professors Barry Halliwell (National University of Singapore, NUS) and Jan Gruber (NUS-Yale), we have been applying systems biological modeling and analysis to gain a better understanding on the formation and accumulation of mtDNA mutant molecules. More specifically, we have created mathematical models of mitochondrial QC with increasing complexity and realism, to identify the critical processes that are involved in mutant mtDNA accumulation. In addition, we have also compiled and analyzed large compendia of mtDNA breakpoints from different organisms to determine the causal factors in the formation of mtDNA deletion mutations. Using model simulations and analysis, we have been able to generate testable hypotheses and actionable insights.
Key contact person: Lakshmi Narayanan Lakshmanan
Network Inference
The inference of biological networks has important applications, from finding the treatment of diseases to the engineering of microbes to produce drugs and biofuels.
Biological networks are often drawn as a graph, in which nodes represent biological entities and edges indicate biological connections. These connections are used to describe a variety of biological functions, for example physiochemical interactions (e.g. binding between molecules), chemical transformation (from substrate to product), regulation (activation/inhibition), gene transcription, protein translation, and many more. Inferring such networks from data means identifying the interconnections (or the edges) among biological entities (or the nodes). A subset of this problem that is of particular interest iis the identification of directed edges or arrows. This directionality can be generally thought to mean causality, i.e. A --> B means “A causes B”. The inference of such causal networks from biological data is known to be extremely challenging.
Over the past several years, we have been developing theoretical framework and algorithms for ensemble inference (for example, see TRaCE) using which we could produce an ensemble of networks that are consistent to and therefore indistinguishable by the available data. This ensemble thus represents the uncertainty in the network inference and provides a direct measure of network inferability. This uncertainty is a consequence of the lack of information in the data to uniquely determine the network structure, because of, among other things, suboptimal experimental design. For this reason, complementing the ensemble inference, we have been working on optimal designs of gene knock-out experiments for GRN inference (for example, see REDUCE).
Key contact person: Rudiyanto Gunawan
Parameter Estimation
Biological system model identification is typically formulated as an iterative process that integrates wet-lab experiments and in silico analysis and optimization (see Figure 1). Most biological modeling studies in the literature involve the creation of an accurate (high fidelity) model of the system, defined by a set of equations and parameter values that describe the important or interesting behavior of the system. There are many challenges in applying this iterative model identification procedure to a real-life problem. The bottlenecking step is typically encountered during the estimation of unknown kinetic parameters from experimental data. This has led to the development of a large number of parameter estimation techniques. However, as we and others have shown, the estimation of kinetic parameters by fitting model simulations to biological data is usually ill-posed. There often does not exist a single (best-fit) solution to the data fitting problem, but rather one can find many parameter combinations, i.e. an ensemble of parameters that can fit the data statistically equally well. Consequently, the best-fit model, even if one is obtained, may have little predictive capability, or worse, it could be misleading.
In light of the identifiability issue and other uncertainty, the success of any model-based studies that rely on building an accurate model of the system, will depend on balancing the (numerical and experimental) tractability of model identification and model fidelity. In this project, we take a different approach, based on the realization that in a real-life problem, the model itself is not an end, but rather a means toward a goal (e.g., obtaining biological insights through model analysis or optimizing the production of certain biomolecules). Specifically, we are using an ensemble modeling approach. The ensemble consists of models that cannot be differentiated among each other from available information about the system (prior knowledge, experimental data). Consequently, the ensemble directly represents the uncertainty about the system due to incomplete information. Our research in this project has generated several tools for parameter estimation and ensemble modeling, which are available through an easy-to-use MATLAB interface called REDEMPTION.
Key contact person: Rudiyanto Gunawan
Figure 1. Iterative Model Identification. The model building process involves the following key steps: data gathering, model formulation or refinement, parameter estimation, and model (in)validation.
Network-based Data Analysis
Elucidating the mode of action of chemical compounds is of great interest in drug discovery and toxicology. In this regard, advances in high-throughput omics technology have been playing a crucial role in providing the data for elucidating cellular entities which interact with and whose function are perturbed by drug and chemical compounds. Cellular-wide response such as whole-genome gene expression profile, to genetic perturbations and chemical compounds can now be measured easily and cheaply. Furthermore, large amount of omics data are available from the ever-growing public biological databases. Because such data are typically highly dimensional, the use of computational methods has become necessary in their analysis. In this regard, we have been working on network-based analysis of gene expression for identifying the direct gene targets of compounds (see DeltaNet and ProTINA). In our network-based data analysis, we combine mechanistic modeling of biological networks (in this case, transcriptional regulatory network) and machine learning algorithm to extract information on the molecular targets of compounds. More specifically, we perform model-based inference of network perturbations caused by compound treatments.
In another recent project, we have been developing network-based analysis of single-cell transcriptional profiles with a particular focus on stem cell differentiation. Single-cell transcriptional profiling has received much recent attention from stem cell researchers, and has reshaped our understanding of the cell differentiation process. The analysis of single-cell gene expression data, however, poses a difficult challenge because of the distributive nature of such data. While technical drop-outs contribute to this difficulty, the stochastic bursty dynamics of gene transcription complicates the data analysis further. Particularly, the bursty gene transcription produces a highly non-standard distribution of mRNA counts. Our algorithm CALISTA tackles the aforementioned issue by using a likelihood approach and a stochastic model that describes the bursty stochastic dynamics of mRNA transcription. CALISTA is able to accurately identify stem cell population structure (clusters) and infer lineage progression from single-cell transcriptomics data (e.g. single-cell RTqPCR or RNA-sequencing).
Key contact persons: Heeju Noh and Nan Papili Gao