Systems Biology in Saccharomyces cerevisiae. Statistical data analysis and mathematical methods
Over the last decade, Systems Biology has emerged as a new paradigm in the study of cellular systems which are characterized from a global perspective; in contrast to the traditional cellular biology, which was focused on specific components of the cell. To this end, a key milestone for the achievement of a global view of biology is the generation and interpretation of different datasets derived from omics technologies; such as Transcriptome and Proteome. Exploring one single set of data of these high-throughput methods may lead to biological interpretations of the observed phenomena; however, to get more insights to build a global perspective one single set of data might not completely unravel the complexity of biological systems. Therefore, the integration of data from different omics can lead to new discoveries. Following the goal of data integration, Transcriptome and Proteome published datasets from the yeast Saccharomyces cerevisiae were collected and analyzed to shed light on the mechanisms that prevent the accurate correlation between transcript and protein and with the goal of having targeted identifications of the cellular functions that affect this lack of correlation. In specific, the aim was to identify the genes that present patterns of correlation, or lack of correlation, between their transcript and protein levels, and most important, the biological variables that might affect these correlations. During the exploration of the data it was found that there is no unique statistical or computational method to integrate these kinds of data, and we therefore, exploited the data using approaches such as Data mining and Machine Learning methods as well as statistical and mechanistic mathematical frameworks. Nevertheless, exploring the data using sophisticated methods of classification, Multilayer Perceptron or Sequential Minimal Optimization Logistic models, did not lead the identification of patterns on different biological properties that can predict correspondence between transcriptome and proteome. However, a less sophisticated method driven by the knowledge of the biological mechanisms of translation led to the conclusions that some genes, belonging to amino acid pathways, like LEU4 and ARG8, are not transnational controlled. Using the same datasets and with an statistical approach inserted in a mathematical framework, it was possible to draw the hypothesis that fifty percent of the variability in the correlation transcript-protein is due to the mechanism of translational initiation that it turns, it is influenced by the tRNA concentration and the competition between cognate and near-cognate tRNA. In conclusions, data integration as a strategic tool in Systems Biology has not reached its final goal and with the study cases presented in this research there is a clear message that there is room for improvements.