We hosted an excellent talk in the Graduate Lecture series in the Department:
Speaker: Dr Junhyong Kim, Department of Computer and Information Science, University of Pennsylvania
Title: Geometric Embeddings of Biological Data
Abstract: Biology is the science of exceptions where heterogeneity is the norm, data are sparse, and most models involve high degrees of computational complexity. In many cases, taking a geometric point of view helps establish a different framework for analysis. Here, I present three stories where we have utilized geometric views to help both method development and biological modeling. First, I discuss the problem of phylogeny inference, which involves stochastic models over tree graphs. I show that families of probability models can be treated as algebraic sets yielding insights into limits of inference and model relationships. Second, I discuss the problem of inferring biological function from RNA molecules that can fold into secondary structures. Standard approaches to this problem involve computationally costly folding algorithms. Here, we address the problem using an “empirical kernel” approach where probabilistic scoring against known models yields a geometric embedding of novel RNA structures without folding. Third, I discuss our empirical and statistical work in single cell RNA sequencing. Single cell transcriptomes present unique challenges in both data analysis and biological meaning. I present some problems in statistical characterization of such data and suggest that the role of individual cells and gene expression should be re-thought in terms of functional constraints and geometric sets