Michael Golden will be presenting “Probabilistic Inference of Nucleotide Coevolution” at the Computational Statistics and Machine Learning seminar today at 15:30 in the Department of Statistics. His slides are available here.
Abstract
Pairs of nucleotide positions within biologically functional nucleic acid secondary structures often exhibit evidence of coevolution that is consistent with base-pairing. PICNIC is a probabilistic sequence evolution model that assesses rates of mutation at base-paired sites in alignments of DNA or RNA sequences. PICNIC is able to fully account for an unknown secondary structure, and in doing so can be used to predict a secondary structure shared amongst an alignment of sequences. PICNIC was used to infer rates of coevolution associated with GC, AU (AT in DNA), and GU (GT in DNA) dinucleotides in non-coding RNA alignments, and single-stranded RNA and DNA virus alignments. Strong evidence was found for GU dinucleotides being selectively favoured at base-paired sites in non-coding RNA and RNA virus alignments, with marginal evidence for GT dinucleotides being selectively favoured at base-paired sites in DNA virus alignments. The strength of coevolution at base-paired sites in a SHAPE-MaP-determined HIV-1 NL4-3 RNA secondary structure and a corresponding alignment containing large numbers of HIV group 1M sequences was also measured, finding that the PICNIC-inferred degrees of coevolution were more strongly correlated with experimentally-determined SHAPE-MaP pairing scores than degrees of coevolution measured using three mutual information methods that do not take into account phylogenetic dependencies.