As of 2017 I’m a 3rd year DPhil at Brasenose college supervised by Jotun Hein in the Department of Statistics.
My research is primarily focused on methods development for evolutionary analysis of biological structures, such as protein structures and RNA.
Here is a link to my external page.
Why my research is fun
Biological molecules often share similar sequence or structural features due to their shared ancestry, these similarities may be referred to as evolutionary dependencies.
Failing to account for evolutionary dependencies can lead to false conclusions. For example, a sequence motif present in multiple proteins might seem biologically significant had it arisen independently in each protein, whereas in reality, the presence of such a motif could be largely due to the evolutionary relatedness of the sequences being analysed.
On the other hand, accounting for such evolutionary dependencies can be extremely useful. Statistical and computational methodologies can be used to transfer related information from one homologous protein to another protein where such information is missing. For example, if we have knowledge of one protein sequence and structure, and knowledge of a second related protein sequence but not its structure, we can utilise the underlying evolutionary dependencies to predict a plausible set of structures for the second protein using the information we have observed in the first (see figure below).
The above figure is an example of evolutionary dependencies being used to predict missing dihedral angles in a second protein from dihedral angles observed in a first protein. Cartoon structure representations of E.coli glyceraldehyde-3-phosphate dehydrogenase structure (PDB 1gad) are depicted in each panel, overlaid with predictive accuracy when using different combinations of observed data to predict missing dihedral angles in 1gad. Thermus aquaticus glyceraldehyde-3-phosphate dehydrogenase (PDB 1cer) was used as a homolog for the purposes of prediction. Predictive accuracy is indicated using a colour gradient depicting the mean angular distance between the true dihedral angle (X1gad) and the predicted (sampled) dihedral angles (X1gad) at each amino acid position. The label at the bottom of each panel indicates the data combination used. In A, no data was used for prediction. In B, only the amino acid sequence corresponding to 1gad (A1gad) was used. In C, the amino acid sequence of 1gad (A1gad) and the amino acid sequence of the homologous protein (A1cer) were used. In D, both amino acids sequences (A1cer and A1gad) and the secondary structure of the homologous protein (S1cer) were used. In E, both the amino acid sequences (A1cer and A1gad) and the dihedral angles of the homologous protein (X1cer) were used. Finally, in panel F the same combination of observations was used as in E, but the alignment was treated as known a priori.
Before you go, here is an example of a protein evolving along a simulated evolutionary trajectory: