This page includes details of projects which are complete, not currently in progress, or not being actively updated. That said, many of these projects contain interesting ideas which may be revived in the future!
Click on the blue text below to see the list of projects in each section.
For some years, Jotun ran a summer school in computational biology – click here for a list linking to some of the projects from those summer schools.
Our grant applications are a good indication of our group’s research interests – click here for a list of a few grant applications we made in years gone by.
Funded Grants
- Comparative Genomics and Next Generation Sequencing – Software Development for Annotation of very large number of genomes (EU COGANGS)
- A Novel Comparative Method for Locating Human Conserved DNA (BBSRC)
- Practical Statistical Alignment (BBSRC)
- Evolutionary Analysis of Non-Coding RNA Genes and Gene Families (BBSRC)
- Association Mapping of Breast and Prostate Cancer Related Genes (EU)
- From Population Genomes to Global Pedigrees Mike Steel (EPSRC)
- Molecular Dynamics, Movements and Evolution Thomas Darden & Mark Sansom
Other Grant Applications
- Population Pedigree Inference from Genomic Data Agnar Helgason & Steffen Lauritzen
- Engineering Systems Biology of the Cell Cycle Béla Novák, Chris Holmes, and Stephen Roberts
- Comparative Virus Annotation
- Statistical Models of Protein Structure Evolution Dave Stuart and Willie Taylor
- Beyond Phylogenies: Evolutionary Analysis of Pathogens
- Combinatorics, Complexity and Probablistics of the Ancestral Recombination Graph
- Integrative Analysis of the Genetic Factors behind Asthma and Atopic Dermatitis William Cookson and Chris Holmes
- Alternative Splicing: Functionality, Evolution and Selection
We have implemented a number of our methods as software packages, most of which are not actively maintained.
- ParIS
- This is a genome rearrangement server. It takes the gene orders and reading directions of two genomes and samples from the posterior probabilities of trajectories transforming the first genome into the second one using a Partial Importance Sampler technique, which is a version of the large class of Markov chain Monte Carlo methods.
- Section 26
- This is a software suite for parsimonious recombination analysis of single nucleotide polymorphism data under the infinite sites assumption. At the core is a branch & bound method for finding the exact minimum number of recombinations needed for a data set with an accompanying evolutionary history.
- Pfold
- This is an RNA fold server. It takes an alignment of RNA sequences as input and predicts a common structure for all sequences. The folding method uses a combination of rate matrices and a stochastic context free grammar.
- IStVaN
- Istvan is a collection of modules implementing various methods for inferring an invariant set, i.e. a set of genes with no or little difference in actual expression levels, from a set of pairs of microarray measured expression levels, and for inferring normalising functions based on a set of pairwise data, possibly an invariant set determined by one of the implemented methods.
- CStoRM
- This is a program for parsing a general hidden Markov model with a stochastic context free grammar. It is based on an extension of the CKY algorithm for parsing a sequence, and essentially finds the most probable combination of a path in the HMM and a parse tree in the SCFG under assumption of independence.
- Rahnuma
- Rahnuma is a tool for prediction and analysis of metabolic pathways and comparison of metabolic networks that represents metabolic networks as hypergraphs. Rahnuma computes all possible pathways between two or more metabolites by using constrained depth first traversal of a hypergraph. It also allows pathway based metabolic network comparisons at organism as well as phylogenetic level.
- StatAlign
- StatAlign is an extendable software package for Bayesian analysis of Protein, DNA and RNA sequences. Multiple alignments, phylogenetic trees and evolutionary parameters are co-estimated in a Markov Chain Monte Carlo framework, allowing for reliable measurement of the accuracy of the results. The models behind the analysis permit the comparison of evolutionarily distant sequences: the TKF92 insertion-deletion model can be coupled to an arbitrary substitution model.
- Starfold
- Starfold is a Java program for predicting RNA secondary structures including a class of pseudoknots. It requires a parameter file to be present. At the moment, running it directly from the JAR archive is only possible under 32-bit Windows.
- Frnakenstein
- Frnakenstein is a python program for solving the inverse RNA folding problem, i.e. given a target structure it attempts to design a sequence folding into this stucture.
Click for a list of old project proposals. These are typically proposals that have already been the basis of one or more student projects. The questions involved do evidently interest us, so if you believe you have a new angle on the proposal please feel free to contact us to discuss the possibility of doing it as a project.
Modelling Evolution, Genome annotation and Comparative Biology
A Stochastic Model of Gene Duplication and Loss (March 15)
Statistical Model for the Evolution of Directions (March 15)
The Origin of Life and Toy Chemistries (Jotun Hein) (Dec 14)
Understanding the evolution of assembly specificity in protein oligomers (March 15)
Molecular Evolution of Primate Retroelements (Aris Katzourakis and Jotun Hein) (Dec 14)
Investigating the Miklos-Lunter-Holmes (2004) Model of Insertion-deletion (Jotun Hein) (Dec 14)
Markov Random Fields on Biological Networks (Jotun Hein and George Deligiannidis) (Dec 14)
Metrics on RNA Secondary Structure Ensembles (Jotun Hein) (Dec 14)
Algorithms for finding Autocatalytic Systems Summary (June 10)
Virus evolution simulation Summary (June 10)
Measurement of Selection on RNA molecules Summary (June 10)
Modelling the dynamics and evolution of nitrogen assimilation in plant pathogenic Pseudomonas Collaboration with Gail Preston Summary (June 10)
Modelling evolution of protein secondary structure topologies Summary (June 10)
Evolutionary Pattern Formation Summary (June10)
Evolving Dynamical Systems – case study: Cell Cycle Collaboration with Bela Novak Summary (Feb 10)
Evolutionary Models for Complex Signals Summary (Feb 10)
Evolutionary Models for Combined Regulation-Metabolism Graphs Collaboration with Gail Preston Summary (Dec 09)
RNA, Stochastic Context Free Grammars and Classifiers Summary (June 09)
Comparative Annotation of Metabolic Pathways Summary (April 09)
Analysing Multiple Functionalities in Proteins Summary (March 09)
Local Pairwise Statistical Alignment Summary (Oct 08)
Fine Scale Regulatory Annotation of a Gene Summary (May 08)
A Unified Approach to Signal Detection Summary (May 08)
MCMC Integration over Evolutionary Histories of Metabolic Networks Collaboration with Tom Snijders Summary (May 08)
Evolving Language Grammars: “Evolving English” Collaboration with Stephen Clark Summary (Jan 07)
Evolving Biological Grammars Summary (June 07)
Evolutionary Docking of Proteins Summary (May 07)
Stochastic Models Combining Alignment and Annotation Summary (Oct 07)
Parallelising pairwise-statistical alignment Summary (Feb 06)
Fitting Genome Models To Known Virus Structures Summary (June 07)
Evolutionary Analysis of Molecular Movements Collaboration with Thomas Darden Mark Sansom Summary (Jan 07)
Choice of parameter set for use with a mathematical model of mechanical force generation in the heart Summary (Feb 06)
Extending the Domain of Comparative Genomics Collaboration with Kay Davies Summary
Population Genetics, Mapping and Genealogical Structures
The following project proposals are all motivated by the wide use of population variation data. The major genealogical structures are phylogenies, pedigrees and ancestral graphs (ARG). A central use of variation data is mapping – making statements about the positions in the genome that is causal for individual phenotypes, such as disease. New sequencing techniques create new opportunities for research, such as pedigree and somatic tree inference but also changes the nature of more traditional problems, such as phylogeny, alignment and recombination analysis, due to the large quantity of data.
Genetic and Genealogical Ancestors (April 2011)
Mapping and Arabidopsis Collaboration with Richard Mott Summary July 10)
Networks and Association mapping Summary Collaboration with Andrey Rzhetsky (Sep 08)
A Gibbs Sampler of the ancestral recombination graph Summary (July 08)
From exact marginals to good importance sampling Summary (June 08)
Statistical Alignment via k-Restricted Steiner Trees Summary (May 08)
“Corner Cutting” approaches to the Ethier-Griffiths-Tavare Recursions Summary (March 08)
Workbench for Ancestral Recombination Graph Summation Summary (March 08)
Population Pedigree Inference from Genomic Data Collaboration with Steffen Lauritzen Summary (Jan 08)
Counting Ancestral Recombination Graph (ARG) Topologies Summary (June 07)
Counting Pedigrees up to Isomorphism Summary (June 07)
Somatic Cell Genealogies and Differentiation Collaboration with Kevin Talbot Summary (May 07)
User Interface for Recombination Analysis Summary (Dec 05)
Systems Biology
These projects are all motivated by the present rise of systems biology. Systems biology poses many questions, both in terms of modelling on a large scale, how feasible it is to infer biological systems and the use of concepts in this field. Networks are central to many systems biology models and the role of evolution also needs to be explored.
Difficult Concepts in Systems Biology III: Function and Purpose Summary (Jan 09)
Difficult Concepts in Systems Biology II: Levels and Reduction Summary (Oct 08)
Identifiability of a Simple Biological System Summary (Jan 08)
Difficult Concepts in Systems Biology: Emergence Collaboration with Carsten Wiuf Summary (Jan 08)
Parameter and Sensitivity Analysis for Large System of ODEs Collaboration with Dagmar Iber Summary (Nov 07)
Algorithmic, Probabilistic and Modelling Challenges
Computational Biology leads to a series of technical problems that could be undertaken by someone with a more pure interest in combinatorics, statistics, mathematics, algorithms, modeling or software development. Some of these might have biological terms in them, but the biological component is minimal (or could be minimized).
Error Correcting Codes, Lumpability and Sequence Evolution (Jan 12)
Incorporating RNA secondary structure prediction into StatAlign (Jan 12)
Kinetic and Co-Transcriptional Folding of RNA (Jan 12)
Dealing with Large, Sparse Continuous Time Markov Chains (Dec 11)
RNA Grammar Search Sum (July 10)
Combining Stochastic Grammars Summary (June 10)
Efficient sampling of ancestral states in the infinite site model Summary (July 09)
Multiple Alignment Using Guide Networks Summary (Nov 08)
A Constraint Optimization Problem in Phylogenetics Collaboration with Raphael Hauser Summary (Dec 08)
Combinatorics of Biological Networks Collaboration with Alex Scott Summary (Oct 08)
Automatic Code Generation for Probabilistic Inference in Computational Biology Collaboration with Oege de Moor Summary (Oct 08)
Gaussian Processes and Gene Regulation Summary (June 08)
Combinatorics Problems in Genome Rearrangement Summary (March 08)
Temporal Multiple Statistical Alignment Collaboration with Gerton Lunter Summary (June 07)
Artifacts from Combining Hidden Markov Models Summary (March 07)
Path Sampling in Continuous Time Markov Chains
Stochastic Turing Patterns Summary (April 06)
Pseudoknots in RNA secondary structure Summary
How many transcripts does it take to reconstruct the Splice Graph? Summary (05)
Parallelisation of Recombination Analysis Summary (Feb 06)
Combining RNA energy minimisation with microscopy information Summary (04)
Collaborative Data Analysis
Testing the Biogeographical Hypotheses Collaboration with Finn Borchsenius & Anders Barfod
Molecular Evolution of Selected Families of Human Endogenous Retroviruses Collaboration with Palle Villesen & Hugo Martins
Phylogenomic Analysis of Algae Collaboration with Tom Cavalier-Smith Summary
Recombination analysis in Arabidopsis Thaliana (Dec 11)
Stochastic Models of Leaf Shape Evolutiion Collaboration with Nick Jones and Miltos Tsiantis Summary (Feb 11)
Footprinting with additional knowledge Collaboration with Richard Mott Summary (Feb 10)
Fine Scale Regulatory Annotation of Cancer Genes Collaboration with Thorunn Rafnar Summary (Aug 09)
Analysis of single-molecule FRET trajectories of transcription complexes based on Hidden-Markov Modelling Collaboration with Achilles Kapanides Summary (Jan 09)
Reconstruction of an Ancestral Protein: Sequence, Function, Motion and Structure Collaboration with Lee Pedersen and Mark Sansom Summary (Dec 08)
Cataloguing sequences homologous to the Rhodobacter flagellar motorCollaboration with Judith Armitage Summary (Feb 08)
Computational Promoter Analysis of non-Coding RNAs Collaboration with Kay Davies Summary (July 07)
Computational Promoter Analysis of Metazoan α-Globins Collaboration with Doug Higgs Summary (June 07)
Annotate 12 Drosophila genomes for regulatory signals Collaboration with Lior Pachter & Vasile Palade Summary (June 07)
Phylogenetic Analysis of “New” Homeobox in the Lineage Leading to Humans Collaboration with Peter Holland Summary (Feb 07)
Structural Analysis of Aptamers Collaboration with William James Summary (Dec 05)
RNA Structure and Evolution Modelling
Evaluation of SCFGs Summary (May 11)
Evolution Grammar Search Summary (April 11)
Practical Implications of Grammar Ambiguity on RNA Secondary Structure Prediction Summary (April 11)
Boltzmann Weighted Combinatorics of RNA Secondary Structures Summary
Computational Origin of Life Models
Mass Action Equations for Autocatalytic Systems (April 11)
Autocatalytic Sets of RNAs Collaboration with Wim Hordijk & Mike Steel Summary (April 11)
Proposal for the Development of a Software Package for Simulating and Studying Catalytic Reaction Systems and Autocatalytic Sets Summary
High School Projects
RNA Secondary Structure (Algorithms) Summary (July 10)
Pairwise Alignment (Algorithms) Summary (June 08)
Sequence Evolution (Probability Theory) Summary (June 08)
Signals in Single Genomes (Computer Science) Summary (July 09)
Counting in Phylogenetics (Combinatorics) Summary (July 07)
Reading projects
Reading projects.
These projects are used to describe a topic and give some references so that a group of students can give a 40-90 minute presentation after 8-10 hours’ work. The projects have been used for courses given in Portugal, South Africa, Denmark, Oxford and Iceland
Identifiability of Biological Systems (Portugal, February 2008) pdf
Models of Grammar Evolution (Portugal, February 2008) pdf
Evolution of Metabolic Networks (Portugal, February 2008) pdf
Integrative Genomics (Portugal, February 2009) pdf
Comparative Biology – Networks (Portugal, February 2009) pdf
Population Genomics (Portugal, February 2009) pdf ppt
Comparative Genomics-Signals (Portugal, February 2009) pdf ppt
Somatic Cell Genealogies (South Africa, March 2009) pdf
Metabolomics (South Africa, March 2009) pdf ppt
Genomic Dark Matter (South Africa, March 2009) pdf
Last Universal Common Ancestor (South Africa, March 2009) pdf
Selective Sweeps (South Africa, March 2009)
RNA Gene Finding (Iceland, June 2009) pdf
Influenza (Iceland, June 2009) pdf
Proteomics (Iceland, June 2009) pdf
Models for Origin of Life (Oxford, December 2009)
Multifunctional Proteins (Oxford, December 2009)
Comparative Biology – Protein Structures (Oxford, December 2009)
Comparative Protein Interaction Network (PIN) Annotation (Oxford 2010)
Stochastic Models of Networks (Oxford 2010)
Gene Regulatory Networks (GRN) Inference from Expression Data (Oxford 2010)
Epigenomics
Inferring Pedigrees
Kinetics from Molecular Dynamics
Algorithms for Predicting DNA Assembling Into a Given Shape
Advanced Models of substitutions
Mathematical Modeling of Marriage Dynamics
Computational Modeling of the Heart
Agent based Population Modeling (Oxford 2010)
Alternative Splicing
Computational Models of Origin of Life
What is Integrative Genomics (IG)?
Integration over Paths in Continuous Time Markov Chains
Evolutionary Protein Structure Comparison
Computational Approaches to Molecules, Reactions and Catalysts
Completed projects with (some) reports
Click here for past projects carried out by students in our group. Where we have been given permission, reports on the projects are made available. The projects have been categorised according to the project programme the student was working under.
Challenges in Bioinformatics
For earlier projects from this course, see also Project Sketches at Bioinformatics Research Centre, Aarhus, Denmark
MSc in Applied Statistics
Student | Approximation Sequence Evolution by Stepping Stones by using Codes and Lumpability | 2013 |
Farah Colchester | Stochastic Modes of Pedigree Generation | 2011 |
Man Tang | Assembly of Multiple Genomes from Fragments | 2011 |
Shou Zhang | Using Probabilistic Models to Infer Infection Rates in Viral Outbreaks | Feb 2007 |
ComLab MSc
Project Title | Student | Date |
Predicting RNA Secondary Structures Including Pseudoknots | Andrey Kravchenko | Sept 2009 |
Reconstruction of Ancestral Metabolisms | Jose Angel Riarola | Sept 2009 |
Evolving Language Grammar | Markus Gerstel | Sept 2008 |
Large-scale comparative annotation of bacterial genomes | Waqar Ali | Aug 2007 |
Annotation of 12 drosophila genomes using multiple neural networks and evolutionary history | Ulf Schafer | Sept 2006 |
User interface for recombination analysis | Charles Lin | Sept 2006 |
Algorithms for haplotype inference from genotypes in the presence of recombination | Syedur Rahman | Sept 2006 |
Graphical programming interface for an XML-based hidden Markov model compile | Ahmad Chugtai | Sept 2005 |
Automatic code generation for recursions involving stochastic context-free grammars | Jonathan Churchill | Sept 2005 |
4th Year Mathematics
Project Title | Student | Year |
Likelihood Maximizationylogenic Trees | James Wood | Sep 2010 |
Evolving Turing Patterns | Ashley Brooks | Feb 2006 |
Summer Students
High School Summer Students
Conrad Godfrey | Aug 2010 | RNA secondary structure prediction: the co-transcriptional effect on RNA folding | ||
Fiona Rust | Aug 2010 | Investigation of the number of possible secondary RNA structures with refernece to theoretical expressions | ||
Abigail Linton | Oct 2009 | Pattern searching in a single genome | ||
Michelle Parker | Oct 2009 | Pattern search in a single genome | ||
Ken Mawhinney
Artemisa Labi |
Oct 2008
Aug 2007 |
Basic models of nucleotide evolution | ||
DPhil Reports
Joe Herman | Stochastic Models of Structure Evolution | 2013 |
James Anderson | Gaussian Processes and Gene Regulation | 2013 |
Luke Cartey | Advance Software for Generating HMM Code | 2011 |
Marton Munz | Comparative Analysis of Molecular Motion | 2011 |
Joanna Davies | Genetic Heterogeneity and Mapping | 2010 |
Rahul Satija | Statistical Alignment and Footprinting | 2009 |
Aziz Mithani | Evolutionary Modelling and Analysis of Metabolic Networks | 2009 |
Ben Holtom | A Paralogy Based Strategy for Identifying Regulatory Elements in Mammalian Genomes | 2008 |
Naila Mimouni | An Investigation of Section Constraints on Non-Coding RNAs and Reliability of Alignments | 2008 |
Saskia de Groot | Genome Annotation and Selectional Analysis of Viral Evolution | 2007 |
Lizhong Hao | Analysis of Global Gene Expression during the Epithelium Differentiation | 2007 |
Stephen McCauley | The Annotation and Evolutionary analysis of Overlapping CDS in ssRNA Viral Genomes | 2006 |
DTC Projects