Home » Projects » Old projects

Old projects

This page includes details of projects which are complete, not currently in progress, or not being actively updated. That said, many of these projects contain interesting ideas which may be revived in the future!

Click on the blue text below to see the list of projects in each section.

For some years, Jotun ran a summer school in computational biology – click here for a list linking to some of the projects from those summer schools.

Our grant applications are a good indication of our group’s research interests – click here for a list of a few grant applications we made in years gone by.

Funded Grants

Other Grant Applications

We have implemented a number of our methods as software packages, most of which are not actively maintained.
This is a genome rearrangement server. It takes the gene orders and reading directions of two genomes and samples from the posterior probabilities of trajectories transforming the first genome into the second one using a Partial Importance Sampler technique, which is a version of the large class of Markov chain Monte Carlo methods.

Section 26
This is a software suite for parsimonious recombination analysis of single nucleotide polymorphism data under the infinite sites assumption. At the core is a branch & bound method for finding the exact minimum number of recombinations needed for a data set with an accompanying evolutionary history.

This is an RNA fold server. It takes an alignment of RNA sequences as input and predicts a common structure for all sequences. The folding method uses a combination of rate matrices and a stochastic context free grammar.

Istvan is a collection of modules implementing various methods for inferring an invariant set, i.e. a set of genes with no or little difference in actual expression levels, from a set of pairs of microarray measured expression levels, and for inferring normalising functions based on a set of pairwise data, possibly an invariant set determined by one of the implemented methods.

This is a program for parsing a general hidden Markov model with a stochastic context free grammar. It is based on an extension of the CKY algorithm for parsing a sequence, and essentially finds the most probable combination of a path in the HMM and a parse tree in the SCFG under assumption of independence.

Rahnuma is a tool for prediction and analysis of metabolic pathways and comparison of metabolic networks that represents metabolic networks as hypergraphs. Rahnuma computes all possible pathways between two or more metabolites by using constrained depth first traversal of a hypergraph. It also allows pathway based metabolic network comparisons at organism as well as phylogenetic level.

StatAlign is an extendable software package for Bayesian analysis of Protein, DNA and RNA sequences. Multiple alignments, phylogenetic trees and evolutionary parameters are co-estimated in a Markov Chain Monte Carlo framework, allowing for reliable measurement of the accuracy of the results. The models behind the analysis permit the comparison of evolutionarily distant sequences: the TKF92 insertion-deletion model can be coupled to an arbitrary substitution model.

Starfold is a Java program for predicting RNA secondary structures including a class of pseudoknots. It requires a parameter file to be present. At the moment, running it directly from the JAR archive is only possible under 32-bit Windows.

Frnakenstein is a python program for solving the inverse RNA folding problem, i.e. given a target structure it attempts to design a sequence folding into this stucture.

Click for a list of old project proposals. These are typically proposals that have already been the basis of one or more student projects. The questions involved do evidently interest us, so if you believe you have a new angle on the proposal please feel free to contact us to discuss the possibility of doing it as a project.

Modelling Evolution, Genome annotation and Comparative Biology
A Stochastic Model of Gene Duplication and Loss (March 15)
Statistical Model for the Evolution of Directions (March 15)
The Origin of Life and Toy Chemistries (Jotun Hein) (Dec 14)
Understanding the evolution of assembly specificity in protein oligomers (March 15)
Molecular Evolution of Primate Retroelements (Aris Katzourakis and Jotun Hein) (Dec 14)
Investigating the Miklos-Lunter-Holmes (2004) Model of Insertion-deletion (Jotun Hein) (Dec 14)
Markov Random Fields on Biological Networks (Jotun Hein and George Deligiannidis) (Dec 14)
Metrics on RNA Secondary Structure Ensembles (Jotun Hein) (Dec 14)
Algorithms for finding Autocatalytic Systems Summary (June 10)
Virus evolution simulation Summary (June 10)
Measurement of Selection on RNA molecules Summary (June 10)
Modelling the dynamics and evolution of nitrogen assimilation in plant pathogenic Pseudomonas Collaboration with Gail Preston Summary (June 10)
Modelling evolution of protein secondary structure topologies Summary (June 10)
Evolutionary Pattern Formation Summary (June10)
Evolving Dynamical Systems – case study: Cell Cycle Collaboration with Bela Novak Summary (Feb 10)
Evolutionary Models for Complex Signals Summary (Feb 10)
Evolutionary Models for Combined Regulation-Metabolism Graphs Collaboration with Gail Preston Summary (Dec 09)
RNA, Stochastic Context Free Grammars and Classifiers Summary (June 09)
Comparative Annotation of Metabolic Pathways Summary (April 09)
Analysing Multiple Functionalities in Proteins Summary (March 09)
Local Pairwise Statistical Alignment  Summary (Oct 08)
Fine Scale Regulatory Annotation of a Gene Summary (May 08)
A Unified Approach to Signal Detection Summary (May 08)
MCMC Integration over Evolutionary Histories of Metabolic Networks Collaboration with Tom Snijders Summary (May 08)
Evolving Language Grammars: “Evolving English” Collaboration with Stephen Clark Summary (Jan 07)
Evolving Biological Grammars Summary (June 07)
Evolutionary Docking of Proteins Summary (May 07)
Stochastic Models Combining Alignment and Annotation Summary (Oct 07)
Parallelising pairwise-statistical alignment Summary (Feb 06)
Fitting Genome Models To Known Virus Structures Summary (June 07)
Evolutionary Analysis of Molecular Movements Collaboration with Thomas Darden Mark Sansom Summary (Jan 07)
Choice of parameter set for use with a mathematical model of mechanical force generation in the heart Summary (Feb 06)
Extending the Domain of Comparative Genomics Collaboration with Kay Davies Summary

Population Genetics, Mapping and Genealogical Structures

The following project proposals are all motivated by the wide use of population variation data.  The major genealogical structures are phylogenies, pedigrees and ancestral graphs (ARG).  A central use of variation data is mapping – making statements about the positions in the genome that is causal for individual phenotypes, such as disease.  New sequencing techniques create new opportunities for research, such as pedigree and somatic tree inference but also changes the nature of more traditional problems, such as phylogeny, alignment and recombination analysis, due to the large quantity of data.

Genetic and Genealogical Ancestors (April 2011)
Mapping and Arabidopsis Collaboration with Richard Mott Summary July 10)
Networks and Association mapping Summary Collaboration with Andrey Rzhetsky (Sep 08)
A Gibbs Sampler of the ancestral recombination graph Summary (July 08)
From exact marginals to good importance sampling  Summary (June 08)
Statistical Alignment via k-Restricted Steiner Trees Summary (May 08)
“Corner Cutting” approaches to the Ethier-Griffiths-Tavare Recursions Summary (March 08)
Workbench for Ancestral Recombination Graph Summation Summary (March 08)
Population Pedigree Inference from Genomic Data Collaboration with Steffen Lauritzen Summary (Jan 08)
Counting Ancestral Recombination Graph (ARG) Topologies Summary (June 07) 
Counting Pedigrees up to Isomorphism Summary (June 07)
Somatic Cell Genealogies and Differentiation Collaboration with Kevin Talbot Summary (May 07)
User Interface for Recombination Analysis Summary (Dec 05)

Systems Biology

These projects are all motivated by the present rise of systems biology.  Systems biology poses many questions, both in terms of modelling on a large scale, how feasible it is to infer biological systems and the use of concepts in this field.  Networks are central to many systems biology models and the role of evolution also needs to be explored.
Difficult Concepts in Systems Biology III: Function and Purpose Summary (Jan 09)
Difficult Concepts in Systems Biology II: Levels and Reduction Summary (Oct 08)
Identifiability of a Simple Biological System Summary (Jan 08)
Difficult Concepts in Systems Biology: Emergence Collaboration with Carsten Wiuf Summary (Jan 08)
Parameter and Sensitivity Analysis for Large System of ODEs Collaboration with Dagmar Iber Summary (Nov 07)

Algorithmic, Probabilistic and Modelling Challenges

Computational Biology leads to a series of technical problems that could be undertaken by someone with a more pure interest in combinatorics, statistics, mathematics, algorithms, modeling or software development.  Some of these might have biological terms in them, but the biological component is minimal (or could be minimized).
Error Correcting Codes, Lumpability and Sequence Evolution (Jan 12)
Incorporating RNA secondary structure prediction into StatAlign (Jan 12)
Kinetic and Co-Transcriptional Folding of RNA (Jan 12)
Dealing with Large, Sparse Continuous Time Markov Chains (Dec 11)
RNA Grammar Search Sum (July 10)
Combining Stochastic Grammars Summary (June 10)
Efficient sampling of ancestral states in the infinite site model Summary (July 09)
Multiple Alignment Using Guide Networks Summary (Nov 08)
A Constraint Optimization Problem in Phylogenetics Collaboration with Raphael Hauser Summary (Dec 08)
Combinatorics of Biological Networks Collaboration with Alex Scott Summary (Oct 08)
Automatic Code Generation for Probabilistic Inference in Computational Biology Collaboration with Oege de Moor Summary (Oct 08)
Gaussian Processes and Gene Regulation Summary (June 08)
Combinatorics Problems in Genome Rearrangement Summary (March 08)
Temporal Multiple Statistical Alignment Collaboration with Gerton Lunter Summary (June 07)
Artifacts from Combining Hidden Markov Models Summary (March 07)
Path Sampling in Continuous Time Markov Chains
Stochastic Turing Patterns Summary (April 06)
Pseudoknots in RNA secondary structure Summary
How many transcripts does it take to reconstruct the Splice Graph? Summary (05)
Parallelisation of Recombination Analysis Summary (Feb 06)
Combining RNA energy minimisation with microscopy information Summary (04)

Collaborative Data Analysis
Testing the Biogeographical Hypotheses Collaboration with Finn Borchsenius & Anders Barfod
Molecular Evolution of Selected Families of Human Endogenous Retroviruses Collaboration with Palle Villesen & Hugo Martins
Phylogenomic Analysis of Algae  Collaboration with Tom Cavalier-Smith Summary
Recombination analysis in Arabidopsis Thaliana (Dec 11)
Stochastic Models of Leaf Shape Evolutiion Collaboration with Nick Jones and Miltos Tsiantis Summary (Feb 11)
Footprinting with additional knowledge Collaboration with Richard Mott Summary (Feb 10)
Fine Scale Regulatory Annotation of Cancer Genes Collaboration with Thorunn Rafnar Summary (Aug 09)
Analysis of single-molecule FRET trajectories of transcription complexes based on Hidden-Markov Modelling Collaboration with Achilles Kapanides Summary (Jan 09)
Reconstruction of an Ancestral Protein: Sequence, Function, Motion and Structure  Collaboration with Lee Pedersen and Mark Sansom Summary (Dec 08)
Cataloguing sequences homologous to the Rhodobacter flagellar motorCollaboration with Judith Armitage Summary (Feb 08)
Computational Promoter Analysis of non-Coding RNAs Collaboration with Kay Davies Summary (July 07)
Computational Promoter Analysis of Metazoan α-Globins Collaboration with Doug Higgs Summary (June 07)
Annotate 12 Drosophila genomes for regulatory signals Collaboration with Lior Pachter & Vasile Palade Summary (June 07)
Phylogenetic Analysis of “New” Homeobox in the Lineage Leading to Humans Collaboration with Peter Holland Summary (Feb 07)
Structural Analysis of Aptamers Collaboration with William James Summary (Dec 05)

RNA Structure and Evolution Modelling
Evaluation of SCFGs Summary (May 11)
Evolution Grammar Search Summary (April 11)
Practical Implications of Grammar Ambiguity on RNA Secondary Structure Prediction Summary (April 11)
Boltzmann Weighted Combinatorics of RNA Secondary Structures Summary

Computational Origin of Life Models
Mass Action Equations for Autocatalytic Systems (April 11)
Autocatalytic Sets of RNAs Collaboration with Wim Hordijk & Mike Steel Summary (April 11)
Proposal for the Development of a Software Package for Simulating and Studying Catalytic Reaction Systems and Autocatalytic Sets Summary
High School Projects
RNA Secondary Structure  (Algorithms) Summary (July 10)
Pairwise Alignment (Algorithms) Summary (June 08)
Sequence Evolution (Probability Theory) Summary (June 08)
Signals in Single Genomes (Computer Science) Summary (July 09)
Counting in Phylogenetics (Combinatorics) Summary (July 07)

Reading projects

Reading projects.

These projects are used to describe a topic and give some references so that a group of students can give a 40-90 minute presentation after 8-10 hours’ work.  The projects have been used for courses given in Portugal, South Africa, Denmark, Oxford and Iceland

Identifiability of Biological Systems (Portugal, February 2008) pdf
Models of Grammar Evolution (Portugal, February 2008) pdf
Evolution of Metabolic Networks (Portugal, February 2008) pdf
Integrative Genomics (Portugal, February 2009) pdf
Comparative Biology – Networks (Portugal, February 2009) pdf
Population Genomics (Portugal, February 2009) pdf ppt
Comparative Genomics-Signals (Portugal, February 2009) pdf ppt
Somatic Cell Genealogies (South Africa, March 2009) pdf
Metabolomics (South Africa, March 2009) pdf ppt
Genomic Dark Matter (South Africa, March 2009) pdf
Last Universal Common Ancestor (South Africa, March 2009) pdf
Selective Sweeps (South Africa, March 2009)
RNA Gene Finding (Iceland, June 2009) pdf
Influenza (Iceland, June 2009) pdf
Proteomics (Iceland, June 2009) pdf
Models for Origin of Life (Oxford, December 2009)
Multifunctional Proteins (Oxford, December 2009)
Comparative Biology – Protein Structures (Oxford, December 2009)
Comparative Protein Interaction Network (PIN) Annotation (Oxford 2010)
Stochastic Models of Networks (Oxford 2010)
Gene Regulatory Networks (GRN) Inference from Expression Data (Oxford 2010)
Inferring Pedigrees
Kinetics from Molecular Dynamics
Algorithms for Predicting DNA Assembling Into a Given Shape
Advanced Models of substitutions
Mathematical Modeling of Marriage Dynamics
Computational Modeling of the Heart
Agent based Population Modeling (Oxford 2010)
Alternative Splicing
Computational Models of Origin of Life 
What is Integrative Genomics (IG)? 
Integration over Paths in Continuous Time Markov Chains  
Evolutionary Protein Structure Comparison 
Computational Approaches to Molecules, Reactions and Catalysts

Completed projects with (some) reports

Click here for past projects carried out by students in our group. Where we have been given permission, reports on the projects are made available. The projects have been categorised according to the project programme the student was working under.

Challenges in Bioinformatics

Crossover & Gene Conversion Discovery Using Local Phasing Stefania Olafsdottir Oct 2013
Classification of Non-Ribosomal Peptide Syntheses with Feed-Forward Neural Networks Dan Sondergaard, Torben Andersen & Henrik Schmidt-Moller Oct 2013
Inference of Population History in an Isolation-with-Migration Model Jade Y Cheng Oct 2013
Maximum Entropy Model for Alternative Splicing Starting Pattern Qianyun Guo Oct 2013
Endogenous Retrovirus Trune Line Okholm & Carina Tansgaard Oct 2013
Modelling the Contribution of SNP Variation in HSD11B2 to the Risk of Schizophrenia in Individuals Exposed to Pre and Postnatal Stress Jean-Christophe Debost
Bioinformatic Studies of Bacterial Ca2+ Transporting P-type ATPases Mateus Dyla Oct 2012
Assembly in Theory and Practice Jesper Jensen Bjerg Oct 2012
Is a Specific Type of Endogenous Retrovirus a Cause of Leukemia? Karen Jessen Oct 2012
Classification of Non-Ribosomal Peptides Using Support Vector Machines Hannah Acheson-Field & Eric Chu Dec 2013
RNA Secondary Structure Prediction Mads Krogh Jensen

For earlier projects from this course, see also Project Sketches at Bioinformatics Research Centre, Aarhus, Denmark


MSc in Applied Statistics

Student Approximation Sequence Evolution by Stepping Stones by using Codes and Lumpability 2013
Farah Colchester Stochastic Modes of Pedigree Generation 2011
Man Tang Assembly of Multiple Genomes from Fragments 2011
Shou Zhang Using Probabilistic Models to Infer Infection Rates in Viral Outbreaks Feb 2007

ComLab MSc

Project Title Student Date
Predicting RNA Secondary Structures Including Pseudoknots Andrey Kravchenko Sept 2009
Reconstruction of Ancestral Metabolisms Jose Angel Riarola Sept 2009
Evolving Language Grammar Markus Gerstel Sept 2008
Large-scale comparative annotation of bacterial genomes Waqar Ali Aug 2007
Annotation of 12 drosophila genomes using multiple neural networks and evolutionary history Ulf Schafer Sept 2006
User interface for recombination analysis Charles Lin Sept 2006
Algorithms for haplotype inference from genotypes in the presence of recombination Syedur Rahman Sept 2006
Graphical programming interface for an XML-based hidden Markov model compile Ahmad Chugtai Sept 2005
Automatic code generation for recursions involving stochastic context-free grammars Jonathan Churchill Sept 2005

4th Year Mathematics

Project Title Student Year
Likelihood Maximizationylogenic Trees James Wood Sep 2010
Evolving Turing Patterns Ashley Brooks Feb 2006

Summer Students

Project Title Student Year
Search for life in catalytic reactiion systems Ina Trolle Andersen, Lin

Nan & Maiken Ina Siegismund


Aug 2010
Modelling coupled evolution of gene regulatory and metabolic networks Hong Noh Aug 2010
Ancestral recombination histories for error detection in genome sequencing  Chris Campbell, Zi Wang & Qian Yu Aug 2010
A genetic algorithm for evolving stochastic context-free grammars James Anderson, Joe Staines & Paula Tataru Aug 2010
Spannoid alignment  Stefan Hansen, Rita Pancsa & Marcus Webb Aug 2010
Metabolic random fields Artemisa Labi & Chris Campbell Oct 2009
Computational analysis of the regulatory region of vertebrate GSX1 Syazana Ebil Oct 2009
Fine scale regulatory annotation of cancer genes  Gabor Boross, Cathleen Heil, Andras Gyorgy & Nanette Coetzer Aug 2009
Difficult concepts in Systems Biology II: the concept of reduction in systems biology Nick Tasker & Andrew Stephenson Aug 2009
Difficult concepts in Systems Biology III: function and purpose Nick Tasker Aug 2009
MCMC and the infinite set model Miklos Zoltan Racz Aug 2009
Gaussian Processes and Gene Regulation James Anderson & Chris Choy Aug 2009
Paths in Markov Chains with applications to protein evolution   Tomas Fabsic & Andreas Sand Pedersen Aug 2009
Population pedigree inference from genomic data  Jon Ingi Sveinbjornsson & Eirikur Fannar Torfason Aug 2009
Fine scale regulatory annotation of ORMDL3 Katalin Orosz Aug 2008
The concept of emergence in systems biology Angela Matthies, Andrew Stephenson & Nick Tasker Aug 2008
From exact marginals to better importance sampling Anders Okholm & Camilla Mondrup Andreassen Aug 2008
Corner cutting approaches to the Either-Griffiths Tavare recursions Stephen O’Keeffe & Ferenc Huszar Aug 2008
Corner cutting in statistical alignment Jesper Nielsen Aug 2007
Random drift on probability distributions Lu Gram Aug 2007
On the Monotonicity of HIPP Lu Gram Aug 2007
On recombination induced multiple coalescent events – published in Genetics Joanna Davies & Frantisek Simancik Oct 2006
Evolving turing patterns Maria Demidova Oct 2006
Protein Secondary Structures Enumeration Max Leung Sep 2006
Counting pedigrees up to isomorphism Tong Chen Oct 2006
Combining different grammars to make multiple annotations of a single sequence Joanna Davies Feb 2005
Gap Attraction: A new measure for whole-genomealignment Naila Mimouni 2004

High School Summer Students

Conrad Godfrey Aug 2010 RNA secondary structure prediction: the co-transcriptional effect on RNA folding
Fiona Rust Aug 2010 Investigation of the number of possible secondary RNA structures with refernece to theoretical expressions
Abigail Linton Oct 2009 Pattern searching in a single genome
Michelle Parker Oct 2009 Pattern search in a single genome
Ken Mawhinney

Artemisa Labi

Oct 2008

Aug 2007

Basic models of nucleotide evolution

Counting in Phylogenetics

DPhil Reports

Joe Herman Stochastic Models of Structure Evolution 2013
James Anderson Gaussian Processes and Gene Regulation 2013
Luke Cartey Advance Software for Generating HMM Code 2011
Marton Munz Comparative Analysis of Molecular Motion 2011
Joanna Davies Genetic Heterogeneity and Mapping 2010
Rahul Satija Statistical Alignment and Footprinting 2009
Aziz Mithani Evolutionary Modelling and Analysis of Metabolic Networks 2009
Ben Holtom A Paralogy Based Strategy for Identifying Regulatory Elements in Mammalian Genomes 2008
Naila Mimouni An Investigation of Section Constraints on Non-Coding RNAs and Reliability of Alignments 2008
Saskia de Groot Genome Annotation and Selectional Analysis of Viral Evolution 2007
Lizhong Hao Analysis of Global Gene Expression during the Epithelium Differentiation 2007
Stephen McCauley The Annotation and Evolutionary analysis of Overlapping CDS in ssRNA Viral Genomes 2006

DTC Projects

Probing Prebiotic Toy-Chemical Reaction Systems for Autocatalytic Sets Federico Paoletti 2015
Evolving and primitive embryotic cell cycle model Malte Luecken 2013
Real chemical systems, RAFs and the Origin of Life Lukas Hutter 2013
Modelling coupled evolution of gene regulatory and metabolic networks Hong Noh 2010
Integrating prior knowledge to reverse-engineer simple biological systems Martin Munz 2008
Hidden Markov modelling of single molecular FRET trajectories Brian Cheung
A phylogenetic model for the prediction of quantitative characteristics and applications to cardiac modelling Richard Mann 2008
Applications of process algebra in systems biology Nicolas Wu 2006
Grammar and phylogenies Robin Ryder 2006
Evolution of metabolic networks Eleni Giannoulatou 2006
How many transcripts does it take to reconstruct the splice graph? Paul Jenkins 2006
Identifying gene clusters and regulatory themes using time course expression data, hidden Markov models and transcription factor information Karen Lees 2004
Hidden Markov models for protein sequence alignment Naila Mimouni 2004
The hunt for genomic dark matter: aligning non-coding functional DNA Naila Mimouni 2004
An analysis of the relative efficacy of the Nussinov-Felenstein and the Knudsen-Hein RNA Secondary structure prediction algorithms Stephen McCauley 2004