The aim of this club is to read the papers that everyone keeps citing but which few people have read. We plan to read a paper every third week for the next 20 years.
We tend to meet Wednesday mornings, and tend to announce the papers we will read ahead of time. Everyone is welcome: if a paper sounds interesting to you, please come by.
This reading group used to be organised on facebook. The old page can be found here.
- Mathematical Chemistry and Chemoinformatics, by A. Kerber et al. (summary part I and part II)
- Phylogeny: Discrete and random processes in evolution, by M. Steel (review published on SIAM News Blog: part I, part II, part III)
- Bayesian Methods in Structural Bioinformatics, edited by
T. Hamelryck, K. Mardia and J. Ferkinghoff-Borg
Phylogenetics, by C. Semple and M. A. Steel
- Protein Physics – A course of lectures, by A. V. Finkelstein and O. Ptitsyn (summary slides)
This book club is an endeavour to broaden our horizons and critically engage with good writing from across the humanities. We intend to go through a book per term, and tend to meet roughly once every 3rd week to discuss new sections of whatever book we are currently going through. Past books that we have read include:
- The General Theory of Employment, Interest and Money, by John Maynard Keynes
- The Qurʼān – A New Annoteted Translation, by Arthur J. Droge
- On Politics, by Alan Ryan
- Capital in the Twenty-First Century, by Thomas Piketty
Combinatorics of Recombination: https://www.dropbox.com/s/magvyy1jkkgin63/graduate%20lecture%201.6.17%20recombi.pptx?dl=0
Research Collaboration: https://www.dropbox.com/s/gveaj5rwp0f7eok/A%20Few%20Things.pptx?dl=0
Topics – both devoted to modelling in evolution: Models of Origins of Life & Phylogenetic
Time: Friday June 9th 2.00 PM – 4.30 PM
Venue: Department of Statistics, Oxford, Large Lecture Theatre
- 2.00 PM Generality and Robustness of the SVDQuartets Method for Phylogenetic Species Tree Estimation (Swofford)
Methods for inferring evolutionary trees based on phylogenetic invariants were first proposed nearly three decades ago, but have been virtually ignored by biologists. A new invariants-based method for estimating species trees under the multispecies coalescent model was recently developed by Julia Chifman and Laura Kubatko, building on earlier work by Elizabeth Allman, John Rhodes, and Nicholas Eriksson. This method comes from algebraic statistics and uses singular value decomposition to estimate the rank of matrices of site pattern frequencies. Although the approach shows great promise, its performance on empirical and simulated data sets has not been adequately evaluated.
I will give a general introduction to the SVDQuartets method and present some results from a simulation study currently in progress (collaboration with Laura Kubatko and Colby Long) that demonstrate that SVDQuartets is potentially highly robust to deviations from the standard evolutionary models assumed by other species-tree estimation methods.
- 3.30PM Autocatalytic Sets and the Origin of Life (Hordijk)
The main paradigm in origin of life research is that of an RNA world, where the idea is that life started with one or a few self-replicating RNA molecules. However, so far nobody has been able to show that RNA can catalyze its own template-directed replication. What has been shown experimentally, though, is that certain sets of RNA molecules can mutually catalyze each other’s formation from shorter RNA fragments. In other words, rather than having each RNA molecule replicate itself, they all help each other’s formation from basic building blocks, in a self-sustaining network of molecular cooperation.
Such a cooperative molecular network is an instance of an autocatalytic set, a concept that was formalized and studied mathematically and computationally as RAF theory.This theory has shown that autocatalytic sets are highly likely to exist in simple polymer models of chemical reaction networks, and that such sets can, in principle, be evolvable due to their hierarchical structure of many autocatalytic subsets. Furthermore, the framework has been applied succesfully to study real chemical and biological examples of autocatalytic sets.
In this talk I will give a general (and gentle) introduction to RAF theory, present its main results and how they could be relevant to the origin of life, and argue that the framework could possibly also be useful beyond chemistry, such as in analyzing ecosystems or even economic systems.
WINE IN COMMON AREA AFTER TALKS
Speaker: Stephen Altschul
Title: Dirichlet Mixtures, the Dirichlet Process, and the Topography of Amino Acid Multinomial Space
Venue: Tuesday May 23rd 3.30 PM Department of Statistics, Lecture Theatre (Lower Ground)
Abstract: The Dirichlet Process is used to estimate probability distributionsthat are mixtures of an unknown and unbounded number of components.Amino acid frequencies at homologous positions within related proteins have been fruitfully modeled by Dirichlet mixtures, and we have used the Dirichlet Process to construct such distributions. The resulting mixtures describe multiple alignment data substantially better than do those previously derived. They consist of over 500 components, in contrast to fewer than 40 previously, and provide a novel perspective on protein structure. Individual protein positions should be seen not as falling into one of several categories, but rather as arrayed near probability ridges winding through amino-acid multinomial space.
The slides will be made available after the talk.
Comment: Stephen Altschul has finally proven that I can’t add 2 and 2. I have attended Altschul Dinners at my College [University College, Oxford] and never thought of connecting the two words Altschul and Altschul despite their obvious similarity. It is in honour of Stephen’s grandmother, whose brother was Arthur Lehman Goodhart and Master of UNIV 1951–63.
Today we hosted a talk by Gereon Kaiping from Leiden University. The talk went through the pipeline his group uses to go from linguistic data collected in the field to reconstructed phylogenetic trees of languages produced using mostly off-the-shelf tools. There was a lively discussion with an audience of linguists, bioinformaticians, and statisticians.
Gereon has kindly made his slides available, which can be viewed below, or downloaded.
There is a famous danish sketch called “Jarl Kakadue” from the show “Casper og Mandrilaftalen”. In the sketch, Jarl explains how he completed an iron man, but instead of running a marathon, he got a good nights sleep instead.
“But isn’t that cheating?” to host asks, to which Jarl replies “No, because such a run takes a couple of hours, but a proper nights sleep is at least 8 hours.”
As the sketch goes on, more and more of the exercise gets replaced. The full thing can be seen here: (in danish)
The concept of Extreme Reading is also a modified iron man in the following sense:
instead of swimming, we read a book.
instead of cycling, we summarise the book
instead of running a marathon, we run half a marathon (over 3 days)
So each day, we read for a couple of hours, ran 7 kilometers, read some more and then we summarized the book for each other and discussed it.
The book i question was “The origin and nature of life on earth – the emergence of the fourth biosphere” – by Eric Smith and Harold J. Morowitz
Unfortunately, the book is rather wordy and not very mathematical. The individual sections are nicely structured, but the book lacks an main message and sense of direction.
This is puzzling, since Morowitz other books are usually shorter and more precise. However, Morowitz died before the book was published, was very weak the last decade, published little in that period and was in general very short in his formulations, while this book is very long (at times lenghty). It is unclear how much Morowitz contributed to the present book.
This book is 600 pages long and consists of 8 chapters. This is a very hard topic to write a coherent book about and the chapters are quite free-standing contributions to describing or explaining the theory of life.
Eric Smith gave a talk somewhat based on the book, which can be found here: https://www.youtube.com/watch?v=0cwvj0XBKlE
The 4 geospheres are:
The point of the title is that life should be though of as a planetary property. However, the point seems more philosophical than scientific, which is the case with many of the subtle points in the book.
A longer summary will be added later.
Overall, the project was a success. We managed to run and read a lot. It is a very satisfying feeling to be both mentally and physically exhausted and we can definitely recommend similar undertakings.
Tomorrow afternoon we are hosting a talk by Gereon Kaiping, who we met at a recent workshop. All are welcome; details below.
Time and location: Department of Statistics on Tuesday 25th April at 4.00 pm – 5.00 pm in the Small Lecture Theatre (LG.03).
Speaker: Gereon Kaiping , University of Leiden
Title: Some Assembly Required: From sounds to histories in 8 steps using mostly off-the-shelf tools.
Abstract: Phylogenetic methods are gaining traction in linguistics, but have so far been quite inaccessible to linguists:
The core tools doing the tree construction – whether they be heuristic or Bayesian – often come from bioinformatics, and their inputs (eg. Nexus files) and outputs (eg. Newick trees without explicit reconstruction) conform to biological, not linguistic standards – or they are ad-hoc written for a specific datasets. However, this situation is changing: In this talk, I will present a collection of tools, most of which are published elsewhere, that together go the full way from linguistic fieldwork via public cross-linguistic linked databases and Bayesian inference tools to plots of phylogenetic trees with ancestral state reconstruction. I will describe both emerging standards in quantitative historical linguistics that make this process easier, and specific challenges that arose in the construction of this tool chain. The talk will conclude with the discussion of some results from the reconstructed word-meaning correspondences in the Lesser Sunda region of Indonesia, and how they feed back into improving our data and understanding of the local language history.
I am happy with our little book clubs, but they induce the wish to read more books than we actually read. Especially I have found it frustrating that we had to cut our economics studies short. I know some people have tried to read very large amounts in a very short time span like 12-24-36 hours. It is really demanding but most likely very rewarding.
Now I should like to try this on:
Carlin and Soskice (2014) Macroenomics: Institutions, Instability, And The Financial System – about 600 pages
Eric Smith and Harold Morowitz (2016): The Origin and Nature of Life on Earth.
I should like to start Friday morning 9AM and be done by Sunday 6PM.
We will start with the Origin of Life book and do it April 21st to 23rd.
Does somebody want to participate?? It is possible to do via Skype.
I suggest each day:
Read 100 pages
Write 1 page summary
Run 7 km
Read 100 pages
Write 1 page summary
Dinner – Sleep
Take Monday off. Maybe all week. Maybe quit academia.
We will also make a powerpoint presentation over the book, but maybe after the 3 days.
I originally wanted to suggest running a marathon, but realism made me suggest a ½ marathon in installments instead