Home » Day-to-day

Category Archives: Day-to-day

Skype Book Discussion Group in “Computational Complexity of Sampling”

The present version can be found here:

http://renyi.hu/~miklosi/CCC/ComputationalComplexityOfCountingAndSampling.pdf

The author – Istvan Miklos – believes he will always be ahead of the readers in writing. We would then write a review that would be published about the same time as the book was published and we put an extended report on this page:

https://heingroupoxford.com/learning-resources/lectures/

We also give a summarizing lecture when we have finished the book. Earlier when we did this, we met every 2nd day doing about 20 pages each time, but it can depend on the individual book. We did a similar thing to Mike Steels 2016-book, which I believe was beneficial to both authors and readers.

The ideal number of participants in such a group is 3-5. It would have to be online since I will be Israel. I like to choose a time that is either starting or ending of working day so it interpheres minimally with work. If you know somebody interested in participating in this, please tell me. If it proves a crappy book, we will stop reading, but that is not what I expect.

What did we learn at: Origins of Life Conference – ISSOL17

What did we learn at: Origins of Life Conference – ISSOL17

  • written by Jotun Hein

Overall attending the conference was a very useful since I haven’t been to an Origins Conference for more than 5 years and since I have stopped teaching Origins, in general, I don’t read so much on the chemical nitty-gritty.

The was much interesting material at the conference and of course, I met some people from Oxford, that I had never seen before working on catalysis.
The first day [Monday] was mainly devoted to Exoplanets and Meteorites/Comets/Transport of Organic Matter.
The second day [Tuesday] was the physical condition on earth 4 Billion or so years ago.
The third 1/2 day [Wednesday] was dedicated to the first chemical steps towards life.
The last 2 days were on the early evolution of life and more theoretical models.

Origins of Life studies are clearly getting a lot more attention/funding now. Computational studies play a much larger role. There are much more serious attempts at synthesizing life de Novo. But I can’t say there is a single convincing scenario for planet Earth. Exoplanets clearly are very exciting, but there is no way to study the architecture of life so far away [barring SETI – that was unrepresented at ISSOL] so all one can hope for a couple of centuries is observation of convincing bio-signatures.

There seemed to have been a lot of organizational problems. I didn’t know where to go and sleep and ended up sitting all night in the airport (while paying for a room at UCSD). Another person I met had experienced something else. The conference dinner was not very different from the free dinner and there were no arrangements of where to go. Anybody going to conferences/workshops knows that many connections are made at the evening socializing.

I, William Kurdahl and possibly some from the Oxford Catalysis will give an informal orientation about the meeting Tuesday, August 29th 3 PM in the small lecture room in The Department of Statistics, Oxford.
William and I both chose 5 papers/presentations that we liked.

These are the slides in progress:
https://www.dropbox.com/s/p5tmy3a1g8i2kd0/ISSOL.pptx?dl=0

Talk by BLAST INVENTOR

Speaker: Stephen Altschul

Screenshot 2017-05-17 10.51.14Title: Dirichlet Mixtures, the Dirichlet Process, and the Topography of Amino Acid Multinomial Space

Venue: Tuesday May 23rd 3.30 PM  Department of Statistics, Lecture Theatre (Lower Ground)

Abstract:   The Dirichlet Process is used to estimate probability distributionsthat are mixtures of an unknown and unbounded number of components.Amino acid frequencies at homologous positions within related proteins have been fruitfully modeled by Dirichlet mixtures, and we have used the Dirichlet Process to construct such distributions.  The resulting mixtures describe multiple alignment data substantially better than do those previously derived.  They consist of over 500 components, in contrast to fewer than 40 previously, and provide a novel perspective on protein structure.  Individual protein positions should be seen not as falling into one of several categories, but rather as arrayed near probability ridges winding through amino-acid multinomial space.

The slides will be made available after the talk.

Comment: Stephen Altschul has finally proven that I can’t add 2 and 2. I have attended Altschul Dinners at my College [University College, Oxford] and never thought of connecting the two words Altschul and Altschul despite their obvious similarity. It is in honour of Stephen’s grandmother, whose brother was Arthur Lehman Goodhart and Master of UNIV 1951–63.

 

 

A tools-focussed talk: from raw linguistic data to reconstructed language trees

Today we hosted a talk by Gereon Kaiping from Leiden University. The talk went through the pipeline his group uses to go from linguistic data collected in the field to reconstructed phylogenetic trees of languages produced using mostly off-the-shelf tools. There was a lively discussion with an audience of linguists, bioinformaticians, and statisticians.

Gereon has kindly made his slides available, which can be viewed below, or downloaded.

Extreme Reading – status report

There is a famous danish sketch called “Jarl Kakadue” from the show “Casper og Mandrilaftalen”. In the sketch, Jarl explains how he completed an iron man, but instead of running a marathon, he got a good nights sleep instead.
“But isn’t that cheating?” to host asks, to which Jarl replies “No, because such a run takes a couple of hours, but a proper nights sleep is at least 8 hours.”

As the sketch goes on, more and more of the exercise gets replaced. The full thing can be seen here: (in danish)

The concept of Extreme Reading is also a modified iron man in the following sense:
instead of swimming, we read a book.
instead of cycling, we summarise the book
and
instead of running a marathon, we run half a marathon (over 3 days)

So each day, we read for a couple of hours, ran 7 kilometers, read some more and then we summarized the book for each other and discussed it.

The book i question was “The origin and nature of life on earth – the emergence of the fourth biosphere” – by Eric Smith and Harold J. Morowitz

Unfortunately, the book is rather wordy and not very mathematical. The individual sections are nicely structured, but the book lacks an main message and sense of direction.

This is puzzling, since Morowitz other books are usually shorter and more precise. However, Morowitz died before the book was published, was very weak the last decade, published little in that period and was in general very short in his formulations, while this book is very long (at times lenghty). It is unclear how much Morowitz contributed to the present book.

This book is 600 pages long and consists of 8 chapters. This is a very hard topic to write a coherent book about and the chapters are quite free-standing contributions to describing or explaining the theory of life.

Eric Smith gave a talk somewhat based on the book, which can be found here: https://www.youtube.com/watch?v=0cwvj0XBKlE

The 4 geospheres are:
Atmosphere (air)
Hydrosphere (water)
Lithosphere (earth)
Biosphere (life)

The point of the title is that life should be though of as a planetary property. However, the point seems more philosophical than scientific, which is the case with many of the subtle points in the book.

A longer summary will be added later.

Overall, the project was a success. We managed to run and read a lot. It is a very satisfying feeling to be both mentally and physically exhausted and we can definitely recommend similar undertakings.

Talk tomorrow 25/4 on phylogenetics tools for historical linguistics

Tomorrow afternoon we are hosting a talk by Gereon Kaiping, who we met at a recent workshop. All are welcome; details below.

Time and location: Department of Statistics on Tuesday 25th April at 4.00 pm – 5.00 pm in the Small Lecture Theatre (LG.03).

Speaker:        Gereon Kaiping , University of Leiden

Title:          Some Assembly Required: From sounds to histories in 8 steps using mostly off-the-shelf tools.

Abstract:       Phylogenetic methods are gaining traction in linguistics, but have so far been quite inaccessible to linguists:
The core tools doing the tree construction – whether they be heuristic or Bayesian – often come from bioinformatics, and their inputs (eg. Nexus files) and outputs (eg. Newick trees without explicit reconstruction) conform to biological, not linguistic standards – or they are ad-hoc written for a specific datasets. However, this situation is changing: In this talk, I will present a collection of tools, most of which are published elsewhere, that together go the full way from linguistic fieldwork via public cross-linguistic linked databases and Bayesian inference tools to plots of phylogenetic trees with ancestral state reconstruction. I will describe both emerging standards in quantitative historical linguistics that make this process easier, and specific challenges that arose in the construction of this tool chain. The talk will conclude with the discussion of some results from the reconstructed word-meaning correspondences in the Lesser Sunda region of Indonesia, and how they feed back into improving our data and understanding of the local language history.

End of the phylogenetic methods in historical linguistics workshop

Sadly the workshop is over, and we are preparing to return to sunny Oxford! We enjoyed two final talks today, which we summarise below. We have also written up summaries of Tuesday’s talks, and Wednesday’s talks.

Causal inference of evolutionary networks – Johannes Dellert, University of Tübingen

This speaker began by discussing the difficulties with building up phylogenetic networks. Most phylogenetic methods (on languages as well as in biological contexts) are based on trees, but these trees imply a greater independence than we know to be realistic – they usually fail to capture language contact and influence, which can be a major driver of similarity between languages (separate from inheritance). Methods which do utilise networks are usually either visualisations of other kinds of data (where nodes don’t correspond to languages, for instance), or are restricted to narrow sub-classes of network structure which are not often powerful enough to capture the kinds of relationships that one would like to capture.

To address this, the speaker presented a project based on the concept of causal inference, building a network of causal relationships between observational data alone. Correlation does not imply causation – but by considering correlations on a connected network, it’s possible to delete edges on the network in such a way that leads to a structure of causal relationships explaining the observed correlations. The results were mostly very good, and went beyond any previously available method or tool for such analysis. There are some artefacts, e.g. with a group of languages that had influence from German, but where one language in particular had had a lot of German influence and it appeared that this language then had influence on the others (rather than all from German), but overall it seems like a very promising project with great results and an inspiringly creative and successful approach to a very difficult problem.

Simulating lexical evolution with semantic shifts – Gereon Kaiping (*) and Johann-Mattis List (^), University of Leiden (*), Max Planck Institute for the Science of Human History (^)

This talk began with a discussion of some of the problems with current quantitative methods in historical linguistics. A major such problem is the lack of proper data on historical language change, leading to a trend towards models not being properly validated and tested. There is also not much simulation done to test methods, and most existing simulations tend to be very simple. This project aims to develop a more realistic model of language change, under which simulations might be done which could lead to better validation and testing of other quantitative historical linguistic methods. The model further considers semantic drift and replacement, in contrast to most previous methods which consider cognates only corresponding to the same concepts.

This built on concepts from Saussure about the form and meaning of words being ‘two sides of the same coin’. The model sees a language as a bipartite graph between a network of concepts and a vector of words. The evolution of the model involves updating the weighting of edges between the concepts and the words, corresponding to the changing set of vocabulary and meanings of words, over a phylogenetic tree. This draws on game theoretic ideas. They also presented some validation and parameterisation of their models based on available data sets. Their software is open source and available online: https://github.com/anaphory/simuling