Sadly the workshop is over, and we are preparing to return to sunny Oxford! We enjoyed two final talks today, which we summarise below. We have also written up summaries of Tuesday’s talks, and Wednesday’s talks.

Causal inference of evolutionary networks – Johannes Dellert, University of Tübingen

This speaker began by discussing the difficulties with building up phylogenetic networks. Most phylogenetic methods (on languages as well as in biological contexts) are based on trees, but these trees imply a greater independence than we know to be realistic – they usually fail to capture language contact and influence, which can be a major driver of similarity between languages (separate from inheritance). Methods which do utilise networks are usually either visualisations of other kinds of data (where nodes don’t correspond to languages, for instance), or are restricted to narrow sub-classes of network structure which are not often powerful enough to capture the kinds of relationships that one would like to capture.

To address this, the speaker presented a project based on the concept of causal inference, building a network of causal relationships between observational data alone. Correlation does not imply causation – but by considering correlations on a connected network, it’s possible to delete edges on the network in such a way that leads to a structure of causal relationships explaining the observed correlations. The results were mostly very good, and went beyond any previously available method or tool for such analysis. There are some artefacts, e.g. with a group of languages that had influence from German, but where one language in particular had had a lot of German influence and it appeared that this language then had influence on the others (rather than all from German), but overall it seems like a very promising project with great results and an inspiringly creative and successful approach to a very difficult problem.

Simulating lexical evolution with semantic shifts – Gereon Kaiping (*) and Johann-Mattis List (^), University of Leiden (*), Max Planck Institute for the Science of Human History (^)

This talk began with a discussion of some of the problems with current quantitative methods in historical linguistics. A major such problem is the lack of proper data on historical language change, leading to a trend towards models not being properly validated and tested. There is also not much simulation done to test methods, and most existing simulations tend to be very simple. This project aims to develop a more realistic model of language change, under which simulations might be done which could lead to better validation and testing of other quantitative historical linguistic methods. The model further considers semantic drift and replacement, in contrast to most previous methods which consider cognates only corresponding to the same concepts.

This built on concepts from Saussure about the form and meaning of words being ‘two sides of the same coin’. The model sees a language as a bipartite graph between a network of concepts and a vector of words. The evolution of the model involves updating the weighting of edges between the concepts and the words, corresponding to the changing set of vocabulary and meanings of words, over a phylogenetic tree. This draws on game theoretic ideas. They also presented some validation and parameterisation of their models based on available data sets. Their software is open source and available online: https://github.com/anaphory/simuling

