Today we (Jotun and Jimi) attended the second day of the Phylogenetic Methods in Historical Linguistics workshop in Tübingen. We heard several interesting talks on a range of topics, including one from Jotun himself. Here’s a summary of what we heard about today.

Introduction to comparative biology and statistical alignment – Jotun Hein, University of Oxford

Jotun’s talk introduced some fundamental ideas in comparative biology, going into some detail of basic statistical alignment techniques, and summarising some potential applications to linguistics via projects done in the Hein group and at Oxford more widely.

He discussed the significance of the different possible structures that can be used to model data in these settings. Michael Golden’s work on protein structure evolution was an example of the use of more complex data structures than models which operate on plain strings or sequences. Jotun mentioned Jimi Cullen’s work on context-dependent sequence evolution and mentioned Luke Kelly’s impressive work with Geoff Nicholls on lateral transfer in stochastic Dollo models.

There was a widely enthusiastic reaction to Jotun’s mention of his brief work with Markus Gerstel on transforming English grammar into German through the use of treebank data – this idea could be worth reviving!

What can shared structures tell us about linguistic similarity? – Andrea Fischer, Universität des Saarlandes

This talk was about the use of ideas from information theory to measure similarity between languages. By finding sequential correspondences between parts of similar words in different languages, it’s possible to compress the list of words by defining rules determining the relationships between letters or groups of letters in words in different languages. Using MDL (minimum description length), it’s possible to get an idea of how close two languages are, in a measure roughly inspired by how easy it might be for a speaker of one language to read text in the other language. The process also infers correspondences between parts of words, which in certain circumstances correspond strongly to known (or sensible) linguistics correspondences.

This seems like a rich work developed in a powerfully modular way, making use of a variety of techniques and ideas from information theory, making it quite extensible and modifiable.

Genes, speech, and language – Dan Dediu, Max Planck Institute for Psycholinguistics

This speaker strove to convince us of the interesting connections between genetics and language. The first example he gave was to do with hearing and deafness. He pointed out several specific genetic mutations which (sometimes surprisingly) lead to deafness. One such mutation has led to higher rates of deafness in some communities where it’s very common, and this in turn has led to the development of sign languages in those communities. As the speaker pointed out, this is a compelling language example of genetic-cultural co-evolution.

He went on to describe his work on identifying links between anatomical vocal variation and language variation. This was focussed on a study of languages which include certain types of “click” sounds, based on the hypothesis that such languages are more likely to develop in communities where people typically have a very small alveolar ridge. He has conducted experiments that seem to indicate that this does indeed make it easier to produce these sounds, and that it does seem to be a common feature of the mouths of people from some areas where click-languages arose. He gave the impression that this is the tip of the iceburg, and that there’s a lot more work to be done before any major conclusions can be drawn, but it seems like a promising line of study.

Constructing language phylogenies on different kinds of data – Stephan Eekman, University of Amsterdam

The speaker went through some phylogenies of North Germanic languages that he had generated using standard tools, by varying the type of input data. He used lexical data (which has been most widely used for inferring language phylogenies in the past), phonological data, and morphosyntactic data, as well as running his models on a dataset combining all of these types. He used the standard Swadesh word list, as well as a word list he had constructed himself of vocabulary relating to domestic animals (apparently inspired by a sheep he saw out the window while working on the project!).

He was surprised to find that the best performance by far seemed to be those models run on the domestic animals data. He discussed some aspects of this type of vocabulary which might have contributed to these promising results. Overall the talk raised things to think about when designing future studies more than delivering any major conclusions of its own.

Applying three evolutionary models to linguistics – Andrew Meade, University of Reading

This wide-ranging talk covered three main topics. The first of these was heterogeneous rates of change on a phylogenetic tree, with examples of animal size change in biology and language change with migration in linguistics. The second topic was on phonemic phylogenetic influence, and applying the concept of concerted evolution to identify regular sound changes in language change. The third was applying population genetics methods to word use – exploring how it comes to be that there are many widely-used words for “sofa” but essentially just one for “axle”.

This talk was a strong demonstration of the potential for these methods from comparative biology in their applications to linguistics, and suggest that the speaker has generated quite a wide-reaching body of work with these approaches.

