Home » Day-to-day » Another day of phylogenetic methods in linguistics

Another day of phylogenetic methods in linguistics

Today we enjoyed six talks at the workshop in Tübingen, which we summarise below. We have also summarised yesterday’s talks. Update: also Thursday’s talks!

Further evidence for punctuated language evolution – Gerhard Jäger, University of Tübingen

This talk discussed the concept of punctuated evolution – that is, evolution where the most active phase of change happens just after speciation takes place. In biology this has been suggested as an explanation for the relatively few ‘intermediate stage’ fossils that are found – it seems that it’s often the case that a species arises, quickly evolves into a relatively stable state, and stays fairly unchanged for some time. It has been suggested that the same phenomenon might occur in language change (e.g. by Dixon in 1997).

Two methods had been reproduced: one from Atkinson et al (2008) which works on manually labelled lists of cognate pairs, and one from Holman and Wichmann (2016) which uses language distance (without needing labelled cognate data). Overall the study’s results seemed to suggest that punctuated evolution may indeed be taking place to some extent in language change.

Building histories of Slavic on parallel texts – Ruprecht von Waldenfels, University of Zurich

This talk was quite different from most others at the workshop in two main ways: it examined a language family history which is known in some detail already, and the methods revolved around the use use of parallel texts rather than word lists or other data.

Taking texts which have been translated into all of the languages considered, the study looked at different language features individually, finding different connections between languages. Since the history of the language family is fairly well known, the speaker was able to explain the nature and history of these different relationships for individual language features. This seemed a step forward for Slavic language studies, confirming much more manual work with much automated analysis. It also sent a strong message to those in the audience, that many methods in use (e.g. based on language similarity) may induce a history, but in fact there can be many histories behind the relationships between languages in any particular group.

Reconstructing language ancestry by performing word prediction – Peter Dekker, University of Amsterdam

This talk described a project based on the use of recurrent neural networks with an encoder-decoder structure to detect cognates, in a supervised machine learning framework. This process has some analogy to problems in machine translation, where neural network approaches have been applied with some success, and this project draws on some of the progress in that field to solve problems here. The neural network is trained on pairs of words corresponding to the same concept in different languages. Since this method avoids relying on manual labelling of cognate and non-cognate pairs, the goal is to take all in but only really learn from the cognate pairs, which is achieved by the design of the loss function. Overall it seemed like the project was reaching a baseline of success in line with existing models, and that it had promising scope for tweaks to improve its performance further.

Sound change phylogeny in Uralic family trees and networks – Jyri J. F. Lehtinen, University of Helsinki

This talk began by acknowledging some of the criticisms that have been aimed at the use of phylogenetic methods in historical linguistics in general by linguists. A primary such complaint was the concept of “garbage in, garbage out”. The speaker described a study which involved a very careful process of data selection. The study looked at shared innovations in Uralic languages by looking at reconstructed protoforms and attested forms of words – taking only the words with the most reliable, stable, and regular reconstructed protoforms known from the literature (taking care to avoid including data which has been superseded or isn’t considered reliable).

The study focussed on phonological data, and compared the results of phylogeny reconstruction with this data to other studies using lexical data, as well as trees constructed from qualitative approaches. The results seemed very positive for the approach.

Deep learning and historical linguistics: two case studies – Taraka Rama, University of Tübingen

As well as a high-level introduction to neural networks, this talk discussed the use of neural networks for two linguistics applications: cognate identification and dialect classification. For cognate identification, convolution neural networks are used. This doesn’t require explicit character alignment, and the network was designed with a structure which allows word relatedness and language relatedness, which both inform cognate inference, to be simultaneously learned. The results were positive, even with relatively small data sizes.

For dialect classification, an unsupervised learning approach was taken. Autoencoders were trained on large numbers of words encoded in IPA format, without the need of explicit manual alignment and cognacy judgments. The approach produced some interesting and good-looking maps of dialect distribution in a few different countries.

Tracking modern human population history from linguistic and cranial phenotype – Hugo Reyes-Centeno, University of Tübingen

This talk took a very creative approach to address a conjecture first raised by Darwin – essentially over how much human genealogy can tell us about language genealogy. To examine this relationship, the study made use of the “serial founder effect”, which essentially says that there is less genetic diversity seen as the population moves further from its starting point (as each time a new population is established, it is drawn from only some fraction of the previous, larger population, so the gene pool of the new population is based on a subset of the original genepool). The study investigated whether there’s a relationship between linguistic diversity and genetic diversity.

Properties of cranial bone fragments were used as a phenotypical proxy for genotypic data. By comparing skull fragments from various regions with language diversity from those reasons, the relationship between them was studied. They also controlled for the effects of geography, in terms of distance from the widely-accepted origin population of humanity in Africa. Overall there was not a significant statistical signal of a relationship, but the speaker discussed further aspects of the serial founder affect which could be investigated to get a more detailed picture.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: