Home » Day-to-day » A tools-focussed talk: from raw linguistic data to reconstructed language trees

A tools-focussed talk: from raw linguistic data to reconstructed language trees

Today we hosted a talk by Gereon Kaiping from Leiden University. The talk went through the pipeline his group uses to go from linguistic data collected in the field to reconstructed phylogenetic trees of languages produced using mostly off-the-shelf tools. There was a lively discussion with an audience of linguists, bioinformaticians, and statisticians.

Gereon has kindly made his slides available, which can be viewed below, or downloaded.


1 Comment

  1. cronjager says:

    This was a really interesting talk. It was very interesting to see how tools from bioinformatics (e.g. BEAST) were used to address questions superficially unrelated to genetics. This led to an interesting discussion after the talk regarding how the needs and interests of linguists might differ from those of bioinformaticians and to what extent it would be possible to formulate a “computational linguists wish list” of features that could be added to beast. The two main items we came up with were

    * Better tools for model validation; that is to say the ability to automatically detect when the assumptions made by BEAST when fitting trees are violated. Models employed by BEAST tend to be motivated by genetic applications and validating models when employing them in an entirely different domain would seem prudent. Ideally being able to detect which modelling assumption is the unrealistic one would be useful (The questions “Do the data fit the model?” and “Why do the data not fit the model” are fundamentally different).

    * The ability to fit multifurcating trees to data. In linguistics it is natural (I have been told) to regard phylogenies of languages or words as multifurcating. In genetics, this has (as far as I am aware) such models have only been considered in in a limited context: for marine-species with heavy-tailed offspring distributions (see e.g. [1], [2], or [3]).

    [1] Eldon, Bjarki, and John Wakeley. 2006. “Coalescent Processes When the Distribution of Offspring Number among Individuals Is Highly Skewed.” Genetics 172 (4): 2621–33. doi:10.1534/genetics.105.052175.

    [2] Birkner, Matthias, Jochen Blath, and Bjarki Eldon. 2013. “Statistical Properties of the Site-Frequency Spectrum Associated with Lambda-Coalescents.” Genetics 195 (3): 1037–53. doi:10.1534/genetics.113.156612.

    [3] Blath, Jochen, Mathias Christensen Cronjäger, Bjarki Eldon, and Matthias Hammer. 2016. “The Site-Frequency Spectrum Associated with Xi-Coalescents.” Theoretical Population Biology 110 (August): 36–50. doi:10.1016/j.tpb.2016.04.002.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: