Being interested in computational biology, a lot of the things I do requires me to understand concepts from more than one discipline. Working across disciplines can be difficult. If a research team is trying to build a computer model of a biological phenomena, there will usually be a mix of biologists and computer scientists. Biologists and computer scientists tend to have largely non-overlapping knowledge bases, often use different approaches to solve problems, and in short, speak different languages. Recently however, I came across an example of how occasionally dipping into different fields can help you look at problems in a different way. I’m going to talk to you about phylogenetic trees, and how people have begun to use them to solve non-biological problems.

Wolves on a tree (The Jungle Book, 1926).

What is a phylogenetic tree?
Phylogenetic trees are a tool used to represent hypotheses about the evolutionary relationships between species (or groups of species), and visualise their evolutionary history. They are a similar concept to family trees, but for species. The ‘tree of life’ is a term sometimes used to refer to a phylogenetic tree which spans the entire range of life on Earth. Using a complete tree of life you could trace the evolutionary path of the first living cells through innumerable speciation and extinction events, all the way to the modern day species we see on Earth today. A stylised, interactive tree of life can be accessed here, provided by the OU.

Phylogenetic trees are constructed using a wide range of techniques and data, but they all rely on the same three assumptions:

  • Over time the characteristics of species change.
  • All life on Earth can (in theory) be traced back to a single common ancestor, meaning any two species share a common ancestor if you trace their evolutionary history back far enough.
  • The evolutionary path of species can branch, creating two distinct populations which subsequently each follow their own evolutionary paths.

A simple phylogenetic tree is shown below. The tree should be read with time moving forward from the bottom of the tree to the top. The tips of the blue, purple and yellow segments represent particular modern day species, while the rest of the tree represents their evolutionary past. The red section shows the evolutionary history which is common to all three of the species. As you follow the tree forward in time (upward) the tree branches at B, indicating a speciation event. One of the branches eventually evolves into modern day macaques; the blue segment therefore represents the evolutionary history which is unique to macaques. The other branch (green segment) splits again (at label C) into two branches which lead to chimpanzees and humans. Thus, the red and green segments together show the common evolutionary history of chimpanzees and humans.

phylogenetic tree
A simple example of a phylogenetic tree. Time proceeds from the bottom to the top.

Constructing phylogenetic trees
Phylogenetic trees are usually used to describe the relationships between smaller sets of organisms than the tree of life. But we do not always have all the information needed to construct a phylogenetic tree, so biologists must look for clues using genetic data, the fossil record and anatomical comparisons of past and present species. By using this data to look for similarities and differences between species (past and present), it allows them to infer what the evolutionary relationships may be. The inference step often makes use of computer algorithms which can process large amounts of data quickly. Phylogenetic trees can then be constructed using the most likely combination of relationships. This allows us to gather evidence to support (or contradict) hypotheses such as, “birds are dinosaurs”.

Phylomemetic trees for cultural analysis
“What does this all have to do with wolves and grandmothers? I was promised grandmother eating wolves!” you might find yourself exclaiming. You’ll have to bear with me a little longer. It turns out that phylogenetic trees can be used to represent the history of other evolving systems. In particular they are being used to analyse the evolution of cultural phenomena. The term ‘meme’, introduced by Richard Dawkins, refers to a unit of culture and can be thought of as the cultural equivalent of a gene. Memes can spread and change (mutate) like biological genes, except that instead of using biological bodies as vessels for transmission they exist in cultural phenomena. This includes things like stories, language, crafts, farming techniques, architecture, and even religion. The study of culture as a self-propagating phenomena is called memetics. Phylogenetic trees which are used to explore subjects in memetics are often referred to as phylomemetic trees.

Phylomemetics for analysing folk stories
Recently people have been constructing phylomemetic trees to try to learn about the evolutionary history of folk stories. A few weeks ago Tehrani published a paper (open access) in which he constructs phylomemetic trees to explore the relationships between a group of similar folk stories. Folk stories are good contenders for analysis using phylogenetic/phylomemetic trees because similar assumptions can be made about their evolution:

  • Folk stories change over time as they’re told and retold. They’re often never written down, making them especially susceptible to this sort of ‘mutation’ (the children’s game ‘Chinese whispers’ / ‘telephone’ comes to mind).
  • As folk stories change and spread, they can branch, creating new versions of the story which change independently of one another and adapt to fill new cultural niches.
  • Therefore, it should be possible to compare stories for common plot elements and trace back the evolutionary history of related stories and pinpoint common ancestry.

However, it doesn’t make sense that all stories ever told since the dawn of language stem from the same original root tale. So it is unrealistic to expect to find a common ancestor between any pair of selected stories. But that doesn’t stop phylomemetic trees being useful to help confirm or reject hypotheses about whether groups of stories are related or not. A second complication is that it is possible for stories to merge as well as branch apart. Story tellers may be inspired from multiple stories, and they are free to mix and match them to create new stories on an ad hoc basis. Similar phenomena can be found from time-to-time in biology. For example, some species of orchids can fertilise other species of orchid to create hybrids, and many bacteria are able to exchange genetic material with each other. However, on the grand scheme of things, these exceptions are relatively rare in biology. They may not be so rare in memetics.

Which antagonist would you choose? Pictures by Lennart Tange (wolf), Lucie Alyre (tiger) and Poorna Kedar (crow).
Which antagonist would you choose? Pictures by Lennart Tange (wolf), Lucie Alyre (tiger) and Poorna Kedar (crow).

The wolves
Finally, we get to the wolves. Tehrani explored the evolutionary history of 58 similar stories which include the well known story (at least in Europe and North America) of Little Red Riding Hood. An example of another story used is that of The Wolf and the Kids, which is popular in some parts of Europe and the Middle East. The Wolf and the Kids tells of a goat who leaves her kids alone while she goes to the field, warning them not to open the door. A wolf, however, tricks the kids into opening the door by impersonating her, and subsequently eats them all. Some of the stories feature a tiger (Japan and China) or ogre (in some parts of Africa) as the antagonist instead of a wolf. Others differ more considerably but retain similar themes; In India there is a story about a crow which tricks a sparrow into leaving her nest so that it can eat her hatchlings.

The stories were gathered from 33 different populations around the world. 72 plot variables were identified and recorded for each story. Variables included things like the gender of the protagonist and whether or not the victim is eaten, rescued or escapes. By cataloging the differences and similarities between the plots of each story it was possible to construct phylomemetic trees of the stories. The resulting trees are interpretations showing how closely related to one another each story variant is. Note: the phylomemetic trees constructed in the paper (and shown below) are unrooted trees. This means that they only show how closely related different stories are and do not indicate which story came first.

One of the trees produced from Tehrani's analysis. ATU333 is a the name of a group of stories which contain Little Red Riding Hood, ATU123 is the name of a group containing The Wolf and the Kids. Image is taken from Tehrani's paper "The Phylogeny of Littl Red Riding Hood", available here (open access).
One of the trees produced from Tehrani’s analysis. ATU333 is a the name of a group of stories which contain Little Red Riding Hood, ATU123 is the name of a group containing The Wolf and the Kids. For a more detailed description I highly recommend taking a look at the original open access paper. Image is taken from Tehrani’s paper “The Phylogeny of Little Red Riding Hood” (open access).

Other studies have combined the use of phylomemetic trees with historic, geographic and ethnolinguistic data to explore human population structure and migration. Others have used them to study the evolution of Indonesian architecture. It’s even been used to map and possibly predict trends in the development of mobile phones and other electronic equipment.

The trees are only as good as the data and methods used to construct them, and there is no guarantee that they accurately represent the evolutionary history of these stories. Instead they show a likely reconstruction based on the data you feed them. However, I think it is an interesting insight into the possible origins of some of our most well known stories, and demonstrates how powerful phylogenetic / phylomemetic analysis can be.



Someone sent me a link to this (credit goes to qbsuperstar03). It’s an article which describes constructing a phylogenetic tree of Pokemon evolution using Bayesian Markov chain Monte Carlo analysis! If anyone knows of any other fun phylogenetic trees or interesting studies you can post them in the comments.