This is the first of a series of posts to outline how to infer a molecular phylogeny in RevBayes. My goal is to demonstrate the flexibility and power of RevBayes using simple, modifiable code snippets. These posts will additionally serve as a foundation to explore advanced techniques and models in the future – which is the fun part! Topic-specific tutorials are also available online.
But what is RevBayes? RevBayes is an open source software package for Bayesian phylogenetic inference specified through probabilistic graphical models and an interactive language. This design allows researchers to estimate species relationships using a modeling interface that is simple, flexible, and efficient. The result is that phylogenetic model space becomes more compact (without shrinking), so it may be explored with ease by empirical and theoretical biologists, who may have great ideas but do not have the time or skills to translate them into computer code.
Let’s concretely examine what this means for phylogenetic models. The most widely adopted phylogenetic models rely on four submodel components (or modules): the diversification process, the molecular substitution process, the branch-rate variation model (or the relaxed clock model), and site-rate variation models. Each of these modules comes in a variety of flavors. For example, there are a vast number of substitution processes depending on what evolutionary features you wish to model – e.g. Do transitions and transversions occur at equal rates? Do all bases occur at equal frequencies? Should the process be time-reversible? Each of these models is fully described by an instantaneous rate matrix, and only differ in how the rate matrix elements are parameterized: are all rates fixed to be equal, all different, or somewhere in between? Ideally, a researcher should be able to compose her phylogenetic model not only from canonical modules described in the literature, but also to apply new types within any given class of modules that she imagines.
The following posts will explore how these modules interact and how they may be customized in RevBayes. As mentioned earlier, RevBayes specifies models through a programming language. Learning a new language begins with exposure, so a natural place to start is with a boilerplate phylogenetic model of molecular substitution.
For now, just give the code a glance and keep it in mind for the future. If you are well-versed in phylogenetic models, the structure of some variables should be familiar. The anatomy of the code will be covered in detail throughout the imminent series of posts. By the end, you’ll be comfortable reading and modifying the code, tailored to your datset and interests. Below is a tentative outline, which will be updated with post links as they’re completed.
Topics
- Graphical models
- Reading data
- Phylogenetic substitution models
- Rate matrices
- Substitution processes
- Diversification processes
- Relaxed clocks
- Site-rate variation
- Estimating the posterior
- Advanced analyses
- Scripting with RevBayes
Of course, feel free send me an email or tweet with any feedback!