This post is part of a series exploring features in RevBayes.
Whether working on genomic datasets or conducting a simulation study, research projects often require identical or nearly identical jobs to be replicated across vast numbers of datasets.
The front-end of RevBayes is an interpretted language, which provides users with an agile and powerful interface for scripting.
Let’s set up a little scenario to make it easy on the imagination.
Suppose your task is to assess the sensitivity of posterior tree probabilities for one simple and one complex model—say, a Jukes-Cantor model, where all transition rates are equal, and the Felsenstein 81 model, where base frequencies are free parameters to be estimated.
You’ve stored multiple sequence alignments for all the genes of interest in the folder genes in NEXUS format.
You want to use RevBayes to estimate the posterior density for each gene, once assuming the rate matrix is fnJC and a second time assuming it is fnF81.
Instead of repeatedly modifying and running your RevBayes script, rb_gene_model.Rev, you can automate the job in bash using the echo command and pipes (|).
To follow along or download the scripts, issue these commands in the shell.
First, we’ll create a RevBayes script called rb_gene_model.Rev that expects three pre-defined variables: data_file gives the name of the gene alignment, job_id gives the analysis an identity that matches the filename, and type_Q gives the type of rate matrix to use.
Next we’ll create a bash script called rb_gene_job.sh.
When called, RevBayes treats arguments as files to be sourced, which means we can pipe (|) the stdout file from echo as a source file into RevBayes.
Perfect for scripting!
Third, we’ll create rb_gene_batch.sh to repeatedly call rb_gene_job.sh with different combinations of arguments.
Note that debug=0 by default, which will cause RevBayes to be run in the background (&) and without being hung up upon closing the terminal (nohup).
These are useful features when running jobs on clusters.
Everything is in place, and we can run the script.
…and, in time, the results will appear in the output directory.