We have marched into genome era already, especially using biparental inherited nuclear genome data to examine the framework of tree of life established by plastid data. However, not all gene trees tell one same story, even the most homogeneous genes from plastid organelle. Not to mention nuclear genes may have more than one copy, and other ptentionl biogloical processes (e.g., hybridization, incomplete lineage sorting (ILS), or horizontal gene transfer, etc); also see Sun et al. (2015). Hence, Smith et al. (2015) did a great job in summarizing current situation, and setting examples of visualization of concordance information in animals and plants genomic phylogenies. They also developed an open source java software to do this job — phyparts.
In this post I will use Sassafras 322 gene trees as an example to show you how to use ASTRAL-III to estimate a species tree, and phyparts to summarize the conflict and concordance information of those individual homologous gene regions, and finally using PhyParts PieCharts to visualize the Phyparts Output. These data was generated by Target Enrichment method using Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes under Dimension project. My other relevant workingflow and scripts are available here.
Note: Matt Johnson has a great tutorial of how to run his script using Jupyter Notebook. The interpretation of PieCharts is well explained as well.
My instructions here foucus on the overall procedures from gene trees all the way down to the Pie chart. Hopefully, I’m able to integrate all the steps together into one pipeline.
Here we need to assess how a number of gene trees how they agree with ours pecies tree, and displaying this discordance and agreeent information with satisfying visualization.
1. Building phylogeny, so that we have all gene trees
- My data was generated from 353 target enrichment method
- I used 322 gene trees from Sassafras samples as example
- All trees were rooted by either phyx or The Newick Utilities, or DendroPy.
2. Species tree estimation
- Species tree was estimated by ASTRAL-III, and see tutorials here
- using The Newick Utilities as example from ASTRAL-III tutorial
nw_ed 1KP-genetrees.tre 'i & b<=10' o > 1KP-genetrees-BS10.tre
- using The Newick Utilities as example from ASTRAL-III tutorial e.g.,
java -jar astral.5.6.3.jar -i collapse_genetrees.tre -o output_species_tree.tre 2> running.log
3. Statistic information about conflict, concordance, or even gene duplications
- You need to run phyparts with instruction on the repo webpage.
- Phyparts “conflict” option CMD:
java -jar target/phyparts-0.0.1-SNAPSHOT-jar-with-dependencies.jar -a 1 -v -d gene_trees -m ASTRAL_species_tree -o output_name
Note: run cmd below, you’ll able to see all the Phyparts options (see snapshot below):
java -jar target/phyparts-0.0.1-SNAPSHOT-jar-with-dependencies.jar
4. Mapping these information on species tree
- You need PhyParts PieCharts python script and tutorial from Matt Johnson
Note: please read the tutorial, Python 2.7 and ETE3 with the graphical options need to be installed before run the python script
- how to run script:
python phypartspiecharts.py species_tree output_name gene_number
output_name must be the same as you names at the
phyparts (# 3) step
5. Visualize the gene tree support and conflic statics with Piecharts
- the output format of PhyParts PieCharts is in
svg, then you need to Inkscape or AI to convert it as
As my plot below: