Demonstrating gene tree conflict with Phyparts Piecharts

We have marched into genome era already, especially using biparental inherited nuclear genome data to examine the framework of tree of life established by plastid data. However, not all gene trees tell one same story, even the most homogeneous genes from plastid organelle. Not to mention nuclear genes may have more than one copy, and other potential biological processes (e.g., hybridization, incomplete lineage sorting (ILS), or horizontal gene transfer, etc); also see Sun et al. (2015). Hence, Smith et al. (2015) did a great job in summarizing current situation, and setting examples of visualization of concordance information in animals and plants genomic phylogenies. They also developed an open source java software to do this job — phyparts.

In this post I will use Sassafras 322 gene trees as an example to show you how to use ASTRAL-III to estimate a species tree, and phyparts to summarize the conflict and concordance information of those individual homologous gene regions, and finally using PhyParts PieCharts to visualize the Phyparts Output. These data was generated by Target Enrichment method using Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes under Dimension project. My other relevant workingflow and scripts are available here.

Note: Matt Johnson has a great tutorial of how to run his script using Jupyter Notebook. The interpretation of PieCharts is well explained as well.

My instructions here focus on the overall procedures from gene trees all the way down to the Pie chart. Hopefully, I’m able to integrate all the steps together into one pipeline.

General steps

Here we need to assess how a number of gene trees how they agree with ours pecies tree, and displaying this discordance and agreeent information with satisfying visualization.

1. Building phylogeny, so that we have all gene trees

  • My data was generated from 353 target enrichment method

  • I used 322 gene trees from Sassafras samples as example

  • Note:

    • All trees were rooted by either phyx or The Newick Utilities, or DendroPy
    • Though ASTRAL is able to take any gene trees rooted or unrooted, however, for downstream Phyparts performance, rooted trees are preferred. Such so all gene trees and species trees are towards to the same root direction (See Matt Johnson’s post).
    • As quote in ASTRAL tutorial:
      “Importantly, we will reroot the tree at the correct node, which is always necessary, since the rooting of the ASTRAL trees is arbitrary and meaningless."

    Therefore, rooting all the trees (species tree and gene trees) is preferred for Phyparts Piecharts. Sometimes, when you have hundreds of gene trees, however, not all outgroups presented in your gene tree; in these case, I recommend either use pxrr function from phyx, or MAD (root the tree by Minimal Ancestor Deviation); the latter works well with those trees without outgroups at all; for details see my other post here

2. Species tree estimation

  • Species tree was estimated by ASTRAL-III, and see tutorials here

  • Note: Collapse gene tree nodes with BS support less than certain value (saying 10%; see cmd below), will help to improve accuracy; sometime increasing the threshold for collapse may yield better results.
    There are many ways to do this: phyx or The Newick Utilities

    • using The Newick Utilities as example from ASTRAL-III tutorial e.g., nw_ed 1KP-genetrees.tre 'i & b<=10' o > 1KP-genetrees-BS10.tre
    java -jar astral.5.6.3.jar -i collapse_genetrees.tre -o output_species_tree.tre 2> running.log

3. Statistic information about conflict, concordance, or even gene duplications

  • You need to run phyparts with instruction on the repo webpage.
  • Phyparts “conflict” option CMD:
    java -jar target/phyparts-0.0.1-SNAPSHOT-jar-with-dependencies.jar -a 1 -v -d gene_trees -m ASTRAL_species_tree -o output_name

Note: run cmd below, you’ll able to see all the Phyparts options (see snapshot below):
java -jar target/phyparts-0.0.1-SNAPSHOT-jar-with-dependencies.jar

4. Mapping these information on species tree

Note: Please read the tutorial, Python >2.7 and ETE3 with the graphical options need to be installed before run the python script

  • how to run script:
    python3 species_tree output_name gene_number

Note: output_name must be the same as you names at the phyparts (# 3) step

5. Visualize the gene tree support and conflic statics with Piecharts

  • the output format of PhyParts PieCharts is in svg, then you need to Inkscape or AI to convert it as pdf. You also can refine the overall layout of that plot.

  • Note: make sure Python3, ETE3, and X server installed.

As my plot below:

Last updated: Mon Dec 7 2020