®®®® SIIA Público

Título del libro: How Transcriptomics Revealed New Information On Actinorhizal Symbioses Establishment And Evolution
Título del capítulo: Pangenomic Analysis of the Rhizobiales Using the GET_HOMOLOGUES Software Package

Autores UNAM:
PABLO VINUESA FLEISCHMANN;
Autores externos:

Idioma:
Inglés
Año de publicación:
2015
Palabras clave:

Core genome size; GET_HOMOLOGUES software package; Heuristic BDBH Algorithm; Pangenome matrix; Phylogenomics; Rhizobiales; Statistical analysis


Resumen:

In this chapter, we introduce GET_HOMOLOGUES, an open-source software package for flexible, robust, and scalable microbial pangenomics and comparative genomics. It builds on top of pair-wise BLAST+ results of whole proteomes, which can be clustered by any combination of the bidirectional best-hit (BDBH), COGtriangles, or OrthoMCL (OMCL) algorithms to construct homologous gene families. These can be interrogated and further analyzed by a series of auxiliary scripts included in the package. We perform a pangenome analysis of 68 full genomes from the Rhizobiales as a case study to demonstrate the software's capabilities. A consensus core genome of 323 orthologous gene families was computed from the intersection of single-copy gene clusters present in all of the 68 genomes analyzed, generated by the BDBH, COGtriangles, and OMCL clustering algorithms. This conservative and highly robust set of orthologous gene families was used to estimate a maximum likelihood species phylogeny of the order Rhizobiales. A conservative theoretical core-genome size of 270 gene families was estimated from the consensus clusters recovered by the COGtriangles and OMCL algorithms after fitting binomial mixture models to the data and selecting the optimal number of classes using the Bayesian information criterion. The core genome represents only a tiny fraction of the Rhizobiales pangenome, which is dominated by the cloud genome fractions, with 7378 gene clusters being present in only three or less of the genomes analyzed herein. The core-genome phylogeny was contrasted with a parsimony pangenome phylogeny, based on the consensus pangenomic matrix of presence-absence data for 14,297 clusters found by both COGtriangles and OMCL. Relationships within generic lineages remain largely consistent with those revealed by the core-genome phylogeny. However, some of the groupings found in the latter phylogeny better reflect the lifestyles of the organisms, as determined by their differential gene content. © 2015 by John Wiley & Sons, Inc. All rights reserved.


Entidades citadas de la UNAM: