RNGR.net is sponsored by the USDA Forest Service and Southern Regional Extension Forestry and is a colloborative effort between these two agencies.

U.S. Department of Agriculture USDA Forest Service Southern Regional Extension Forestry Southern Regional Extension Forestry

Conifer Genomes and Implications for Pine Genetics and Improvement

The emergence of improved sequencing technologies, coupled with decreasing costs, inspired innovative assembly methods for large and complex genomes, such as the conifer megagenomes, that range from 16 to 40 Gbp in size. Although these megagenomes are increasing in contiguity, accurate genome annotations remain challenging. Questions surrounding genome evolution are answered by interrogating the genome and its associated annotation. The accuracy of these products impacts estimates of genome duplication, gene family expansion/contraction, and functional assessments. Applications related to genomic selection, classification of hybrids, and pangenome approaches also require robust annotations. Among conifer genome assemblies, the gene space annotations are complicated by the presence of repetitive elements, large gene families, numerous pseudogenes, and long introns. Existing annotation packages are challenged to differentiate among these features and provide high quality results. Recent efforts have focused on improving strategies for the annotation of several gymnosperms, including five conifer species. We examine the impact of using assembled transcriptomic evidence (full length transcript and protein sequences) versus RNA read alignments to train ab initio gene predictors to annotate these genomes. These approaches are evaluated with assays for accessible chromatin, such as ATAC-Seq, which can improve the detection of true gene models, and distinguish prevalent pseudogenes. The final loblolly pine genome annotation improves on both the estimated completeness and structural metrics of the proposed gene models. A total of 51,200 genes were annotated with a novel pipeline integrating RNA-Seq and protein alignments with Braker2 and two in-house developed pieces of software, EnTAP and gFACs. The ATAC-Seq data assisted in filtering of the mono-exonic genes which are frequently inflated in conifer genomes. This approach was benchmarked against previous annotations as well as those resulting from standard standalone pipelines (MAKER and Braker2). The most recent release of the loblolly pine genome annotation can be retrieved from the TreeGenes database.


Download this file:

PDF document Download this file — PDF document, 61Kb

Details

Author(s): Jill Wegrzyn, Sumaira Zaman, Alyssa Ferreira, Madison Caballero, Ross Whetten

Publication: Tree Improvement and Genetics - Southern Forest Tree Improvement Conference - 2019