Biodiversity Data Journal :
Software Description
|
Corresponding author: Marcos De Sousa (msousa@museu-goeldi.br)
Academic editor: Anne Ropiquet
Received: 17 Nov 2021 | Accepted: 09 Dec 2021 | Published: 10 Dec 2021
© 2021 Caio Ribeiro, Lucas Oliveira, Romina Batista, Marcos De Sousa
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Ribeiro CVR, Oliveira LP, Batista R, De Sousa M (2021) UCEasy: A software package for automating and simplifying the analysis of ultraconserved elements (UCEs). Biodiversity Data Journal 9: e78132. https://doi.org/10.3897/BDJ.9.e78132
|
The use of Ultraconserved Elements (UCEs) as genetic markers in phylogenomics has become popular and has provided promising results. Although UCE data can be easily obtained from targeted enriched sequencing, the protocol for in silico analysis of UCEs consist of the execution of heterogeneous and complex tools, a challenge for scientists without training in bioinformatics. Developing tools with the adoption of best practices in research software can lessen this problem by improving the execution of computational experiments, thus promoting better reproducibility.
We present UCEasy, an easy-to-install and easy-to-use software package with a simple command line interface that facilitates the computational analysis of UCEs from sequencing samples, following the best practices of research software. UCEasy is a wrapper that standardises, automates and simplifies the quality control of raw reads, assembly and extraction and alignment of UCEs, generating at the end a data matrix with different levels of completeness that can be used to infer phylogenetic trees. We demonstrate the functionalities of UCEasy by reproducing the published results of phylogenomic studies of the bird genus Turdus (Aves) and of Adephaga families (Coleoptera) containing genomic datasets to efficiently extract UCEs.
phylogenomics, ultraconserved elements (UCEs), research software, reproducibility, bioinformatics.
In the last decade, new genome-subsampling methods have been developed as a cheaper and simpler alternative to complete genome sequencing, thus enabling the scientific community to better understand the evolutionary inter-relationships of species (
Although UCE data can be easily obtained from targeted enriched sequencing (
In this work, we present UCEasy, an open source software package that facilitates the analysis of UCEs from sequencing samples, following the best practices of research software. UCEasy is a Python wrapper that standardises, automates and simplifies the following PHYLUCE tasks: quality control of raw reads, assembly, alignment and UCE extraction. We demonstrate the functionalities of UCEasy by reproducing the published results from two phylogenomic studies (
UCEasy
UCEasy automates and simplifies the analysis of UCE datasets from DNA sequence samples in FASTQ files (either single-ended or paired-ended), interacting with Python scripts adopted in the standard PHYLUCE 1.6 workflow (https://phyluce.readthedocs.io/en/latest/tutorials/tutorial-1.html), as shown in Fig.
MIT Licence
UCEasy has an extensible software architecture that makes use of the Facade and Adapter design patterns (
UCEasy was built based on best practices in scientific computing (
The target audience for this software package includes evolutionary biologists and conservation scientists with knowledge of basic Linux commands. We are open to discussing additional ideas or new features to expand the current functionality of this software package.
To demonstrate the effectiveness of UCEasy, we reproduced the published results by
The study of
We captured significantly more UCEs than
UCEasy successfully reproduced the pipeline of the studies mentioned and met the best practices recommended in the literature for scientific computing. A standardised package, such as that presented here, can help evolutionary biologists by automating laborious tasks and facilitating the reproducibility of computational experiments. Finally, UCEasy architecture is sufficiently robust to support new updates from PHYLUCE without hassle. As future work, we plan to extend UCEasy to include the new PHYLUCE 1.7 version and incorporate new phylogenetic software packages from other developers.
The authors would like to thank the Conselho Nacional de Desenvolvimento Científico e Tecnológico-CNPq (grants 149985/2018-5; 129954/2018-7) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES-INPA proc. 88887477562/2020-00).