Biodiversity Data Journal :
R Package
|
Corresponding author: Miguel Alvarez (kamapu78@gmail.com)
Academic editor: Scott Chamberlain
Received: 15 Jan 2018 | Accepted: 13 Apr 2018 | Published: 02 May 2018
© 2018 Miguel Alvarez, Federico Luebert
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Alvarez M, Luebert F (2018) The taxlist package: managing plant taxonomic lists in R. Biodiversity Data Journal 6: e23635. https://doi.org/10.3897/BDJ.6.e23635
|
Taxonomic lists are crucial elements of vegetation-plot databases and provide the links between original entries, reference taxon views and different taxon concepts. We introduce the R package taxlist in the context of object-oriented modelling for taxonomic lists. This package provides a data structure based on species lists in Turboveg, which is a software broadly used for the storage of vegetation-plot databases and implements functions for importing and handling them prior to statistical analysis. We also present a schema for relational databases, compatible with taxlist objects and recommend its use for handling diversity records.
ecoinformatics, database, taxon concept, taxon view, Turboveg, vegtable
Vegetation-plot databases are increasingly gaining importance, not only as a way to host historical vegetation data or to store data collected in ongoing research projects, but also for storing vegetation-plot observations including types of syntaxonomical classifications in the context of the Braun-Blanquet approach (
Taxonomic lists (i.e. species lists) are crucial components of vegetation-plot databases and several authors have raised concerns about the consequences of inconsistent nomenclatorial applications in downstream statistical analyses (
Accepted name: The name used for designating a taxon. According to the International Code of Botanical Nomenclature ( |
Combination: The name of a taxon at the species level or below, which includes the name of the genus and further epithets ( |
Potential taxon: Proposed by |
Synonym: A name applied to a taxon, alternative and subordinated to its accepted name. Synonyms are subdivided into homotypic or nomenclatural when they share the same typus as the accepted name and heterotypic or taxonomic when they are described based on different types ( |
Taxon: A taxonomical entity belonging to any rank of the taxonomic classification. |
Taxon concept: Also called "taxonomic concept", it refers to the taxonomic circumscription denoted by a name according to an "opinion" (taxon view). It is fairly applied as the synonym of "potential taxon" and "taxonym" ( |
Taxon name usage: Application of a name to design a taxon concept, regardless of its status as accepted name or synonym. This term refers to alternative plant names also in the package vegdata ( |
Taxon view: The reference used for determining hierarchical position and circumscription of a taxon concept. This term was introduced by |
Taxonym: This term was proposed by |
Syntaxon: An abstract unit of phytocoenoses, which is defined by its species composition and plant-sociological patterns (co-occurrence). As in the case of a taxon, a syntaxon is incorporated into a hierarchical system ( |
Handling taxon concepts with their respective names requires relational structures rather than flat tables, as currently done by Turboveg and most of the mentioned R-applications. These structures may not only be useful for data storage but also for implementing consistent algorithms to manage the information contained in taxonomic lists, such as retrieving occurrence of taxa rather than names, building subsets by querying taxonomic relationships, quickly displaying diversity statistics and especially testing for consistency of the data.
While intensive discussions have been centred around data integration (e.g.
In this work, we aim to provide an information structure for storing taxonomic lists (the class taxlist) focusing on species lists contained in Turboveg databases. The main properties implemented in these objects are: 1) flexibility to include different degrees of available information (flat taxon lists, lists including taxon traits or hierarchical structures including taxon levels and parent-child relationships), 2) an automatic check of consistency of information contained in those objects (provided by validity checking), 3) quick display of information contained in the lists (summary methods) and 4) common processes implemented in specific functions.
The package taxlist can be installed from Comprehensive R Archive Network (CRAN). Alternatively it can also be installed from a GitHub repository (https://github.com/kamapu/taxlist) using the package devtools.
This package was programming using S4 (the fourth version of the programming language S), which is also implemented in R (
Each slot in taxlist objects is containing a column-oriented table (class data.frame in R). One of the most important features of the taxlist objects is the separation of taxon names from their relationships to taxon concepts (see Table
Relational model of information contained in objects of class taxlist. Each box is a column-oriented table including the name of the table (slot name in R) and the names of the mandatory columns. Lines indicate relationships between and within tables, while key symbols show the key fields in the respective tables. Dots suggest the possibility to extend the tables with custom columns. Single entries in table taxonRelations represent a taxon, while table taxonNames includes taxon usage names (accepted names and synonyms). Attributes of taxa (e.g. functional traits) are stored in table taxonTraits and table taxonViews includes sources determining the circumscription of a taxon, its accepted name and synonyms.
Taxon information structured in a taxlist object. The code loads the package and the installed example data "Easplist". Here the function subset() extracts the taxon Cyperus papyrus L. with its parents. Since the example does not have any information on taxon properties (traits), the life form for C. papyrus (identifier 206 in the example data) is inserted before to produce the print in the R console.
While the design of taxlist objects was inspired by the content of species lists in the software Turboveg (see also https://www.synbiosys.alterra.nl/turboveg), which are stored in DBF files, the main differences with taxlist are: 1) The content of slots taxonNames and taxonConcepts is stored in a single table called "species" in Turboveg and 2) taxon views and hierarchical structures are not explicitly supported in Turboveg. On the other hand, Turboveg also stores taxon attributes in a separated table called "ecodbase".
Empty objects are generated from a template or prototype by the function new(). Alternatively, they can be created from character strings containing accepted names or data frames with accepted names and synonyms by using the function df2taxlist(), which returns a taxlist object. In a similar way, it is possible to import species lists included in Turboveg databases through the function tv2taxlist(). In the latter case, the similar function tax() is available from the package vegdata (
Hierarchical taxonomic structures can be also implemented through parent-child relationships, which is an optional feature of taxlist. In that case, the information on taxonomic levels (i.e. names and hierarchical sequence) have to be set by the user (function levels(), see also Suppl. material
Consistency of information in taxonomic lists is checked by the function validObject() in R. Validity checking in taxlist includes detecting occurrence of duplicated combinations (same taxon names with same author), duplicated identifiers (IDs) and orphaned entries, amongst others. In the special case of lists including taxonomic levels and parent-child relationships, it will be further checked that any parent entry is included as concept and that any child is at least one level lower than the respective parent.
The function summary() retrieves the number of names and taxon concepts included in the input object as well as the number of traits and references, occurrence of parent-child relationships and taxonomic levels. The function summary() can be also applied to single concepts indicated in the argument ConceptID that are queried either by ID numbers (integer value) or by names (character value). In that case, the respective overview displays the accepted name, taxon view, synonyms, taxonomic rank and parent concept in the console.
Several methods are provided to handle the information of taxlist objects; many of them allow adding, replacing and retrieving components of a taxonomic lists (e.g. taxon_names(), taxon_relations(), taxon_traits() and taxon_views()). The function subset() works as a query, building through logical operations or character matching. Using this function, a set of taxon concepts are extracted from an object, keeping the validity of output objects. The query is applied to the content of a slot (defined in argument slot), while children or parents of retrieved taxa can be preserved in the output (arguments keep_children and keep_parents). The function clean removes orphaned entries in order to recover the validity of objects affected by direct manipulation of slots.
The functions add_concept(), add_synonym(), accepted_name() and change_concept() are suitable to increment information contained in a taxlist object and to modify the relationships amongst taxon names. They facilitate common processes required for changing or tuning taxonomic classifications as proposed by
Adding parent-child relationships amongst concepts (i.e. relatiohips amongsttaxa at different taxonomic ranks) imply the use of the function add_parent(), while the function add_level() should be used for including taxonomic ranks or even adding new levels into existing ranks (increase on taxonomic resolution).
Finally, the function backup_object() attempts to create backups of taxlist objects (and any object in an R session) as an R image stored in a zip-file for recovery purposes. The zip file will include, by default, a time stamp (the date of backup) and a suffix in the case of more than one backup produced in the same day. Backup files can be loaded to a session with the function load(), while the function load_last() will automatically select the last created backup within a folder.
The use of the above mentioned functions is demonstrated in Suppl. material
Since the definition of taxlist as a class in R enables its use as a component in other S4 objects, for example as a slot, it can be implemented in any newly defined class connecting diversity records with taxonomic data. This capability is demonstrated with the package vegtable (
While a series of packages has been implemented in R for handling information of species and taxonomic lists, the implementation of a structured object class and respective functions in the framework of object-oriented programming is a novel feature in taxlist. This package is suitable for formatting floristic lists from raw data and testing the consistency of the respective information previous to its storage in a relational database, but also to make modifications of data through an R script previous to its assessment without the necessity of modifying the data source.
A combined work of taxlist with packages dealing with standardisation of nomenclature is demonstrated by the implementation of the function tnrs from the package taxize (
The package taxlist has been developed as an activity in the context of the project GlobE wetlands (FKZ 031A250, https://www.wetlands-africa.de). Special thanks to Stephan Hennekens for discussions and his support on data exchange between Turboveg and spreadsheets. We thank the comments of Florian Jansen (University of Rostock, Germany) and Walter G. Berendsohn (Botanical Garden and Botanical Museum Berlin, Germany) on a previous version of this work.
GlobE is an initiative launched by the Federal Ministry of Education and Research from Germany (BMBF) supporting the global development of sustainable, high-output agriculture.
The GlobE-wetlands project (FKZ 031A250) aims to know the effects of crop production on the integrity of wetland agro-ecosystems in East Africa, including ecological functions and bio-diversity.
University of Bonn, Germany.