Biodiversity Data Journal :
Data Paper (Biosciences)
|
Corresponding author: Clara Baringo Fonseca (baringo.fonseca.clara@gmail.com)
Academic editor: Quentin Groom
Received: 30 Jan 2017 | Accepted: 27 May 2017 | Published: 30 May 2017
© 2017 David Dias, Clara Baringo Fonseca, Luiza Correa, Nayara Soto, Andrea Portela, Keila Juarez, Roque João Tumolo Neto, Murilo Ferro, João Gonçalves, Jurandir Junior
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Dias D, Baringo Fonseca C, Correa L, Soto N, Portela A, Juarez K, Tumolo Neto R, Ferro M, Gonçalves J, Junior J (2017) Repatriation Data: More than two million species occurrence records added to the Brazilian Biodiversity Information Facility Repository (SiBBr). Biodiversity Data Journal 5: e12012. https://doi.org/10.3897/BDJ.5.e12012
|
Primary biodiversity data records, available on-line, are essential for conservation planning. Of the mega diversity countries, Brazil have reached a high level of scientific research in describing their biodiversity. However, there still remain significant limitations in recovering, collating and organizing available information on Brazil's biological diversity and its distribution. Since the colonial period, biological material were often collected and transferred to other countries, which were characterized, stored and maintained. As a result, natural history museums worldwide possess large amounts of primary biodiversity data originally from Brazil which are then published on-line in the international Global Biodiversity Information Facility (GBIF) infrastructure. Aiming to recover these data, the Brazilian Biodiversity Information System (SiBBr) developed an automatic repatriation tool capable of retrieving all records registered in Brazil but published outside Brazilian territory in an automated manner.
Thus, 2,459,366 records were added to SiBBr’s Repository in one day. Europe and the United States hold about 80% of all records. The data set covers all life kingdoms. Animalia is the most represented group with 3 main phylum's: Chordata, Arthropoda and Mollusca, within more than 40% of all records. Plantae also comprises a large portion of the records with angiosperms having the major number of entries.
Brazilian System for information on Biodiversity (SiBBr), primary biodiversity data, Global Biodiversity Information Facility (GBIF), repatriation data, occurrence records, Brazil
Biodiversity primary data are key to address scientific conservation and sustainability issues (
Brazil is classified at the top of the world’s 17 megadiverse countries, and second in terms of species endemism (
Due to the importance of making such data available to the countries of origin, the Convention Biological Diversity (CBD) and GBIF have called for the increased mutual transfer of biodiversity data between countries, also referred to as the repatriation process (
Aiming to repatriate digital data from other countries, the SiBBr developed an automatic repatriation tool capable of retrieving all GBIF records within Brazilian coordinates published outside Brazilian territory and indexing them in the SiBBr repository as a dataset that is periodically updated. The present data paper describes the repatriation data set published in SiBBr’s repository through the Integration Publishing Toolkit (IPT) and list the steps of the automated repatriation process.
Brazilian Biodiversity Information Facility (SiBBr)
The Brazilian Biodiversity Information Facility, known as SiBBr (Fig.
The Brazilian Biodiversity Information Facility - www.sibbr.gov.br
The SiBBr project goal is to ensure data-driven policy design and implementation by facilitating and mainstreaming biodiversity information into decision-making and policy development processes. Biodiversity primary data should be available to support strategic environmental action plans and official documents used by government agencies to identify priority areas for conservation, as well as procedures in the area of environmental licensing and impacts on biodiversity. The implementation is based on a collaborative network of institutions and actors where investments focus on the digitalization and modernization of biological collections and information to incorporate and use through the national on-line SiBBr repository.
SiBBr also provides instruments, tools and technology to support scientific research to expand base knowledge and the current capacity of learning about Brazilian biodiversity. The production of scientific knowledge will contribute the requirements of the society and allow decision-makers to establish policies that integrate biodiversity conservation and sustainable use objectives. SiBBr currently integrates approximately 300 datasets from 93 publishers between national and private institutions sharing more than 10 million records, including the repatriation data set.
Data published in GBIF provide quick and easy access to global biodiversity data. Data users can search for specific data by customizing the search using filters such as publishing country or country of record which allows to find any data type. This procedure, done manually and on-line, is prolonged and a time-consuming effort. To avoid the procedure and aiming to speed up the process, in Brazil, repatriation of data from GBIF is automatic and periodic. The SiBBr team developed a tool that performs such action in an automated fashion indexing data in the SiBBr repository as it is placed in the system. Developed with Golang programming (https://golang.org/) and bash scripting, the source code comprises two different filters; country of origin (Brazil) and publishing country.
First of all, the repatriation tool makes an API request in GBIF database. Consequently, GBIF compiles all records that meets the conditions previously determined and retrieves a Comma Separated Values (csv) zipped file. Then, the csv file is converted to a sqlite database and published again through GBIF's Integrated Publishing Toolkit (
However, data quality arrangements must be done before publish it again in SiBBr's repository through IPT. The tool is an open software developed to facilitate the share and usability of biodiversity primary data using a vocabulary or set of terms, named as Darwin Core (http://rs.tdwg.org/dwc/terms/) that describe biodiversity data (
The data paper describes the state of the data set when the procedure was used to harvest from GBIF for the first time on 9th of April of 2016, at which time 2,459,366 records were added into the SiBBr repository.
A total of 2,459,366 records have been distributed among all publishing countries worldwide. Figs
Number of Brazilian records per country published outside national borders (Logarithmic scale). US = United States of America; GB = United Kingdom; NL = Netherlands; DE = Germany; FR = France; SE = Sweden; AR = Argentina; ES = Spain; AT = Austria; CH = Switzerland; VE = Venezuela; AU = Australia; CA = Canada; BE = Belgium; JP = Japan; CO = Colombia; NO = Norway; FI = Finland; PL = Poland; EE = Estonia; DK = Denmark; ZA = South Africa; NZ = New Zealand; CR = Costa Rica; NI = Nicaragua; BG = Bulgaria; MX= Mexico; CL = Chile; CZ = Czech Republic; LU = Luxembourg; PR = Puerto Rico; PT = Portugal; GH = Ghana
The repatriation dataset comprises 2.459.366 records of all six life kingdoms; Animalia, Plantae, Fungi, Bacteria, Protozoa and Chromista. The best represented kingdom is Animalia with 25 phyla; Chordata, Arthropoda, Mollusca and Platyhelminthes have the most records. Other pylums include Cnidaria, Nematoda., Echinodermata, Annelida, Porifera, Brachiopoda, Bryozoa, Rotifera, Acanthocephala, Sipuncula, Hemichordata, Kinorhyncha, Myxozoa, Nematomorpha, Echiura, Onychophora, Kamptozoa, Phoronida, Chaetognatha, Chaetognatha, Nemertea and Tardigrada (Fig.
For Plantae, as despicted in Fig.
Regarding Fungi, the dataset includes 5 groups: Ascomycota, Basidiomycota, Glomeromycota, Zygomycota and Chytridiomycota (Fig.
All data repatriated comprise a collecting period of time that goes from 1658 to 2016. The first record available in GBIF from Brazil is based on a specimen collected in July of 1658. The specimen belong to phylum Spermatophyta, kingdom Plantae published in GBIF by The United States and stored in The Field Museum of Natural History of Chicago.
Column label | Column description |
---|---|
Registro | Id of each single record |