Herbarium collection of the Rio de Janeiro Botanical Garden (RB), Brazil

Abstract Background This paper provides a quantitative and general description of the Rio de Janeiro Botanical Garden herbarium (RB) dataset. Created over a century ago, the RB currently comprises ca. 750,000 mounted specimens, with a strong representation of Brazilian flora, mainly from the Atlantic and Amazon forests. Nearly 100% of these specimens have been entered into the database and imaged and, at present, about 17% have been geo-referenced. This data paper is focused exclusively on RB's exsiccatae collection of land plants and algae, which is currently increasing by about twenty to thirty thousand specimens per year thanks to fieldwork, exchange and donations. Since 2005, many national and international projects have been implemented, improving the quality and accessibility of the collection. The most important facilitating factor in this process was the creation of the institutional system for plants collection and management, named JABOT. Since the RB is continuously growing, the dataset is updated weekly on SiBBr and GBIF portals. New information The most represented environments are the Atlantic and Amazon forests, a biodiversity hotspot and the world's largest rain forest, respectively. The dataset described in this article contains the data and metadata of plants and algae specimens in the RB collection and the link to access the respective images. Currently, the RB data is publicly available online at several biodiversity portals, such as our institutional database JABOT, the Reflora Virtual Herbarium, the SiBBr and the GBIF portal. However, a description of the RB dataset as a whole is not available in the literature.


Introduction
Created in 1890, the RB herbarium of the Rio de Janeiro Botanical Garden (JBRJ) is composed of seven botanical collections consisting of: mounted specimens (RB -750,000, with 7,500 nomenclatural types and around 3,000 paratypes), wood (RBw -ca. 10,300 specimens), fruits (RBcarpo -ca. 8,000 specimens), DNA bank (RBdna -ca. 5,700 specimens), spirit (RBspirit -ca. 2,500 specimens), seed bank (RBsem -ca. 2,700 specimens) and ethnobotany . For further details about the history and structure of the herbarium see (Marquete et al. 2001, Forzza et al. 2015b. This data paper is focused on the main mounted plants and algae herbarium specimens, which are currently increasing by about twenty to thirty thousand new specimens per year. Samples are organised alphabetically across two floors in the herbarium building: the dicots families of angiosperms from A to N are stored on the first floor; the remaining families of dicots, monocots, gymnosperms, ferns, lycophytes, bryophytes, algae, fungi, lichens and the other collections (fruits, spirit, wood and ethnobotanical) on the second floor.
The collection is completely digitised and it is managed using the JABOT system (Silva et al. 2017, Gonzales 2009), which was developed by JBRJ staff. With the implementation of the REFLORA project (Forzza et al. 2015a, Nic Lughadha et al. 2016, nearly 100% of the mounted specimens were photographed and are available through the public interface of the JABOT system and in other biodiversity portals, such as the Reflora Virtual Herbarium (Forzza et al. 2015a), the Brazilian Biodiversity Information System (SiBBr) (Gadelha et al. 2014) and the Global Biodiversity Information Facility (GBIF) (Robertson et al. 2014). Currently, new accessions are digitised and photographed before they are incorporated into the herbarium and all new identifications are updated in the JABOT system on a daily basis. At present, about 17% of the specimens have been assigned geographic coordinates (Fig. 2). Number of specimens per botanical family (≥ 4.000 specimens) deposited in the RB herbarium.

Temporal coverage
The RB herbarium was created by the naturalist João Barbosa Rodrigues in 1890. The first specimens came from a private collection of about 25,000 specimens donated by Emperor Dom Pedro II (Marquete et al. 2001). Aiming to make the herbarium a reference for the Georeferenced specimen deposited in the RB herbarium (ca. 17% of the total). study of the Brazilian flora, Barbosa Rodrigues and other naturalists (e.g. A. Ducke, J.G. Kuhlmann, A.P. Duarte, A.C. Brade, D. Sucre) conducted many expeditions throughout Brazil, considerably enriching the collection. The herbarium has been a national reference since then, increasing its collections over the years (Fig. 3).

Sampling methods
Sampling description: The herbarium specimen collection combines specimens from institutional projects, undergraduate and post-graduate research and exchanges or donations from other herbaria. In addition, from 1970 onwards, relevant national projects of flora documentation sent specimens to RB, such as RADAMBRASIL Project (1970)(1971)(1972)(1973)(1974)(1975)(1976)(1977)(1978)(1979)(1980)(1981)(1982)(1983)(1984)(1985) and the Flora Program CNPq (1975-1983, MCT/CNPq 1987. Furthermore, as the official custodian for the Ministry of Environment, RB also receives many specimens from private companies with activities related to environmental impact studies and phytochemical products and most of these specimens are donated in exchange for identification. The institution includes 53 associate researchers and hosts around 550 visiting taxonomists every year, standing out as the most visited herbarium in Brazil.

Quality control:
The RB uses the institutional system JABOT to perform all functions regarding herbarium management (i.e. loans, donations, new specimen registration, database management, quality control and web publication). The JABOT is a PostgreSQL database management system with 117 tables specifically created for botanical collections. The data insertion can be made directly into the JABOT interface or via uploading spreadsheets, with controlled and free text fields (Silva et al. 2017).
Step description: Plant processing procedures The herbarium follows the usual procedures for processing specimens (Bridson andForman 2000, Simpson 2006). Fresh materials are pressed and dried over a stove or in an oven. Once they are dry, specimens are glued on to acid-free paper, with gummed cloth tape or thread for bulky plants. Bryophytes, fungi and lichens are placed into acid-free packets. Algae can also be mounted on acid-free paper with gummed cloth tape or stored in plastic boxes in the case of calcareous algae.

Collection digitisation history
Once JABOT was created in 2005, data entry for the herbarium started with the "Projeto de Informatização" funded by Petrobras, which lasted until 2007. After this, between 2008 and 2010, without a project specifically directed towards data inclusion into the system, this task was performed only for new specimens of previously digitised families, by a smaller team, part of institutional projects and core herbarium staff. At the end of 2010, the Reflora project started, boosting data entry into the system and was completed during 2014. The GFJP and GUA herbaria were incorporated into RB in 2016 and 2017, respectively, which substantially increased the number of specimens entered into the system, as can be seen in Fig. 4.

Geographic coverage
Description: The majority of specimens were collected in Brazil (ca. 90%) and the country's most widely represented region is the Southeast, where the herbarium is based (ca. 349.000 specimens, 50% of the total). The south-eastern states of Rio de Janeiro (ca. 189,000 specimens) and Minas Gerais (ca. 90,000) are represented by the largest number of specimens (Fig. 5). It should be noted that most of this region is part of the Atlantic Forest and Rio de Janeiro state is positioned entirely within this biome.
North Brazil ranks second in number of specimens and the states of Amazonas and Pará are the best represented, with ca. 29,000 (ca. 4%) and 28,000 (ca. 4%), respectively (Fig. 5). One of the first great plant collectors in the region, especially with regard to Amazonian flora, was Adolf Ducke, who conducted expeditions in the states of Amazonas and Pará, mainly in the first half of the 20th century.

Description:
The RB herbarium has 750,000 mounted specimens, making it the largest herbarium in Brazil (Gasper and Vieira 2015). The full database is available via the Integrated Publishing Toolkit (IPT) of Rio de Janeiro Botanical Garden (Version 84.131, published in 2017-11-22) or via JABOT system (http://www.jbrj.gov.br/jabot). For the integration of Web-based applications, the JABOT Web service is available at http:// servicos.jbrj.gov.br/jabot.

Column label
Column description occurrenceID The unique identifier of the Occurrence.
identifiedBy A list (concatenated and separated) of names of people, groups or organisations who assigned the Taxon to the subject.

dateIdentified
The date on which the subject was identified as representing the Taxon.
identificationRemarks Comments or notes about the identification. family The full scientific name of the family in which the taxon is classified. genus The full scientific name of the genus in which the taxon is classified.

SpecificEpithet
The name of the first or species epithet of the scientificName. infraspecificEpithet The name of the lowest or terminal infraspecific epithet of the scientificName. taxonRank The taxonomic rank of the most specific name in the scientificName.
scientificNameAuthorship The authorship information for the scientificName.
format Image file format.
identifier A list, concatenated and separated by "|" of the specimens images URLs in a lowresolution format to be used as thumbnails.
references A list, concatenated and separated by "|" of the specimens images URLs in a highresolution format to be integrated into other portals and websites.
license A legal document giving official permission to do something with the resource. rightsHolder A person or organisation owning or managing rights over the resource. type The nature or genre of the resource.

modified
The most recent date-time on which the resource was changed. institutionCode The name (or acronym) in use by the institution having custody of the object(s) or information referred to in the record.
collectionCode The name, acronym, coden or initialism identifying the collection or data set from which the record was derived.
basisOfRecord The specific nature of the data record. fieldNumber An identifier given to the event in the field. Can be described as the number of the field campaign. fieldNotes The text of notes taken in the field about the specimen.
eventRemarks Comments or notes about the field campaign. country The name of the country or major administrative unit in which the Location occurs. countryCode The standard code for the country in which the Location occurs according to ISO 3166-1-alpha-2 country codes. stateProvince The name of the next smaller administrative region than country (state, province, canton, department, region etc.) in which the Location occurs. county The full, unabbreviated name of the next smaller administrative region than stateProvince (county, shire, department etc.) in which the Location occurs. locality The specific description of the place. Less specific geographic information can be provided in other geographic terms (higherGeography, continent, country, stateProvince, county, municipality, waterBody, island, islandGroup). This term may contain information modified from the original to correct perceived errors or to standardise the description.
minimumElevationInMeters The lower limit of the range of elevation (altitude, usually above sea level), in metres.
maximumElevationInMeters The upper limit of the range of elevation (altitude, usually above sea level), in metres. Project description: The project, aimed to capture data and images of RB specimens previously unavailable in the JABOT system, especially those in the fungi, bryophyte, algae, fruits and wood collections. It also facilitated visits by taxonomists, improving the quality of identifications, particularly for those groups for which there are no specialists at the RB.
Funding: The Brazilian Science, Technology and Innovation Ministry (MCTI), CNPq and The Brazilian Biodiversity Information System (SiBBr).

Duration: 2012-2015
Project title: "Contribuições do Jardim Botânico do Rio de Janeiro à implementação do SiBBr" (Contributions of the Rio de Janeiro Botanical Garden to the implementation of the SiBBr).

Project description:
The Brazilian Biodiversity Information System (SiBBr), is an initiative that intends to ensure the proper use of Brazilian biodiversity and ecosystem data, integrating information and facilitating processes of decision-making and public policy development. It also assists national herbaria in the digitisation of their specimens and in the repatriation of images of specimens from European and North American herbaria. The RB contributes to this initiative by making available its collections' data and those of Reflora Virtual Herbarium and the Brazilian Flora 2020, projects coordinated by this institution. This initiative currently supports daily data and image capture of new specimens incorporated into RB herbarium. Project description: The National Forestry Inventory (IFN) is coordinated by the Brazilian Forest Service (SFB) and aims to collect socio-economic and ecological information about the country's forest resources. It supports the formulation, implementation and execution of public policies for the development, use and conservation of these resources. In the state of Rio de Janeiro, the process for identification of botanical material has been carried out by taxonomist consultants hired by the IFN in the RB. In addition, all the fertile specimens collected by the project in the other Brazilian states are also sent to the RB. This project financed the acquisition of imaging equipment for all participant herbaria, contributing to collections digitisation. Also, at RB, due to its size, three technicians were hired to help with day-to-day herbarium activities.

Funding: Brazilian Forest Service
Duration: 2013-current

Challenges for biological collections digitisation and publication
The costs related to the maintenance and curation of a herbarium are significant and curators are always under pressure to gain financial support (Deng 2015). There is also a demand for modernisation and data sharing, which greatly increases the costs for these collections, augmenting the need for financial support. Those new costs come mainly from IT infrastructure and its maintenance but also originate from a demand for new and specialised staff and for software maintenance and development.
The first phases of the project of digitisation and publishing the contents of the RB herbarium occurred between 2005 and 2007, with an investment of around US$254,000 for the incorporation of metadata of 291,630 specimens and digitisation of 10,646 specimens (Gonzales 2009). From 2007 to 2011, the Global Plants Initiative focused on the type specimens that are few in number comparing to the general collection. From 2010, a new phase of the digitisation of the RB herbarium started under the umbrella project Reflora, which has the following as one of its main goals: the repatriation of high resolution images of Brazilian specimens deposited in European and North American herbaria and the digitisation of many national herbaria. Considering the fact that this phase is included in such an enormous initiative, the quantification of the amount of investments expended exclusively with RB is imprecise, at least for the time being.
Regarding infrastructure, the dataset associated with the RB collection represents 2 5.6% of all institutional digital storage space, 6.8% of processing power (CPUs) and 9.7% of memory. The associated costs of power consumption, especially of climate control in the tropics, are also significant.
Despite the fact that literature cites a number of initiatives of online open-access biodiversity databases that failed due to lack of funding, after the initial push for resources (Costello et al. 2014), there has been a solid perception that nowadays "Without data we cannot generate information and build knowledge to make informed decisions or develop indicators to track progress towards biodiversity goals and targets" (Juffe-Bignoli et al.

2016).
Thus, it is considered that the trade-off for committing a substantial portion of theinstitutional budget, as well as technical and scientific staff time, to digitisation of and publication about the collections, has been very positive for the institutional relevance, as well as for its visibility and image and this is associated with a resulting gain in funding opportunities.
staff, who works day and night in order to keep our data and infrastructure operating and the RB staff for daily dedicated work in the herbarium collection. Douglas Daly, Eimear Nic Lughadha, Fernando Matos and Walter G. Berendso for critical reading of the manuscript. JL, PL, FLRF and NQ are SiBBr grant holders. RCF is a CNPq research fellow.