Unlocking the Entomological Collection of the Natural History Museum of Maputo, Mozambique

Abstract Background The collections of the Natural History Museum of Maputo have a crucial role in the safeguarding of Mozambique's biodiversity, representing an important repository of data and materials regarding the natural heritage of the country. In this paper, a dataset is described, based on the Museum’s Entomological Collection recording 409 species belonging to seven orders and 48 families. Each specimen’s available data, such as geographical coordinates and taxonomic information, have been digitised to build the dataset. The specimens included in the dataset were obtained between 1914–2018 by collectors and researchers from the Natural History Museum of Maputo (once known as “Museu Alváro de Castro”) in all the country’s provinces, with the exception of Cabo Delgado Province. New information This paper adds data to the Biodiversity Network of Mozambique and the Global Biodiversity Information Facility, within the objectives of the SECOSUD II Project and the Biodiversity Information for Development Programme. The aforementioned insect dataset is available on the GBIF Engine data portal (https://doi.org/10.15468/j8ikhb). Data were also shared on the Mozambican national portal of biodiversity data BioNoMo (https://bionomo.openscidata.org), developed by SECOSUD II Project.


Introduction
More than 3000 insect species are estimated to be present in Mozambique (Ministèrio da Terra, Ambiente e Desenvolvimento Rural 2015). However, the country's entomological diversity is poorly documented. Despite the increasing research on insect diversity, there is poor dissemination of generated data and minimal usage of these scientific findings (Ministry for Coordination of Environmental Affairs 2014). Taking into account these shortcomings, the need to document and record the country's entomological diversity emerges.
In such a framework, the Entomological Collection of the Natural History Museum of Maputo (NHMM) could play a crucial role in documenting and disseminating data about Mozambican entomological diversity. The Natural History Museum Collections (NHMCs) are important repositories of biodiversity data and represent a fundamental system for providing references to describe the natural world (Alves et al. 2014). Conventionally, in museum collections, organisms are identified, catalogued and stored in a systematic order, representing an important and durable source of ancillary data (Sampaio et al. 2019). Moreover, they provide the source material that can be used for several biological studies (Monteiro et al. 2017) and their relevance is rapidly growing in understanding, using and protecting natural resources. Specimens in museum collections and the information available in related databases are used internationally for examining biodiversity changes related to habitat loss, climate change, biological invasions and for determining the threat status of species (Hamer 2012).
Although the NHMCs preserve pivotal information on biodiversity, they are often difficult to access. The digitisation of museum collections can contribute to overcoming this limitation by allowing easier access to museum heritage and making data on global biodiversity available to researchers and policy-makers (Drew et al. 2017). With this intent, an efficient way to tackle the digitisation of the museums records is to divide collection data by distinct subsets, representing specific taxonomic compartments (Roy and Gagnon 2016).
The NHMM houses a large zoological collection, which includes mammals, birds, reptiles, fishes, insects and other invertebrates (Natural History Museum of Maputo 2016a). The museum is part of the Eduardo Mondlane University of Maputo (UEM), both as a centre of scientific communication and as a hub for the development of biological research activities.
Its mission is to preserve and promote Mozambique's wildlife heritage, encourage scientific research on its fauna and ecosystems and, lastly, promote formal and informal environmental education, contributing to the sustainable use and management of the Country's natural resources and ecosystems (Universidade Eduardo Mondlane 2015). Considering the poor documentation about Mozambican entomological diversity, the Entomological Collection of the NHMM created a dataset (hereafter called "the dataset") to encompass biodiversity data and to make this data available for the scientific community. The dataset records the primary biodiversity data, such as taxonomic classification, geographical coordinates of sampling site and date of collection of the specimens. A taxonomic review of the specimens was carried out and the collection sites for each occurrence were georeferenced using Google Earth 7.3 Software. The specimens of the entomological collection, included in the dataset, have been sampled in all the country's provinces (excluding Cabo Delgado) between 1914 and 2018 by collectors and researchers from NHMM. The main contributions were made by Maria Corinta Ferreira and Gunderico da Veiga Ferreira, during their work as entomologists at the NHMM. With 176527 specimens, belonging to almost all insect orders found in the country, the Museum's Entomological Collection is the largest specimen collection in Mozambique. It holds a pivotal value both at national and Afro-tropical Regional level (Natural History Museum of Maputo 2016b).
Thus, by making the knowledge of Mozambique's entomological diversity accessible, the dataset produced by NHMM can support researchers and policy-makers in planning strategies to manage and conserve the entomological biodiversity and its related fauna and flora.
The dataset has been developed in the framework of the SECOSUD II Project within the Biodiversity Network of Mozambique BioNoMo (https://bionomo.openscidata.org/bionomo) initiative, which aims to provide a tool for national aggregation of biodiversity data (SECOSUD II Italian Cooperation Project 2017b). The dataset has also benefited from the contribution of the Biodiversity Information for Development (BID) Program for sharing primary biodiversity data on the Global Biodiversity Information Facility (https://www.gbif.org/) portal through the project Mobilizing primary biodiversity data for Mozambican species of conservation concern (Global Biodiversity Information Facility Secretariat 2017).

General description
Purpose: Mobilisation of primary biodiversity data for Mozambique's entomofauna Additional information: The dataset is a subset of the Entomological Collection of the NHMM. The species included in the dataset were taxonomically reviewed. All dataset specimens were collected in Mozambique, during sampling expeditions conducted between 1914 and 2018, from 225 different localities. Approximately 93% of the specimens are georeferenced. The dataset includes taxonomic classification, locality name, sampling coordinates, catalogue number and collection date.

Project description Title: SECOSUD II -Conservation and equitable use of biological diversity in the SADC region: Biodiversity Network of Mozambique initiative and Mobilizing primary biodiversity data for Mozambican species of conservation concern.
Study area description: The study area of the SECOSUD II Project encompasses the following countries belonging to the Southern African Development Community: Mozambique, Eswatini, South Africa and Zimbabwe. The project Mobilizing primary biodiversity data for Mozambican species of conservation concern was designed for Mozambique.

Design description:
The dataset was digitised in the framework of the BioNoMo initiative, as part of SECOSUD II Project and is one of the occurrence datasets published on GBIF through the project Mobilizing primary biodiversity data for Mozambican species of conservation concern, within the Biodiversity Information for Development Programme.
The SECOSUD II Project aims to consolidate the capacities of decision-makers responsible for land planning and management of natural resources. This project also aims promote and support the harmonisation of land management processes at the national, regional and international level. The main objective of the SECOSUD II Project is to promote biodiversity conservation and sustainable economic development in the SADC (Southern African Development Community) region, consistent with the Convention on Biological Diversity (CBD) goals (SECOSUD II Italian Cooperation Project 2017a). Project activities include support for the development of a national platform for the collection, organisation and sharing of information on biological diversity. BioNoMo activities include initialising the database of primary biodiversity data in each partner institution to create a biodiversity information network, freely available on a deputed web portal. Such web portal provides: (i) the documentation on biodiversity at national level; (ii) data on primary biodiversity; and (iii) species data such as taxonomic lists and image archives. BioNoMo aims to be a tool for national aggregation of biological diversity data, making such data available to support the development of more effective strategies for biodiversity conservation. Therefore, it is a source of information which can support scientific research and national institutions in the reporting commitments related to international convention on biodiversity conservation, such as the Convention on Biological Diversity (SECOSUD II Italian Cooperation Project 2017b). SECOSUD II works with research institutes involved in the management and conservation of biodiversity in the SADC region. The project works with some of the main collectors and suppliers of primary biodiversity data in the region, The project Mobilizing primary biodiversity data for Mozambican species of conservation concern aims to mobilise data on endemic and near-endemic species of plants, birds, reptiles, amphibians and fish (Biodiversity Information for Development 2019). The main objective is to increase the availability and use of biodiversity information, to support landuse decision-making (Global Biodiversity Information Facility Secretariat 2017) and biodiversity conservation strategy planning. Currently, more than 130000 occurrence records have been digitised (Biodiversity Information for Development 2019). The lead project institution in the country was the Instituto de Investigação Agrária de Moçambique (IAMM), including several additional partners involved at national and international level. Moreover, in order to promote the exchange of taxonomic knowledge, a partnership with the South African National Biodiversity Institute (SANBI) has been developed, leading to the creation of a network of experts between both countries and to the consequent capacity building in Mozambique. Another partner, the Royal Botanic Gardens, Kew, provided additional training in IUCN Red Listing through its Tropical Important Plants Area initiative. Finally, through its BioNoMo initiative, SECOSUD II Project provided technical assistance to the data digitising activities and has made the digitised data available via the BioNoMo portal. In addition, other partners who allowed data publishing are: Entomoteca -Ministério de Agricultura e Segurança Alimentar (  Sampling methods Study extent: Sampling occurred in nine (9) provinces of the Country (Maputo, Gaza, Inhambane, Manica, Sofala, Tete, Zambezia, Nampula and Niassa). No records were collected in Cabo Delgado Province (Fig. 1).
Sampling description: Samples from the Entomological Collection of NHMM were collected between 1914 and 2018 in 225 localities. The main contributor to the collection was Maria Corinta Ferreira Fontes de Melo Ferreira , during her work as resident entomologist at the NHMM and Gunderico da Veiga Ferreira, an entomologist for the Board of Geographical Missions and Colonial Research. In 1949, Maria Corinta Ferreira established a programme for collecting insects in the wood sawmills and forests of the Maputo Region. Consequently, the programme was extended (mainly in 1965 and 1973) to other southern provinces, with the intent of enriching and diversifying the museum's entomological collection (Antunes 2016).
In the most recent years, the collection was expanded through contributions from National Parks and Reserves; the majority of specimens was donated by Gorongosa National Park and the Maputo Special Reserve. Field expeditions, conducted by the NHMM, have also led to an increase in the number of specimens, particularly in the Coleoptera and Lepidoptera orders.
Quality control: The data quality control was supported by national and international entomology experts, which validated the taxonomic classification of specimens.
Step description: In the framework of the BioNoMo initiative and the Mobilizing primary biodiversity data for Mozambican species of conservation concern project, assistants and staff of data provider institutions were trained on biodiversity database creation, digitisation and management.
The dataset was developed through the digitisation of labels and field cards of the specimens in the Entomological Collection of NHMM. The data included in the dataset were cleaned with an exhaustive review of existing museum material. Such review was supported by entomology specialists from the Sapienza -University of Rome, by performing a taxonomic verification of each specimen collected. Specimens without a reliable taxonomic classification were not included in the dataset. Thus, a revision of the cataloguing records of the included specimens (such as catalogue number, collector, data, location of collection, storage and the status of maintenance of the specimens) was performed. In order to automatically update the dataset through an international biodiversity database, such as the Catalogue of Life (https://www.catalogueoflife.org/) and the Encyclopedia of Life (https://eol.org/), taxonomic classification was updated and validated using R Statistical Program version 3.4.2., which allowed us to compare our data with other entomological databases. Data were digitised through SPECIFY Version 7. Approximately 93% of specimens were georeferenced according to the guidelines of Chapman and Wieczorek (2020). Using the Georeferencing Calculator tool, the pointradius method was adopted as a practical solution for the georeferencing of descriptive localities. This method has been chosen to facilitate the management of uncertainties related to the georeferencing of the older samples (Wieczorek et al. 2004).
In addition, maps and gazetteers were used to further refine the georeferencing of the sampling locations by providing coordinates and spatial boundaries for the sites described in the field card of each specimen. The geographic coordinates were determined using Google Maps. Decimals of geographic coordinates were based on the World Geodetic System 84 (WGS84) datum.
The georeferencing process applied is consistent with the requirements of the Darwin Core standard on which the dataset has been built. The Darwin Core standard is an open access ensemble of rules and definitions to facilitate the digital sharing of information about biological diversity. Darwin Core is based mainly on the concept of taxa, their occurrence in nature, as documented by observations, specimens, samples and related information (Wieczorek et al. 2010).

Taxonomic coverage
Description: The dataset includes 7967 specimens from seven orders, 48 families and 409 species. Orthoptera is the most represented order (39% of the specimens), followed by Diptera (26%) and Lepidoptera (18%). The remaining part of the specimens belongs to the orders Blattodea, Odonata, Coleoptera and Mantodea, which account for 10%, 5%, 2% and 0.60% of the data, respectively (Fig. 3).

Rank
Scientific Name  (Hochkirch et al. 2018) and Empusa fasciata (Shcherbakov and Battiston 2020) are classified as "Endangered", "Near Threatened" and "Data Deficient", respectively. The scientific name of the kingdom to which the specimen belongs

Phylum
The scientific name of the phylum to which the specimen belongs

Class
The scientific name of the class to which the specimen belongs

Order
The scientific name of the order to which the specimen belongs

Family
The scientific name of the family to which the specimen belongs

Genus
The scientific name of the genus to which the specimen belongs

Subgenus
The scientific name of the subgenus to which the specimen belongs