Biodiversity Data Journal :
Data Paper (Biosciences)
|
Corresponding author: Arianna Giannini (arianna.giannini@uniroma1.it)
Academic editor: Dimitris Poursanidis
Received: 03 Sep 2024 | Accepted: 09 Dec 2024 | Published: 28 Feb 2025
© 2025 Arianna Giannini, Massimo Appolloni, Luigi Romani, Marco Oliverio
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Giannini A, Appolloni M, Romani L, Oliverio M (2025) Mobilising marine biodiversity data: a new malacological dataset of Italian records (Mollusca). Biodiversity Data Journal 13: e136243. https://doi.org/10.3897/BDJ.12.e136243
|
|
The location and palaeoceanographic history of the Mediterranean Sea make it a biodiversity hotspot, prompting extensive studies in this region. However, despite the marine biodiversity of this area being apparently widely studied, a large amount of distributional data for Mediterranean taxa is still unpublished or scattered in various sources and formats, causing severe limitations to their potential reuse. This emerges as a particularly thorny issue for highly biodiverse and neglected taxa, such as invertebrates. The mobilisation of these frozen data through a process of standardisation and georeferencing could potentially support biodiversity research and conservation. The aim of this work is to provide a standardised pipeline to integrate these dispersed data, focusing on the Italian waters of the Mediterranean Sea and using molluscs as target taxa. Data were gathered from two main sources: published literature and Natural History Collections. The harmonisation process involved three key steps: 1) terminology and structure standardisation; 2) taxonomy updating and 3) georeferencing. Our efforts yielded over 44000 standardised records of mollusc species from Italian seawaters. These records encompassed primary biodiversity data from newly-digitised specimens owned by 11 different institutions and private collectors, as well as secondary biodiversity data extracted from 311 published studies.
This work is the first attempt to mobilise the available distributional information of Italian marine mollusc species from Natural History Collections and literature, converting the retrieved data into point-occurrence records through standard protocols, thus creating a FAIR (Findable, Accessible, Interoperable and Reusable) dataset collating these records from Italian marine sectors.
marine, mollusca, big data, biodiversity conservation, Natural History Collections, occurrences, Italy
Human impact on natural ecosystems is leading to several changes in global biodiversity structure and distribution. These changes include the loss of a portion of species large enough to suggest a sixth mass extinction, especially when coupled with the rate at which these extinctions are occurring (
The present work aims at collecting and making usable in the form of point-occurrences the distributional data of marine mollusc species reported in Italy, by integrating via harmonisation and georeferencing processes both primary (i.e. newly-digitised specimen from public and private Natural History Collections) and secondary biodiversity data (i.e. non-databased spatial information of species reported in publicly-accessible papers). The dataset concept can be visualised in Fig.
Dataset concept. The dataset's core information consists of species occurrences. Occurrences are extracted from two sources: literature and NHCs. From the original source, records are converted into Primary or Secondary Biodiversity Data (respectively, PBD and SBD) through a process of standardisation and, when necessary, georeferencing. The scientific names of species are aligned with the World Register of Marine Species (WoRMS) nomenclature so that they are comparable with other taxonomic database resources (e.g. Ocean Biodiversity Information System: www.obis.org, Catalogue of Life: www.catalogueoflife.org, Global Biodiversity Information Facility: www.gbif.org).
Data were gathered from two main sources: literature and Natural History Collections (NHCs). To collect literature data, a comprehensive search was performed on the public databases Scopus and Web of Science. In addition to this, we also searched data from journals specialised on Mediterranean marine fauna, namely Iberus and all the volumes of both journals of the Italian Society of Malacology (Società Italiana di Malacologia, SIM): Bollettino Malacologico and Alleryana. Since until the publication of the Checklist of the Italian Fauna (
Literature data search constraints, dates and number of results obtained for each consulted source.
Source | Search constraints | Start date | N° of results |
---|---|---|---|
Scopus |
TITLE-ABS-KEY (marine AND mollusca AND italy) | 16/05/2023 | 306 |
Web of Science | ((ALL=(marine)) AND ALL=(mollusca)) AND ALL=(italy) | 05/07/2023 | 442 |
Google Scholar | MARINE+MOLLUSCA+ITALY source:Iberus, from 1995 to 2023. | 20/10/2023 | 33 |
SIM | All volumes of Bollettino Malacologico and Alleryana from 1995 to 2023. | 24/11/2023 | 567 |
Total | 1348 |
To remove human-readable leading, trailing, double spaces and non-printable characters, the entire dataset was run through the Excel TRIM, CLEAN and SUBSTITUTE functions. Carriage returns were checked and removed using Notepad++ software (
1. Firstly, data were merged and formatted in a Darwin Core scheme (
2. With the same package, a first filter was performed to clean the dataset from duplicates and records lacking essential information (i.e. identification or locality/coordinates). Then, data were manually filtered to retrieve records that were: out of scope (i.e. occurrences outside the Italian Marine Exclusive Economic Zone, fossils, non-marine species), too vague (i.e. broad locality, specimens with a higher level of identification than the genus) or dubious (dubious locality, ambiguous and/or unclear identification). All these cleaning steps have been consolidated into the "Invalid Records Filter" block in Fig.
Flow chart of the standardisation pipeline, with the number of input records and the number of discarded ones at each step. With the term "invalid", we identify all records out of scope or with problems (i.e. duplicates, records lacking essential information, fossils, non-marine species, specimens with a higher level of identification than the genus, ambiguous and/or unclear identification, records with dubious locality, occurrences outside study area or with broad locality).
3. Taxonomy was aligned to the one proposed by the World Register of Marine Species (
4. The remaining dubious taxonomy that was not automatically validated was checked manually and then submitted to experts, which resulted in the removal of other records with dubious identification.
5. Open Nomenclature (ON) qualifiers (
6. Subsequently, records were classified in seven different groups based on the type of the geographic information they had, in order to georeference them by the most appropriate method. Georeferencing was performed following the point-radius method (
Description of the seven types of geographic information contained in the original records and georeferencing protocols and methods applied in each case. This information can be found in the dataset column dwc:georeferenceRemarks.
Type | Description | Georeference method and protocol | N° of records |
---|---|---|---|
Corrected |
The original coordinates placed the raw record on land. |
The record was moved to the position at sea nearest to the one defined by the original coordinates. A standard uncertainty of 100 m was assigned to the record. |
748 |
Depth driven |
The original geographical information was provided as a textual locality and the depth at which the specimen was found. |
The record was georeferenced following the |
21144 |
Distance driven |
The original geographical information was provided as the distance from a locality. |
The record was georeferenced following the |
262 |
Exact |
The raw record already had exact coordinates provided in some format. |
Where necessary, coordinates were converted to WGS84 decimal degrees. A standard uncertainty of 50 m was assigned to the record. |
9012 |
Locality approximation |
The original geographical information was provided as a textual locality, without depth or distance from the coast. |
The record was georeferenced following the |
9430 |
Map approximation |
The raw record was geographically positioned through a visual representation (e.g. map, satellite imagery) in the original source. |
The record has been mapped trying to recreate as closely as possible the position represented in the original source. A standard uncertainty of 500 m was assigned to the record. |
2894 |
Route |
The raw record was geographically defined by the coordinates of the start and end of the route taken during the event in which the specimen was found. |
The record was georeferenced following the |
606 |
7. During the georeferencing process it was possible to remove other data occurring outside study boundaries. We then excluded records with >5000 m of uncertainty radius.
8. As raw temporal data from NHCs arrived in various formats, this information was handled with the R package lubridate (
Collected data occurred within the Italian Exclusive Economic Zone (EEZ), that consists of a marine area of 538,216 km2 (
35.06440614922465 and 45.80891370810167 Latitude; 5.889722222129848 and 18.99523827942329 Longitude.
The dataset includes 44096 occurrences of 1513 Italian marine mollusc species, covering 85% of the Italian malacofauna and six out of the eight classes reported in Italy (
Class | Genus | Species | Subspecies | Occurrences |
---|---|---|---|---|
Bivalvia |
207 |
326 |
1 |
9410 |
Cephalopoda |
28 |
35 |
0 |
682 |
Gastropoda |
457 |
1110 |
7 |
32832 |
Monoplacophora |
2 |
2 |
0 |
21 |
Polyplacophora |
12 |
28 |
0 |
820 |
Scaphopoda |
7 |
12 |
0 |
331 |
Total |
713 |
1513 |
8 |
44096 |
Rank | Scientific Name |
---|---|
kingdom | Animalia |
phylum | Mollusca |
class | Bivalvia |
class | Cephalopoda |
class | Gastropoda |
class | Monoplacophora |
class | Polyplacophora |
class | Scaphopoda |
family | Acanthochitonidae |
family | Acteonidae |
family | Addisoniidae |
family | Aegiridae |
family | Aeolidiidae |
family | Aglajidae |
family | Akeridae |
family | Alacuppidae |
family | Amathinidae |
family | Anabathridae |
family | Anatomidae |
family | Anomiidae |
family | Aplysiidae |
family | Aporrhaidae |
family | Architectonicidae |
family | Arcidae |
family | Argonautidae |
family | Arminidae |
family | Assimineidae |
family | Astartidae |
family | Atlantidae |
family | Barleeiidae |
family | Basterotiidae |
family | Bathysciadiidae |
family | Borsoniidae |
family | Brachioteuthidae |
family | Bullidae |
family | Bursidae |
family | Cadlinidae |
family | Caecidae |
family | Calliostomatidae |
family | Callistoplacidae |
family | Callochitonidae |
family | Calmidae |
family | Calycidorididae |
family | Calyptraeidae |
family | Cancellariidae |
family | Capulidae |
family | Cardiidae |
family | Carditidae |
family | Carinariidae |
family | Cassidae |
family | Cavoliniidae |
family | Cerithiidae |
family | Cerithiopsidae |
family | Chamidae |
family | Charoniidae |
family | Chauvetiidae |
family | Chilodontaidae |
family | Chitonidae |
family | Chromodorididae |
family | Cimidae |
family | Cingulopsidae |
family | Clathurellidae |
family | Clavagellidae |
family | Cliidae |
family | Cocculinidae |
family | Colloniidae |
family | Colpodaspididae |
family | Colubrariidae |
family | Columbellidae |
family | Conidae |
family | Corbulidae |
family | Cornirostridae |
family | Coryphellidae |
family | Costellariidae |
family | Cranchiidae |
family | Crassatellidae |
family | Creseidae |
family | Cuspidariidae |
family | Cylichnidae |
family | Cymatiidae |
family | Cymbuliidae |
family | Cypraeidae |
family | Cystiscidae |
family | Dendrodorididae |
family | Dentaliidae |
family | Diaphanidae |
family | Discodorididae |
family | Donacidae |
family | Dorididae |
family | Dotidae |
family | Dreissenidae |
family | Drilliidae |
family | Elachisinidae |
family | Eledonidae |
family | Ellobiidae |
family | Embletoniidae |
family | Entalinidae |
family | Epitoniidae |
family | Eratoidae |
family | Eubranchidae |
family | Eulimidae |
family | Facelinidae |
family | Fasciolariidae |
family | Fionidae |
family | Fissurellidae |
family | Flabellinidae |
family | Fusiturridae |
family | Fustiariidae |
family | Gadilidae |
family | Gadilinidae |
family | Galeommatidae |
family | Gastrochaenidae |
family | Glossidae |
family | Glycymerididae |
family | Goniodorididae |
family | Granulinidae |
family | Gryphaeidae |
family | Haliotidae |
family | Halonymphidae |
family | Haminoeidae |
family | Hancockiidae |
family | Hanleyidae |
family | Heliconoididae |
family | Hermaeidae |
family | Heroidae |
family | Hiatellidae |
family | Histioteuthidae |
family | Horaiclavidae |
family | Hyalocylidae |
family | Hyalogyrinidae |
family | Hydrobiidae |
family | Iravadiidae |
family | Ischnochitonidae |
family | Isognomonidae |
family | Janolidae |
family | Kelliellidae |
family | Laonidae |
family | Larocheidae |
family | Lasaeidae |
family | Lepetellidae |
family | Lepetidae |
family | Leptochitonidae |
family | Limacinidae |
family | Limapontiidae |
family | Limidae |
family | Limopsidae |
family | Littorinidae |
family | Loliginidae |
family | Lottiidae |
family | Lucinidae |
family | Lyonsiellidae |
family | Lyonsiidae |
family | Mactridae |
family | Malleidae |
family | Malletiidae |
family | Mangeliidae |
family | Margaritidae |
family | Marginellidae |
family | Mathildidae |
family | Mesodesmatidae |
family | Mitridae |
family | Mitromorphidae |
family | Murchisonellidae |
family | Muricidae |
family | Myidae |
family | Myrrhinidae |
family | Mytilidae |
family | Nassariidae |
family | Naticidae |
family | Neoleptonidae |
family | Neopilinidae |
family | Neritidae |
family | Newtoniellidae |
family | Noetiidae |
family | Notodiaphanidae |
family | Nuculanidae |
family | Nuculidae |
family | Octopodidae |
family | Octopoteuthidae |
family | Ocythoidae |
family | Omalogyridae |
family | Ommastrephidae |
family | Onchidiidae |
family | Onchidorididae |
family | Onychoteuthidae |
family | Orbitestellidae |
family | Ostreidae |
family | Otinidae |
family | Ovulidae |
family | Oxynoidae |
family | Pandoridae |
family | Parilimyidae |
family | Patellidae |
family | Pectinidae |
family | Pediculariidae |
family | Pendromidae |
family | Peraclidae |
family | Periplomatidae |
family | Pharidae |
family | Phasianellidae |
family | Philinidae |
family | Pholadidae |
family | Phyllidiidae |
family | Pinnidae |
family | Pisaniidae |
family | Piseinotecidae |
family | Plakobranchidae |
family | Planaxidae |
family | Platyhedylidae |
family | Pleurobranchaeidae |
family | Pleurobranchidae |
family | Polyceridae |
family | Poromyidae |
family | Potamididae |
family | Pristiglomidae |
family | Propeamussiidae |
family | Psammobiidae |
family | Pseudococculinidae |
family | Pteriidae |
family | Pterotracheidae |
family | Pulsellidae |
family | Pyramidellidae |
family | Ranellidae |
family | Raphitomidae |
family | Retusidae |
family | Rhizoridae |
family | Ringiculidae |
family | Rissoellidae |
family | Rissoidae |
family | Rissoinidae |
family | Runcinidae |
family | Samlidae |
family | Scaliolidae |
family | Scaphandridae |
family | Scissurellidae |
family | Scyllaeidae |
family | Semelidae |
family | Sepiidae |
family | Sepiolidae |
family | Siliquariidae |
family | Siphonariidae |
family | Skeneidae |
family | Skeneopsidae |
family | Solecurtidae |
family | Solemyidae |
family | Solenidae |
family | Spondylidae |
family | Tellinidae |
family | Teredinidae |
family | Tethydidae |
family | Thraciidae |
family | Thyasiridae |
family | Thysanoteuthidae |
family | Tjaernoeiidae |
family | Tonicellidae |
family | Tonnidae |
family | Tornidae |
family | Trapezidae |
family | Tremoctopodidae |
family | Trimusculidae |
family | Trinchesiidae |
family | Triphoridae |
family | Tritoniidae |
family | Triviidae |
family | Trochaclididae |
family | Trochidae |
family | Truncatellidae |
family | Tudiclidae |
family | Turbinidae |
family | Turritellidae |
family | Tylodinidae |
family | Umbraculidae |
family | Ungulinidae |
family | Vanikoridae |
family | Velutinidae |
family | Veneridae |
family | Vermetidae |
family | Verticordiidae |
family | Vitrinellidae |
family | Volvatellidae |
family | Xenophoridae |
family | Xylodisculidae |
family | Xylophagaidae |
family | Yoldiidae |
The dataset includes both recent and historical data. Regarding NHCs data, pre-1950 records come from the historical collections held in the Civic Museum of Zoology of Rome (i.e. Monterosato, Meli and Piersanti NHCs), while post-1950 records are from private collections. For literature data, on the other hand, pre-1950 records come mainly from published catalogues of historical collections, while more recent records come from faunistic and ecological studies. The list of all literature and NHCs data sources can be found respectively in Suppl. material
Creative Commons Attribution Non-Commercial (CC-BY-NC 4.0)
Column label | Column description |
---|---|
institutionID | An identifier for the institution having custody of the object(s) or information referred to in the record. For data collected from public institutions, the identifiers of the Global Registry of Scientific Collections (GRSciColl) were used. |
institutionCode | The name (or acronym) in use by the institution having custody of the object(s) or information referred to in the record. For data collected from public institutions, the identifiers of the Global Registry of Scientific Collections (GRSciColl) were used. For data collected from private collections, the following non-standard identifiers were used, marked with the prefix "PriColl_": Nofroni I. (PriColl_NI), Renda W. (PriColl_RW), Romani L. (PriColl_RL), Roncone F. (PriColl_RF), Russo P. (PriColl_RP), Tringali L. (PriColl_TL), Trono D. (PriColl_TD). |
collectionCode | The name, acronym, coden or initialism identifying the collection or dataset from which the record was derived. |
basisOfRecord | The specific nature of the data record. |
catalogNumber | An identifier for the record within the dataset or collection. |
recordedBy | A list (concatenated and separated) of names of people, groups or organisations responsible for recording the original dwc:Occurrence. |
individualCount | The number of individuals present at the time of the dwc:Occurrence. |
establishmentMeans | Statement about whether a dwc:Organism has been introduced to a given place and time through the direct or indirect activity of modern humans. |
associatedReferences | A list (concatenated and separated) of identifiers (publication, bibliographic reference, global unique identifier, URI) of literature associated with the dwc:Occurrence. This column was used to indicate the bibliographic source from which literature data was collected. |
eventDate | The date or interval during which a dwc:Event occurred. Dates are expressed following the ISO 8601 standard. |
year | The four-digit year in which the dwc:Event occurred, according to the Common Era Calendar. |
higherGeographyID | The Getty Thesaurus of Geographic Names persistent identifier for the geographic region within which the dcterms:Location occurred. |
higherGeography | The less specific geographic name of the information captured in the dwc:locality term. |
waterBody | The name of the water body in which the dcterms:Location occurs. Names of the Italian marine biogeographical areas were used (Bianchi et al. 2004). |
country | The name of the country in which the dcterms:Location occurs. |
stateProvince | The name of the next smaller administrative region than country (i.e. region) in which the dcterms:Location occurs. |
locality | The specific description of the place. |
minimumDepthInMetres | The lesser depth of a range of depth below the local surface, in metres. |
maximumDepthInMetres | The greater depth of a range of depth below the local surface, in metres. |
decimalLatitude | The geographic latitude (in decimal degrees, using the spatial reference system given in dwc:geodeticDatum) of the geographic centre of a dcterms:Location. |
decimalLongitude | The geographic longitude (in decimal degrees, using the spatial reference system given in dwc:geodeticDatum) of the geographic centre of a dcterms:Location. |
geodeticDatum | The ellipsoid, geodetic datum or spatial reference system (SRS), upon which the geographic coordinates given in dwc:decimalLatitude and dwc:decimalLongitude are based. |
coordinateUncertaintyInMetres | The horizontal distance (in metres) from the given dwc:decimalLatitude and dwc:decimalLongitude describing the smallest circle containing the whole of the dcterms:Location. |
georeferencedBy | A list (concatenated and separated) of names of people, groups or organisations who determined the georeference (spatial representation) for the dcterms:Location. |
georeferenceProtocol | A description or reference to the methods used to determine the spatial footprint, coordinates and uncertainties. |
georeferenceRemarks | Notes or comments about the spatial description determination, explaining assumptions made in addition or opposition to the those formalised in the method referred to in dwc:georeferenceProtocol. |
identificationQualifier | A brief phrase or a standard term ("cf.", "aff.") to express the determiner's doubts about the dwc:Identification. The standard terminology proposed by Sigovini et al. (2016) was followed. |
identifiedBy | A list (concatenated and separated) of names of people, groups or organisations who assigned the dwc:Taxon to the subject. |
scientificNameID | An identifier for the nomenclatural (not taxonomic) details of a scientific name. WoRMS LSID persistent identifiers were used. |
scientificName | The full scientific name. |
class | The full scientific name of the class in which the dwc:Taxon is classified. |
order | The full scientific name of the order in which the dwc:Taxon is classified. |
family | The full scientific name of the family in which the dwc:Taxon is classified. |
genus | The full scientific name of the genus in which the dwc:Taxon is classified. |
specificEpithet | The name of the first or species epithet of the dwc:scientificName. |
infraspecificEpithet | The name of the lowest or terminal infraspecific epithet of the dwc:scientificName. |
taxonRank | The taxonomic rank of the most specific name in the dwc:scientificName. |
scientificNameAuthorship | The authorship information for the dwc:scientificName formatted according to the conventions of the applicable dwc:nomenclaturalCode. |
References of the 311 publications from which data were extracted and number of records obtained from each source.
List of private collectors and institutions that own the 11 NHCs from which the data were collected, with initial number of raw records and final number of standardised records.
This list contains annotations on some ambiguous taxa (i.e. taxonomic entities that are difficult to interpret/not fully resolved, possible fossil taxa and occasional alien species for the Italian fauna) that have nevertheless been included in the dataset.