New occurrence records on the rodent species inhabiting Vietnam, based on Joint Russian-Vietnamese Tropical Research and Test Center genetic samples collection

Abstract Background Open access to occurrence records in a standardised format has strong potential applications for many kinds of ecological research and bioresources management, including the assessment of invasion risks, formulation of nature protection, biomedical and management plans in the context of global climate and land-use changes both in the short and long perspective. The accumulation and aggregation of data on the occurrence records of small mammals are relevant for the study of biogeography and for ecological surveys including construction of the spatial distribution and ecological niche modelling of species ' distributions in the context of global climate change. The author has created a dataset of 2408 rodents and tree shrews occurrence records from Vietnam, collected from November 2007 to May 2022. A number of zoologist colleagues also provided genetic samples. A considerable part of these data has been published previously in a number of papers; however, most of these data have yet to be presented. These records cover a significant part of the range of many rodent species in Southeast Asia and provide new data on their distribution. The data were obtained during a number of different field expeditions, where some animals were caught by the author and some were provided by other researchers, resulting in different accuracy levels of geographic coordinates and altitude estimates may range from 10 to 1000 metres in area and from 1 to 100 metres for elevation. A number of samples were genetically examined to avoid inconsistencies with the taxonomic identification. With the help of colleagues, the author created a set of georeferenced occurrence records, adapted to the controlled vocabulary of Darwin Core format datasets, removed duplicates and standardised the format of records using commonly-used unified data structure. This paper presents the resulting dataset of rodents (mostly of Muridae and Sciuridae) along with other small terrestrial species (Scandentia Tupaidae) occurrence records in the territory of Vietnam and Laos. New information Much of the distribution data are currently available as open source GBIF databases and potentially may be combined into a united framework for better data resolution. The dataset presented here combines occurrence records of many species over a significant part of their recent natural range, in Vietnam and Laos. The author presents a validated and comprehensive dataset of rodents' occurrence records, based on genetic samples collection compiled during 15 years working in Vietnam (from 2007 to date). Prior to this project, a considerable part of the information about Vietnamese rodents was not available to a wide range of researchers to use these spatial data for analyses by modern methods, for example, for analysis based on geographic information systems (GIS technologies). This dataset now is available for any researchers who use the data format prepared in accordance with Darwin Core standards. For different countries of Southeast Asia and beyond, there are a lot of additional occurrence records for a number of species listed here which may be combined, but a considerable part of them is still scattered over a number of separate literary sources, while another is still presented as maps, field notes and huge amount of museum zoological collections records. The final set was created by a combination of species occurrence records and uniform data structure with verification of the samples' geographic coordinates. Most samples were genetically or/and morphologically verified for correct taxonomical identification, because the most part of the samples presented was carefully investigated by the author himself, both for morphology and genetic attribution. Therefore, the dataset expands the available information on the spatial and temporal distribution of a number of small mammals’ species in Southeast Asia. All original notes and geographical localities were carefully checked and any duplicate and erroneous records have been removed from the final dataset. To the date of publication of these data, the GBIF database https://www.gbif.org contained 1408 rodent occurrence records from Vietnam (Fig. 1) along with 240 Scandentia records (Fig. 2), primarily the data on museum materials, including four large collections, such as the Field Museum of Natural History (Zoology) Mammal Collection (646 samples), Australian National Wildlife Collection provider for OZCAM (537), MVZ Mammal Collection Arctos (109), Museum of Comparative Zoology, Harvard University (69) and six other minor collections comprising single specimens. Actually, as for the small terrestrial mammals, Vietnam remains one of the least representative regions in Southeast Asia. Here, we present new data containing 2408 occurrence records, including 2237 rodent records, along with 171 Scandentia ones (Fig. 3). Thus, the data significantly expand our knowledge about actual ranges of a number of species, including rare and endangered ones.

significant part of the range of many rodent species in Southeast Asia and provide new data on their distribution. The data were obtained during a number of different field expeditions, where some animals were caught by the author and some were provided by other researchers, resulting in different accuracy levels of geographic coordinates and altitude estimates may range from 10 to 1000 metres in area and from 1 to 100 metres for elevation. A number of samples were genetically examined to avoid inconsistencies with the taxonomic identification. With the help of colleagues, the author created a set of georeferenced occurrence records, adapted to the controlled vocabulary of Darwin Core format datasets, removed duplicates and standardised the format of records using commonly-used unified data structure. This paper presents the resulting dataset of rodents (mostly of Muridae and Sciuridae) along with other small terrestrial species (Scandentia Tupaidae) occurrence records in the territory of Vietnam and Laos.

New information
Much of the distribution data are currently available as open source GBIF databases and potentially may be combined into a united framework for better data resolution. The dataset presented here combines occurrence records of many species over a significant part of their recent natural range, in Vietnam and Laos. The author presents a validated and comprehensive dataset of rodents' occurrence records, based on genetic samples collection compiled during 15 years working in Vietnam (from 2007 to date). Prior to this project, a considerable part of the information about Vietnamese rodents was not available to a wide range of researchers to use these spatial data for analyses by modern methods, for example, for analysis based on geographic information systems (GIS technologies). This dataset now is available for any researchers who use the data format prepared in accordance with Darwin Core standards.
For different countries of Southeast Asia and beyond, there are a lot of additional occurrence records for a number of species listed here which may be combined, but a considerable part of them is still scattered over a number of separate literary sources, while another is still presented as maps, field notes and huge amount of museum zoological collections records. The final set was created by a combination of species occurrence records and uniform data structure with verification of the samples' geographic coordinates. Most samples were genetically or/and morphologically verified for correct taxonomical identification, because the most part of the samples presented was carefully investigated by the author himself, both for morphology and genetic attribution. Therefore, the dataset expands the available information on the spatial and temporal distribution of a number of small mammals' species in Southeast Asia. All original notes and geographical localities were carefully checked and any duplicate and erroneous records have been removed from the final dataset.
To the date of publication of these data, the GBIF database https://www.gbif.org contained 1408 rodent occurrence records from Vietnam ( Fig. 1) along with 240 Scandentia records (Fig. 2), primarily the data on museum materials, including four large collections, such as the Field Museum of Natural History (Zoology) Mammal Collection (646 samples), Australian National Wildlife Collection provider for OZCAM (537), MVZ Mammal Collection Arctos (109), Museum of Comparative Zoology, Harvard University (69) and six other minor collections comprising single specimens.  The Scandentia records registered for Vietnam in GBIF database up to the date of this publication.

Introduction
Despite the long history of investigations, fauna composition and limits of a number of species and morpha of rodents, composing the bulk of the fauna of small terrestrial mammals in Southeast Asia, remain not completely understood. By the end of the 20 century, based on investigation of the museums' collections and combininginformation on distribution accumulated over the previous period, the first reports on the fauna of the region were compiled and systematised, along with the records for individual countries and territories (Medway 1965, Medway 1969, Van Peenen et al. 1969, Harrison 1974  1976, Marshall 1977b, Marshall 1977a, Marshall 1988, Musser 1981, Taylor et al. 1982, Taylor et al. 1983, Sokolov 1982, Sokolov 1986, Sokolov 1992, Musser and Newcomb C. 1983, Heaney 1986, Corbet and Hill 1991, Corbet and Hill 1992, Musser and Carleton 1993, Pavlinov et al. 1995, which made it possible to establish, in general terms, the taxonomy (at least with resolution up to genera), to assess the species composition and formulate the initial ideas about natural ranges of rodents inhabiting Southeast Asia. Based on these papers, several field-guides, popular and reference papers were compiled shortly afterwards (Medway 1977, Medway 1978, Medway 1983, Payne et al. 1985, Lekagul and McNeely 1977, Lekagul and McNeely 1988, Nowak 1991, Nowak 1999, Zhang et al. 1997, Wilson and Reeder 1993, Kuznetsov 2006, Pavlinov 2006, Smith and Xie 2008, Francis 2008; theyhave been used up to recently for taxonomy and practical issues in the field of nature conservation and biodiversity investigation. In spite of the actual advances, fragmentary geography of samples (incomplete coverage), wide polymorphism and morphological similarity of many species of Muridae, primarily amongst the largest genera, such as Rattus, Niviventer, Chiromyscus, Leopoldamys, Mus and others, as well as significant morphological variability of a number of groups of Sciuridae, make it difficult to form correct views on the actual richness of small mammals' fauna in the region and natural ranges of the species. However, the main reason for the scarcity of knowledge in this field, which had developed by the end of the 20 century, is the lack of researchers who are ready to carefully dealwith this very complex and interesting group in a region remote from the centres of western academic science. Despite the actual advances in macrosystematics, the issue of species composition, its limits and ranges within most genera have remained largely vague and unclear for a long time.
By the beginning of 21 century, with genetic methods of analysis, it became possible to clarify many issues about the correspondence of species/morpha previously described under various names, as well as to begin to search and study so-called cryptic species. Genetic techniques, along with classic morphological approaches, made it possible to carry out accurate species diagnostics. Almost immediately, it became obvious that the species richness and fauna composition within the main genera and groups of Muridae in Southeast Asia, such as Rattus, Niviventer, Leopoldamys, Maxomys, Bandicota and Mus, is considerably underestimated, the same being true about most of the smaller and more exotic groups of mice and rats, such as Typhlomys, Chiropodomys, Chiromyscus, Dacnomys, Saxatilomys and Tonkinomys and the same is true for most of the Sciuridae genera, including the largest squirrel genera like Callosciurus, Tamiops and Dremomys.
The surveys of the author in Vietnam in the period 2007-2022 years were mainly devoted to clarifying the taxonomy and systematics of the main groups of small terrestrial rodents and resulted in a number of generic revisions and descriptions of new taxa (Balakirev and Rozhnov 2010, Balakirev and Rozhnov 2012, Balakirev and Rozhnov 2019Balakirev et al. 2011a, Balakirev et al. 2011b, Balakirev et al. 2012, Balakirev et al. 2013, Balakirev et al. 2013b, Balakirev and Rozhnov 2019, Balakirev et al. 2021, Balakirev et al. 2022, Balakirev et al. 2017, Abramov et al. 2017a, Abramov et al. 2017b) and, simultaneously, the author composed the collection of genetic (and corresponding morphological) samples held in A.N. Seversov's Institute of Ecology and th th Evolution of Russian Academy of Sciences (Moscow, Russia) and its international department Joint Russian-Vietnamese Tropical Research and Test Center (Hanoi, Vietnam). On this basis, it was possible to complete the occurrence database, which is presented in this paper.

General description
Purpose: The presented data are the most important basis for the study of natural biodiversity, including its dynamics and the investigation of the processes of formation of the mammalian fauna both in evolutionary and historical time-scales. These records are also important for the development of ecological niche models to study the correlation between climate and land-use parameter changes and the participation of recent species in complex fauna through space and time. The publication of occurrence records, based on aggregated data in a standardised format, may also provide valid information and contribute to research of the biological invasion processes.

Project description
Title: Small terrestrial mammals of eastern Indochina (Vietnam and neighbouring countries).

Study area description:
The geographic distribution of samples lies within the geographical area specified by the coordinates 8.6877-23.20476N and 102.2375-109.29083E. It contains 164 individual geographical localities, determined usually with an accuracy of 10 to 1000 m (with a few samples to 5000 m). For each locality, from 1 to 12 species of small mammals were recorded. The study area covers the whole of Vietnam and, currently, the dataset also includes a few samples from central Laos (Fig. 4).

Sampling methods
Description: Small terrestrial mammals occurrence records have been collected from various sources: field data gathered by the authorover 15 years, including ~ 1500 capture records and ~ 900 other samples (records) presented to the author by colleagues and also obtained during a number of expeditions; the dataset also included some records from samples from collections of the Department of Theriology of Zoological Museum of Moscow State University (ZMMU, Moscow). In general, samples were collected from 164 geographically attributed sites from 35 provinces and territories of Vietnam and Khammouane Province of Laos.

Sampling description:
The presented materials are based on genetic collections (total DNA samples) under the author's supervision. The data combine 2408 records including 1688 Muridae, 174 Sciuridae, 10 Platacanthomyidae, 252 Spalacidae, 171 Tupaiidae along with 113 genus-specified sample records (Table 1). This collection also includes a number of insectivorous and small carnivora samples, which currently arenot included in the dataset presented here. Although the records correspond and partially overlap with the ZMMU museum samples (skulls, skins, ethanol preserved bodies), as well as othersstored at the Zoological Institute of Russian Academy of Sciences (ZIN RAS, Saint-Petersburg), they do not cover completely the collections of these museums, but contain data only about the samples (treated here as records) deposited by the author or with his participation.

Species
Number of records  Geographical distribution of occurrence records. Table 1.
Database content by species.
New occurrence records on the rodent's species inhabits Vietnam based o ...

Taxonomic coverage
Description: The presented data cover almost completely the fauna of Muridae, Platacanthomyidae and Spalacidae of the region, as well as the largest genera of Sciuridae and includes distribution ranges of 58 species of rodents (41 out of Muridae, 15 out of Sciuridae, three out of Spalacidae and one from Platacanthomyidae) and two species of Scandentia (Table 1). taxonRank The taxonomic rank of the most specific name in the scientificName. kingdom The full scientific name of the kingdom in which the taxon is classified. phylum The full scientific name of the phylum in which the taxon is classified. class The full scientific name of the class in which the taxon is classified. order The full scientific name of the order in which the taxon is classified. family The full scientific name of the family in which the taxon is classified.

genus
The full scientific name of the genus in which the taxon is classified.