Distribution of alpine endemic plants of northern Asia: a dataset

Abstract Background We describe a dataset providing information on the geographic distribution of northern Asian endemic alpine plants. It was obtained by digitising maps from the atlas “Endemic alpine plants of Northern Asia”. Northern Asia includes numerous mountain ranges which may have served as refugia during the Pleistocene ice ages, but there have been no studies that analysed this question. We suggest that this dataset can be applied for better understanding of the alpine endemism in northern Asia. New information The dataset includes 13709 species distribution records, representing 211 species from 31 families and 106 genera. Each record provides data regarding the distribution of an individual species. These data provide a foundation for studying northern Asia's endemic alpine species and conducting research on the factors concerning their distribution.


Introduction
Being climatically and topographically heterogeneous, mountain ecosystems are characterised by a high degree of plant species diversity (López-Pujol et al. 2011, Hassan et al. 2005. They are often considered to have been potential refugia or buffering zones that either prevented extinction or promoted speciation during the Quaternary glacialinterglacial shifts because of their high spatiotemporal climatic stability (Sandanov et al. 2020, Harrison and Noss 2017, Feng et al. 2016, Sandel et al. 2011. During the Pleistocene glacial periods, ice sheets expanded greatly throughout northern Asia, mountainous regions contributed to the preservation of a number of alpine species (Harrison andNoss 2017, Volkova andBaranova 1980). Malyshev considered nine mountain areas in northern Asia that served as refugia during ice ages of Pleistocene for at least 231 alpine endemic species (Malyshev 1979, Vodopyanova et al. 1974. Alpine endemism was studied for mountain ranges of Siberia, Far East and northern part of Asia (Malyshev 1972, Krasnoborov 1974, Yurtzev 1981, Schlothauer 1990. In more recent studies, it was revealed that Far East has seven centres of edemism (Kozhevnikov 2007). However, despite numerous studies on endemism of northern Asia's alpine plants, it is not considered as an endemism hot spot on a global scale (Harrison andNoss 2017, Hobohm et al. 2019). Moreover, to this date, there have been no studies that quantitatively assess the correlation amongst climate, topography and alpine endemism in northern Asia. We consider the lack of baseline species distribution data is the main reason for this lack. We have developed and are sharing this dataset to address this need and to encourage the quantitative analyses required for developing a better understanding of the alpine endemism in northern Asia (Brianskaia et al. 2021).
checked (Vodopyanova et al. 1974). The list of editors of the atlas includes Vodopyanova N.S., Malyshev L.I., Siplivinskiy V.N., Tolmachev A.I. and Yurtsev B.A. Many different cartographers were involved in preparing the published maps (Table 1). Taxonomy of species in the GBIF dataset is given both as published in the atlas (Vodopyanova et al. 1974) in scientificName column and verified according to the Catalogue of Life (Roskov et al. 2019) and Checklist of Asian Russia Flora (Baikov 2012 (4), Saussurea ajanensis (4)

Sampling methods
Study extent: Northern Asia is an extensive area, stretching from the Ural Mountains in the west to the Pacific Ocean in the east; from the Arctic Ocean in the north to Central and East Asia in the south. According to Malyshev (Malyshev 1979), there are nine areas with alpine flora which includes mountains of Russian Far East, south-eastern Siberia, Ural and Putorana (Fig. 3). Example of the distribution map scan -Bergenia crassifolia.
Sampling description: In total, 231 maps were scanned from the atlas Endemic Alpine Plants of Northern Asia (Vodopyanova et al. 1974). All maps were adjusted to the same size and horizontal position in order to obtain standardised images of the maps.  Digitalisation was performed in QGIS 3.10 software by means of a georeferencing tool. Source raster distribution maps were georeferenced by snapping control points to the destination vector shapefile, which, in our case, was the border of Russia. This transformed all the maps to the WGS84 spatial projection. Subsequently, species distribution locations were digitised from each map. Coordinates of each location were calculated by QGIS and displayed in the attribute table.
Quality control: Final examination of the digitised species distribution maps was performed in QGIS 3.10. This step took most of the time and efforts in the entire digitising process. Each digitised map was compared to the original print map and the habitat of each digitised record compared with the habitat characteristics and geographic range of the species concerned reported in literature. Major sources for this part of the review were the Flora of Siberia (Krasnoborov et al. 1997, Lomonosova et al. 1992, Malyshev et al. 1990, Peshkova et al. 1994, Peshkova et al. 1990, Pimenov et al. 1996, Polozhiy et al. 1994, Polozhiy et al. 1996, Timokhina et al. 1993, Vlasova et al. 1987, Vydrina et al. 1998, Doronkin et al. 1997, Kashina et al. 1988) and Vascular Plants of Soviet Far East (Kharkevich 1985, Kharkevich 1987, Kharkevich 1988, Kharkevich 1989, Kharkevich 1991, Kharkevich 1992, Kharkevich 1995, Kharkevich 1996. Almost all (97%) of the digitised maps were consistent with the printed maps. Those that were not included records from near the ocean and in the Far East. They were manually adjusted to match the printed maps. For example, such records were adjusted for Betula middendorffii for which distribution goes along the sea of Okhotsk (Figs 5,6,7,8). оклонения от речной сети размер точек. Coordinate uncertainty in metres was calculated, based on three types of uncertainties (Chapman and Wieczorek 2020). The first type is the coordinate uncertainty of the species occurrence from the herbarium locality description. As mentioned earlier, the maps in the atlas were drawn, based on the herbaria specimen. In order to test this type of coordinate uncertainty, the occurrence dataset from the Moscow University Herbarium (MW) was used as the reference (Seregin 2021). A total of 1500 random occurrences from the Asian part of Russia were taken from MW herabrium and analysed. Generally, the coordinate uncertainty for all analysed occurrences ranges from 0.1 to 60 km. All the data were divided in three random groups by 500 occurrences. The mean coordinate uncertainty for each group equals to 3.86, 5.66 and 2.96 km. Thus, the mean value amongst these three groups was close to 4 km. Based on this result, we established approximately 5 km as the coordinate uncertainty.  The example of Betula middendorffii distribution records being digitised out of the shapefile in QGIS 3.10.
The second type is the coordinate uncertainty of the drawn maps. The endemic plants of the northern Asia atlas includes four types of maps: a) for the entire northern Asia; b) for northern Asia from 120 to 170 E; c) for south Siberia from 75 to 120 E; d) for Far East including Kamchatka Peninsula, Sakhalin Island and Kuril Islands. The coordinate uncertainty of distribution records on each type of the map varies due to its scale. The calculation of coordinate uncertainty of the drawn maps was performed by measuring the distance between species distribution records and the closest river drainage in QGIS 3.10. River drainage was crucial in Soviet botanical mapping as it was used as the reliable feature for species occurrence location. In order to calculate the average coordinate uncertainty, each distance was summarised and divided by the number of measurements. Thus, a) 30 km is the uncertainty for distribution records of the entire northern Asia maps; b) 25 km for the northern Asia maps from 120 to 170 E; c) 20 km for the south Siberia maps from 75 to 120 E; d) 15 km for the Far East maps including Kamchatka Peninsula, Sakhalin Island and Kuril Islands.
The third type is the coordinate uncertainty of the map digitalisation in QGIS 3.10. To test the coordinate uncertainty of the map digitalisation, three experts independently performed it on their computers for each of four types of the maps. As a result, the coordinate uncertainty was less than 5 km in all cases in all types of the maps by three experts.
The final coordinate uncertainty was calculated by summariing all three above-mentioned uncertainties for four types of maps.

Taxonomic coverage
Description: In total, the dataset includes 231 species with 13709 distribution records from 31 families and 106 genera. The top 10 families hold 64% (8783 records) of the total number of endemic alpine species distribution records (Table 2). Additionally, a number of species distribution records were compiled for each species (