Taxonomic diversity and abundance of enchytraeids (Annelida, Clitellata, Enchytraeida) in the Northern Palaearctic. 1. Asian part

Abstract Background Enchytraeids, or potworms, are tiny oligochaetes that are distributed worldwide in many terrestrial, freshwater and marine ecosystems. Despite their key role in the functioning of ecosystems, the diversity and abundance of Enchytraeidae are rarely studied due to the laborious process of species identification. The present study addresses this gap and sheds some light on the distribution and abundance of enchytraeids in the lands of the Northern Palearctic. The provided dataset constitutes the latest and comprehensive field sampling of enchytraeid assemblages across the Asiatic part of the Northern Palearctic, encompassing an original set of soil samples systematically collected throughout the region from 2019 to 2022. New information The dataset includes occurrences from 131 georeferenced sites, encompassing 39 species and 7,074 records. This represents the first dataset providing species-specific information about the distribution and abundance of terrestrial enchytraeids across an extensive geographic area covering the Asian sector of the Northern Palaearctic. The compiled dataset is the key for exploring and understanding local and regional enchytraeid diversity. It may also serve as a valuable resource for monitoring and conserving the entire soil biodiversity.


Background
Enchytraeids, or potworms, are tiny oligochaetes that are distributed worldwide in many terrestrial, freshwater and marine ecosystems.Despite their key role in the functioning of ecosystems, the diversity and abundance of Enchytraeidae are rarely studied due to the laborious process of species identification.The present study addresses this gap and sheds some light on the distribution and abundance of enchytraeids in the lands of the ‡

Introduction
Enchytraeids, also known as potworms, are tiny, yet ecologically impactful components of biota living in soils and freshwater and marine sediments worldwide (Erséus et al. 2010, Rota andde Jong 2015).Despite their small size and especially where earthworms are scarce or absent, they play a vital role in terrestrial ecosystems by regulating many key processes like nutrient cycling and maintaining soil structure (Potapov et al. 2022).However, due to the highly laborious taxonomic identification that involves in vivo morphological evaluation, as well as the considerable lack of experienced staff worldwide, there is a dramatic shortage of studies devoted to understanding their temporal and spatial distribution at the species level (Römbke et al. 2017).This is especially true for the eastern part of the Northern Palearctic, which remains as one of the least studied areas in terms of enchytraeids (Nurminen 1982, Rota andde Jong 2015).Nevertheless, the situation is changing and there is a growing interest in the ecology and taxonomy of Enchytraeidae in this part of the world (Rota et al. 2018, John et al. 2019, Degtyarev et al. 2020).The objective of this data paper is to address this knowledge gap.To achieve this, we conducted a field survey of enchytraeid fauna and population across various biomes within the Northern Palearctic between 2019 and 2022.Due to the extensive geographical extent and the significant amount of material that still requires identification, we have chosen to split the resulting dataset into two main parts: Asian and European.The dataset dedicated to the European part will be submitted to the same journal in the near future (Saifutdinov et al., in prep.).

General description
Purpose: The purpose of the data paper is to depict the distribution and abundance of enchytraeids in the Northern Palearctic Region, particularly in its Asiatic part.

Project description
Study area description: The area under investigation is the Asian part of the Northern Palearctic, encompassing a diverse range of biome types, starting from Siberian tundra in the far north to temperate forests and deserts in the south (Binney et al. 2017).We limit the research area to the Ural Mountains in the west, Uzbekistan and Mongolia in the south.The territory of China was excluded due to organisational reasons.In total, we examined 131 sites located within various biomes as classified by the WWF (Olson et al. 2001), including: (1) tundra, (2) boreal forests, (3) temperate coniferous forests, (4) temperate broadleaf and mixed forests, (5) temperate grasslands, savannahs and shrublands, (6) flooded grasslands and savannahs and (7) deserts and xeric shrublands.In each of the biomes, we collected from a different number of sites due to logistical constraints and various extraction capacities.Comprehensive information about each site is given in Table 1.

Sampling description:
The material for the dataset was collected between 2019 and 2022.
We selected sampling sites in areas that were not heavily disturbed by human activity.In arid regions, we chose the most humid (but not flooded) spots.The sampling protocol was developed in compliance with widely recognised methods (Ghilarov 1975, Coleman et al. 2004).At each site, we collected a random selection of 1 to 7 soil monoliths.Detailed information on number of soil monoliths collected from each site can be found in the "samplingEffort" column within the GBIF dataset (Degtyarev et al. 2023).These soil monoliths were taken using a 5-cm-diameter steel corer down to a depth of 10 cm.After collection, the soil was carefully placed into plastic bags and transported to the laboratory at the A. N. Severtsov Institute of Ecology and Evolution, Russian Academy of Sciences, Moscow.Subsequently, the soil samples were stored in a refrigerator at +4°C until extraction.Enchytraeids were extracted from the soil using the wet funnel technique as described by Didden et al. (1995).It is commonly known as Graefe's method and is a somewhat simplified modification of O'Connor's method (O 'Connor 1962).Graefe's innovation is the rejection of artificial heating of the surface of the soil sample.Otherwise, it is not significantly different in efficiency from the O'Connor method (Kobetičová and Schlaghamerský 2003).We placed a sieve in each funnel and a soil monolith in each sieve.Then tap water was poured into the funnel so that the soil monolith was completely covered.A test tube was attached to each funnel and placed in a container with room temperature water.This precaution aimed to prevent potential overheating of the extracted enchytraeids, considering the possibility of random and sudden temperature fluctuations in the extraction room.Extraction was carried out for 16 to 24 hours, after which the tubes were detached from the funnels and the contents of the tubes were poured into Petri dishes.

Quality control:
The samples were collected by a number of soil zoologists and ecologists from the A.N. Severtsov Institute of Ecology and Evolution, Russian Academy of Sciences, Moscow and trained volunteers.In total, 39 different enchytraeid species were collected.Given the variance in the number of soil monoliths across sites, the dataset includes abundance expressed as individuals per square metre.Enchytraeid species were identified in vivo immediately after the extraction procedure, according to Schmelz and Collado (2010).For species not included in this guide or described later, comparisons with original descriptions were used.We also employed molecular analysis to verify the species Fridericia bulboides and Mesenchytraeus gigachaetus.Detailed information on the methods used for molecular analysis is available in Degtyarev et al. (2020).
Some of the species we have found exhibit distinct morphological differences from all known enchytraeid species.We are confident that these species have not yet been described in literature.A comprehensive description of these species will be possible once more data have been collected.Therefore, we have decided to refer to them as Fridericia sp. 1, Enchytraeus sp. 1, Henlea sp. 1 and Henlea sp. 2 for now.Henlea sp. 1 and Henlea sp. 2 are large Henlea worms, both with unusually robust spermathecae.Fridericia sp. 1 is a medium-sized Fridericia species from mountainous Uzbekistan.Enchytraeus sp. 1 is possibly an obligate parthenogenetic species from the E. buchholzi group, characterised by underdeveloped male copulatory organs.
The taxonomy follows the WoRMS database (Timm and Erséus 2023).Scientific names were checked using the GBIF species matching tool.Subsequently, the identified enchytraeids were used for further molecular analyses (COI and/or H3 genes).As such, all instances of enchytraeid occurrences within the studied sites were recorded as dwc:basisOfRecord = "HumanObservation".Juvenile specimens were identified at the genus level.The identification of all enchytraeids was conducted by Maxim Degtyarev.
Step description: 1) The selection of study sites was driven by the intention to locate undisturbed areas displaying minimal or no signs of human activity.
2) Site sampling was carried out at a distance of no less than 100 m from the borders of selected zonal sites within one of the seven biome types according to WWF (Olson et al. 2001): tundra, boreal forests, temperate coniferous forests, temperate broadleaf and mixed forests, temperate grasslands, savannahs and shrublands, flooded grasslands and savannahs, as well as desert and xeric shrublands.
3) At each site, soil monoliths were collected using a steel corer with a diameter of 5 cm, reaching a depth of 10 cm.
4) The transportation of soil monoliths was conducted in isothermic containers to prevent soil overheating, which could lead to the mortality of organisms present.
5) Enchytraeids were extracted from the soil using the wet funnel method as described by Didden et al. (1995).
6) Following the extraction process, enchytraeids were identified in vivo to the species level using an Olympus BX-43 microscope.Subsequently, they were preserved in 96% alcohol for further molecular and isotopic analyses.The geographical references were obtained by recording the coordinates of the sampling sites using a mobile phone and the Organic Maps app (Organic Maps OÜ 2023).The measurement error of the coordinates was approximately 25 m.The WGS84 coordinate system was used for all records.

Taxonomic coverage
Description: Across the 131 sites studied within seven biomes in the Asiatic part of the Northern Palaearctic, we identified a total of 39 species belonging to 16 genera.The highest species richness was recorded in boreal forests (34 species in total, see Table 2).
Temperate broadleaf and mixed forests, as well as grasslands and shrublands, hosted approximately 20 species each.In the tundra biome, we found 16 species.The number of species in temperate coniferous forests, flooded grasslands and savannahs ranged from 10 to 11.The lowest species richness was observed in xeric shrublands and deserts (Table 2).The average species richness of enchytraeids varied between four species per site in flooded grasslands and savannahs and 0.25 species per site in deserts and xeric shrublands.The same trends were also observed in the case of the average abundance of enchytraeids (see Table 2).
Average species richness per sampling site within a biome (m ± SE), total species richness and average abundance (indiv.per square metre ± SE) of enchytraeids in the studied biomes of the Asiatic part of the Northern Palaearctic.The numbers in brackets adjacent to specific biomes indicate the number of true replicates.Unidentified enchytraeids were excluded when counting the number of species.Specimens identified only at the genus level (Genus sp.) were included in the analysis as unique species, while juvenile specimens were only included in the counts if species from the same genus were absent at the site.The classification of biomes is given according to Olson et al. (2001).

Column label Column description
stateProvince (Event core) The name of the next smaller administrative region than country (state, province, canton, department, region etc.), in which the dcterms:Location occurs.For sampling events in Russia, this records the federal subject (republic, krai, oblast etc.) where the sample was collected.For sampling events in other countries, this records administrative regions according to Database of Global Administrative Areas.
habitat (Event core) This variable provides the biome classification assigned to the sampling location, based on the habitat typing system defined by the World Wildlife Fund (WWF).For additional information about the WWF biome classification system, please refer to Olson et al. (2001).
type (Event core) The nature or genre of the resource.Constant value -event.
occurenceID (Occurrence extension) Each occurrence is assigned a unique identifier constructed from the sampling date, country code, region abbreviation for Russia or full name for other countries, sampling site number and occurrence number at that site.For example, the identifier "12-07-2019-RU-SL-5-14" corresponds to the 14 occurrence recorded on 12 July 2019 at sampling site #5 in Sakhalin Oblast, Russia.

basisOfRecord (Occurrence extension)
This field contains a constant value indicating the record type.All occurrences have the value "Human observation" because organisms were identified in vivo and then used for further molecular and isotopic analyses after collection.

recordedBy (Occurrence extension)
The person, group or organisation responsible for originally recording the occurrence data.For example: "Korobushkin D | Saifutdinov R".

identifiedBy (Occurrence extension)
The person, group or organisation responsible for identification.For all records in this dataset, organisms were identified by Maxim Degtyarev.

organismQuantity (Occurrence extension)
A number or enumeration value for the quantity of dwc:Organisms.

organismQuantityType (Occurrence extension)
The type of quantification system used for the quantity of dwc:Organisms.

occurenceStatus (Occurrence extension)
A statement about the presence or absence of a dwc:Taxon at a dcterms:Location.

taxonRemarks (Occurrence extension)
Freeform remarks entered relevant to the taxonomy and characterisation of the documented species or taxon.Example: "Henlea cf.nasuta".

scientificName (Occurrence extension)
The full scientific name, with authorship and date information, if known.
kingdom (Occurrence extension) The full scientific name of the kingdom in which the dwc:Taxon is classified.

phylum (Occurrence extension)
The full scientific name of the phylum or division in which the dwc:Taxon is classified.

class (Occurrence extension)
The full scientific name of the class in which the dwc:Taxon is classified. th

Description:
The research region was located in the Asian part of the Northern Palearctic, from the Ural Mountains in the west to the Pacific coast in the Russian Far East (Fig. 1).It included biomes in West and East Siberian Russia, Kazakhstan, Mongolia, Uzbekistan and the Russian Far East.This extensive geographic area consists of diverse habitat types, including tundra, taiga, steppe, boreal forest and mountain ranges.Spanning a large latitudinal gradient, the region contains hot desert (BWh), cold desert (BWk), hot semi-arid (BSh) and cold semi-arid (BSk) climates in the south, transitioning to humid continental (Dfa) and warm summer continental (Dfb) climates in the mid-latitudes and subarctic (Dfc) and tundra (ET) climates in the far north near the Arctic, according to the Köppen-Geiger climate classification (Beck et al. 2018).

Figure 1 .
Figure 1.Enchytraeidae sampling locations in the Asian part of the Northern Palaearctic.The map was created using QGIS 3.32.2-Lima software (QGIS.org2023).

Table 1 .
Locations, habitat information and number of recorded enchytraeid species for sampling sites in the Asian Northern Palaearctic Region.Taxonomic diversity and abundance of enchytraeids (Annelida, Clitellata, ...

Biome Average No of Species Total No of Species Average abundance
The dataset includes two related tables linked by the eventID column -Sampling Events and Associated Occurrences.The Sampling Events table consists of 131 events.The Associated Occurrences table consists of 7,074 occurrences (