MHA Herbarium: Eastern European collections of vascular plants

Abstract Background World herbaria with 387.5M specimens (Thiers 2019) are being rapidly digitised. At least 79.9M plant specimens (20.6%) are already databased throughout the globe in the standard form of GBIF-mediated data. The contribution of smaller herbaria has been steadily growing over the last few years due to cost reduction, usage of platforms and solutions developed by the leaders. A web-resource the Moscow Digital Herbarium (Seregin 2020b) was launched by the Lomonosov Moscow State University in October, 2016 for publication of specimens imaged and databased in the Moscow University Herbarium (MW). As of 31 December 2018, the web-portal included 968,031 images of 971,732 specimens digitised in MW. This dataset is available in GBIF (Seregin 2020). The global trend is largely the same in Russia, where a dozen herbaria started to scan their holdings after imaging of the nation’s second largest herbarium (Kislov et al. 2017, Kovtonyuk et al. 2019, Seregin 2020a). In 2019, we started to use Moscow Digital Herbarium as a web-repository for digitised herbarium specimens from some Russian collections, starting with the Herbarium of Tsitsin Main Botanical Gaden, Russian Academy of Sciences (MHA). Due to this, a single-university system became a multi-institutional consortium in April 2019 (Seregin 2020a). The dataset of the Moscow collections and partly of the Eastern European collections of the MHA Herbarium is now available in GBIF (Seregin and Stepanova 2020). New information MHA Herbarium imaged 64,008 specimens from Moscow Region and partly from other regions of Eastern Europe at 600 dpi and provided key metadata. These data are now fully available in the Moscow Digital Herbarium and GBIF. Complete georeferencing of the specimens from the City of Moscow was a key task in 2020. As of May 2020, 50,324 specimens, including 49,732 specimens from Russia, have been georeferenced (78.6%) and 39,448 specimens have fully-captured label transcriptions (61.6%). Based on these data, we give a detailed overview of the collections including spatial, temporal and taxonomic description of the dataset.


Introduction
The official name of the collection is the Skvortsov Herbarium of the Main Botanical Garden, Russian Academy of Sciences (acronym MHA). In 2020, the Herbarium was named after the well-known Russian botanist Alexey Konstantinovich Skvortsov (1920Skvortsov ( -2008, who was the scientific supervisor of the MHA Herbarium for 36 years. The Herbarium was launched soon after the founding of the Main Botanical Garden of the Academy of Sciences of the USSR in 1945. Initially, some minor collections of dry plants were stored in workrooms of the staff. In 1958, the Herbarium received a hall of 280 m in the newly-constructed main lab building. A group headed by V.N. Voroshilov formed the herbarium staff. Upon formal establishment, the MHA Herbarium received an almost complete set of exsiccates "Herbarium of the Flora of the USSR" from the Komarov Botanical Institute (Leningrad) and all botanical collections from the Timiryazev Institute of Plant Physiology (Moscow), including duplicates of important Moscow collections by D.P. Syreyshchikov, the first curator of the Moscow University Herbarium. These initial holdings were supplemented by the collections from Voronezh and Moscow Oblasts by V.N. Voroshilov, B.M. Kulkov and V.A. Shtamm (Stepanova et al. 2020).
In 1966, A.K. Skvortsov became the scientific supervisor of the MHA Herbarium. The main vectors of the Herbarium development were formed in this time: "Our collections should provide: 1. orientation in the flora as a source of the material for introduction; 2.
documentation of the introduction activities.
The location of the herbarium in the center of European Russia obliges us to create a regional herbarium" (Skvortsov and Proskuryakova 1973). Skvortsov formed the main sections of the Herbarium-the Russian Far East, Siberia, Middle Asia, the Caucasus, the Moscow Region, the European part (European Russia and adjacent republics of the former USSR), the Crimea; General Herbarium (foreign countries); Herbarium of Introduction; Dendrological Herbarium; type collection; Skvortsov's personal herbarium (taxonomic collections of Salix, Populus, Betula, Epilobium, as well as materials on the flora of Middle Russia and Lower Volga).
Some Russian-language references describe the main milestones in the history of the MHA Herbarium (Skvortsov and Proskuryakova 1973, Skvortsov 1977, Belyanina and Makarov 1994, Skvortsov and Belyanina 2005, Ignatov et al. 2010, Ignatov 2015, Stepanova et al. 2020. As of January 2020, the MHA Herbarium holds 615,223 specimens of vascular plants and ca. 70,000 specimens of bryophytes. The general structure of the MHA Herbarium is given in The Herbarium of vascular plants is located in two halls (334 m ) in the main lab building of the Garden. Duplicates and unmounted backlog are stored in several rooms (120 m ) at Botanicheskaya Street, 33-4 within a ten minute walk from the main building. The

Sampling methods
Step description: To schedule and perform the digitisation of the MHA Herbarium, we used five key stages by Nelson et al. (2012): • pre-digitisation curation and staging, • specimen image capture, • specimen image processing, • electronic data capture, • georeferencing specimen data.

Pre-digitisation curation and staging
The section curator reviews all incoming physical accessions for meeting the basic requirements of the herbarium specimen. A specimen should be a high-quality dried plant (or several individuals) with a label bearing identification, collection site, habitat, collection date and collector. After that, unmounted new material is frozen at a temperature of -30°C for 14 days as a quarantine procedure against specific herbarium pests and then mounted. New collections are counted (and listed in the collection journal) right after mounting. Sorting and incorporation of new material takes place once a year, usually in the autumnwinter period. Right before imaging, pre-ordered self-adhesive barcodes with an acronym and a seven-digit number (e.g. MHA 0 002 094) were attached to the herbarium sheet.
Eastern European section. In December 2017, with the purchase of a specialised scanner Microtek ObjectScan 1600, we began the imaging of vascular plants in the MHA Herbarium. Since the specimens from European Russia and adjacent states constitute the largest and most used section of the Herbarium, we decided to start imaging from this section. If there were two or more taxa on a single sheet, they were remounted on separate sheets.

Specimen image capture
Specimens were imaged in accordance with international standards with a resolution of 600 dpi and a colour checker (24 colours). After scanning, each image was automatically renamed according to the barcode served as an unique identifier. In total, 14,274 specimens of the Eastern European section were digitised in 2017-2018. Imaged Eastern European collections at that time were stored on external discs without online access.
The Moscow section was scanned more intensively under the time limit from March to October 2019. Every day, two to three operators worked on the single scanner in shifts. For each shift lasting four to five hours, 140-160 specimens were digitised. Thus, 300-400 specimens were imaged per day per scanner. In total, the herbarium team imaged 49,621 specimens within eight months and completed the mission.
During the imaging, we encountered a number of minor issues: • Some specimens have large plants covering partly or fully the label text. The specimens were imaged as they are, whereas the labels will be captured not from the image, but from the physical specimen later. • Sometimes two different species were mounted on a single sheet. In such cases, if possible, the specimens were remounted on to two sheets. If the remounting was impossible or impractical, the single sheet was scanned, but the image was duplicated and each file was assigned an additional digit ("-1" or "-2") to facilitate unique identifiers for each species. • Labels of a larger size widely used in the exsiccates "Herbarium of the flora of the USSR" were often folded during mounting. We tried to remount such labels to make text fully available on images, but in some cases, the label partly covered the plant. • In some cases, two or more parts of the same large plant were mounted on several sheets bearing a single label and further notes like "sheet #2", "sheet #3" etc. These sheets were initially inserted into the cupboards after being fastened with a removable paper clip. However, they have been mixed over time with other specimens, so now it is impossible to trace the correct label for these multiple "sheets #2".

Specimen image processing
While scanning, the operator started a new directory for every species and named it against a folder name. Before uploading the images into the Moscow Digital Herbarium, the structure of the directories was converted into a table of metadata. Thus, for each accession, the initial metadata included ID (barcode identifier), taxon name from folder without taxonomic authors and the geographic code of the area.
The taxon name, according to the protocols of the Moscow Digital Herbarium, was automatically matched with the latest version of the Catalogue of Life (CoL), from which the complete accepted name, synonymy and hierarchical list of supraspecific taxa were downloaded for every entry.
Publication of images with brief metadata is a powerful tool for rapid online access to the scanned herbarium collections. This approach was largely used in Paris where the largest herbarium of the world was imaged and published online (Le Bras et al. 2017). Similar protocols were adopted in Edinburgh and Moscow University (Haston et al. 2012, Seregin 2016, Seregin 2018.
After online publication of the Moscow Region specimens in the Moscow Digital Herbarium and GBIF, other sections of the MHA Herbarium will undergo the same procedure. Thus, to date, 64008 images of specimens of vascular plants from the MHA Herbarium are available online.

Electronic data capture
After online publication of the images and associated brief metadata, we link the records with existing full-label data capture of 7,087 specimens of the Moscow section ( Thus, the minimum obligatory set of metadata available for all digitised specimens of the MHA Herbarium in this dataset include barcode ID, complete taxonomic information, collection date, the first collector, curatorial area and geographical coordinates (if available on the label). Additionally, 18,803 specimens had full-text inscriptions of labels (29.4%) due to earlier efforts.
Further full-text data capture was carried out by the operators of the Moscow Digital Herbarium for specimens collected within the City of Moscow (15,982 specimens). An operator entered the label data from the scanned image into an Microsoft Excel spreadsheet with 30 standard fields (including some pre-filled ones to avoid mistakes). Additionally, a commercial partner under the GBIF contract (2019) made full-text transcriptions of 4,617 specimens from the City of Moscow and Moscow Oblast.
After data entry, the scientific supervisor of the Moscow Digital Herbarium checked the spreadsheets for technical issues by a set of automatic, semi-automatic and manual operations. The IT-team, using the data migrator programme, then converted data from the Excel spreadsheet to the PostgreSQL database of the Moscow Digital Herbarium for further data storing and retrieving. This stage also includes some automatic checks of data consistency.
As of May 2020, the full text of labels has been entered for 39,448 specimens of the MHA Herbarium (61.6% of the imaged ones)-27,783 specimens of the Moscow section and 11,665 specimens of the Eastern European section. Full-text label transcriptions help to optimise the further georeferencing by combining labels with identical text into groups.

Georeferencing specimen data
The operators of the Moscow Digital Herbarium and the Garden employees carried out manual georeferencing with further implementation of the ISTRA system (Intellectual System of Toponymic Reading and Attribution), several lines of the code being written in JAVA. This code is integrated into the Moscow Digital Herbarium and unavailable as a stand-alone product.
The first algorithm of the ISTRA system combines the specimens into the groups according to the matching of the captured label text. In this case, there are two options for combining -complete matching mode and letters-only mode. The results do not differ in accuracy from the manual georeferencing. The second algorithm of the ISTRA system forms the specimen groups according to the matching of three fields-collection date, collector's surname and curatorial area. Within the walking-day route, the standard georeferencing accuracy in most cases does not exceed 5 km. Further data refinement will help us to replace automatic georeferencing with the more accurate manual one.
In both cases, the operator inserts the coordinates manually and the system sets the coordinates automatically for all specimens of the group. The first algorithm takes precedence over the second one. In all cases, we save the log file and note the georeferencing method in the form of the standard disclaimers: • captured from the label; • set manually by the operator; • set automatically by matching of the label text; • set automatically by matching of the collection date and collector.
Manual georeferencing is carried out using standard e-cartographic libraries (Yandex.Maps, Google Maps, Wikimapia, SAS.Planet etc.) for modern specimens, whereas historic collections are georeferenced using the libraries of scanned maps (etome sto.ru, loadmap.net etc.) following the principle "collection date = map date". Coordinate precision (rounded to 100 m) is set and stored for each manual georeferenced point.
Complete georeferencing of the specimens from the City of Moscow was a key task in 2020 for the Moscow University team (according to the Moscow project), whereas employees of the MHA Herbarium georeferenced specimens from Moscow Oblast and Eastern Europe (starting with the most prolific collectors). In total, 50,324 specimen have been georeferenced (74%), including 49,732 specimens from Russia.
For 7,414 specimens, the coordinates were taken from the label (14.7% of the number of georeferenced ones), for 10,849 specimens (21.6%), they were set manually and for 32,061 specimens (63.7%), they were calculated automatically using the ISTRA system.

Geographic coverage
Description: The Eastern European section of the MHA Herbarium has its focus on European Russia (Table 2). The most sampled areas are the Moscow Region, Lower Volga, Central and Central Forest-steppe Regions (Table 3), the areas intensively studied by the Garden staff. Initially, these Regions were inextricably linked with the activities of A.K. Skvortsov, whose fruitful initial collections often formed a solid basis for the later extensive floristic research.

Rank
Country Estimated total number of specimens  Collections of the MHA Herbarium from Eastern Europe by country. Moscow Region forms its own section in the MHA Herbarium. This is due to the location of the Garden in the City of Moscow. One of the initial missions of the Herbarium was precise documentation of the local flora, including long-term observations of both native and alien plants. Based on these materials, standard flora and checklists were published by the Garden staff in collaboration with Moscow University (Voroshilov et al. 1966, Ignatov et al. 1990, Mayorov et al. 2012, Mayorov et al. 2020).
The Lower Volga Region was one of the focus areas for A.K. Skvortsov, his graduate students and the Herbarium employees. This activity resulted in the published volumes of the "Flora of the Lower Volga" (Skvortsov 2006, Reshetnikova 2018. The Central Region is also well-represented in the collection due to another long-term floristic interest of A.K. Skvortsov, which was later continued by the Herbarium staff who published "Kaluga Flora" (Reshentnikova et al. 2010). Belgorod Oblast in the Central Forest-steppe Region is a new area of the research headed by N.M. Reshetnikova. Other Regions are less represented and are not as complete and thorough. Usually, these are either collections from various field trips of the Garden staff or gifts.
Geographical coordinates for the dataset frame are given below.

Taxonomic coverage
Description: The dataset covers vascular plants of Eastern Europe, both native and alien species. There are also some specimens of cultivated plants, especially from the Moscow Region. The Moscow section is completely digitised and can provide figures on the taxonomic representation of the MHA Herbarium collections, whereas 14% of the the Eastern European section has been digitised and, therefore, information on its taxonomic composition only shows which families have been digitised so far.
Until 2017, the taxonomic backbone of the MHA Herbarium was the standard checklist by Czerepanov (1995). In some cases, the section curators could deviate from this source depending on their taxonomic expertise. For taxonomic coverage across the leading taxa, see Table 4, Table 5 and   Table 5.
Top genera of the Moscow section, MHA Herbarium. Eastern European section. In this section, 14% of the collections have been digitised so far, therefore Table 7, Table 8 and Table 9 show not the taxonomic diversity of the section, but an overview of imaged specimens. The specimens were scanned one by one following the order of the physical collection-pteridophytes, gymnosperms and angiosperms following Engler's system against the standard catalogue ( Table 7. Top families of the Eastern European section, MHA Herbarium (digitised specimens only).  Table 9.
Top species of the Eastern European section, MHA Herbarium (digitised specimens only).  In recent decades, collections came mainly from four Regions-Rostov Oblast, Lower Volga Region, Central Region (mainly Kaluga Oblast) and Central Forest-Steppe Region (mainly Belgorod Oblast). These are the places of the fieldwork of the current Herbarium employees, as well as the above-mentioned expeditions across the Lower Volga Region of the 1990s. On the contrary, there have been no significant accessions from Ukraine, Belarus, Moldova, Estonia, Latvia, as well as some regions of European Russia in recent decades.

Moscow section.
In this section, 49,550 specimens have collection date after 1890. Their temporal distribution over decades is given in Fig. 1.   Shcherbakov (65). In this period, intensive studies of the alien flora of the Moscow Region resulted in the checklist by Ignatov et al. (1990). This paper became the foundation for the further study of invasive plants around Moscow (Mayorov et al. 2012, Mayorov et al. 2020).

Eastern European section
2.1. Lower Volga Region. At the moment, 3,244 specimens from this Region have been digitised so far. We assume that the total volume of collections from the Lower Volga is 23,170+ specimens. Figures given below are based on 3,145 digitised specimens (14% of the collection volume) having a collection date after 1890. Their temporal distribution over decades is given in Fig. 2.
Notable collections from the Lower Volga began to arrive in the mid-1970s, but the peak of the major accessions stretched over the 1980s and 1990s. Especially large collections were made in 1982, 1986, 1989-1990 and 1993-1994. In   Fig. 3.
Accessions from the Central Region have two peaks-in the 1970s (especially 1971, 1974) and in 2000-2010s (especially 2003, 2007, 2014). In the 1970s, the main collections were received from V.V. Makarov (820 estimated number of specimens/115 digitised) and A.K. Skvortsov (290/41) Shmytov (320/45). The most sampled area is Kaluga Oblast which has been intensively studied by the Herbarium staff. This resulted in the publication of the standard regional flora (Reshentnikova et al. 2010).
The collections from the Central Forest-steppe Region have two peaks-in the 1960s (especially 1966, 1968) and in the 2000s (especially 2006-2008). In the 1960s, the main collections were acquired from V.V. Makarov (1,510 estimated number of specimens/211 Accessions from the Northern Region are distributed more evenly across decades. Two peaks can be noted-in the 1950s-1960s (especially 1951, 1966)

Additional information
Collectors 1. Moscow section. Full list of collectors consists of 823 surnames, including 127 people who collected more than 10 specimens. The list of top collectors of the Moscow section is given in Table 11, supplemented by the portrait galleries ( Fig. 4 and Fig. 5).    The basis of the Moscow section was formed by ca. 2,000 specimens from D.P. Syreyshchikov and ca. 700 specimens from P.A. Smirnov, collected in 1920s and received from the Timiryazev Institute of Plant Physiology (Moscow).
In 1970

Eastern European section.
The collection consists mainly of specimens which the Garden staff collected during field trips since the 1950s. Initially, herbarium vouchers accompanied living plants and seeds collected in the wild for the displays of the Garden. This documentation activity was later supplemented by extensive floristic and taxonomic studies, conservation research and monitoring of alien species.

Lower Volga Region.
The flora of the southeast of European Russia is the most fully represented regional flora of the Eastern European section. The Region known as Lower Volga includes Volgograd, Astrakhan and Saratov Oblasts and the Republic of Kalmykia. This is a predominantly semi-arid steppe region. The list of collectors includes 136 surnames (see top-collectors in Table 12 and Fig. 6), but for 53 people, only a single specimen have been digitised so far. A vast amount of material helped to critically assess the current state of the flora of the southeast of European Russia and with the publication of two volumes of the "Flora of the Lower Volga" (Skvortsov 2006, Reshetnikova 2018. The third volume of the series is expected in near future.

Other areas.
The MHA Herbarium covers all regions of Eastern Europe within the former USSR with varying degree of completion. Table 13 shows the main collections from this territory, excluding Lower Volga. An additional gallery of top collectors is given in Fig. 7. Description and map of the curatorial areas used in the Moscow Digital Herbarium is available online (Seregin 2020b).