"Flora of Russia" on iNaturalist: a dataset

Abstract Background The "Flora of Russia" project on iNaturalist brought together professional scientists and amateur naturalists from all over the country. Over 10,000 people were involved in the data collection. New information Within 20 months, the participants accumulated 750,143 photo observations of 6,857 species of the Russian flora. This constitutes the largest dataset of open spatial data on the country’s biodiversity and a leading source of data on the current state of the national flora. About 87% of all project data, i.e. 652,285 observations, are available under free licences (CC0, CC-BY, CC-BY-NC) and can be freely used in scientific, educational and environmental activities.


Introduction
Since 2008, iNaturalist (https://www.inaturalist.org/) has been crowdsourcing biodiversity observations made by citizen scientists, as well as their taxonomic identifications. Hundreds of publications have already taken into account iNaturalist data for use in research, conservation and policy (e.g. Ocampo-Peñuela et al. 2016, Chandler et al. 2017, Heberling and Isaac 2018. There are three key themes that iNaturalist embraces: social interaction; shareability of data, tools and code; and scalability of the platform and community (Seltzer 2019).
The advent of large, technology-based resources allows ecologists and biologists to work at spatio-temporal scales previously unimaginable (White et al. 2015). With 50M observations accompanied by photo or audio evidence, the global iNaturalist dataset is one of the largest online collections of biodiversity data. It is partially represented in the GBIF, with the exclusion of observations which remain unidentified or have unconfirmed or missing licence information. Nonetheless, the GBIF export tools provide excellent data usability and the resulting exports come with a DOI which one can use for citation in publications. The GBIF data usage counter shows that iNaturalist GBIF-mediated data gained 781 citations (as of 11 Sep 2020) making it one of the most commonly-used datasets amongst the GBIF (Ueda 2020).
Many research papers focus on the employment of iNaturalist data as a primary source (Heberling and Isaac 2018, Seregin et al. 2020). For instance, iNaturalist includes dozens of metadata fields for every observation and was employed as a case study in the theory of long-tailed datasets (Cui et al. 2019). Observations from the iNaturalist Challenge at FGVC 2017 with links to 675,000 licensed images of 5,089 species have been widely used in computer vision training Zheng et al. 2019). iNaturalist observations and images have been be employed as a data source in classical taxonomy of tracheophytes (Svoboda and Harris 2018), studies of the distribution of gecko clones (Lapwong and Juthong 2018), plant phenology (Barve et al. 2020) and fish infections on a continental scale (Happel 2019). Moreover, Skejo et al. (2020) recently published a description of a new species, based on photos from iNaturalist in addition to scarce museum material. The platform has been suggested as a suitable agent for storage of photo vouchers associated with museum specimens (Heberling and Isaac 2018).
Biodiversity documentation, by the means of aggregation of individual observations, is the main goal of iNaturalist. Consistent with this are the many examples of papers dealing with new noteworthy records of either alien (Vendetti et al. 2018;Hiller and Haelewaters 2019;Liebgold 2019) or native organisms (Rosenberg 2018; Schuette et al. 2018) made by amateur naturalists. Further accumulation of data made possible precise documentation of alien species distribution on a nationwide scale (Ciceoi et al. 2017), their expansion process (Oficialdegui et al. 2020), routine monitoring of invasive species (Larson et al. 2020), documentation of at-risk species beyond the boundaries of protected areas (Young et al. 2019) and a global assessment of species' extinction risk with the inclusion of citizen science data (Gardiner and Bachman 2016). Spatial data from iNaturalist have been employed in studies of bird collisions with windows (Winton et al. 2018), global snakebite mortality (Longbottom et al. 2018) and the search for environmental triggers in orchids (Lori et al. 2018).
It has recently been shown that iNaturalist serves as a tool indispensable for avoiding biases in urban biodiversity data (Li et al. 2019), for making decisions related to the urban management of red foxes and coyotes (Mueller et al. 2019) and for testing urban biotic homogenisation with the use of data generated by the participants of the City Nature Challenge (Leong and Trautwein 2019). There are positive examples of iNaturalist usage in data accumulation by researchers ( Ocampo-Peñuela et al. 2016), as well as the citizen community helping scientists with a supply of data ). In addition, there are examples of iNaturalist usage during university courses of classical zoology and botany together with standard field guides and keys (Unger et al. 2020).
The iNaturalist dataset at various taxonomic and/or geographical extents has been checked for completeness of data against complete literature data (Goldstein et al. 2018), expert-based range maps (Fourcade 2016), museum collections (Spear et al. 2017) and available inventories within protected areas (Jacobs and Zipf 2017). Vahidi et al. (2017) performed a general quality assessment of iNaturalist data which made possible the revealing of the majority of attribute and positional errors amongst the crowd-sourced biodiversity observations. Borzée et al. (2019) published a case study on cross-verification of iNaturalist observations against published georeferenced molecular data, whereas Maritz and Maritz (2020) compared Facebook versus iNaturalist as data sources in the assessment of trophic interactions. Prudic et al. (2018) verified the completeness of iNaturalist data with various field techniques of butterfly data collection.
The project "Flora of Russia", which includes all verified ("research-grade") observations of vascular plants from the country, was launched by the Moscow University team on 9 Jan 2019 to support data collection for the "Atlas of the Russian flora" (Seregin et al. 2020). During the first 20 months, the number of identified and verified iNaturalist observations of vascular plants from Russia increased 68-fold and the number of involved users increased 10-fold. Here, we present the characteristics of the dataset as for 9 Sep 2020, soon after the project reached two notable milestones of 750,000 verified observations and 10,000 observers ( Fig. 1, Fig. 2 and Fig. 3). Fig. 3 shows both the number of observers and project members. Since the collection projects on iNaturalist are working as filters, all RG observations of vascular plants from Russia are covered by the project giving an impressive figure of 10K observers. As of 13 Sep 2020, 1,736 members of iNaturalist have formally joined the "Flora of Russia" project by pressing the "Join" button. As a result, they clearly affiliate their data with the project by an automatically-generated logo on every observation page and receive notifications on project updates and journal posts. Those observers who are not members of the project still get benefits in the form of identifications, because experts are inspecting all observations available on iNaturalist. Dynamics of identified and verified ("research-grade") observations of the "Flora of Russia" project since the inception. Blue dots represent the research grade observations and red dots correspond to unverified observations from the project's backlog. About 11K observations were deleted from iNaturalist by a single "mega-observer" on 25 Feb 2020. The "Flora of Russia" project species number dynamics since the inception on 9 Jan 2019. From 31 Jul 2020, the number of species stabilised due to ongoing expert data cleaning activity.

General description
Purpose: For a number of years, Russian professional and amateur biologists were using Internet-based national networking systems of the georeferenced data collection for birds, invertebrates and plants. For instance, Plantarium is the most popular Russian-language resource for collecting plant and lichen photographs from around the world with emphasis on Russia and adjacent regions. However, unlike iNaturalist, it does not allow data export nor is these data included in the GBIF, since photos and other data lack licence indications. In addition, contributing observations to Plantarium requires more effort from the members.
After digitisation of the nation's second largest herbarium (Seregin 2018), the Moscow University team launched a public awareness campaign to support community-generated data collection for plants. We decided not to spend budget on our own crowd-sourcing system, but to use and promote the international iNaturalist platform as suitable for data collection in Russia with a number of efficient tools and a global community. Russia on iNaturalist: By late 2018, Russia was the 18th country on iNaturalist in terms of the number of verifiable observations (47,888). After 20 months of the project activity, we can see drastic changes in the biodiversity data coverage across Russia with a strong emphasis on tracheophytes.
Currently, Russia holds fifth place amongst countries represented on iNaturalist in terms of the number of verifiable observations of all groups of organisms and the third place by observations of vascular plants in particular (Table 1). Observers (blue dots) and members (red dots) of the "Flora of Russia" project since the inception on 9 Jan 2019.

Country
All Amongst the top ten countries, Russia has the highest proportion of tracheophyte observations of all uploaded to iNaturalist (62.2%). A community of birdwatchers is also quite active when compared to other top countries, whereas other groups of organisms are still lacking much attention (Table 2). Birds are the primary object of attention for at least eight co-authors of this paper, whereas three of us are focused on fungi. The top ten countries by the number of verifiable observations on iNaturalist (as of 5 Sep 2020). In Tables 1-6 of this section, Russia is presented within the borders on 1 Jan 2014 (so called "standard places" on iNaturalist), i.e. excluding the Republic of Crimea and the City of Sevastopol claimed by Ukraine. Russia has the highest proportion of vascular plants amongst identified and confirmed observations which are classified as "research grade" on iNaturalist (Table 3). Moreover, Russia is the leading country on iNaturalist amongst the top ten with regard to the proportion of confirmed observations amongst all tracheophyte records. As we showed in 2019 (Seregin et al. 2020), the number of unconfirmed plant observations in Russia usually rapidly increases from May to August and decreases from September to April, when experts most intensively work with the backlog of unprocessed observations.

Rank Country All groups Tracheophytes Proportion of tracheophytes in RG observations
Proportion of RG To facilitate further accumulation of the project's data into the GBIF, we ask our observers to specify open Creative Commons Licences, such as CC0 (http://creativecommons.org/ publicdomain/zero/1.0/), 2) CC-BY (http://creativecommons.org/licenses/by/4.0/) and Table 3.
Identified and verified ("research grade", RG) observations for the top ten countries on iNaturalist (as of 5 Sep 2020) 3) CC-BY-NC ( http://creativecommons.org/licenses/by-nc/4.0/) for their observations. We do this on a regular basis in the form of the project's journal posts available to every member of our community. As a result of this activity, 83.8% of all observations on iNaturalist from Russia (and as many as 85.3% in tracheophytes) are freely licensed, making Russia the leader in open-access biodiversity data on iNaturalist (Table 4). As a result of intense expert activity and the promotion of free licensing, 73.8% of tracheophyte records from Russia have become available in the GBIF (Table 5) which is the highest proportion amongst the leading countries in iNaturalist. Since 2020, the iNaturalist dataset has become the largest source of data on the Russian biodiversity available through the GBIF.  Table 4.

Rank Country
Verifiable observations with free licences (CC0, CC-BY & CC-BY-NC) for the top ten countries on iNaturalist (as of 5 Sep 2020) Table 5.
iNaturalist records available in the GBIF ("research-grade" observations with free licences) for the top ten countries on iNaturalist (as of 3 Sep 2020) (Ueda 2020) The number of observers with at least a single verifiable observation is not so high in Russia, equalling just 14K (Table 6). Nonetheless, the average productivity of the members of the community is extremely high. On average, 93 verifiable observations have been created by each observer across all groups of organisms, while, with regard to vascular plants, the number is 73 "research-grade" observations per observer, which makes the highest level of observer activity amongst the top ten countries on iNaturalist.
Russia is globally unique taking into account the active growth of data within the "Flora of Russia" project. Amongst the top ten countries on iNaturalist, Russia has achieved: 1. the highest proportion of tracheophytes amongst all observations; 2. the highest proportion of identified and verified ("research-grade") observations amongst tracheophytes; 3. the highest proportion of both free licences (CC0, CC-BY & CC-BY-NC) and GBIF records; 4. the highest number of observations per observer.

Project description
Title: "Flora of Russia" project on iNaturalist Personnel: As of 13 Sep 2020, 1,736 members of iNaturalist have joined the project (see also Fig. 3). The core of the project team is formed by 129 people, who are listed simultaneously amongst the top 200 identifiers and top 500 observers of the project, including 15 project members affiliated with the Lomonosov Moscow State University. Of the 129 members, 112 confirmed their formal contribution to this data paper (see the "Author contributions" section and the "Community coverage" section for additional information). Dr. Alexey P. Seregin is the founder and an administrator of the project. Study area description: The project covers the territory of the Russian Federation as defined by the national legislation, i.e. including the Republic of Crimea and the City of Sevastopol claimed by Ukraine.

Design description: Main features of iNaturalist as a data collection platform
Any user can register as an "observer" on iNaturalist. Users may upload observations of organisms through their account using the website https://www.inaturalist.org/ or the free mobile applications "iNaturalist" and "Seek". A total of 1.28M observers are involved in the work of the platform, including 14.3K observers with at least one observation from Russia (Table 6).
In order to meet the minimum requirements for further scientific use, an observation needs to have: (1) a date; (2) a georeference; (3) a photograph/series of photographs or (for animals) an audio recording(s) of the object's sounds, created by the observer; (4) the organism needs to be recorded in the wild. Provided that these requirements are fulfilled, the observation is marked as "needs ID", regardless of whether the author identified the organism or not. Once an observation receives identical identifications by more than two thirds of the iNaturalist users at the level of species (in some cases of genus), it becomes "research-grade", a category for verified observations. A supporting identification by a second user makes an observation "research-grade" while identification by a single user is Table 6.
Observers and their average productivity for the top ten countries on iNaturalist (as of 5 Sep 2020) not enough. Disagreeing identifications may once again exclude an observation from this category. Low-quality photos or photos of plants accurate identification of which requires a study of some micromorphological, anatomical or genetic traits usually do not reach the "research grade" or, in the latter cases, remain identified and verified only at the level of genus.
The observers may choose a licence allowing further re-use of the data. Observations licensed with three Creative Commons Licences (CC0, CC-BY, CC-BY-NC) and of "research-grade" quality are automatically exported to the GBIF. As of 5 Sep 2020, the iNaturalist database contained 48.6M observations that have met the minimum quality requirements (Table 1), of which 29.2M have achieved "research-grade" (Table 3). Unfortunately, only 19.7M observations have been exported to the GBIF due to copyright restrictions ( Table 5).
The implementation of artificial intelligence (AI) for identification is a key feature of iNaturalist . It gives the users a suggestion about the most similar species after analysing the photos ("visually similar") and taking into account the geographical distribution of other records ("seen nearby"). Initially, the portal's AI compared the newly-uploaded photos with a basic set of images, which, in 2017, comprised 859,000 photos of more than 5,000 species. The images of varying quality had been collected using different types of cameras, but their identifications have been doublechecked. Primary results showed that modern AI methods, at that time, gave an accurate identification for 67% of observations, which well illustrates the complexity of the dataset . In 2018, most images of plants and animals from any part of the world were likely to receive from the system an identification of species inhabiting North America. Over the course of 2019 and 2020, AI has almost stopped suggesting incorrect identifications for plants within European Russia (Seregin et al. 2020). It still works somewhat worse with photos from Asian Russia and the Caucasus. Millions of new photos reviewed by the expert community and constantly added to the library of standard images allow AI to improve the performance. Its capabilities are, however, still inferior to expert assessments with regard to certain groups of organisms or certain geographic territories. Nevertheless, the system's general awareness of the world flora is many times larger than that of an individual botanist. In many ways, this particular feature of iNaturalist attracts both amateurs and professionals. The success of iNaturalist has made possible the further use of AI for species recognition by photograph for millions of images in the GBIF database (Robertson et al. 2019).

Portal on the flora of Russia
To collect data on the plant distribution in the City of Moscow, the Moscow University team initially organised the "Flora of Moscow" project on iNaturalist on 29 Dec 2018. An immediate positive feedback from users and a surge of interest forced us to create 85 more regional projects with a uniform ideology in early January 2019 and organise them as part of the "Flora of Russia" umbrella project. Each regional portal automatically includes observations of vascular plants uploaded on iNaturalist which have achieved "research-grade" and found within the administrative boundaries of a specific federal territory. The home page of each regional portal displays its statistics and basic information.
The "Flora of Russia" homepage ( Fig. 4) includes a "scoreboard" with a ranking of regional projects (ordered by the number of observations, species and observers), basic statistics, a list of the latest observations, news from the project journal and a general map of all data. There are links leading to the project description, the project journal, the rankings of the top observers (ordered by the number of observations and species), top identifiers, most often recorded species and detailed statistical reports. Thus, both regional projects and the all-Russian portal are organised in the form of ranking tables, stimulating both individual and team activity of observers in accordance with the gamification paradigm (Bowser et al. 2013).
The experts (most of whom are the authors of this paper) review the unverified and unnamed observations to suggest the correct name which may either confirm or disprove the opinion of the observer. Typically, most clear photographs from European Russia and the Russian Far East are identified within a couple days after uploading.

Funding:
The project is functioning on a voluntary basis. Although being created in the Lomonosov Moscow State University, it does not have formal institutional funding. Members of the project search for their own budget for field trips and online activity. Some grants of the co-authors are acknowledged in this paper. The homepage of the "Flora of Russia" project on iNaturalist (Russian-language interface, statistics as of 15 Sep 2020).

Sampling methods
Sampling description: The standard procedure of sampling is described on iNaturalist in the form of 17 paragraphs in the "Observations" section of the help page (last revised 8 Sep 2020 by Sam Kieschnick).
Quality control: Data quality control is necessary for maintaining a high quality of records within a dataset. In the "Flora of Russia" project description, there is a well-structured, detailed and constantly improving section with recommendations for users in Russian. Apart from the general information (including short videos about iNaturalist and a description of available research tools of the portal), there are two particularly important sections, i.e. "Recommendations for new users" and "Recommendations for event curators". Both sections provide detailed instructions for the user on what, how and where to create a good-quality observation on iNaturalist. However, many users are not familiar with these guidelines. This imposes a certain responsibility on the identifiers and the project curators, who act as data stewards. The most important and/or frequently occurring issues are listed below.
For each project on iNauralist, at least one or two curators should be assigned to review the uploaded observations and make comments, if necessary. The most frequent mistakes are: 1. low-quality or wrong-angle photos, 2.
observations of cultivated plants without a relevant indication, 3.
either unintentional or intentional duplication of the same observation, 4.
unintentional merging of numerous observations into a single one, 5.
lack of date or location of an observation, 6.
lack of any original identification (at least a coarse one), 7.
upload of copyright media.
In some cases, an inaccurate location for an observation shows up automatically, caused by specific GPS settings on the smartphone or camera. We report these issues to observers for further manual correction or mark such observations as "location is not accurate". We highly recommend georeferencing using the "GPS only" mode instead of either "GPS plus mobile networks" and "mobile networks only". The latter two options may shift the observation's georeference to the nearest mobile tower instead of the actual observer's location. Additionally, all records with positional accuracy exceeding 50,000 m were marked as having inaccurate location on 25 Sep 2020 and reported to users in the project journal post. Suspicious positional accuracy of 0, 1 or 2 metres recorded in thousands of observations is an artifact set up automatically during the uploading of observations by the devices.
Another difficult and common problem is the separation of cultivated plants from garden escapes (naturalised or casual). Cultivated plants may be well recognisable and could reach "research grade" rapidly. We ask experts and project curators to double-check "research-grade" observations to detect plants growing only in cultivation.
A well-designed and useful feature in iNaturalist is the possibility to call for attention of a specific user using the "@" prefix (for example, @krestov). This is very important for maintaining the appropriate quality as experts may respond and help in identification.
Undoubtedly, the data quality depends on the quality of the uploaded photographs and field experience of the users. We ask project curators to post links to regional checklists, field guides and illustrated atlases for interested naturalists in the project description.
Constant quality control is especially important during various events such as bioblitzes or mandatory student practices. As their numerous participants mostly lack experience in collecting biodiversity data through iNaturalist, the work of curators and teachers should be constant during the whole period of these events.

Geographic coverage
Description: Russia is a large country with an area of over 17 million km and an unevenly distributed human population. For instance, in Chukotka, the population density is only 0.07 people per 1 km , whereas in the City of Moscow, it is 4,925 people per 1 km ( Table  7). The geographic coverage of the dataset is characterised by significant spatial disparities in the presented data for all indexes, including the number of observations, species and observers (Fig. 5).   Table 7.
Human population and area of the regions of Russia (official data) Number of observations. The key index of the "Flora of Russia" project is the number of uploaded observations (Fig. 1). The project reached 750,000 observations of "researchgrade" quality on 7 Sep 2020, whereas ca. 135,000 unverified observations make the project's backlog, which is not included in the dataset. The stable snapshot of the dataset produced on 8 Sep 2020 contains 750,143 records (see "Data resources" section).
The City of Moscow topped the project by the number of observations from 18 Aug 2019 to 15 Jun 2020, when Moscow Oblast, the region with the largest community of observers, took the lead (Table 8). Other regions of Central Russia -Bryansk, Tula, Nizhny Novgorod and Kursk Oblasts -hold the third to sixth places in the ranking. A map of 750K observations from the "Flora of Russia" project showing an extreme disproportion in data coverage (source: iNaturalist.org).  Table 8.
Observations of the "Flora of Russia" project distributed amongst regional projects The top 10 regional projects contribute 45.4% of observations of the entire project and this proportion is constantly decreasing due to the growth of the communities in other regions. For instance, the proportion of observations made in the top ten regions was 55.5% on 9 Jan 2020. However, the disproportion in the spatial coverage is obvious even within the leading regions (Fig. 6). Observations per capita. If we normalise the number of observations per 1,000 inhabitants, it turns out that the two most active communities are in the City of Sevastopol and Kamchatka, followed by Bryansk Oblast, Kursk Oblast, Mordovia, Kostroma Oblast and, unexpectedly, Chukotka, a vast region with a very small population. In general, this index best reflects both the involvement of the local residents in the "Flora of Russia" project and the activity of this particular region's community. Observations per recorded species. The number of observations per recorded species is the integrated index which best characterises both the data density and species representation. The gradual accumulation of observations leads to consequent revealing of all known species or, at least, of regularly observed plants. When recording a new species becomes a rare event and an active community still posts many new photos, the average number of observations per species begins to grow rapidly. According to this index, the leaders are Moscow Oblast (65), City of Moscow (61), Bryansk Oblast (31), Tula Oblast (29), Nizhny Novgorod Oblast (27) and Omsk Oblast (25). Regions with rich floras (for example, montainous areas) outperform the relatively-poor plains because more observations need to be made there to record numerous rare species.

Taxonomic coverage
Description: As of 7 Sep 2020, the "Flora of Russia" project included observations of 6,857 species of vascular plants (Fig. 2). Plants of the World Online (POWO) serves as a taxonomic backbone for tracheophytes on iNaturalist. There are some tools used for automatic, semi-automatic and manual addition of new taxa and modification of the taxonomic information. Reasonable deviations from POWO could be accepted on iNaturalist by the curators after community discussions. The taxonomic opinion of an observer, if necessary, may be recorded in the description section of an individual observation.
Unfortunately, Russia lacks both a modern checklist of vascular plants and a standard flora. Therefore, we could assume that the project covers ca. 55% of the Russian plant diversity out of 12,500 species estimated by Kamelin (2007). That is quite a satisfactory figure since the Russian flora includes many species which require collection and proper identification of herbarium specimens (Hieracium, Alchemilla, Crataegus, some Poaceae and Cyperaceae etc.). There are also many rare endemics in hardly accessible mountain areas and quite a few insufficiently-known species recorded from scattered localities.
The list of the most recorded species of the project includes species which are widespread, easily recognisable and identifiable during all seasons (Table 9). These are mostly perennial herbs tolerant to intensive human activity, but also some common trees. Since the observations are concentrated in European Russia (Fig. 5), top-observed species of the project perfectly match the most common plants of temperate Europe, based on 2 frequency of occurrences in the national grid mapping projects (Seregin 2011  The top 20 species of the "Flora of Russia" project ordered by the number of observations The following users have created the greatest number of unique species and nothospecies records: R.A. Murtazaliev (195  To assess the regional representation of our data, we have compiled a table on the regional diversity of the Russian flora with necessary references (Table 10). The numbers of known species across the regions are not always perfectly comparable, since the authors of regional floras, guides and checklists used various species concepts which were either "splitters" or "lumpers". The overestimate for Volgograd Oblast (Sagalaev 2008) is especially notable.   Krasnoborov (1988), Krasnoborov (1997), Krasnoborov andMalyshev (1988), Malyshev (1997), Malyshev and Peshkova (1987), Malyshev and Peshkova (1990), Malyshev and Peshkova (1993a), Malyshev and Peshkova (1993b), Malyshev et al. (2003), Peshkova (1996, Peshkova and Malyshev (1990), Polozhy and Malyshev (1994a), Polozhy and Malyshev (1994b), Polozhy and Peshkova (1996)   Crimea.
Additionally, the Caucasus is listed as the only biodiversity -and especially tracheophyte diversity -hotspot of global importance in Russia (Myers et al. 2000, Barthlott et al. 2007).
We present data on the taxonomic diversity of vascular plants within the regions (Table 11) using two indexes, i.e. (1) the number of the recorded species and (2) the number of the lowest-rank taxa of the taxonomic tree with "research grade". The second index includes varieties, subspecies, species and those genera which cannot be accurately identified to species rank by uploaded photos (for instance, Alchemilla, Pilosella, Hieracium, Euphrasia and some genera requiring specific features not always captured by observers, like Melilotus and Epilobium without flowers etc.). The number of the recorded species is more suitable for the further taxonomic analysis.
The community has observed the highest number of plant species in Dagestan (1,927 species), which is the richest region of Russia in terms of the number of known species (Table 10). However, the rich flora of Dagestan is still represented here by only 57.0% of known species (Murtazaliev 2016  Taxonomic diversity of the "Flora of Russia" project across regional projects (references for species known in the region are given in Table 10 If we consider the number of species known from each region, Kursk Oblast is the leader in terms of the proportion of observed species (81.8% out of 1,409 known species). Nizhny Novgorod Oblast, Bryansk Oblast, Kostroma Oblast and the City of Sevastopol also have over 70% of known species already represented on iNaturalist, although Nizhny Novgorod Oblast lacks a modern flora checklist, since the number of taxa for the region published by Bakka and Kiseleva (2008) is out of date.
In regional lists, 53 species are counted as the leaders of the scoreboards (Table 12). This list includes some notable invasive species like Acer negundo (a leader in five regions), Heracleum sosnowskyi (two regions), Ambrosia artemisiifolia, Erigeron annuus, Hordeum jubatum, Impatiens glandulifera, and Lupinus polyphyllus (one region each). Usually, this high performance of invasive alien species is a result of intentional recording in line with a regional assessment of aliens performed by the project members.

Species
Number of observations in the regional  The most recorded species in the regional projects (as of 9-10 Sep 2020)

Species
Number of observations in the regional

Temporal coverage Notes: Uploading date
The project started on 9 Jan 2019 with 11,000 "research-grade" observations of the Russian flora. As of 8 Sep 2020, observations uploaded to iNaturalist in 2018 and earlier, account for only 1.4% of all the project data (Table 13). The number of observations uploaded in the eight months of 2020 exceeds threefold the data uploaded in 2019. The backlog of unidentified observations from 2019 is much smaller than the proportion of unprocessed records made in 2020.
Year "Research grade"

Observation date
Many participants of the "Flora of Russia" project hold large photo archives and continue to post them on iNaturalist retrospectively. Therefore, at least 14.9% of the observations were made before 31 Dec 2018 (Table 14). Since the project requires a photo of the organism, the most important limiting factor of the temporal coverage is the time of spreading of digital cameras. Apparently, their appearance in Russia, judging by the data, is dated [2002][2003]. Amongst the earlier observations, there are both scanned photographs and transparencies, as well as later photographs of preserved specimens.
We have analysed the data on the basis of dates of observation for 2019 (21.5% of all data on plants in Russia) and the eight months of 2020 from January to August (64.6%). Two graphs given below have the same scale bar.
Year   In 2019, the most productive days were the first two days of the Team Cup final, when its participants made 3,027 (11 Aug 2019) and 2,602 (10 Aug 2019) observations of vascular plants (Fig. 7). This was mainly caused by the fact that we organised the final as a bioblitz, while in the early stages, it was possible to upload archived photos. However, the 2019 Cup overall did not attract much interest amongst the participants. For example, the third richest day by the number of observations was 17 Jun 2019, on which 2,514 observations of vascular plants were made, including 555 observations from the field trip of Lomonosov Moscow State University students to Voronezh Oblast.
In 2020, on the contrary, all the six stages of the Cup are clearly visible as prominent peaks of observation numbers. Namely, on 30 May 2020, 10,780 observations of vascular plants were made during 1/8 of the Cup (16 teams) and during the first and second days of the semi-finals 10,724 and 10,734 observations were made by four regional teams (Fig. 8). Verifiable observations from Russia on iNaturalist made in 2019 -tracheophytes (green) and all other groups (brown). A major event which contributed to data collection was the Russian Team Cup on photodocumentation of wild plants 2019 (from 1/32 to final).
From the 1/8 of the Cup onwards, the rounds were held in the format of a three-day bioblitz from Saturday to Monday.
International competitions like the City Nature Challenge (CNC) and the International Biodiversity Championship (IBC) did not generate peak user interest across Russia in 2020. However, both events also made a significant contribution to our data, since they lasted four days each. During the four CNC days (24-27 Apr 2020), 20,965 observations of vascular plants were made and 20,429 observations were recorded during the four IBC days (3-6 Aug 2020). We actively promoted both events amongst the participants of the "Flora of Russia" project.
It is worth mentioning that the COVID-19 restrictions of the spring of 2020 caused, for example, a low level of participation in CNC, which was made up for in the summer by offcampus student practices and events for high school students which all used iNaturalist this year.
Summing up, all Russian projects on student practices over the three summer months of 2020 (the common time for them in Russia) shows that 54,186 "research grade" observations by more than 750 observers meet the requirements of the "Flora of Russia" project. This makes a modest 17.4% contribution to the summer observations of the project. In 2020, practices in the form of independent work of students supervised remotely by teachers were held in fourteen   Another notable income of the summer data flow was the Herbarium 2.0 project, organised by Valentina Borodulina. Being initially designed for high school students, it attracted the attention of schoolteachers and teachers of out-of-school education. Of the 44,087 observations of this project (1 Jun -31 Aug 2020), 36,307 observations were made in Russia and reached "research-grade". This contributes to 11.0% of our summer data and the most active observers involved in the project rapidly became notable participants of the "Flora of Russia" project.

Usage licence
Usage licence: Other

Number of data sets
We amended the dataset on 25 Sep 2020 after a data audit performed by Dr Robert Mesibov (https://www.datafix.com.au) in line with preparation of the data paper. All records with positional accuracy exceeding 50,000 m were marked as having inaccurate location and reported to users. Altogether, we excluded 1,106 observations from the project's data and 587 observations from the backlog from the backup on this ground.
The "research-grade" observations with free licences (CC0, CC-BY and CC-BY-NC) are fully available in GBIF within "iNaturalist Research-grade Observations" occurrence dataset (https://doi.org/10.15468/ab3s5x). We added the last column "gbif_id" to all csv files of our dataset with URLs of GBIF records using GBIF Occurrence Download https://doi.org/10.15468/dl.msfxkn performed on 28 Sep 2020.
The following abbreviations are used in column descriptions: • A -automatically generated data (usually from exif files of photos); • M -manually inserted data; • AM -both options are possible (automatically generated data which could be manually edited).

Community Coverage
Number of observers. The project reached a milestone of 10,000 observers with at least a single "research grade" observation 7 Sep 2020.  Table 15.
Community of the "Flora of Russia" project across regional projects Number of members (subscribers) of regional projects. The largest regional community of formal members is in the City of Moscow (122 participants) and Moscow Oblast (88 participants). Membership in a regional project allows a member to follow news and to affiliate their observations with a specific region on the observation page. More than 40 participants joined the projects of Tula Oblast, Crimea, Novosibirsk Oblast, Bryansk Oblast, Krasnodar Krai, Sevastopol, Altai Krai and Sverdlovsk Oblast. In Kamchatka, 30.8% of observers are subscribers to the regional project, while in St. Petersburg, on the contrary, only 3.0% have subscribed to the regional project. The number of subscribers is a result of an active curation of the regional project journal.
Number of observers per 1M capita. The number of observers per 1M of the regional population shows how actively the local residents are involved in the work of the "Flora of Russia" project. However, a top list, with a few exceptions, includes regions with a small population and sites specifically noteworthy for naturalists. Due to tourist activity, a relatively high number of observers has been noted in Altai Republic, Kamchatka, Leningrad Oblast, Karelia, Chukotka, Nenets Autonomous Okrug and Kaluga Oblast. Communities mostly formed by local residents include Sevastopol, Moscow Oblast and Tver Oblast.
Number of observers per 1,000 km . This index helps assess areas with a high density of observers. The federal cities of Moscow, St. Petersburg and Sevastopol are undoubtedly in the lead here (200-700 observers per 1,000 km ). This number is reduced to 43 observers in Moscow Oblast, followed by the Crimea (16), Tula Oblast (13) and Kaliningrad Oblast (11).

Productivity (number of observations per observer).
This index clearly demonstrates the regions with a fairly modest community, where data are received mainly from a few of the most active participants ("mega-observers") (  Table 16. Top observers of the regional projects (a -author, c -contributor)

Data Usage
The project's data were cited in a number of research papers dealing with documentation and verification of new regional records (Prokopenko et al. 2019;Verkhozina et al. 2019;Leostrin and Efimova 2020;Seregin 2020b;Verkhozina et al. 2019Verkhozina et al. 2020. Other examples of dataset usage include papers on distribution of noteworthy alien plants Zarubo and Mayorov 2020), floristic inventories of protected areas (Seregin 2020a) and phenology of plants during the extremely warm winter of 2019/2020 .
Several papers on orchids of Russia employed our data to a various extent since this showy group attracts special attention of the observers (Efimov and Legchenko 2020; Efimov 2020; Popovich et al. 2020).