Historical collections of vascular plants in the Korean Peninsula by three major collectors in the early 20th century: U. J. Faurie, E. J. Taquet and E. H. Wilson

Abstract Background The digitisation of historical collections aims to increase global access to scientific artifacts, especially those from currently inaccessible areas. Historical collections from North Korea deposited at foreign herbaria play a fundamental role in biodiversity transformation patterns. However, the biodiversity pattern distribution in this region remains poorly understood given the severe gaps in available geographic species distribution records. Access to a dominant proportion of primary biodiversity data remains difficult for the broader scientific and environmental community. The digitisation of foreign collectors’ botanical collections of around 60,000 specimens from the Korean Peninsula before World War II is ongoing. In this paper, we aim to fill this gap by developing the first comprehensive, open-access database of biodiversity records for the Korean Peninsula. This paper provides a quantitative and general description of the specimens that Urbain Jean Faurie, Emile Joseph Taquet and Ernest Henry Wilson have collected and are kept in several herbaria. New information An open-access database of biodiversity records provides a simple guide to georeferencing historical collections. The first set describes E. H. Wilson’s collection of woody plants collected in the Korean Peninsula and preserved at the Harvard University Herbaria (A). This set includes 1,087 records collected from 1917 to 1918. The other collections contain specimens collected by E. J. Taquet (4,727 specimens from Quelpaert (Jeju), 1907–1914) and U. J. Faurie (3,659 specimens from North Korea and Quelpaert, 1901, 1906 and 1907). For each specimen, we recorded the species name, locality indication, collection date, collector, ecology and revision label. This set contains more than 9,400 specimens, with 22% of vascular plants from North Korea and 66% from Quelpaert (Jeju) Island. In these collections, we included some images that correspond to the specimens in this dataset.


Introduction
Institutions outside the Korean Peninsula hold much of the region's historical biodiversity information. With nearly 100,000 specimens, including data on specimens stored at foreign herbaria, these institutions have a comprehensive chronological, historical, taxonomic and geographic coverage of Korean plants, including those from inaccessible areas such as North Korea. Despite the abundance of biodiversity information in these collections, there remains a pressing need to make such data accessible and sufficiently integrated to foster query-based enquiries and achieve regional conservation priorities. Creating this openaccess database mobilises existing biodiversity information and knowledge within the Korean Peninsula. Through the advantages offered by a database, we could search through historical records of foreign herbaria, generate georeferenced specimen data and produce images of North and South Korean vascular plants. With these goals, the project addressed the imbalance in biodiversity information between South and North Korea and reduced the knowledge gap on the diversity and distribution of vascular plants in the Korean Peninsula.
Historical biodiversity data provide the context for past observations. Here, we present a vascular plant dataset of the Korean Peninsula covering the early 1900s. This dataset consists of three sets: (1) E. H. Wilson's 1,087 specimens mainly from North Korea from 1917 and1918; (2) E. J. Taquet's 4,727 specimens from Quelpaert from 1907Quelpaert from to 1914and (3) U. J. Faurie's 3,659 specimens from North Korea andQuelpaert from 1901 to 1907. These datasets were the first attempts at archive digitisation in both North and South Korea, covering an early period and incorporating data from different sources. The objective was to identify, describe, perform quality control and integrate historical data for the Korean Peninsula into standardised datasets and make them freely available and reliable for end users in terms of fitness for use (Chapman 2005 (7) SNUA for the T.B. Lee Herbarium at Seoul National University. The E (3,017) and TI (1,002) herbaria constitute the largest collections of Taquet, while the KYO (2,714), E (1,475) and P (869) conserve the major collections of Faurie. This is the first attempt by Korean researchers to investigate specimens deposited at various foreign herbaria using a single and uniform protocol. We have visited TI, A, E and KYO, taken photos and recorded them in the database. We have also searched for additional specimens at P, LE and K either from botanical collection papers (Grabovskaya-Borodina et al. 2018) or herbarium websites (Royal Botanic Gardens, Kew 2020, Chagnoux 2020). Quality control: Both Faurie and Taquet did not number their collections chronologically, based on their collecting activities. They seem to have sorted the collections by genera and they assigned numbers to the taxonomic bundles of dried plants. Some of the collection data, such as locality, date or collection number, were missing. The first set of specimens is at E or P, except for some families. Duplicate specimens were widely distributed and could be found at BM, TI, KYO, A, LE and B. Faurie's collection of several thousand herbarium specimens is deposited in Paris, with duplicates at the University of Kyoto, the British Museum, Kew and elsewhere (Kitagawa 1979, Koidzumi 1936. Georeferencing: A wide range of historically used toponyms in Korea have Chinesecharacter origins and can, therefore, be written the same way (Choo 2016, Tanabe and Watanabe 2014). As a result of 36 years of Japanese colonial occupation, Korean place names used for plant collections have become a toponymic enigma. In many Asian countries, Japanese exonyms are names of places in the Japanese language that differ from those given in their dominant language. Japanese botanists or field guides often transliterated these toponyms into the Japanese pronunciation. This has produced many unresolved botanical exonyms, which have been only found on herbarium labels. These Japanese terms for some place names are now a mystery either because they are quite different from endonyms or because of some other obscure etymology. We have prepared a multilingual gazetteer to resolve the inconsistencies, uncertainties and confusion on botanical exonyms in the Korean Peninsula that foreign explorers and botanical collectors in Korea have used over the past 120 years (Table 1, Chang et al. 2015).
After the identification of place names, the next step is providing a precise coordination to a biological collection. We always aimed for accurate georeferencing for location coordinates, but sometimes this was not possible because of insufficient information in the place names. Thus, in these situations, we used higher geographic area coordinates, such as counties or cities. To minimise errors, enhance data consistency and maintain integrity throughout the georeferencing process, we modified a procedure adopted by the Chinese type collection project (Fig. 1, Lohonya et al. 2020). Using the BRAHMS system, we set up a database of herbarium records. We compared the geographic queries with the label information for each specimen to resolve geographic information. We detected and corrected two types of errors: typographical errors and erroneously identified records. After updating the database with recent publications and cleaning the data, we obtained the clear collection data that corresponds to this dataset.
Finally, we generated the Darwin Core Archive to incorporate the metadata in this file and published the data on GBIF, using the Integrated Publishing Toolkit.  Table 1.
While most South Korean place names are derived from words in the Chinese character, Japanese botanists transliterated these place names into the Japanese pronunciation. Flow diagram on how to approach labels in different languages and endonyms as well as exonyms.

Geographic coverage
Description: The Korean Peninsula is located in northeast Asia, between China and Japan. To the northwest, the Amnok River separates Korea from Liaoning Province in northern China and to the northeast, the Duman River separates Korea from Jirin Province in northern China and Far Eastern Russia. Excluding the islands, the Peninsula area covers about 220,847 km . The eastern and northern parts of the Peninsula are characterised by the high mountains. The highest point of the Korean Peninsula is located at Mount Paektu (2,744 m a.s.l.; 41°59N; 128°04E) and stands on the border with China (Fig. 2). The southern area of the Peninsula begins at the Island Marado (33°06N; 126°16E )) at the south of Jeju Island and stretches in an eastwards direction to the islets of Dokdo ( 37°14N; 131°52E).
The total occurrence points of vascular plant specimens were collected by three collectors in the Korean Peninsula. Enlarged maps are shown for the Mt. Konggo-san and Quelpaert, where Wilson, Faurie and Taquet made extensive collections.

Column label Column description
type The nature or genre of the resource. institutionCode The name (or acronym) in use by the institution having custody of the object(s) or information referred to in the record. basisOfRecord The specific nature of the data record.
occurrenceID An identifier for the Occurrence (as opposed to a particular digital record of the occurrence). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the occurrenceID globally unique.
recordNumber An identifier given to the Occurrence at the time it was recorded. Often serves as a link between field notes and an Occurrence record, such as a specimen collector's number.
recordedBy A list (concatenated and separated) of names of people, groups or organisations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.
eventDate The date-time or interval during which an Event occurred. For occurrences, this is the date-time when the event was recorded. Not suitable for a time in a geological context. year The four-digit year in which the Event occurred, according to the Common Era Calendar.
month The integer month in which the Event occurred. day The integer day of the month on which the Event occurred. country The name of the country or major administrative unit in which the Location occurs.
countryCode The standard code for the country in which the Location occurs. stateProvince The name of the next smaller administrative region than country (state, province, canton, department, region etc.) in which the Location occurs.
county The full, unabbreviated name of the next smaller administrative region than stateProvince (county, shire, department etc.) in which the Location occurs. locality The specific description of the place. Less specific geographic information can be provided in other geographic terms (higherGeography, continent, country, stateProvince, county, municipality, waterBody, island, islandGroup). This term may contain information modified from the original to correct perceived errors or standardise the description.  kingdom The full scientific name of the kingdom in which the taxon is classified. phylum The full scientific name of the phylum or division in which the taxon is classified.

class
The full scientific name of the class in which the taxon is classified.

order
The full scientific name of the order in which the taxon is classified. family The full scientific name of the family in which the taxon is classified.

genus
The full scientific name of the genus in which the taxon is classified. specificEpithet The name of the first or species epithet of the scientificName. infraspecificEpithet The name of the lowest or terminal infraspecific epithet of the scientificName, excluding any rank designation. taxonRank The taxonomic rank of the most specific name in the scientificName.