OLICH: A reference library of DNA barcodes for Nordic lichens

Abstract Background DNA barcodes are increasingly being used for species identification amongst the lichenised fungi. This paper presents a dataset aiming to provide an authoritative DNA barcode sequence library for a wide array of Nordic lichens. New information We present 1324 DNA barcode sequences (nrITS) for 507 species in 175 genera and 25 orders. Thirty-eight species are new to GenBank and, for 25 additional species, ITS sequences are here presented for the first time. The dataset covers 20–21% of the Nordic lichenised species. Barcode gap analyses are given and discussed for the three genera Cladonia, Ramalina and Umbilicaria. The new combination Bryobilimbia fissuriseda (Poelt) Timdal, Marthinsen & Rui is proposed for Mycobilimbia fissuriseda and Nordic material of the species, currently referred to as Pseudocyphellaria crocata and Psoroma tenue ssp. boreale, are shown to belong in Pseudocyphellaria citrina and Psoroma cinnamomeum, respectively.


Introduction
Lichens are fungi living in mutualistic symbiosis with green algae and/or cyanobacteria. They are highly polyphyletic and 20-30 independent lichenisation events may have occurred during the evolution of higher Fungi (Schoch et al. 2009, Lücking et al. 2017a). The lichenised fungi currently consists of more than 19,000 species, i.e. about 17% of all known fungal species and 27% of all known ascomycete species (Lücking et al. 2017a). They occur globally in all major terrestrial ecosystems, especially on trees, soil and rock. Although a lichen thallus consists of a symbiotic association between a fungal exhabitant, the mycobiont and one or more photosynthetic algal and/or cyanobacterial inhabitants, the photobionts, the taxonomy and nomenclature follow that of the mycobiont. A lichen thallus may house additional fungi, i.e. parasites, parasymbionts and endophytes (Hawksworth 1982, Richardson 1999, Spribille et al. 2016, making a lichen thallus a complex symbiotic assemblage. Identifying lichenised fungi has traditionally been based on morphological, anatomical and chemical characters, but DNA-based identification is now increasingly being used. In 2012, the internal transcribed spacer of nuclear ribosomal DNA (ITS) was recommended as the primary fungal DNA barcode marker by the International Fungal Barcoding Consortium (Xu 2016), following studies on its suitability by for example Schoch et al. (2012). For lichenised fungi, studies reporting successful use of the ITS marker for species delimitations include Kelly et al. (2011), Zheng et al. (2011), Parnmen et al. (2013, Leavitt et al. (2014), Magain and Serusiaux (2015) and Divakar et al. (2016). Pino-Bodas et al. (2013), however, found ITS often failing for species delimitations in Cladonia and investigated alternative markers.
Funding: This project is funded by the Norwegian Research Council (RCN 226134) through the Norwegian Barcode of Life project (NorBOL). In addition, the Norwegian Species Information Centre (Artsdatabanken) has funded four projects on lichens under the Norwegian Taxonomy Initiative from which material has been collected and/or sampled (project numbers 70184216, 70184227, 70184233, and 70184237).

Sampling methods
Sampling description: The material was partly freshly collected specimens from the field and partly previously curated specimens in university herbaria. The distribution of specimen age at seqencing is given in Fig. 2. The specimens were initially determined by experts based on morphology and secondary chemistry and the identifications were reexamined at the quality control after sequence acquisition. All collected material were, after sampling, deposited in university herbaria and the entire material is housed in BG (2.1%), H (0.1%), O (94.9%), SAV (0.2%), TRH (1.7%) and TROM (1.1%) (abbreviated according to Index Herbariorum; http://sweetgum.nybg.org/science/ih).  Distribution of specimen age (years after collecting) at sequencing in the OLICH dataset.
Thin-layer chromatography (TLC) was performed for identification of secondary compounds on selected specimens in accordance with the methods of Culberson (1972), as modified by Menlove (1974) and Culberson and Johnson (1982). Solvent system B' was used for routine examination, A and C for additional confirmation.
Tissue of thallus or preferably apothecia (about 3 mm ) was assembled in 96 well plates obtained from the Canadian Centre for DNA Barcoding (CCDB; http://ccdb.ca). One well was left empty to serve as negative control. The freshly collected material was either sampled in the field or at a field station. In the field, the specimen was first photographed, then tissue was removed with a flame-decontaminated forceps and placed in the well and finally the specimen was collected. Specimens requiring more controlled handling were photographed and sampled either with forceps or scalpel under a binocular lens at a field station. Wells containing fresh material were dried for several days with lids open in a drying chamber containing silica gel. Herbarium material was sampled at the housing institutions under binocular lenses.
Field data was stored on tablets and GPSs. Tissue sampling data and photography data were transferred directly to standard BOLD spreadsheets, whereas the collection data passed through herbarium databases for obtaining institution storing, voucher ID (= sample ID), uniform nomenclature and taxonomical hierarchy and contact information. Material sampled in herbaria used a similar pipeline, although simpler, as collection data was already available.
The plates were sent to CCDB for DNA extraction (following ) and sequencing of the ITS region (following . Primers used were ITS1-F or ITS5 (forward) and ITS4 (reverse) (Gardes andBruns 1993, White et al. 1990). Trace files and sequences were uploaded to BOLD (Ratnasingham and Hebert 2007). The sequences were edited at CCDB, but occasionally inspected and re-edited at OLICH. Table 1.
List of specimens included in the OLICH dataset (sequence length >200bp, 1324 records, 507 species). Thin Layer Chromotagrophy (TLC, Chemistry-column) was only done in species complexes where chemical data is relevant for the identification of the species or where there are chemical strains within the species.

Temporal coverage
Notes: The sampled material was collected in the period 1995-2017, with 90% being after 2010 and with a peak of 532 specimens in 2014 (Fig. 3).   Depending on future funding, the OLICH project will continue sequencing Nordic lichens for the BOLD database, with the ultimate (although unrealistic) goal of complete coverage. Hence, the dataset will be updated in the future with new sequences and information. The version of the dataset at the time of writing the manuscript is included as Suppl. materials 1, 2, 3, 4 in the form of two text files for record information as downloaded from BOLD, one text file with the collecting and identification data in Darwin Core Standard format (downloaded from BOLD and formatted by the authors) and of a fasta file containing all sequences as downloaded from BOLD.

Usage rights
It should be noted that, as the BOLD database is not compliant with the Darwin Core Standard format, the Darwin Core formatted file (dwc) that can be downloaded from BOLD is not strictly Darwin Core formatted. For a proper Darwin Core formatted file, see Suppl. material 3.
All records have images and geographical coordinates in BOLD.
Column labels below follow the labels downloaded in the tsv format. Columns with no content in our dataset are left out in the list below.

Additional information Summary statistics and genetic distances
The dataset contains on average 2.6 records per species, ranging from one (184 species) to 12 (one species) records per species. Average sequence length was 541 bp (including sequences that did not include the whole ITS region due to unreadable trace files). The shortest sequence in the dataset was an incomplete sequence, 229 bp long. The species with the shortest complete ITS region was Haematomma ocroloeucum with 481 bp. The longest ITS region was 856 bp from Leptogium cochleatum. However, the two sequences of the latter species did not align with the others at the start of the sequence and it is possible that introns in the SSU rDNA are included.
We calculated intra-and interspecific genetic distances for the three genera with most sampled species and performed barcode gap analyses on those. Distances and key statistics are found in Table 2.

Ramalina Umbilicaria Cladonia
Number of species in genus in this dataset 9 19 47 Number of records 26 52 122 Average number of records per species 2.9 2.7 2.6 Average intraspecific distances 0-1.3% 0-2.3% 0-3.4% Interspecific distances 0.4-8.2% 2.5-12.1% 0.2-15.6% In Ramalina, there was an overlap between intra-and interspecific distances (Table 2, Fig.  4a). The overlap is caused by the one species pair with higher maximum intraspecific distance than the distance to the nearest neighbour (Fig. 4b); i.e. R. farinacea had an intraspecific distance of 0.9% and a distance of 0.4% to R. subfarinacea. If this species pair is excluded from the distance-dataset, a barcode gap will occur at 2% (Fig. 4a). Table 2.  Genetic distances within the three lichen genera with most sampled species in the OLICH dataset.
a: Frequency distribution of intraspecific distances (blue) and interspecific distances (orange). A gap between intra-and interspecific distances illustrates the barcode gap in the genus. b: Scatterplots representing maximum intraspecific distances versus minimum distance to closest neighbour for all species with more than one specimen per species, in each of the three genera. c: Frequency distribution of intraspecific distances (blue) and interspecific distances (orange). A gap between intra-and interspecific distances illustrates the barcode gap in the genus. d: Scatterplots representing maximum intraspecific distances versus minimum distance to closest neighbour for all species with more than one specimen per species, in each of the three genera. e: Frequency distribution of intraspecific distances (blue) and interspecific distances (orange). A gap between intra-and interspecific distances illustrates the barcode gap in the genus. f: Scatterplots representing maximum intraspecific distances versus minimum distance to closest neighbour for all species with more than one specimen per species, in each of the three genera.
For Umbilicaria, there were no species pairs with higher maximum intraspecific distance than the distance to the nearest neighbour, but one species had the same maximum intraspecific distance as the distance to the nearest neighbour, namely U. proboscidea whose closest neighbour was U. virginis (Fig. 4d). There is thus no barcode gap in our Umbilicaria dataset, as the intra-and interspecific distances overlaps around 3% (Fig. 4c).
Whereas Ramalina and Umbilicaria, despite the above-mentioned issues, show a tendency for displaying a barcode gap, Cladonia does not show the same pattern. There is no gap, but rather overlapping, smooth curves of intra-and interspecific distances in the barplot ( Table 2, Fig. 4e). Most of the species pairs that disrupt the pattern are complexes where species are known to be sometimes difficult to distinguish with traditional characters and these are found below the line in Fig. 4f: C. coniocraea/gracilis, C. maxima/ecmocyna and C. cenotea/subfurcata/crispata/squamosa.

Observations
From the phylogenetically and BLASTn based quality control, some specific taxonomic observations were made that are listed below.
Alectoria sarmentosa (Ach.) Ach. ssp. vexillifera (Nyl.) D.Hawksw.: This subspecies is currently accepted in the Nordic flora Myllys 2011, Nordin et al. 2019). It differs morphologically from spp. sarmentosa in forming a more prostrate thallus with broader and more flattened main branches and, ecologically, in growing on the ground or on rocks mainly in coastal and alpine areas. Subspecies sarmentosa grows mainly on trunks and branches of conifers in boreal forests. Our sequence of spp. vexillifera (OLICH1478) is nested within our four sequences of spp. sarmentosa, however, which does not support the recognition of the subspecies (Fig. 5).
Brodoa oroarctica (Krog) Goward : The species differs from B. atrofusca (Schaer.) Goward in the shape of the thallus and lobes and in the lack of physodic acid in most of the thallus except for in some lobe tips (Krog 1974). The former species is circumpolar arcticalpine, with southernmost occurrences in Europe in the southern Scandinavian mountains, whereas the latter is a European species known mainly from the Alps and Pyrenées and with scattered occurrences in Northern Europe. Our four barcode sequences of B. oroarctica are however nested within the four sequences of B. atrofusca (Fig. 6), and the complex seems to be in need of revision.  Maximum likelihood tree based on the OLICH sequences of Alectoria sarmentosa.
Numbers above branches and thick lines indicate bootstrap support ≥ 70%. Bold font indicates focus taxon and grey font the outgroup. The two-letter ISO country codes are included. This applies for all phylogram figures (Figs 5,6,7,8,9,10,11,12,13,14,15). This rarely collected species was described from the Alps in the large, collective genus Lecidea Ach. (Poelt 1961). It was later transferred to the segregate genus Mycobilimbia Rehm by Poelt & Hafellner (in Hafellner 1989). Partly due to the presence of characteristic blue, K+ green granules in the hymenium and the hypothecium, Poelt (1961) and Timdal (1992) (Fig. 7), shows that it is, in fact, closely related to B. diapensiae and B. hypnorum and it is hence here transferred to Bryobilimbia. Hawksw. The OLICH dataset shows the same topography ( Fig. 8): B. capillaris nested within B. implexa. However, all our specimens of the B. capillaris/implexa complex were identified by morphology prior to TLC analysis and the correlation holds: all pale specimens (B. capillaris) contain alectorialic acid (plus accessories) and barbatolic acid, whereas no dark specimens (B. implexa) do. Fumarprotocetraric and protocetraric acids occur variably in both species. Either the DNA barcode marker and the other markers used in recent phylogenetic studies do not distinguish between two species which may be recognised by the correlation of thallus colour and secondary chemistry or the two "species" are conspecific and their morphological/chemical difference is caused by other factors. For example, the presence/ absence of an additional symbiont (basidiomycete yeast) were found likely to trigger the production of vulpinic acid in the cortex of Bryoria fremontii/tortuosa by Spribille et al. (2016).  (Bjelland et al. 1997, Obermayer and Mayrhofer 2007, Nordin et al. 2019, were recently shown to represent three distinct species in the phylogenetic analysis by Mark et al. (2019). The diagnostic, main compounds of those species are imbricaric acid in C. monachorum, olivetoric acid in C. olivetorum s. str. and perlatolic acid in C. cetrarioides (Mark et al. 2019). The seven specimens sequenced in the OLICH project fell into the C. cetrarioides and C. monachorum clades (Fig. 9) and the TLC analyses of the sequenced specimens showed the chemotypes were in accordance with the species identifications (Table 1).

Figure 8.
Maximum likelihood tree of Bryoria capillaris and B. implexa based on the OLICH sequences.

Hypotrachyna afrorevoluta (Krog & Swinscow) Krog & Swinscow :
The species is morphologically and chemically quite similar to H. revoluta (Flörke) Hale and the OLICH sequences ( Fig. 10) confirm the presence of the two species in the Nordic countries. See Elix and Thell (2011) for the subtile morphological differences between the species.

Nephroma tangeriense (Maheu & A.Gillet) Zahlbr.:
The species was recently reported as new to the Nordic countries (Klepsland 2013) from two collections from Rogaland county. The OLICH project has sequenced one of those (OLICH2019), as well as four additional collections from Hordaland and Vest-Agder counties and confirms its identity and distinction from N. laevigatum Ach. (Fig. 11).
Peltigera "neocanina": BLASTn search revealed the identity of OLICH1878, which was originally identified as P. canina (L.) Willd., as P. "neocanina". This is still an invalid name despite the taxon, which must be regarded as a morphologically cryptic species, has been known since the phylogenetic study of Peltigera by Miadlikowska and Lutzoni (2000).  (Fig. 12).

Psoroma cinnamomeum Malme:
The specimen from which OLICH702 was generated was originally identified as Psoroma tenue Henssen var. boreale Henssen, but BLASTn search against GenBank indicated its identity as P. cinnamomeum. A phylogenetic analysis, based on most of the ITS sequences of Psoroma in GenBank (including those used by Park et al. 2018), supported this identification as OLICH702 and came out in a strongly supported clade with the six other specimens of P. cinnamomeum, two of which Maximum likelihood tree based on the OLICH sequences of Nephroma laevigatum and N. tangeriense, with selected ITS sequences from GenBank.
are from southernmost Chile and four from Svalbard (Fig. 13). The clade was the sister of the strongly supported clade of the two specimens of P. tenue in GenBank, both from Antarctica. Both P. cinnamomeum and P. tenue were described from Tierra del Fuego, var. boreale from USA, Colorado (cf. Henssen and Renner 1981). OLICH702 contains porphyrilic acid and two unknown compounds (apparently related to porphyrilic acid due to the colour and fluorescence on the chromatograms) and hence fits the current concept of P. tenue var. boreale (cf. Jørgensen 2007). Unless molecular studies show that there are two morphologically very similar species in the Northern Hemisphere, both containing porphyrilic acid, we believe P. cinnamomeum is the correct name for the taxon currently known as P. tenue var. boreale.
Ramalina capitata (Ach.) Nyl.: The species has often been included in R. polymorpha (Lilj.) Ach. in the past (e.g. Krog and James 1977), but is now usually accepted as distinct (e.g. Nimis et al. 2018, Nordin et al. 2019). In our phylogeny (Fig. 14), Nordic and western European material of the two species are mostly well separated, but the four sequences from Turkey in GenBank are deviant and the complex seems to be in need of further study.
Ramalina subfarinacea (Cromb.) Nyl.: Our two sequences of this species, as well as the two ITS sequences available in GenBank, are nested in the clade of Ramalina farinacea (L.) Ach. in our phylogeny (Fig. 15). According to Krog and James (1977), there are subtle morphological differences between the species, but apparently the major distinguishing feature is the ecology: mainly on coastal rock (R. subfarinacea) vs. mainly on bark of deciduous, inland trees (R. farinacea). More study is needed in order to evaluate the taxonomic status of R. subfarinacea.   Maximum likelihood tree of the Ramalina capitata/polymorpha complex, based on all available ITS sequences.

Figure 15.
Maximum likelihood tree of the Ramalina farinacea/subfarinacea complex, based on all available ITS sequences.