Biodiversity Data Journal :
Data Paper (Biosciences)
|
Corresponding author: Nicolas Sauvion (nicolas.sauvion@inrae.fr)
Academic editor: Colin Favret
Received: 19 May 2021 | Accepted: 16 Jun 2021 | Published: 01 Jul 2021
© 2021 Nicolas Sauvion, Jean Peccoud, Christine Meynard, David Ouvrard
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Sauvion N, Peccoud J, Meynard CN, Ouvrard D (2021) Occurrence data for the two cryptic species of Cacopsylla pruni (Hemiptera: Psylloidea). Biodiversity Data Journal 9: e68860. https://doi.org/10.3897/BDJ.9.e68860
|
|
Cacopsylla pruni is a psyllid that has been known since 1998 as the vector of the bacterium ‘Candidatus Phytoplasma prunorum’, responsible for the European stone fruit yellows (ESFY), a disease that affects species of Prunus. This disease is one of the major limiting factors for the production of stone fruits, most notably apricot (Prunus armeniaca) and Japanese plum (P. salicina), in all EU stone fruit-growing areas. The psyllid vector is widespread in the Western Palearctic and evidence for the presence of the phytoplasma that it transmits to species of Prunus has been found in 15 of the 27 EU countries.
Recent studies showed that C. pruni is actually composed of two cryptic species that can be differentiated by molecular markers. A literature review on the distribution of C. pruni was published in 2012, but it only provided presence or absence information at the country level and without distinction between the two cryptic species.
Since 2012, numerous new records of the vector in several European countries have been published. We ourselves have acquired a large amount of data from sampling in France and other European countries. We have also carried out a thorough systematic literature review to find additional records, including all the original sources mentioning C. pruni (or its synonyms) since the first description by Scopoli in 1763. Our aim was to create an exhaustive georeferenced occurrence catalogue, in particular in countries that are occasionally mentioned in literature with little detail. Finally, for countries that seem suitable for the proliferation of C. pruni (USA, Canada, Japan, China etc.), we dug deeper into literature and reliable sources (e.g. published checklists) to better substantiate its current absence from those regions.
Information on the distribution ranges of these vector psyllids is of crucial interest in order to best predict the vulnerability of stone fruit producing countries to the ESFY threat in the foreseeable future.
We give free access to a unique file of 1975 records of all occurrence data in our possession concerning C. pruni, that we have gathered through more than twenty years of sampling efforts in Europe or through intensive text mining.
We have made every effort to retrieve the source information for the records extracted from literature (1201 records). Thus, we always give the title of the original reference, together with the page(s) citing C. pruni and, if possible, the year of sampling. To make the results of this survey publicly available, we give a URL to access the literature sources. In most cases, this link allows free downloads of a PDF file.
We also give access to information extracted from GBIF (162 exploitable data points on 245 occurrences found in the database), which we thoroughly checked and often supplemented to make the information more easily exploitable.
We give access to our own unpublished georeferenced and genotyped records from 612 samples taken over the last 20 years in several European countries (Switzerland, Belgium, Netherlands, Spain etc.). These include two countries (Portugal and North Macedonia), for which the presence of C. pruni had not been reported before. As our specimens have been genotyped (74 sites with species A solely, 202 with species B solely and 310 with species A+B), our new data enable a better overview of the geographical distribution of the two cryptic species at the Palaearctic scale.
Hemiptera, psyllid, Cacopsylla pruni, vector-borne plant pathogen, phytoplasma, 'Candidatus phytoplasma prunorum', European stone fruit yellows, species distribution, epidemiology
Psyllids (Psylloidea), or jumping plant-lice, are plant sap-sucking hemipterans that could be considered as a minor group in terms of species diversity (3,573 described species according to
Other bacteria transmitted by psyllids to fruit trees have major economic impacts, in Europe in particular (
Dispersal of psyllid vectors poses a threat to food security across countries, stressing the need to anticipate the risks associated with introductions of new psyllids. Mapping the vector potential distributions under scenarios of introduction is crucial to an efficient pest risk assessment (PRA) framework (
At least four criteria should be considered before using occurrence data as input for SDMs (
Historical data may also consitute a precious resource to help trace vector dispersion routes or simply to access specimens that can no longer be obtained (e.g. samples from an inaccessible locality). Many museums and academic institutions hold field notebooks and maintain collections that are a rich source of valuable information (e.g. collection date and locality) on insect specimens collected during scientific expeditions (
Cacopsylla (Thamnopsylla) pruni (Scopoli, 1763) has been known since 1998 as the vector of a bacterium, ‘Candidatus Phytoplasma prunorum’ responsible for ESFY (
In their review,
Establishing the geographic distribution of C. pruni and possibly for each biotype, was therefore a priority. To this end, we developed molecular markers to easily identify the C. pruni biotypes (
Our objective is to give access, through a unique dataset, to all the data we have gathered on the two cryptic species of C. pruni. In this way, we hope to contribute to a better management of ESFY in countries affected by the disease and to a better anticipation of the risk of introduction in countries not yet affected.
This dataset is a compilation that is meant to include all available information (literature, GBIF, INRAE unpublished data) on the geographical distribution of two cryptic species of the psyllid Cacopsylla pruni at the scale of the Palaearctic (Fig.
Global map of the 1716 occurrence data available in the C. pruni dataset (map generated with QGIS 3.14). The map shows the distribution of cryptic species A (green dots) and B (red dots) according to available data. However, most of the data from the literature (black dots), GBIF (orange dots) or the Psylloidea catalogue of the "Faune de France" (currently being published) do not allow a distinction to be made between cryptic species.
The data contained in this dataset have three different origins: a systematic literature review, the Global Biodiversity Information Facility [GBIF] network and field collections by researchers/students from INRAE-Montpellier. They cover several ecoregions of the Palaearctic (Fig.
Literature data
In order to extend upon the
The searches were not restricted by language and were traced back to the first description of C. pruni (1763). Each line of the dataset that we make available (see section 'Data resources') corresponds to a reference. For almost all of them, we have retrieved the PDF file of the orignal publication (including old books) which allowed us to verify the information. The corresponding URL is given for each data in the dataset (DOI link or similar link generally giving direct access to the PDF). We systematically tried to specify the locality where the observation was made (see Quality control section). Whenever the information was available, we specified the cryptic species of C. pruni (A or B, according to
GBIF data
A search on the keywords "Cacopsylla pruni" returned 245 occurrences in GBIF.org (14 June 2021). The derived dataset with filtered export of GBIF occurrence data is available at this link: https://doi.org/10.15468/dd.rm55g8. Amongst the 245 occurrences, we were able to extract the names of 45 localities with geographic coordinates. For 87 occurrences, for which only the name of the locality was given, we retrieved the geographic coordinates from Google Earth. The database also provided images of scanned slides from the NHM collection (https://www.gbif.org/fr/occurrence/gallery?taxon_key=2012955) from which we retrieved precise information about the sampling (date, location, host plant, collector) (Fig.
Examples of metadata accessible on the website of the Natural History Museum from links associated with GBIF references (e.g. https://www.gbif.org/occurrence/1265697015).
Sampling data
For more than 20 years, researchers (Gérard Labonne, Gaël Thébaud, Jean Peccoud, Christian Cocquempot and Nicolas Sauvion) or students of INRAE-Montpellier have collected C. pruni individuals. Using a beating tray (80 cm x 80 cm), we collected essentially on Prunus spinosa L. (blackthorn) in spring and the rest of the year on Pinus nigra J.F Arnold (Black Pine), Picea abies (L.) H.Karst. (Common Pitch-fir) and Abies alba Mill. (Common Silver Fir). Other congeneric species where sometimes caught, but C. pruni individuals were easily recognised by the colour of the fore wing, which is dark brown at the apex and brown in the remaining part. Soon after identification, samples were conserved in 96% ethanol until DNA extraction and then genotyped (for species determination) according to the protocol described by
We recorded the GPS coordinates of all collected samples in their wild habitat, geolocalising the bush, hedge or shrub sampled. For the few insects sampled in orchards, we attributed a unique GPS coordinate — corresponding to the centre of each plot — to all the corresponding samples. The name of the locality given in the dataset corresponds to the nearest locality to the sampled point. We sampled mainly France, without restriction to apricot-growing regions and focusing on southern regions where species A and B live in sympatry or in strict allopatry. We also collected samples in Spain, Switzerland and Italy. The addition of these 612 new occurrence data improves the picture of the geographical distribution of the two species, hence it should be valuable for risk assessment, phylogeography or population genetics studies (Fig.
We have a strong expertise in the taxonomy of psyllids (
All the specimens that we collected in the field were first carefully visually examined and then genotyped according to
Wherever possible, geographic coordinates (in WGS-84 coordinate system) refer to specific localities. We used Google Earth to search and reference each locality name found in literature or GBIF, being careful about homonymy and translation of names and possible changes of country names. We consider the precision of these geographical coordinates to be a few kilometres, as authors rarely give very precise coordinates of their sampling points. Conversely, whenever we found geographical coordinates in GBIF, we plotted them on a Google Earth map to identify the closest locality and to check consistency with other information provided (name of the region, country etc.). When no locality name was given, precision may vary from city to province, region or country (e.g. "USSR: South European Part"). In this case, we specified that the “locality is not stated". For data points only specifying countries, we provided the GPS coordinates of the country centres extracted from Google Earth, for lack of a better option. We, therefore, included a column with the estimated precision for each record, stressing that some of these data should be used with caution depending on the level of precision required for analyses. Conversely, GPS coordinates of our own collected samples (see previous section) have an accuracy of a few metres. Each point was first geolocalised with a portable GPS and then checked on Google Maps.
Most field names of the dataset were chosen according to the Darwin Core format (
The database covers the entire known geographic range of the two cryptic species of the psyllid C. pruni, from Morocco to Norway and from Portugal to Mongolia.
We have also extended our search to other countries where either species could potentially be found, in particular countries where different species of Prunus are described in wild or cultivated ecological compartments (e.g. Japan, China, USA, Canada) and where these psyllids could be phytoplasma vectors. Whenever possible, we relied on checklists from recognised taxonomists to ensure the veracity of the information before concluding as "absence" (e.g.
33.815458 and 65.59623333 Latitude; -8.383379 and 112.52588611 Longitude.
The data paper focuses on two cryptic species of Cacopsylla (Thamnopsylla) pruni (Scopoli, 1763), currently referred to as A and B. Species of Cacopsylla pruni show clear genetic differences despite being morphologically and ecologically indistinguishable (
Literature data cover 1763 to 2020.
INRAE data cover 1998 to 2020.
Column label | Column description |
---|---|
catalogNumber | An identifier which assigns a unique code to each of the 1975 records (NS0001 to NS1975). |
phylum | The full scientific name of the phylum in which the taxon is classified. |
class | The full scientific name of the class in which the taxon is classified. |
order | The full scientific name of the order in which the taxon is classified. |
suborder | The full scientific name of the suborder in which the taxon is classified. |
superfamily | The full scientific name of the superfamily in which the taxon is classified. |
family | The full scientific name of the family in which the taxon is classified. |
subfamily | The full scientific name of the subfamily in which the taxon is classified. |
genus | The full scientific name of the genus in which the taxon is classified. |
acceptedNameUsage | The full name, with authorship and date information of the currently valid (zoological) taxon. |
Occurrence | An existence of an Organism (sensu http://rs.tdwg.org/dwc/terms/Organism) at a particular place at a particular time. Here, five modalities: "insufficient data" (i.e. insufficient information to determine presence or absence); "probable absence" (i.e. no presence data yet found in records); "probable presence" (i.e. presence very likely, but not yet confirmed); "confirmed presence". |
speciesA | Information concerning the assignment of the specimens of a population (i.e. caught on the same day in the same locality on the same host plant) to species A of C. pruni. Three modalities: "not genotyped"; "not species A" (i.e. no individual of genotype A was found in the population analysed, but individuals of species B); "species A" (i.e. at least one individual of genotype A found in the population analysed). Genotyping was based on Peccoud et al. (2013). |
speciesB | Information concerning the assignment of the specimens of a population (i.e. caught on the same day in the same locality on the same host plant) to species B of C. pruni. Three modalities: "not genotyped"; "not species B" (i.e. no individual of genotype B was found in the population analysed, but individuals of species A); "species B" (i.e. at least one individual of genotype B found in the population analysed). Genotyping was based on Peccoud et al. (2013). |
country | Names of the countries where the individual(s) attributed to C. pruni have been recorded, according the universally applicable code ISO 3166-2:2013. |
countryCode | Two-letter country codes defined in ISO 3166-1, part of the ISO 3166 standard to represent countries where species have been described. |
locationRemarks | Comments or notes about the location. |
locality | The specific description of the place. The locality is given as accurately as possible (precise address, village, town), but may sometimes be imprecise (e.g. mountain, region) or even absent (NA="locality not stated"). See column "coordinateUncertaintyInMetres" for more details on uncertainty. |
coordinateUncertaintyInMetres | The horizontal distance (in metres) from the given decimalLatitude and decimalLongitude describing the smallest circle containing the whole of the Location. Leave the value empty if the uncertainty is unknown, cannot be estimated or is not applicable (because there are no coordinates). Zero is not a valid value for this term, for example, 30 m = margin of error in the measurement of coordinates using a GPS navigator; 1000 or 10000 m = uncertainty attributed to most locality names in literature, in the absence of more precise information; 50000 m = uncertainty when only the name of the region/province is known. |
decimalLatitude | The geographic latitude (in decimal degrees according to the geodetic coordinate reference system EPSG 4326) of the geographic centre of a location. Positive values are north of the Equator, negative values are south of it. Legal values lie between -90 and 90, inclusive. |
decimalLongitude | The geographic longitude (in decimal degrees according to the geodetic coordinate reference system EPSG 4326) of the geographic centre of a location. Positive values are east of the Greenwich Meridian, negative values are west of it. Legal values lie between -180 and 180, inclusive. |
hostPlantFamily | Six modalities: "Fabaceae"; "Pinaceae"; "Rosaceae"; "Salicaceae"; "unknown" (specimens collected by sweeping or Malaise trap); "unspecified species". Here "host plant" is taken in the broadest sense, i.e. plants on which a psyllid species completes its immature to adult life cycle or shelter plant (plants on which adult psyllids overwinter and on which they may feed) or casual plant (plants on which adult psyllids land, but do not feed). |
hostPlantLatinName | Latin name of the host plant species (i.e. host plant sensu stricto, shelter plant or casual plant) according to the International Code of Nomenclature for algae, fungi and plants (https://www.iaptglobal.org/). For example, Picea abies (L.) H.Karst., Prunus spinosa L. etc. |
hostPlantVernacularName | Vernacular English name of the host plant species. |
sourceCategory | The three different sources of information used to compile the dataset: "GBIF" (i.e. data from the Global Biodiversity Information Facility); "literature" (i.e. any data resulting from a text-mining from different sources - manuscript, book, article etc. - accessible or not on the web); "INRAE" (i.e. data from collections by INRAE Montpellier, not published to date). |
ownerInstitutionCode | The name (or acronym) in use by the institution having ownership of the object(s) or information referred to in the record. |
locationAccordingTo | Information about the source of this Location information. Could be a publication (gazetteer), institution or team of individuals. Here, detailed title of the original reference associated with the locality; "no data" (i.e. no information found for a particular country, for example, Kyrgyzstan, Malta). |
dateIdentified | The date on which the subject was determined as representing the Taxon. Here, year of publication of the reference cited in the "locationAccordingTo" column. |
page | Page where the original information about the locality can be found in the reference cited in the "locationAccordingTo" column. |
eventDate | The date-time or interval during which an Event occurred. For occurrences, this is the date-time when the event was recorded. Here, year(s) or date of sampling or observation in the locality according to the information in the "locationAccordingTo" column.'1996' (some time in the year 1996). '2010-06' (some time in June 2010). '2010-02-12' (some time during 12 February 2010). '2007/2010' (some time during the interval between the beginning of the year 2007 and the end of the year 2010). |
associatedReferences | A list (concatenated and separated) of identifiers (publication, bibliographic reference, global unique identifier, URI) of literature associated with the occurrence. Here, URL by which the original information can be retrieved (downloadable PDF file in open access, link to the publisher of a non-open access reference, direct link to the original GBIF occurrence etc.). |
We are very grateful to Josiane Peyre for her valuable technical assistance for the genotyping of thousands of psyllids and the following individuals for their contributions to the collection of psyllid or plant samples: N. Courtieu & J.-M. Broquaire (SICA Centrex), N. Galabert & J. Delnatte (SICA L’Edelweiss), B. Rouillé (SRPV-PACA), E. Navarro (Terroir de Crau), P. Delon (CA-Gard), E. Falezan (GIE-Tain l’Hermitage), P. Exbrayat (CA-Drôme), M. Léon-Chapoux and V. Delaunay (SEFRA) and G. Devènes (Agroscope). Many students also took part in the collection of the psyllids, for which they are warmly thanked: Ghislaine Sagna, Léa Merlet, Piroska Czibulyás, Elise Découvreur, Clara Bouchet, Clara Sauvion, Zo-Norosoa Andrianjaka-Camps, Florent Décugis and Olivier Lachenaud.
Part of this work benefitted from a postdoctoral grant to NS funded by an INRA-CIRAD SDIPS grant (Speciation and Molecular Diagnosis of Insect Pest Species Complexes). Field and molecular work for this study were supported by several projects during 15 years:
2005-07: ECOGER "Ecologie et adaptation des insectes phytophages en gestion de leurs populations" founded by le Ministère de l'Enseignement Supérieur, de la Recherche et de l'Innovation-France;
2007-08: SEE-ERA.NET, network 'Phytoplasma epidemiology', funded by the 6th EU Framework Programme for Research and Development (contract number ERA-CT-2004-515805)
2009-11: SDIPS "Mechanisms of Speciation & Molecular Diagnosis of Insect Pest Species Complexes" founded by INRA-France;
2010-12: SPEED@ID “Accurate SPEciEs Delimitation and IDentification of Eukaryotic biodiversity using DNA markers”. A project proposed by F-BoL, the French Barcode of Life initative - Genoscope Evry-France;
2010-12: PRIMA PHACIE “Pest risk assessment for the European Community plant health: A comparative approach with case studies”, founded by European Food Safety Authority (EFSA), grant agreement CFP⁄EFSA⁄PLH⁄2009⁄01;
2010-12: Bilateral project PIA BOSPHORUS between TUBITAK-Turkey and le Ministère des Affaires étrangères-France "Role of the vectors (psyllids) in the dissemination of the diseases due to phytoplasma on fruit trees";
2011-13: PHYLOPSYL from the project “Bibliothèque du vivant” (BdV) funded by three French institutions (the CNRS, INRA and MNHN);
2015-2018: E-SPACE project number 1504-004, Improving epidemiosurveillance of Mediterranean and tropical plant diseases, French Agropolis Foundation.
2020-21: This data paper was conceived within the stimulating framework of the KIM RIVE (Key Initiative Montpellier: Infectious Risks and Vectors, https://muse.edu.umontpellier.fr/key-initiatives-muse/rive/), supported by MUSE (Montpellier University of Excellence, https://muse.edu.umontpellier.fr/en/muse-i-site/) and the RIVOC key challenge (https://muse.edu.umontpellier.fr/2021/04/19/appel-a-projets-rivoc/), supported by the Occitanie Region (France).
NS contributed to text mining, sampling and characterisation of the insects, georeferencing, development of the dataset, map making and writing of the paper; DO provided easier access to scattered taxonomic data through his extensive expertise on psyllids, contributed to species validation and writing of the paper; JP contributed to sampling, molecular characterisation of the insects and writing of the paper; CNM contributed to the development of the dataset and writing of the paper.