A global food plant dataset for wild silkmoths and hawkmoths and its use in documenting polyphagy of their caterpillars (Lepidoptera: Bombycoidea: Saturniidae, Sphingidae)

Abstract Background Herbivorous insects represent a major fraction of global biodiversity and the relationships they have established with their food plants range from strict specialists to broad generalists. Our knowledge of these relationships is of primary importance to basic (e.g. the study of insect ecology and evolution) and applied biology (e.g. monitoring of pest or invasive species) and yet remains very fragmentary and understudied. In Lepidoptera, caterpillars of families Saturniidae and Sphingidae are rather well known and considered to have adopted contrasting preferences in their use of food plants. The former are regarded as being rather generalist feeders, whereas the latter are more specialist. New information To assemble and synthesise the vast amount of existing data on food plants of Lepidoptera families Saturniidae and Sphingidae, we combined three major existing databases to produce a dataset collating more than 26,000 records for 1256 species (25% of all species) in 121 (67%) and 167 (81%) genera of Saturniidae and Sphingidae, respectively. This dataset is used here to document the level of polyphagy of each of these genera using summary statistics, as well as the calculation of a polyphagy score derived from the analysis of Phylogenetic Diversity of the food plants used by the species in each genus.


Introduction
Herbivorous insects represent a major fraction of global biodiversity (Fiedler 1998) and are central to studies of numerous and diverse ecological and evolutionary processes, such as resource specialisation (Devictor et al. 2008), co-evolution (Thompson 1988) and food web dynamics (Vidal and Murphy 2017). Elucidating the degree of food plant-insect specificity helps understand community assembly, ecosystem dynamics and latitudinal gradients of species richness (Ødegaard 2006). Moreover, insect-plant interactions are central to the understanding of niche breadth and they play a key role in mediating competition that structures communities and backdrop the human view of entire networks of interacting species (Devictor et al. 2008, Forister et al. 2014. The different levels of specialisation observed in phytophagous insects, from strict specialists to highly-generalist species, are traits that are also considered as possibly important drivers of speciation or adaptive radiation (Janz and Nylin 2008, Jousselin and Elias 2019, Wang et al. 2017).
The Lepidoptera families Saturniidae (wild silkmoths) and Sphingidae (hawkmoths, sphinx moths) are amongst the best-known insect families worldwide, both taxonomically and biologically and they are generally characterised by being large-bodied moths (Janzen 1984a). A recently-published taxonomic checklist (Kitching et al. 2018) revealed a combined species richness of around 5000 species globally. These two families exhibit contrasting life-history strategies both as adults -Sphingidae (feeding, long-lived adults) and Saturniidae (non-feeding, short-lived adults) (Janzen 1984a) -and as caterpillars -Sphingidae (fast growing, many toxic plant specialists) and Saturniidae (slow-growing, many tannin and resin-rich plant specialists) (Janzen 1981, Janzen 1984a). In the Neotropics, sphingid caterpillars seem to specialise on only a relatively small number of plant families, feeding on both young and old, relatively tender leaves that contain low molecular weight toxic compounds, whereas saturniid caterpillars feed on tougher, as well as younger, leaves of an often wide range of plant families that contain high levels of large polymeric molecules (tannins, resins) that interfere with digestion (Janzen 1981). Consequently, sphingid caterpillars digest more nutrients per bite and need less time to reach a given full size than do saturniids (Bernays and Janzen 1988).
A massive amount of data is available on the larval food plants in the wild of the two families, both in literature and in institutional and personal databases. For the Lepidoptera as a whole, the HOSTS database (Robinson et al. 2010) comprises the most comprehensive collation of information about what caterpillars overall are believed to eat. It contains some 180,000 records for about 22,000 Lepidoptera species extracted from 1600 documents (Robinson et al. 2010). Although HOSTS has not been updated for almost a decade, the subset of records for the superfamily Bombycoidea has been independently maintained and added to by IJK and this updated version is used here. Another spectacular effort towards gathering food plant data for Lepidoptera is the inventory of caterpillars in the Area de Conservacion Guanascate (ACG) in north-western Costa Rica Hallwachs 2016, Janzen andHallwachs 2020). It comprises ~ 70,000 records of reared wild-caught larvae of Saturniidae and Sphingidae linked to their DNA barcodes. Besides these two main public data repositories, one of the authors (JH) has built his own personal database for Sphingidae over 20 years, compiling records from literature, web resources, personal field observations and communications from collaborators. In addition, food plant information is also scattered across the published literature, including a few more recent food plant catalogues, such as in Stone (1994), Santin (2004), Meister (2011), but also webpages and personal databases, all of which makes the process of collating and resolving the information very difficult and time consuming.
All three databases cited above are and remain independently maintained and updated. Here we publish a single dataset resulting from their combination. Our aim is to make this massive amount of information available as a single dataset that allows its use for ecological and evolutionary analyses. In particular, we want to investigate the role of food plant use in the evolution of the two families (Arnal et al., in prep.), especially with respect to the degree of polyphagy, defined as the plasticity in the use of different food plants for caterpillars to complete their development. We provide further details about the contents of this dataset in the following sections, as well as a number of caveats to avoid incorrect interpretation and use of these data. In addition to variables summarising the level of polyphagy of the caterpillars of sphingid and saturniid moths, we also provide a polyphagy score, based on a calculation of Phylogenetic Diversity (Faith 1992) of the food plant families used by the species included in the database.

General description Purpose: The food plant dataset
This dataset (Suppl. material 1) is a synthesis of current knowledge regarding the food plants eaten by the caterpillars of two families of Lepidoptera (Saturniidae and Sphingidae). It aims to capture the state of knowledge at the time of assembly of the dataset so that it can be used to investigate the role of food plants use breadth in the spatial and temporal evolution of both families (Arnal et al., in prep.).
This dataset of larval food plant records for sphingids and saturniids worldwide is the result of the integration, with significant data reconciliation and standardisation, of these three largely independent data sources: 1) Information for Sphingidae and Saturniidae embedded in the HOSTS database (Robinson et al. 2010); as further added to and refined by IJK, downloaded on 2 March 2018) (hereafter HOSTS); 2) An inventory of the caterpillars, their food plants and parasitoids of Area de Conservacion Guanacaste (ACG, Janzen DH, downloaded on 16 July 2018 for Saturniidae and 18 July 2018 for Sphingidae) (hereafter DHJ); 3) The personal database of Jean Haxaire (Associate Researcher to MNHN, imported on 17 July 2018) (hereafter JH).
A "record" refers to a unique combination of caterpillar species, plant species and source. Records in the dataset resulting from rearing experiments in captivity or from introduced plant species are listed separately as they often do not represent natural insect-plant associations. Redundancy (duplication) of records amongst the three databases following their combination was not a concern for our research objectives; the dataset should be treated as qualitative and the frequency of records ignored (see list of points in next section).
A total of 25,937 records was compiled from the three databases in a single dataset given as Suppl. material 1. This compilation provides information for 137 genera and 757 species of Saturniidae and 166 genera and 725 species of Sphingidae.
As an example of the uses of this dataset, we report basic polyphagy variables as well as a polyphagy score, based on the Phylogenetic Diversity (PD, Faith 1992) of the food plants used by the caterpillars of saturniid and sphingid moths. Using a recent dated angiosperm phylogeny (Magallón et al. 2015), we measured the PD score, i.e. the total length of all phylogenetic tree branches connecting the different families of plants eaten by a given moth species in natura, using the pd function of the picante R package (Kembel et al. 2010). The species scores were then averaged within each genus to obtain genus scores in Suppl. material 2. Note that gymnosperm records were excluded from our calculations of PD scores to avoid bias caused by the considerable phylogenetic distance between angiosperms and gymnosperms.
The genus-level polyphagy variables and the polyphagy scores of Saturniidae and Sphingidae genera are provided as Suppl. material 2.
Additional information: Calls for caution: 1.
The correctness of food plant identifications in databases and in literature should be treated with considerable caution, as they were largely made by non-botanists; food plant names used are also subject to taxonomic and nomenclatural uncertainty and their correctness and validity may be considered equivocal in some cases.

2.
The previous point also applies to moth names, especially when considering species-level identifications. These may be incorrect or outdated. For example, more than 1500 new species have been described within family Saturniidae in the past decade (Kitching et al. 2018), largely with the support of DNA barcoding analyses. Thus, food plant records may not account for recently split complexes of cryptic species, members of which may have quite different natural histories (e.g. Janzen (2012)).

3.
The food plant dataset is derived from known food plant records at the time of its compilation; as such, it represents a snapshot of the knowledge at that time and it may differ from the data compiled in the original sources and then updated independently (e.g. new records and/or corrections (e.g. identification errors or synonymies of the moth/caterpillar or the plant or both)).

4.
All records are meant to represent actual instances in which caterpillars were found feeding and developing on the food plant. Records in the DHJ database all result from rearing trials of caterpillars found in the field on the food plant in question and, in many cases, identification of the caterpillar was confirmed through DNA barcoding of the resultant adult moths. A few records, recognised as questionable (e.g. inconsistent locality/identification data) in the HOSTS and JH databases, were filtered out and are not included in the present combined dataset. 5.
The food plant dataset does not account for the frequency of use of a given food plant amongst other plants also listed for the same species of moth. The DHJ database, because it is based on individual specimen records, does include quantitative data; however, this information is not incorporated into the combined dataset, although it could bring additional information on local food plant preferences of species and populations. We note that this information would nevertheless be very difficult to analyse and interpret as it is conditional upon the local availability of food plants, as well as possibly seasonal conditions, local variations through time and difficulty of collecting. 6.
The previous point also brings a note of caution in that polyphagy, as calculated here from the data available for a given species, may not be translatable to the population or site level and vice versa. A species may have populations in which some caterpillars have a lower level of polyphagy than others, at least in part because the food plants that could be eaten do not occur in that ecosystem and because many species arrive by ecological fitting rather than in situ evolution (Janzen 1985a). This is especially the case with species following expanding frontier agriculture into new ecosystems or following contemporary climate changes. 7.
Strictly speaking, we define polyphagy as the capacity of a given individual caterpillar to feed and develop (through its complete life cycle) on different food plants. This can only be approximated by considering sibling individuals (as is sometimes the case in the DHJ database), individuals from the same population or, ultimately, from the same species or higher taxonomic categories. We thus acknowledge that the scores of polyphagy at species and genus level should be recognised as human abstractions. 8.
Polyphagy is constrained in situ by the local availability of food plants -an individual caterpillar cannot be polyphagous on species of plants that are not present. 9.
Here we approximated polyphagy scores at species level for saturniid and sphingid moths and we assume that they represent valuable information about the level of plasticity of individuals of the populations of a species to use different food plants. These scores were then used to calculate polyphagy scores at the genus level. Generic level of polyphagy is a human abstraction, but it is seen as relevant information to understand the past diversification dynamics. Plasticity in the use of food plants may have favoured or impeded geographical dispersal and may have mitigated speciation or extinction processes or influenced species' natural histories in many other ways (Janzen 1985b). 10. We acknowledge that the polyphagy level derived from caterpillar plant feeding records approximates, but may not reflect precisely, the plasticity in oviposition site selection by female moths (see, for instance, Janzen 1984b). Indeed, caterpillars may be driven by starvation to feed on a different plant after consuming all leaves of the plant they started to develop on and which had been selected for oviposition by the female.

Geographic coverage
Description: The present dataset combines food plant records for saturniid and sphingid species worldwide. NHM who undertook the original HOSTS project: Phillip Ackery, George Beccaloni, Luis Hernández, Adrian Hine, Sven Loburg, Mike Lowndes and, most of all, the late Gaden Robinson, whose dedication saw the project to completion. He is also extremely grateful to the many people who contributed their own rearing records of Lepidoptera or personal accumulations of data for inclusion in the HOSTS database, particularly Mike Bigger (UK), John W. Brown (USA), Chris Conlan (USA), Rob Ferber (USA), Konrad Fiedler (Germany), Jeremy Holloway (UK), Frank Hsu (USA), Jurie Intachat (Malaysia), Alec McClay (Canada), Bill Palmer (Australia), Pierre Plauzoles (USA) and the generous individuals who contributed rearing records through the WorldWideWeb and who are known to us only as an email address. IJK is particularly grateful to Julian Donahue and the Los Angeles County Museum of Natural History for allowing us to include data into HOSTS on Microlepidoptera from the card catalogue prepared by the late J.A. Comstock and C. Henne and for access to manuscript records by Noel McFarland. Full acknowledgements for the HOSTS database can be found at https://www.nhm.ac.uk/our-science/data/ hostplants/#9.

Author contributions
PA, LBM, IJK and RR designed the study and organised the assembly of the dataset. JH, IJK, WH and DHJ compiled the three databases; LBM carried out their combination and computed summary statistics of polyphagy levels. PA computed the calculation of polyphagy scores.
LBM wrote the frist draft of the manuscript, then all authors contributed to its redaction and to the edition of its final version.