Aquatic eDNA for monitoring French Guiana biodiversity

Abstract Background Environmental DNA [eDNA] metabarcoding has recently emerged as a non-destructive alternative to traditional sampling for characterising species assemblages. New information We here provide a consistent dataset synthetising all eDNA sampling sites in French Guiana to date. Field collections have been initiated in 2014 and have continued until 2019. This dataset is however a work in progress and will be updated after each collecting campaign. We also provide a taxon by site matrix for fishes presence / absence as inferred from eDNA. Our aim is to allow a transparent communication to the stakeholders and provide the foundation for a monitoring programme based on eDNA. The lastest version of the dataset is publicly and freely accessible through the CEBA geoportal (http://vmcebagn-dev.ird.fr) or through the French Guiana geographic portal (https://www.geoguyane.fr).


Introduction
French Guiana is an overseas territory of France located on the north-eastern coast of South America. With ca. 84,000 km (the size of Austria), it represents the largest outermost region of Europe. About 96% of its surface is covered by undisturbed primary rainforest. Due to its location in a tropical humid environment, the territory harbours a very dense hydrographic network. This network is comprised of 112,000 km of water bodies and is divided into 8 drainage basins flowing south-north (Mourguiart and Linares 2013). As opposed to Amazonia sensu stricto, where all the basins are connected to the Amazon, French Guiana basins are all disconnected and independently lead to the Atlantic Ocean. The two largest basins, the Maroni and the Oyapock, are boundaries with Suriname and Brazil, respectively. A total of 20% of the network is represented by rivers (Strahler order > 3) while the remaining 80% correspond to streams less than 10 m large and less than 1 metre deep.
As a European territory, French Guiana must comply with European regulations aiming at developing surveillance programmes on water quality (Directive 2000/60/EC). This directive was translated into French law (n°2004-338) mainly under article R212-22 of the environment code and the "Law on water and aquatic environment" (n°2006-1772). For the territory of French Guiana, several surveillance programmes have been set up for the time periods 2010-2015 and 2016-2021. This has resulted in a characterisation of both reference physico-chemical environments and biological communities, as well as practical tools (e.g. biological indices) to evaluate and monitor water quality. A set of sites have been defined under the "Surveillance Control Network" and the "Operational Control Network" that are monitored on a yearly basis.
However, quantifying the composition of species assemblages in Amazonian aquatic systems remains difficult because species inventories are harmful to the fauna. Indeed, sampling fish in small streams consists in the use of toxicant (rotenone) that kill all the fishes within the stream reach (Allard et al. 2014). In rivers, gill nets are used and cause lethal injuries to the fishes entangled in the nets (Murphy and Willis 1996). Such destructive sampling no longer complies with ethics and European laws. Non-destructive methods, such as diving and electrofishing are not efficient in those streams and rivers due to their low water conductivity and their high turbidity (Allard et al. 2014, Melki 2016. As a consequence, collecting data on entire assemblages is almost impossible using traditional sampling methods, which act as a barrier to scientific advances on ecosystem structure and function and associated applied issues on biodiversity conservation and management. Since 2014, we used a non-destructive alternative to traditional fish sampling by characterising species assemblages using environmental DNA [hereafter eDNA] metabarcoding (Taberlet et al. 2018, Taberlet et al. 2012. eDNA consists of collecting DNA released by organisms directly into the water. Environmental DNA sequences are then compared to reference molecular databases to assign sequences to species. This method has been shown to efficiently characterise fish faunas in temperate rivers ) and has recently been successfully applied in French Guiana , Cantera et al. 2019. We here provide a consistent dataset synthetising all eDNA sampling sites in French Guiana to date. We also provide a taxon by site presence/absence matrix for the fish fauna. Our aim is to allow a transparent communication to the stakeholders and provide the foundation for a monitoring programme based on eDNA.

Study area description: Collecting trips have been conducted in various locations throughout French Guiana.
Design description: This dataset was developed to provide the foundation for a biodiversity monitoring programme based on eDNA but also to better understand the impact of human activities on aquatic biodiversity. Locations were thus selected to maximise the geographic coverage of rivers and streams, taking into account undisturbed sites but also sites under human disturbances (close to villages, close to gold mining sites etc.).
Funding: Data for this resource have been obtained with support from Labex CEBA (Center for the Study of Biodiversity in Amazonia), Labex DRIIHM (Dispositif de Recherche Interdisciplinaire sur les Interactions Hommes-Milieux) and Labex TULIP (Towards a Unified theory of biotic interactions: role of environmental perturbations). Labex (Laboratoires d'Excellence) are funded by "Investissement d'Avenir" grants managed by the French National Research Agency (ANR) under references ANR-10-LABX-25-CEBA, ANR-11-LABX-0010-DRIIHM and ANR-10-LABX-0041-TULIP. Additional financial support was also obtained from the DEAL Guyane, Office de l'Eau Guyane (Aquatic Metabarcoding project) and through the ANR DEBIT project (ANR-17-CE02-0007-01). SPYGEN, a private company specialised in eDNA, as well as VigiLife, a non-governmental agency, provided financial and laboratory support. Logistic support was also provided by the Parc Amazonien de Guyane and Hydreco Laboratory (Kourou, Guyane). Quality control: The operator always remained downstream from the filtration area and stayed on the bank (for small streams) or on emergent rocks (for larger streams and rivers). For sites located along the same river course, we sampled downstream to upstream to avoid contamination by eDNA transported by the boat (for rivers) or clothes. Geographical coordinates were obtained using a GPSmap 64S device (Garmin) or similar. Such devices report coordinates accuracy using the CEP50 (Circular Error Probability), meaning that there is only 50% probability that a reported position would be within a distance of X metres to the real position. Considering other sources of GPS errors (such as ionosphere delay and signal multi-path), we estimate the accuracy of the coordinates to be around 30 metres at a 95% confidence level under dense forest cover.

Sampling methods
Step description: At each site, we placed the input part of the tubing in a high-flow part of the watercourse. Sampling was achieved in rapid hydromorphological units to ensure an optimal homogenisation of the water throughout the water column. Water was pumped ca. 20 cm below the surface and each filtration lasted 30 min (except for a few sites where filtration time was 15 minutes). Each sample results from the filtration of ~34 l of water (~17 litres when filtration time was 15 minutes). At the end of the filtration, we emptied the filtration capsule of water, filled it with 150 ml of preservation buffer (Tris-HCl 0.1 M, EDTA 0.1 M, NaCl 0.01 M and N-lauroyl sarcosine 1%, pH 7.5-8) and stored it in the dark in individual sterile plastic bags. Samples were then stored at room temperature before DNA extraction. Preliminary tests demonstrated that the preservation buffer was suitable for room temperature storage up to a month. Information on DNA extraction, amplification and sequencing, as well as subsequent bioinformatic pipelines, can be found in  and Cantera et al. (2019).
Site scale variables were measured directly in the field at the sampling location. Width was measured using a decameter for small streams (less than 15 metres width and 1 metre depth) and using an electronic telemeter (Bushnell Sport 850) for larger rivers. Water depth was measured using a graduated stick in small streams and a depth sounder (Plastimo echotest II) in larger rivers. Turbidity was measured using a Eutech Instrument Turbimeter (TN-100). Temperature, O saturation, O and pH were measured using a WTW 3420 field multimeter. Geographical coordinates were obtained using a GPSmap 64S device (Garmin) or similar. Elevation was derived for the geographic coordinates using the SRTM30 dataset.

Geographic coverage
Description: The sampling area is delimited by the current administrative boundaries of the French Guiana territory. To the East, the Oyapock river delimits the frontier with Brazil. To the West, the Maroni river delimits the frontier with Suriname. This is an important detail as the delimitation of the territory has not been constant throughout history and a large portion of northern Brazil was disputed between France and Brazil during the 19th century. Even though French Guiana is an overseas territory of France, all occurrences are considered as belonging to the French Guiana "country" to comply with the ISO 3166-1 standard.

Taxonomic coverage
Description: The dataset provides information on eDNA sampling sites and fishes presence/absence as inferred from metabarcoding analyses . DNA extracted from the sampling cartridge could, in theory, be used for amplifying any taxonomic group, depending on the downstream molecular biology protocols. Local metabarcoding reference databases for French Guiana biodiversity are currently available for mammals (Kocher et al. 2017b, Kocher et al. 2017a) and insects (Talaga et al. 2017, Kocher et al. 2016), but additional databases are under active development for other groups as well. Data format: ESRI Shapefile (a spreadsheet in "tab separated value" format is also provided for compatibility).
Description: This dataset provides detailed information on sampling sites and sampling events. The latest version of the dataset is available on the CEBA geoportal ( http://vmcebagn-dev.ird.fr) under reference 5617a9ff-d0aa-48a9-b2c2-cb7fd5b92692.

Site code
A unique identifier of the site that could be used for downstream analyses (optional).

Site name
The name of the sampling location.
Site description The original textual description of the site.

Latitude
The geographic Latitude (in decimal degrees, WGS84) of the sampling point.

Longitude
The geographic Longitude (in decimal degrees, WGS84) of the sampling point.
Elevation Altitude in metres above sea level inferred from the geographic coordinates and the SRTM30 dataset.

Watercourse class
The watercourse class infered a posterio based on the BD Carthage dataset.

Event date
The date of the sampling event.

Disturbance
Level of disturbance at the site (either Reference for undisturbed site, gold mining, ancient gold mining, agriculture and/or urbanisation). Estimated a priori.
Depth Water depth in metres (measured at the sampling site).

Width
Watercourse width (in metres) measured at the sampling site.

Conductivity
Water conductivity (in micro Siemens) measured at the sampling site using a WTW 3420 field Data format: Spreadsheet in "tab separated value" Description: This dataset provides a taxon by site matrix, made after sequences assignment to the reference database . For taxa described at the genus level or higher, the number of included species is indicated within parentheses. The latest version of the dataset is available on the CEBA geoportal (http://vmcebagndev.ird.fr) under reference 5617a9ff-d0aa-48a9-b2c2-cb7fd5b92692.