Biodiversity Data Journal :
Data Paper (Biosciences)
|
Corresponding author: Jonas Mortelmans (jonas.mortelmans@vliz.be)
Academic editor: Yasen Mutafchiev
Received: 04 Aug 2020 | Accepted: 28 Oct 2020 | Published: 16 Dec 2020
© 2020 Luz Amadei Martínez, Jonas Mortelmans, Nick Dillen, Elisabeth Debusschere, Klaas Deneudt
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Amadei Martínez L, Mortelmans J, Dillen N, Debusschere E, Deneudt K (2020) LifeWatch observatory data: phytoplankton observations in the Belgian Part of the North Sea. Biodiversity Data Journal 8: e57236. https://doi.org/10.3897/BDJ.8.e57236
|
|
This paper describes a phytoplankton data series generated through systematic observations in the Belgian Part of the North Sea (BPNS). Phytoplankton samples were collected during multidisciplinary sampling campaigns, visiting nine nearshore stations with monthly frequency and an additional eight offshore stations on a seasonal basis.
The data series contain taxon-specific phytoplankton densities determined by analysis with the Flow Cytometer And Microscope (FlowCAM®) and associated image-based classification. The classification is performed by two separate semi-automated classification systems, followed by manual validation by taxonomic experts. To date, 637,819 biological particles have been collected and identified, yielding a large dataset of validated phytoplankton images. The collection and processing of the 2017–2018 dataset are described, along with its data curation, quality control and data storage. In addition, the classification of images using image classification algorithms, based on convolutional neural networks (CNN) from 2019 onwards, is also described. Data are published in a standardised format together with environmental parameters, accompanied by extensive metadata descriptions and finally labelled with digital identifiers for traceability. The data are published under a CC‐BY 4.0 licence, allowing the use of the data under the condition of providing the reference to the source.
phytoplankton, Belgium, marine, LifeWatch Belgium, FlowCAM, image recognition
Phytoplankton contributes to almost half of the Earth’s total primary production (
The availability of long-term phytoplankton observational data for the Belgian Part of the North Sea (BPNS) is limited. In the last decades, several studies have described the Belgian phytoplankton community structure (
In general, long-term time series of phytoplankton are hard to come by (
In response to the identified data gap for the BPNS and taking into account the availability of the newest imaging technology, a long-term phytoplankton observation effort was initiated as part of the Flemish contribution to LifeWatch. Multidisciplinary sampling campaigns are organised in the BPNS on a regular basis, collecting phytoplankton samples that are processed with a digital imaging flow cytometer (FlowCAM). The procedures put in place for automated processing and manual validation manifest a durable approach for the generation of a long-term high-quality phytoplankton time series.
LifeWatch observatory data: phytoplankton observations by imaging flow cytometry (FlowCam) in the Belgian Part of the North Sea
Deneudt K.; Mortelmans J.; Muyle J.; Debusschere E.; Dillen N.; Amadei Martínez L.
The BPNS is located in the Southern Bight of the North Sea. It is characterised by shallow waters (< 40 m) and strong semi-diurnal tidal currents resulting in a vertically homogeneous water column (
Stations are visited in the course of one to three-day sampling cruises with the RV Simon Stevin on a monthly or seasonal frequency. Sampling activities onboard are registered in the Marine Information Data Acquisition System (MIDAS). Through MIDAS, scientists can record the metadata of their scientific actions (e.g. time, coordinates, action type, start and stop of the action, station, status of deployment and notes). MIDAS also registers the navigation (heading, current time, latitude, longitude, speed, course over ground, navigation depth and draught), together with meteorological (air temperature, relative humidity, wind direction and speed) and oceanographic data (sea surface water temperature, salinity, chlorophyll-a and sound velocity). This information is synchronised with the VLIZ ICT network every 24 hours and is made available online through the VLIZ website.
A spatial grid of 17 stations, spread over the BPNS, is being sampled since May 2017 (Fig.
Study sites on the Belgian Part of the North Sea (BPNS). Nine stations onshore (black points), visited monthly: 120 (
Surface water samples are collected in every station, fixed with acid Lugol (5%) and stored in cold (4°C) and dark conditions. Once in the lab, samples are processed with the FlowCAM using the 300-µm deep flow cell with the 4X objective, capturing the particles with an Equivalent Spherical Diameter (ESD) between 70 and 300 µm in 2017 and 55-300 µm from 2018 onwards. In 2017 and 2018, using the autoclassification tool of VisualSpreadsheet, the images collected were assigned to a taxon and further, a taxonomist validated the automatic classification. From 2019 onwards, the classification of images is performed using image classification algorithms, based on convolutional neural networks (CNN), using as training set the validated images from 2017 and 2018.
The output of both classification processes are manually validated by an experienced taxonomist to remove the errors of the automatic prediction. In this step, the taxonomist checks that all the imaged particles have been assigned to the correct category by the automatic classification, if not, the particles are manually changed to the right category. The taxonomist evaluates 2 times all the particles to correct the possible misclassifications. The species identification is done with the help of
Sampling at sea
The phytoplankton samples are collected with a stainless steel bucket. In total, either 50 or 70 litres of surface water are hauled up onboard and poured into an Apstein net (1.2 m long, 55 µm mesh size and 50 cm diameter). The volume of water collected is documented in MIDAS. The sample is concentrated in a plastic jar at the cod-end of the net, where the sample and rinsing water escapes through a 55 µm mesh window. Immediately afterwards, the sample is preserved in acid Lugol’s solution at a 5% final concentration and stored onboard in dark conditions at 4°C. At the end of the sampling campaign, the samples are transported and stored in the Marine Station Ostende (MSO) at 4°C until processing. The remaining sample material after processing is available to researchers for re-use.
FlowCAM processing
Within three months after collection, the samples are processed using the FlowCAM VS-4 (Fluid Imaging Technologies, Yarmouth, Maine, U.S.A.) and the software VisualSpreadsheet® Version 4.2.52. FlowCAM combines the technologies of flow cytometry, microscopy and image analysis (
For this dataset, the 300-µm deep flow cell with the 4X objective and the 5 ml syringe pump are used. This combination maximises the taxonomic resolution for the size range of interest without compromising the running time. Sample preservation with Lugol negates the ability to discriminate cells from detritus through the detection of chlorophyll (
Attachment of diatoms with spines to the flow cell wall (e.g. Chaetoceros Ehrenberg) and aggregation of chain-forming diatoms (e.g. Bellerochea) often interfere with the sample processing. To minimise clogging and to increase the durability of the flow cell, each sample is pre-filtered in a 300-µm mesh-size net (
To convert from cell counts in the FlowCAM to phytoplankton Abundance (cell l-1), we used the following formula:
\(Abundance (cells/L) = {count \over Vol. imaged (mL) \quad*\quad Dilution factor\quad *\quad Vol. filtered (L) \quad* \quad Vol.sample (mL)}\)
were Abundance is defined as the number of cells in a litre of the unfiltered water sample, Vol. imaged is the volume in the field of view of each sample, Vol. filtered is the volume poured into the Apstein net and Vol. sample is the remaining sample after the filtration in the Apstein net.
Semi-automatic classification with VisualSpreadsheet (2017-2018)
A reference library with phytoplankton images for the Southern Bight of the North Sea is created using the autoclassification tool of VisualSpreadsheet and the manual validation. Following software recommendations, the reference library consists of various categories, each containing 10 - 20 images (regions of interest; ROIs) for each category and covers a species or higher taxon group in case identification at species level is not possible. This is called "class" in the VisualSpreadsheet and, based on those images per library, filters are defined. A category can contain several filters to represent different orientations or developmental stages of the same taxon (e.g. Chaetoceros in valve view or girdle view). The combination of categories with its filters are stored as a learning set that is used to run an Auto Classification and assign the sample particles to different categories and taxon groups. In addition, separate library categories are also created for non-phytoplanktonic particles (e.g. crustacea, eggs, detritus…). Due to the large diversity of taxa in the samples and the variation in species composition over the year, the combination of used categories in the learning set needs to be adapted regularly. Only the categories of the taxa expected to be present are used. Categories with its filters are applied following the order of the most abundant taxa to least abundant. The obtained classification is validated manually by taxonomic experts.
Semi-automatic classification with CNNs (2019 - current)
Since 2019, the classification of our FlowCAM images is facilitated by using deep learning classifiers, more specifically CNNs. One of the prerequisites for allowing the use of deep learning classifiers is the availability of a large training dataset. Once our validated FlowCAM dataset (2017-2018) was sufficiently large, it became possible to shift towards CNNs for class prediction of the images. The main benefit of using CNNs is the increased classification accuracy, reducing the time spent by trained taxonomists to validate the data afterwards. Consequently, this also allows the data to be released to the public sooner.
The current iteration of the CNN in use is the one provided and trained by Instituto de Física de Cantabria (IFCA, Spain) (
Moving towards a new classification methodology also offers opportunities to further automate and standardise our FlowCAM data processing pipeline. In the new setup, raw output files from the FlowCAM are directly processed by a set of python scripts. The typical “FlowCAM-collages” are cropped into separate ROIs, a clean data table describing all ROIs is generated and additional sample processing metadata is incorporated into the output directory. This avoids the use of VisualSpreadsheet, allowing more and easy control over the data, as well as enabling automation of the dataflow. The generated files are uploaded to a MongoDB server where they are classified by the CNN.
Data were collected in 17 stations over the BPNS (Fig.
51°5'21.5"N and 51°52'34"N Latitude; 3°22'13.4"E and 2°14'8"E Longitude.
The dataset is composed of 55 categories identified at species level or higher taxon group if the identification at species level is not possible. Bacillariophyceae (33 taxa) and Dinophyceae (7 taxa) are the most abundant phytoplankton classes in the dataset, the rest of the dataset being formed by non-phytoplanktonic categories (15).
The validated dataset shows that, from May 2017 to December 2018, diatoms (Bacillariophyceae) (310,132 ROIs) such as Rhizosolenia (117183 ROIs), Guinardia flaccida (32,486 ROIs), Pseudo-nitzschia (28,285 ROIs) and Ditylum brightwellii (24,989 ROIs) are the most abundant taxa in the sampling period. In the case of dinoflagellates (Dinophyceae) (6,044 ROIs), Tripos fusus (4,616 ROIs) is the most abundant species (Fig.
Rank | Scientific Name |
---|---|
class | Appendicularia |
species | Corethron criophilum Castracane, 1886 |
genus | Licmophora C.A. Agardh, 1827 |
genus | Diploneis (C. G. Ehrenberg) P.T. Cleve, 1894 |
species | Plagiogramma vanheurckii Grunow, 1881 |
species | Triceratium alternans f. alternans J.W. Bailey, 1851 |
genus | Leptocylindrus P.T. Cleve in C.G.J. Petersen, 1889 |
species | Triceratium favus Ehrenberg, 1839 |
genus | Plagiogramma / Bellerochea |
species | Plagiogramma brockmanni var. brockmanni Hustedt, 1939 |
species | Lithodesmium undulatum Ehrenberg, 1839 |
species | Rhizosolenia robusta var. robusta Norman ex Ralfs in Pritchard, 1861 |
species | Navicula membranacea Cleve, 1897 |
genus | Skeletonema R.K. Greville, 1865 |
genus | Proboscia B.G. Sundstrom, 1986 |
genus | Asterionella A.H. Hassall, 1850 |
genus | Bacteriastrum G. Shadbolt, 1854 |
species | Rhizosolenia delicatula Cleve, 1900 |
genus | Paralia P.A.C. Heiberg, 1863 |
species | Bellerochea horologicalis Stosch, 1980 |
species | Vibrio paxillifer O.F.Müller, 1786 |
species | Stephanopyxis turris (Greville) Ralfs, 1861 |
species | Helicotheca tamesis (Shrubsole) M.Ricard, 1987 |
genus | Synedra / Thalassionema |
genus | Eucampia C.G. Ehrenberg, 1839 |
species | Eucampia striata Stolterfoth, 1879 |
species | Lauderia annulata Cleve, 1873 |
genus | Chaetoceros C.G. Ehrenberg, 1844 |
order | Eupodiscales / Biddulphiales / Triceratiales |
species | Ditylum brightwellii (T.West) Grunow, 1885 |
genus | Pseudo-nitzschia H. Peragallo in H. Peragallo & M. Peragallo, 1900 |
species | Rhizosolenia flaccida Castracane, 1886 |
genus | Rhizosolenia T. Brightwell, 1858 |
genus | Acineta Ehrenberg, 1834 |
species | Favella ehrenbergii (Claparède & Lachmann, 1858) Jörgensen, 1924 |
subphylum | Crustacea |
genus | Pyrocystis J.Murray ex Haeckel, 1890 |
species | Tripos fusus (Ehrenberg) F.Gómez, 2013 |
species | Tripos lineatus (Ehrenberg) F.Gómez, 2013 |
genus | Tripos Bory de Saint-Vincent, 1823 |
class | Dinophyceae |
genus | Noctiluca Suriray, 1836 |
phylum | Foraminifera |
phylum | Cnidaria |
phylum | Echinodermata |
class | Polychaeta |
See Fig.
The dataset is licensed under a Creative Commons CC-BY4.0 licence, allowing the use of the data under the condition of providing the reference to the original source. When using the data in publications, acknowledgement of LifeWatch is required. This can be done by adding the reference to the used dataset version; for example, the used “Flanders Marine Institute (VLIZ), Belgium (2020): LifeWatch observatory data: phytoplankton observations by imaging flow cytometry (FlowCAM) in the Belgian Part of the North Sea. https://doi.org/10.14284/424 and by referring to the current data paper.
Column label | Column description |
---|---|
id | An identifier for the set of information associated with an Event (something that occurs at a place and time). May be a global unique identifier or an identifier specific to the dataset. |
type | The nature or genre of the resource. |
modified | The most recent date-time on which the resource was changed. |
language | The language of the resource. |
rightsHolder | A person or organisation owning or managing rights over the resource. |
accessRights | Information about who can access the resource or an indication of its security status. Access Rights may include information regarding access or restrictions based on privacy, security, or other policies. |
datasetName | The name identifying the dataset from which the record was derived. |
ownerInstitutionCode | The name (or acronym) in use by the institution having ownership of the object(s) or information referred to in the record. |
eventID | An identifier for the set of information associated with an Event (something that occurs at a place and time). May be a global unique identifier or an identifier specific to the dataset. |
parentEventID | An identifier for the broader Event that groups this and potentially other Events. |
samplingProtocol | The method or protocol used during an Event. |
eventDate | The date-time or interval during which an Event occurred. For occurrences, this is the date-time when the event was recorded. Not suitable for a time in a geological context. |
locationID | An identifier for the set of location information (data associated with dcterms:Location). May be a global unique identifier or an identifier specific to the dataset. |
waterBody | The name of the water body in which the Location occurs. |
country | The name of the country or major administrative unit in which the Location occurs. |
countryCode | The standard code for the country in which the Location occurs. |
minimumDepthInMeters | The lesser depth of a range of depth below the local surface, in metres. |
maximumDepthInMeters | The greater depth of a range of depth below the local surface, in metres. |
decimalLatitude | The geographic latitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic centre of a Location. Positive values are north of the Equator, negative values are south of it. Legal values lie between -90 and 90, inclusive. |
decimalLongitude | The geographic longitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic centre of a Location. Positive values are east of the Greenwich Meridian, negative values are west of it. Legal values lie between -180 and 180, inclusive. |
Column label | Column description |
---|---|
id | An identifier for the MeasurementOrFact (information pertaining to measurements, facts, characteristics or assertions). May be a global unique identifier or an identifier specific to the dataset. |
occurrenceID | An identifier for the Occurrence (as opposed to a particular digital record of the occurrence). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the occurrenceID globally unique. |
measurementType | The nature of the measurement, fact, characteristic or assertion. |
measurementTypeID | An identifier for the nature of the measurement, fact, characteristic or assertion. |
measurementValue | The value of the measurement, fact, characteristic or assertion. |
measurementValueID | An identifier for the value of the measurement, fact, characteristic or assertion. |
measurementUnit | The units associated with the measurementValue. |
measurementUnitID | An identifier for the units associated with the measurementValue. |
measurementDeterminedBy | A list (concatenated and separated) of names of people, groups or organisations who determined the value of the MeasurementOrFact. |
measurementMethod | A description of or reference to (publication, URI) the method or protocol used to determine the measurement, fact, characteristic or assertion. |
measurementRemarks | Comments or notes accompanying the MeasurementOrFact. |
Column label | Column description |
---|---|
id | An identifier for the Occurrence (as opposed to a particular digital record of the occurrence). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the occurrenceID globally unique. |
modified | The most recent date-time on which the resource was changed. |
basisOfRecord | The specific nature of the data record. |
occurrenceID | An identifier for the Occurrence (as opposed to a particular digital record of the occurrence). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the occurrenceID globally unique. |
occurrenceStatus | A statement about the presence or absence of a Taxon at a Location. |
eventID | An identifier for the set of information associated with an Event (something that occurs at a place and time). May be a global unique identifier or an identifier specific to the dataset. |
scientificNameID | An identifier for the nomenclatural (not taxonomic) details of a scientific name. |
scientificName | The full scientific name, with authorship and date information, if known. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term. |
Data are made available through the LifeWatch data explorer (
Monitoring of phytoplankton via the FlowCAM is part of a long term ESFRI initiative. Regular updates of the validated data are accessible on the LifeWatch data explorer and a yearly dataset is published on MDA. Valorisation of this data is ongoing in the framework of MSFD and in light of the blue economy supporting research, for example, fouling management, nature-based solutions, aquaculture etc. and is part of an artificial intelligence application study.
Funding for the data collection and management is provided by the Research Foundation - Flanders (FWO) in the framework of the Flemish contribution to LifeWatch, which is a landmark European Research Infrastructures on the European Strategy Forum on Research (ESFRI) roadmap. Scientists and RV Simon Stevin crew joining the LifeWatch sampling campaigns are acknowledged for their practical support. The authors thank the Flemish Ministry of Mobility and Public Works (VLOOT) for operating the RV Simon Stevin and facilitating the surveys. We thank the reviewer for the very helpful comments.