Biodiversity Data Journal :
Software Description
|
Corresponding author:
Academic editor: Dmitry Schigel
Received: 28 Dec 2015 | Accepted: 22 Mar 2016 | Published: 23 Mar 2016
© 2016 Sonja Leidenberger, Martin Käck, Björn Karlsson, Oskar Kindvall
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Leidenberger S, Käck M, Karlsson B, Kindvall O (2016) The Analysis Portal and the Swedish LifeWatch e-infrastructure for biodiversity research. Biodiversity Data Journal 4: e7644. https://doi.org/10.3897/BDJ.4.e7644
|
During the last years, more and more online portals were generated and are now available for ecologists to run advanced models with extensive data sets. Some examples are the Biodiversity Virtual e-Laboratory (BioVel) Portal (https://portal.biovel.eu) for ecological niche modelling and the Mobyle SNAP Workbench (https://snap.hpc.ncsu.edu) for evolutionary and population genetics analysis. Such portals have the main goal to facilitate the run of advanced models, through access to large-capacity computers or servers. In this study, we present the Analysis Portal (www.analysisportal.se), which is a part of the Swedish LifeWatch e-infrastructure for biodiversity research that combines a variety of Swedish web services to perform different kinds of dataprocessing.
For the first time, the Swedish Analysis Portal for integrated analysis of species occurrence data is described in detail. It was launched in 2013 and today, over 60 Million Swedish species observation records can be assessed, visualized and analyzed via the portal. Datasets can be assembled using sophisticated filtering tools, and combined with environmental and climatic data from a wide range of providers. Different validation tools, for example the official Swedish taxon concept database Dyntaxa, ensure high data quality. Results can be downloaded in different formats as maps, tables, diagrams and reports.
Analysis Portal, biodiversity informatics, e-infrastructure, e-sciences, Swedish LifeWatch, web services
Biodiversity informatics is a young discipline. The term was probably firstly used in 1996 by J.L Edwards in an unpublished document by the OECD Working Group on Biological Informatics – 1st meeting, notes on the draft agenda, Washington (see www.bgbm.org/BioDivInf/TheTermn.htm). It encompasses information techniques for all kinds of biodiversity information and a framework for data sharing to address questions in the field of conservation management and ecosystem services.
The LifeWatch concept was begun to build an e-Science European infrastructure for biodiversity and ecosystem research in January 2011, where representatives from Hungary, Italy, the Netherlands, Romania and Spain signed the first Memorandum of Understanding. The idea was to construct a European research e-infrastructure project for biodiversity science and ecosystems research (www.lifewatch.eu). The LifeWatch architecture is described in detail elsewhere (
The Swedish LifeWatch consortium was founded in 2010 and is a national collaboration between four universities (the Swedish University of Agricultural Sciences (SLU), the University of Gothenburg (GU), Lund University (LU) and Umeå University (UMU)), the Swedish Meteorological and Hydrological Institute (SMHI) and the Swedish Museum of Natural History (SMNH), including the Swedish GBIF node. The project is coordinated by the Swedish Species Information Centre (ArtDatabanken) at SLU and financed by the Swedish Research Council (VR) and the Swedish Environmental Protection Agency.
The principal aim was to create a national web service oriented e-infrastructure for biodiversity data covering terrestrial, freshwater and marine habitats, and where all components are accessible through open web services (
This article describes the Analysis Portal (version 1.0.5821.25115, release date: 09/12/2015), which is a part of the Swedish national e-infrastructure of biodiversity data. In this software description, we refer to the current version named above. We describe the structure of this web application and the functionality of its different components. We also give a detailed presentation of all the web services the Analysis Portal is built upon (Supplementary materials 1-3).
Swedish LifeWatch Analysis Portal - a national e-infrastructure for biodiversity research
The Analysis Portal (AP) (Fig.
Data providers
Numerous biodiversity occurrence databases are connected to the Analysis Portal via the Taxon Observation Service. The AP offers an entry node to all of these databases, currently providing access to around 60 million species observation records (Fig.
The most comprehensive database on Swedish biodiversity data is the Swedish Species Observation System (http://www.artportalen.se) financed by the Swedish Environmental Protection Agency. Since 2000 it has been an internet-based, freely accessible reporting system and data repository for geo-referenced species observations of the major organisms, i.e. animals, plants and fungi. Today it provides more than 50,000,000 observations of 28,660 different species (
The Swedish Species Observation Centre also collaborates on the observation database of red-listed species with around 1 million occurrence data. The database contains only protected species and permission is required to have access to it. This is handled by the UserAdministration (see below: Data Management and Cleaning) where the rapporteur (http://www.artportalen.se/Home/TermsOfUse) and the user have to follow a certain policy formulated by the Swedish Species Information Centre (http://www.artdatabanken.se/verksamhet-och-uppdrag/arter-kunskapsinsamling/fynd-av-arter/skyddsklassade-arter/). In future, this data set will be integrated into the Swedish Species Observation System. A similar integration is planned for the freshwater fish database PIKE (Umeå University; http://www.emg.umu.se/english/research/research-projects/pike/?languageId=1#).
Another extensive database that is available via the AP is the national register of survey test-fishing (NORS, http://www.slu.se/en/departments/aquatic-resources/databases/national-register-of-survey-test-fishing-nors/) with more than 2,000,000 records on caught fish dating back to the 1950s. The standardized test-fishing with Nordic multi-mesh gillnets (12 mesh sizes) became a European standard (EN 14757) in 2005 and gives a good overview of species occurrence and abundance of fish fauna in lakes. For rivers, the Swedish Electrofishing RegiSter (SERS, http://www.slu.se/en/faculties/nl/about-the-faculty/departments/department-of-aquatic-resources/databases/database-for-testfishing-in-streams/) exists in Sweden. The database was begun in 1989 and most electric fishing is ordered by the Swedish Environmental Protection Agency and Swedish Agency for Marine and Water Management, that also finance part of data hosting. Both fish databases provide fairly good predictions of absence data. The coastal fish database (KUL, http://www.slu.se/en/departments/aquatic-resources/databases/database-for-coastal-fish-kul/) is currently in line to be linked to the AP. The data consist of fish catches with net and fyke net in Swedish coastal waters. The Department of Aquatic Resources at the Swedish University of Agricultural Sciences and the Swedish Agency for Marine and Water Management host the data.
Marine environmental monitoring data are represented by the Swedish Ocean Archive, the so-called SHARK database (www.sharkdata.se), with around 70,000 species occurrence data of zoo- and phytoplankton recorded by the Swedish Meteorological and Hydrological Institute (SMHI). Species observations on freshwater species are represented by the environmental data web service MVM (http://miljodata.slu.se/mvm/) linked with nearly 500,000 species occurrence data to the AP. The Department of Aquatic Sciences and Assessment at the Swedish University of Agricultural Sciences is the data host for inland waters and the MVM database containing data on national and regional monitoring of phytoplankton, benthic diatoms, macrophytes, zooplankton and benthic fauna.
Historical data sets are also linked to the AP through DINA (Digital Information system for Natural history collections; www.dina-web.net/naturarv/), a Swedish system for digitalization of natural history collections in Stockholm (Swedish Museum of Natural History, SMNH) and Gothenburg (Göteborg Natural History Museum, GNHH). At the moment, more than 20,000 objects with coordinates are linked to the AP, but digitalization will increase substantially in the near future.
By using the GBIF web service (www.gbif.org), several databases of different Swedish museums (e.g. the herbariums of the Oskarshamn Botanical Museum and Umeå University) are available with all geo-referenced records in the AP. Furthermore, the Swedish database on harbour porpoises (SMNH) (> 2,000) is linked to the AP. The Swedish bird ring centre´s (SMNH) species occurrence records, numbering approx. 5 million, has recently be linked.
Wireless Remote Animal Monitoring (WRAM) is a national e-infrastructure for automatic reception, long-time storage, sharing and analysing biotelemetry sensor data from animals like mammals, birds and fish (
The long-time goal is to connect all national and regional Swedish databases to the AP via the Artportalen or directly to the Swedish Species Observation Service to make them accessible to researchers, policy-makers and citizen scientists through a single entry point. The number of occurrence reports in most of the databases increases substantially every year. Additionally, numerous Swedish databases at different host institutes are not connected today, for example national databases on seals, butterflies, hedgehog, moose and other game animals, as well as databases on invasive species, mussels, freshwater jellyfish and crayfish or the tree portal.
Data Management and Cleaning
The biodiversity data are structured in very different ways in the various databases that are connected to the Swedish LifeWatch e-infrastructure. In order to make all the data searchable at one single focal point, the data have to be restructured. Many of the data sources are sample-based and only contain partial information on taxon-specific occurrences. Darwin Terms (TDWG, http://rs.tdwg.org/dwc/terms/index.htm) provide a practical format suitable for sharing data on species observations and it is the main format adopted by the GBIF. For the Swedish LifeWatch e-infrastructure, a data model was constructed that could handle an infinite number of classes and data properties, including the set defined in Darwin Core (
The process of restructuring the data to fit Darwin Core and to enable a uniform search mechanism is supported by a set of web services. Each provider has published a web service where data can be retrieved according to the Darwin Core format. Regarding the taxonomic information only taxon id and the verbatim form of the scientific name, if available, is retrieved from the providers. Based on the taxon ids, the rest of the taxon-related information is retrieved directly from the Taxon Service, i.e. the SOAP-based web service behind the web application Dyntaxa (www.Dyntaxa.se). This solution ensures that the taxon-related search mechanisms provided by the Swedish LifeWatch focal point services are always based on taxon hierarchies and naming that is up to date irrespective of the status of local providers. As Dyntaxa is used when administrating changes of taxon concepts, the Taxon Service can provide the necessary information to resolve effects of lumps and splits in a correct manner.
A specific service named the Species Observation Harvest Service has been constructed in order to perform the harvesting and data processing necessary for compiling a unified dataset fulfilling all the requirements set by Swedish LifeWatch based on what is retrieved from each provider. The Harvest Service has methods to start and cancel harvests of single providers. To connect a new data provider service, one only needs to specify a mapping protocol that determines which field(s) in the Darwin Core set should correspond to the properties in the provider´s interface. Sometimes, data from the provider needs processing and sometimes default values are set for a single provider for certain fields.
The Harvest Service cleans the data in various ways by checking the content of each data field. If a data field contains values that do not correspond to the rules specified, the record is discarded and logged as erroneous. The error logs is a useful source of information in the process of increasing the quality of the harvesting mechanisms (Fig.
Overview of the web services used in the AP. The core web services (SOAP) are shown in blue boxes. Two of them handle species observations harvested from a number of data provider services (white boxes). Environmental data, maps and metadata of different kinds are connected to OGC services (WFS, WMS, CSW).
The user administration provides the user options for personalized settings and allows certain search enquiries to be saved and reused. The login is not necessary. The same user administration system is used for the Swedish Species Observation System (Artportalen), Dyntaxa and the AP. For each user, over 90% of observations are publicly accessible; exceptions exist only for a few sensitive species (http://www.artdatabanken.se/media/2157/skyddsklassade-arter-artdatabanken-150505.pdf). Access to sensitive species is subject to certain user requirements and permissions are handled via the user administration system (
Systems architecture
The AP is a web application that has been constructed according to the Model-View-Controller (MVC) design pattern using ASP.NET MVC. The application is not connected directly to any database; all data retrieval mechanisms, statistical calculations, and aggregations are instead performed by various web services (Fig.
Another implication of the service-oriented architecture is that the software is only loosely dependent on the underlying data sources. It should be possible to connect the AP to another set of services providing data from another region. Actually, if for example the GBIF web services were to be complemented with corresponding functionality, the AP could work on a global scale without much reconfiguration. This is an interesting idea indeed, as some of the most powerful methods provided by the AP or rather its underlying web services, would in many cases require uploads of hundreds of millions of records in order to perform the statistical processing based on the functions available in the RESTful web service API currently provided by GBIF. This is of course not really a feasible solution. However, by adding a couple of analytical functions (Table 1, Suppl. material
Services
The AP currently uses three types of service. Two belong to the suite of standardized OGC (Open Geospatial Consortium http://www.opengeospatial.org) services, i.e. the Web Feature Service (WFS), which is suitable for retrieving geo-referenced data of any kind, and the Web Map Service (WMS), which provides raster representations of the data. The third type of web service that the AP uses is a set of very specific SOAP services that are considered to be the core services in the Swedish LifeWatch e-infrastructure for biodiversity data.
The generic REST-based interoperability of the OGC Services makes them easy to integrate dynamically in web applications using, for example, Open Layers. In the AP, this was done mainly for the purpose of giving the users the flexibility of adding whatever environmental or climatic data and maps that are made available by data providers all over the world. To find detailed information about these services, we recommend a set of Swedish metadata portals (Environment Climate Data Sweden, ECDS; the Geodata Portal; the Swedish Metereological and Hydrological Institute, SMHI Open Data Catalogue; Geological Survey of Sweden, SGU Open Data Catalogue) (Fig.
The medium-term ambition is to implement support for all major OGC services in the AP, including direct searches for data using the Catalogue Services mentioned above. Currently, the AP only supports OGC for connecting environmental data via the OGC Web Feature Service (WFS) and as background maps using the OGC Web Map Services (WMS). Most available OGC types of services that are relevant for the LifeWatch community are provided by organizations that do not belong to the Swedish LifeWatch Consortium and the services have been published for quite different reasons. Many of them have been published due to the INSPIRE directive (http://inspire.ec.europa.eu). Swedish LifeWatch, however, does also publish some data, e.g. WFS. One of the layers SLW provides consists of all public species observations as Darwin Core (http://slwgeo.artdata.slu.se:8080/geoserver/web/?wicket:bookmarkablePage=:org.geoserver.web.demo.MapPreviewPage).
As the Core Services are totally essential for the functionality in the AP, we describe each of them briefly below. We will also explain in detail the web service functions that are actually used in the current version of the AP in Table 1 (Suppl. material
In general, there exist six main services:
1) User Service
Url: https://user.artdatabankensoa.se/UserService.svc?wsdl
This is the main service used by the web application UserAdmin (https://UserAdmin.ArtDatabanken.se), which is the main administration tool for authorization within the Swedish LifeWatch. The service supports generic construction of application specific authority objects and user roles which can be used in order to regulate which users are allowed to retrieve or edit data of different kinds. The service supports regulation of data handling based on taxonomic, temporal and spatial specifications. It also supports delegation of authorization in a rather simple and effective way, making it possible, for example, for certain members of the Swedish County Administrations to authorize their own trusted employees to handle sensitive data within their geographical region.
The User Service is used by all other core services for authentication and in order to check the user’s roles and authorizations.
2) Taxon Service
Url: https://taxon.artdatabankensoa.se/TaxonService.svc?wsdl
This is the main service used by the web application Dyntaxa (www.dyntaxa.se) to search and present taxonomic information on practically all multicellular taxa naturally occurring in Sweden. This service also provides all methods utilized by Dyntaxa for all editing of the content in the Swedish Taxonomic Database. Taxon Service provides the infrastructure with globally unique identifiers of all the taxonomic concepts that are handled by the core web services within Swedish LifeWatch. The Dyntaxa taxon id, which is the name of the taxon concept id in Dyntaxa, is the main link between the different data types essential for biodiversity in Sweden, such as species observations, taxon names and hierarchies, taxon traits and other attributes, as well as information on legislation and Red List status.
3) Taxon Attribute Service
Url: https://taxonattribute.artdatabankensoa.se/TaxonAttributeService.svc?wsdl
This service has the potential to handle all generalizations about taxa including habitat and substrate preferences and usage, interspecific interactions, life history traits, threats, Red List classification, and legislation. It handles more than 2,000 factors that are evaluated in relation to the Swedish taxa provided by the Taxon Service. In the scope of LifeWatch, the service is mainly used for retrieving taxon lists determined by different factors or combinations of factors and taxonomic hierarchies.
4) Geo Reference Service
Url: https://georeference.artdatabankensoa.se/GeoReferenceService.svc?wsdl
This service provides information on existing Swedish regions of different kinds, e.g. counties and municipalities.
5) Swedish Species Observation Service
Url: https://swedishspeciesobservation.artdatabankensoa.se/SwedishSpeciesObservationService.svc?wsdl
This service constitutes the main focal point of Swedish biodiversity data in terms of species observations and occurrence data. It provides a couple of methods that can be used in order to retrieve species observations originating from several data sources, e.g. Darwin Core records. In addition to the methods (9-16) listed in Table 1 Suppl. material
6) Analysis Service
Url: https://analysis.artdatabankensoa.se/AnalysisService.svc?wsdl
In contrast to the other services, this service does not handle a particular type of data. Instead, it is dedicated for all sorts of data processing or data retrieval tasks that involve transformations of data types from their basic representation to something else (
The Analysis Service also includes a number of processing methods that act on data from OGC WFS. These methods can be used for calculating grid statistics based on the features in a specified data layer (Functions no. 25-26, Table 1, Suppl. material
More information on the usage of the SLW Core Service can be obtained from the Swedish LifeWatch homepages (SLW Data Management, 2015 or www.servicecentrelifewatch.eu/products).
Portal functionalities
The AP has numerous functionalities. Species occurrence records can, for example, be filtered by present and absence data, or after taxa name, polygon, time and red listed category. Different environmental layers (WMS, WFS) can be uploaded to the AP and combined with species observation records. All functionalities of the current version are described in the user manual (Suppl. material
Users
At present, the AP has three different kinds of main users: researchers, biologists at the county/municipality/other administrations and consulting companies (Fig.
The three different user groups and their connection: 1) Researchers that combine data from the portal with their own data, running models and publishing the results; 2) Consulting companies getting the order to analyse a certain area for environmental effects or have to inventorise red-listed species. Beside writing reports, scientific publishing can be found as well (e.g.
Ecologists and biologists working for an authority or a consulting company, working with environmental impact analysis and decision-making processes, often use the taxon id filter function to download, for example, red-listed species in a certain area (polygon filter). User administration handles access to protected species, where different authorities have access to several existing protection levels (1-5) of the species, depending on their responsibilities in the decision-making processes.
Today, 2 years since the release of the AP, it is not clear to what extent citizen scientists use the portal. From Artportalen, an important Swedish data provider, it is known that numerous ornithologists, members of entomological and botanical associations report to and use the database to compare and share local species information. In future, it is hoped that these user groups will join the company of the AP to run simple analyses for their target species.
In the future, many analyses can be run through the portal and the use of the available data in the AP may lead to further ideas, observations and investigations of species distribution patterns.
On-going research that use the AP includes, for example, phenology studies on birds and effects of habitat restorations on wetland bird trends. Another study is attempting to identify forest regions based on species composition in order to make regionalized conservation recommendations. Studies have already been published that have been done on the SLW infrastructure (http://www.svenskalifewatch.se/sv/publicerat/). One recently published study deals with the identification of candidate responsibility species that are of particular interest in connection with the planning of road and railway constructions (
Outlook
The Analysis Portal described in this paper is a first national step towards making biodiversity data free available to scientists, companies and decision-makers dealing with questions of and related to nature conservation management (Fig.
The AP will continue to be developed in response to users` feedbacks and requests. An interesting idea would be to adjust the AP to function on a global scale instead of only for Sweden. Practically, this could, for example, be done by complementing the GBIF web service with the functionalities listed in Table 1, Suppl. material
One vision of Swedish LifeWatch is that all Swedish biodiversity data, or at least those that are produced with taxpayers` money, are freely accessible via a portal like the AP that is based on numerous web services. Today, the technical know-how is available, but co-operation between all institutes, universities, scientists and other responsible people is still under development. Perhaps it also needs a generation shift, where younger scientists grow up with a better data-sharing mentality. Bio(diversity)informatics is in demand to be taught more at university level (
Following the basic concept of LifeWatch, Swedish LifeWatch is actively working on capacity building, giving trainee workshops and support, and building up a Nordic LifeWatch co-operation between the Scandinavian countries to combine existing e-infrastructures into a Nordic network, similar to the AP. Long-term perspectives might be to create related e-infrastructures on biodiversity data on the European level.
Species do not stop in front of political borders and successful biodiversity research and management therefore need more free access and availability of data on a national and international level. This can push collaboration forward and will help us solve prospective challenges in the field of nature conservation management in future.
Swedish LifeWatch is funded by the Swedish government through the Environmental Protection Agency and the Swedish Research Council (grant no. 829-2009-6278).
A list of all core web service methods used in the AP. Here, the published methods name includes the name of the SOAP service. The functional description gives details about what can be arbieved with each method.
List of criteria that can be specified when searching for species observations in the Swedish LifeWatch e-infrastructure core web services, i.e. the Analysis Service and the Swedish Species Observation Service.
User manual - portal functionalities.