Corresponding author: Sonja Leidenberger (
Academic editor: Dmitry Schigel
During the last years, more and more online portals were generated and are now available for ecologists to run advanced models with extensive data sets. Some examples are the Biodiversity Virtual e-Laboratory (BioVel) Portal (
For the first time, the Swedish Analysis Portal for integrated analysis of species occurrence data is described in detail. It was launched in 2013 and today, over 60 Million Swedish species observation records can be assessed, visualized and analyzed via the portal. Datasets can be assembled using sophisticated filtering tools, and combined with environmental and climatic data from a wide range of providers. Different validation tools, for example the official Swedish taxon concept database Dyntaxa, ensure high data quality. Results can be downloaded in different formats as maps, tables, diagrams and reports.
Biodiversity informatics is a young discipline. The term was probably firstly used in 1996 by J.L Edwards in an unpublished document by the OECD Working Group on Biological Informatics – 1st meeting, notes on the draft agenda, Washington (see
The LifeWatch concept was begun to build an e-Science European infrastructure for biodiversity and ecosystem research in January 2011, where representatives from Hungary, Italy, the Netherlands, Romania and Spain signed the first Memorandum of Understanding. The idea was to construct a European research e-infrastructure project for biodiversity science and ecosystems research (
The Swedish LifeWatch consortium was founded in 2010 and is a national collaboration between four universities (the Swedish University of Agricultural Sciences (SLU), the University of Gothenburg (GU), Lund University (LU) and Umeå University (UMU)), the Swedish Meteorological and Hydrological Institute (SMHI) and the Swedish Museum of Natural History (SMNH), including the Swedish GBIF node. The project is coordinated by the Swedish Species Information Centre (ArtDatabanken) at SLU and financed by the Swedish Research Council (VR) and the Swedish Environmental Protection Agency.
The principal aim was to create a national web service oriented e-infrastructure for biodiversity data covering terrestrial, freshwater and marine habitats, and where all components are accessible through open web services (
This article describes the Analysis Portal (version 1.0.5821.25115, release date: 09/12/2015), which is a part of the Swedish national e-infrastructure of biodiversity data. In this software description, we refer to the current version named above. We describe the structure of this web application and the functionality of its different components. We also give a detailed presentation of all the web services the Analysis Portal is built upon (Supplementary materials 1-3).
Swedish LifeWatch Analysis Portal - a national e-infrastructure for biodiversity research
The Analysis Portal (AP) (Fig.
Numerous biodiversity occurrence databases are connected to the Analysis Portal via the Taxon Observation Service. The AP offers an entry node to all of these databases, currently providing access to around 60 million species observation records (Fig.
The most comprehensive database on Swedish biodiversity data is the Swedish Species Observation System (
The Swedish Species Observation Centre also collaborates on the observation database of red-listed species with around 1 million occurrence data. The database contains only protected species and permission is required to have access to it. This is handled by the UserAdministration (see below: Data Management and Cleaning) where the rapporteur (
Another extensive database that is available via the AP is the national register of survey test-fishing (NORS,
Marine environmental monitoring data are represented by the Swedish Ocean Archive, the so-called SHARK database (
Historical data sets are also linked to the AP through DINA (Digital Information system for Natural history collections;
By using the GBIF web service (
Wireless Remote Animal Monitoring (WRAM) is a national e-infrastructure for automatic reception, long-time storage, sharing and analysing biotelemetry sensor data from animals like mammals, birds and fish (
The long-time goal is to connect all national and regional Swedish databases to the AP via the Artportalen or directly to the Swedish Species Observation Service to make them accessible to researchers, policy-makers and citizen scientists through a single entry point. The number of occurrence reports in most of the databases increases substantially every year. Additionally, numerous Swedish databases at different host institutes are not connected today, for example national databases on seals, butterflies, hedgehog, moose and other game animals, as well as databases on invasive species, mussels, freshwater jellyfish and crayfish or the tree portal.
The biodiversity data are structured in very different ways in the various databases that are connected to the Swedish LifeWatch e-infrastructure. In order to make all the data searchable at one single focal point, the data have to be restructured. Many of the data sources are sample-based and only contain partial information on taxon-specific occurrences. Darwin Terms (TDWG,
The process of restructuring the data to fit Darwin Core and to enable a uniform search mechanism is supported by a set of web services. Each provider has published a web service where data can be retrieved according to the Darwin Core format. Regarding the taxonomic information only taxon id and the verbatim form of the scientific name, if available, is retrieved from the providers. Based on the taxon ids, the rest of the taxon-related information is retrieved directly from the Taxon Service, i.e. the SOAP-based web service behind the web application Dyntaxa (
A specific service named the Species Observation Harvest Service has been constructed in order to perform the harvesting and data processing necessary for compiling a unified dataset fulfilling all the requirements set by Swedish LifeWatch based on what is retrieved from each provider. The Harvest Service has methods to start and cancel harvests of single providers. To connect a new data provider service, one only needs to specify a mapping protocol that determines which field(s) in the Darwin Core set should correspond to the properties in the provider´s interface. Sometimes, data from the provider needs processing and sometimes default values are set for a single provider for certain fields.
The Harvest Service cleans the data in various ways by checking the content of each data field. If a data field contains values that do not correspond to the rules specified, the record is discarded and logged as erroneous. The error logs is a useful source of information in the process of increasing the quality of the harvesting mechanisms (Fig.
The user administration provides the user options for personalized settings and allows certain search enquiries to be saved and reused. The login is not necessary. The same user administration system is used for the Swedish Species Observation System (Artportalen), Dyntaxa and the AP. For each user, over 90% of observations are publicly accessible; exceptions exist only for a few sensitive species (
The AP is a web application that has been constructed according to the Model-View-Controller (MVC) design pattern using ASP.NET MVC. The application is not connected directly to any database; all data retrieval mechanisms, statistical calculations, and aggregations are instead performed by various web services (Fig.
Another implication of the service-oriented architecture is that the software is only loosely dependent on the underlying data sources. It should be possible to connect the AP to another set of services providing data from another region. Actually, if for example the GBIF web services were to be complemented with corresponding functionality, the AP could work on a global scale without much reconfiguration. This is an interesting idea indeed, as some of the most powerful methods provided by the AP or rather its underlying web services, would in many cases require uploads of hundreds of millions of records in order to perform the statistical processing based on the functions available in the RESTful web service API currently provided by GBIF. This is of course not really a feasible solution. However, by adding a couple of analytical functions (Table 1, Suppl. material
The AP currently uses three types of service. Two belong to the suite of standardized OGC (Open Geospatial Consortium
The generic REST-based interoperability of the OGC Services makes them easy to integrate dynamically in web applications using, for example, Open Layers. In the AP, this was done mainly for the purpose of giving the users the flexibility of adding whatever environmental or climatic data and maps that are made available by data providers all over the world. To find detailed information about these services, we recommend a set of Swedish metadata portals (
The medium-term ambition is to implement support for all major OGC services in the AP, including direct searches for data using the Catalogue Services mentioned above. Currently, the AP only supports OGC for connecting environmental data via the OGC Web Feature Service (WFS) and as background maps using the OGC Web Map Services (WMS). Most available OGC types of services that are relevant for the LifeWatch community are provided by organizations that do not belong to the Swedish LifeWatch Consortium and the services have been published for quite different reasons. Many of them have been published due to the INSPIRE directive (
As the Core Services are totally essential for the functionality in the AP, we describe each of them briefly below. We will also explain in detail the web service functions that are actually used in the current version of the AP in Table 1 (Suppl. material
In general, there exist six main services:
1)
Url:
This is the main service used by the web application UserAdmin (
The User Service is used by all other core services for authentication and in order to check the user’s roles and authorizations.
2)
Url:
This is the main service used by the web application Dyntaxa (
3)
Url:
This service has the potential to handle all generalizations about taxa including habitat and substrate preferences and usage, interspecific interactions, life history traits, threats, Red List classification, and legislation. It handles more than 2,000 factors that are evaluated in relation to the Swedish taxa provided by the Taxon Service. In the scope of LifeWatch, the service is mainly used for retrieving taxon lists determined by different factors or combinations of factors and taxonomic hierarchies.
4)
Url:
This service provides information on existing Swedish regions of different kinds, e.g. counties and municipalities.
5)
Url:
This service constitutes the main focal point of Swedish biodiversity data in terms of species observations and occurrence data. It provides a couple of methods that can be used in order to retrieve species observations originating from several data sources, e.g. Darwin Core records. In addition to the methods (9-16) listed in Table 1 Suppl. material
6)
Url:
In contrast to the other services, this service does not handle a particular type of data. Instead, it is dedicated for all sorts of data processing or data retrieval tasks that involve transformations of data types from their basic representation to something else (
The Analysis Service also includes a number of processing methods that act on data from OGC WFS. These methods can be used for calculating grid statistics based on the features in a specified data layer (Functions no. 25-26, Table 1, Suppl. material
More information on the usage of the SLW Core Service can be obtained from the Swedish LifeWatch homepages (SLW Data Management, 2015 or
The AP has numerous functionalities. Species occurrence records can, for example, be filtered by present and absence data, or after taxa name, polygon, time and red listed category. Different environmental layers (WMS, WFS) can be uploaded to the AP and combined with species observation records. All functionalities of the current version are described in the user manual (Suppl. material
At present, the AP has three different kinds of main users: researchers, biologists at the county/municipality/other administrations and consulting companies (Fig.
Ecologists and biologists working for an authority or a consulting company, working with environmental impact analysis and decision-making processes, often use the taxon id filter function to download, for example, red-listed species in a certain area (polygon filter). User administration handles access to protected species, where different authorities have access to several existing protection levels (1-5) of the species, depending on their responsibilities in the decision-making processes.
Today, 2 years since the release of the AP, it is not clear to what extent citizen scientists use the portal. From Artportalen, an important Swedish data provider, it is known that numerous ornithologists, members of entomological and botanical associations report to and use the database to compare and share local species information. In future, it is hoped that these user groups will join the company of the AP to run simple analyses for their target species.
In the future, many analyses can be run through the portal and the use of the available data in the AP may lead to further ideas, observations and investigations of species distribution patterns.
On-going research that use the AP includes, for example, phenology studies on birds and effects of habitat restorations on wetland bird trends. Another study is attempting to identify forest regions based on species composition in order to make regionalized conservation recommendations. Studies have already been published that have been done on the SLW infrastructure (
The Analysis Portal described in this paper is a first national step towards making biodiversity data free available to scientists, companies and decision-makers dealing with questions of and related to nature conservation management (Fig.
The AP will continue to be developed in response to users` feedbacks and requests. An interesting idea would be to adjust the AP to function on a global scale instead of only for Sweden. Practically, this could, for example, be done by complementing the GBIF web service with the functionalities listed in Table 1, Suppl. material
One vision of Swedish LifeWatch is that all Swedish biodiversity data, or at least those that are produced with taxpayers` money, are freely accessible via a portal like the AP that is based on numerous web services. Today, the technical know-how is available, but co-operation between all institutes, universities, scientists and other responsible people is still under development. Perhaps it also needs a generation shift, where younger scientists grow up with a better data-sharing mentality. Bio(diversity)informatics is in demand to be taught more at university level (
Following the basic concept of LifeWatch, Swedish LifeWatch is actively working on capacity building, giving trainee workshops and support, and building up a Nordic LifeWatch co-operation between the Scandinavian countries to combine existing e-infrastructures into a Nordic network, similar to the AP. Long-term perspectives might be to create related e-infrastructures on biodiversity data on the European level.
Species do not stop in front of political borders and successful biodiversity research and management therefore need more free access and availability of data on a national and international level. This can push collaboration forward and will help us solve prospective challenges in the field of nature conservation management in future.
Swedish LifeWatch is funded by the Swedish government through the Environmental Protection Agency and the Swedish Research Council (grant no. 829-2009-6278).
Homepage:
Creative Commons Public Domain Waiver (CC-Zero)
The time schedule of the Analysis Portal. From the start-up in 2011 over a three-years construction phase including its release, to a new operational phase and further construction.
Start page of the Analysis portal.
Overview of the connected national databases and their occurrence records to which the Analysis Portal offers access.
Overview of the web services used in the AP. The core web services (SOAP) are shown in blue boxes. Two of them handle species observations harvested from a number of data provider services (white boxes). Environmental data, maps and metadata of different kinds are connected to OGC services (WFS, WMS, CSW).
Overview of linked metadata search functions of the AP.
The three different user groups and their connection: 1) Researchers that combine data from the portal with their own data, running models and publishing the results; 2) Consulting companies getting the order to analyse a certain area for environmental effects or have to inventorise red-listed species. Beside writing reports, scientific publishing can be found as well (e.g.
Core Web Service Methods
A list of all core web service methods used in the AP. Here, the published methods name includes the name of the SOAP service. The functional description gives details about what can be arbieved with each method.
File: oo_82227.pdf
List of criteria that can be specified when searching for species observations
List of criteria that can be specified when searching for species observations in the Swedish LifeWatch e-infrastructure core web services, i.e. the Analysis Service and the Swedish Species Observation Service.
File: oo_81905.pdf
The Analysis Portal - user manual
User manual - portal functionalities.
File: oo_82233.pdf