Biodiversity Data Journal :
Forum Paper
|
Corresponding author: Donald Hobern (dhobern@gbif.org)
Academic editor: Vincent Smith
Received: 05 Feb 2019 | Accepted: 04 Mar 2019 | Published: 08 Mar 2019
This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
Citation:
Hobern D, Baptiste B, Copas K, Guralnick R, Hahn A, van Huis E, Kim E, McGeoch M, Naicker I, Navarro L, Noesgaard D, Price M, Rodrigues A, Schigel D, Sheffield C, Wieczorek J (2019) Connecting data and expertise: a new alliance for biodiversity knowledge. Biodiversity Data Journal 7: e33679. https://doi.org/10.3897/BDJ.7.e33679
|
There has been major progress over the last two decades in digitising historical knowledge of biodiversity and in making biodiversity data freely and openly accessible. Interlocking efforts bring together international partnerships and networks, national, regional and institutional projects and investments and countless individual contributors, spanning diverse biological and environmental research domains, government agencies and non-governmental organisations, citizen science and commercial enterprise. However, current efforts remain inefficient and inadequate to address the global need for accurate data on the world's species and on changing patterns and trends in biodiversity. Significant challenges include imbalances in regional engagement in biodiversity informatics activity, uneven progress in data mobilisation and sharing, the lack of stable persistent identifiers for data records, redundant and incompatible processes for cleaning and interpreting data and the absence of functional mechanisms for knowledgeable experts to curate and improve data.
Recognising the need for greater alignment between efforts at all scales, the Global Biodiversity Information Facility (GBIF) convened the second Global Biodiversity Informatics Conference (GBIC2) in July 2018 to propose a coordination mechanism for developing shared roadmaps for biodiversity informatics. GBIC2 attendees reached consensus on the need for a global alliance for biodiversity knowledge, learning from examples such as the Global Alliance for Genomics and Health (GA4GH) and the open software communities under the Apache Software Foundation. These initiatives provide models for multiple stakeholders with decentralised funding and independent governance to combine resources and develop sustainable solutions that address common needs.
This paper summarises the GBIC2 discussions and presents a set of 23 complementary ambitions to be addressed by the global community in the context of the proposed alliance. The authors call on all who are responsible for describing and monitoring natural systems, all who depend on biodiversity data for research, policy or sustainable environmental management and all who are involved in developing biodiversity informatics solutions to register interest at https://biodiversityinformatics.org/ and to participate in the next steps to establishing a collaborative alliance.
The supplementary materials include brochures in a number of languages (English, Arabic, Spanish, Basque, French, Japanese, Dutch, Portuguese, Russian, Traditional Chinese and Simplified Chinese). These summarise the need for an alliance for biodiversity knowledge and call for collaboration in its establishment.
Biodiversity, biodiversity data, biodiversity informatics, GBIC2, alliance, collaboration, data quality, sustainability, research infrastructure, open science, open data, investment
Biodiversity, the variation within species, between species and of ecosystems*
Addressing this challenge is difficult because our fundamental understanding of the complexity and dynamics of biodiversity remains inadequate. Over the past quarter of a millennium, taxonomists have described around 2 million distinct species. Estimates of the actual number of extant species vary (
For centuries, biologists have been collecting specimens and recording observations that offer the historical context for modern scientific biodiversity-based investigations and predictive modelling. Many complementary sources of observations and measurements provide contemporary perspectives, including, amongst others, ecological research and field-based monitoring, citizen science and local community initiatives, molecular studies, automated cameras and sensors and satellite imagery.
Over the last twenty years, a steady growth in informatics investments, on many different scales, has served to make biodiversity information from all of these sources accessible online. During 2018, data published and aggregated through the Global Biodiversity Information Facility (GBIF) surpassed one billion records, each serving as evidence for the occurrence of a species at a given time and place. Associated networks, such as the Ocean Biogeographic Information System (OBIS) and national biodiversity information facilities, also each provide access to significant volumes of occurrence data. The 2018 edition of the Catalogue of Life Annual Checklist includes taxonomic information on 1,803,488 living and extinct species (around 75% of known species). The Biodiversity Heritage Library now includes more than 55 million pages of scanned biodiversity literature. The Barcode of Life Data System includes 6,293,000 barcode sequences representing 280,000 species. The recently established Global Genome Biodiversity Network*
These efforts have made significant progress in facilitating open access to biodiversity data, increasing scientific understanding and providing relevant information for policy on the environment, conservation and sustainability. Improved modelling of species distributions at all scales supports evidence-based planning and decision-making. However, as acknowledged on multiple occasions by major biodiversity informatics initiatives*
A further concern is the disconnection between these systems and the communities which have the necessary expertise to validate, curate and improve data from diverse sources. This leads to quality issues with implications for subsequent analysis and use. The causes of this disconnection include both sociological (
Stakeholders have been able to develop a common vision for aligning their activities, as described in the 2012 Global Biodiversity Informatics Outlook (GBIO) (see
The Senckenberg Gesellschaft für Naturforschung hosted a workshop on Exploring Synergies and Sustainability for Biodiversity Information Systems in March 2017. Attendees representing global data infrastructures, national data centres and major research institutions agreed that the global community needed to develop a shared mechanism for planning, delivering and sustaining a linked and open global biodiversity data infrastructure. This workshop did not determine what form such a mechanism should take or whether the desired outcomes would require a sustainable long-term, persistent decision-making body or could be achieved through shorter time-bound collaboration. Accordingly, the Global Biodiversity Information Facility (GBIF), as part of its 2018 Work Programme *
GBIC2 began with an overview of the complex landscape of stakeholders that contribute or make use of biodiversity information. These stakeholders included a broad range of scientific communities, politicians, non-governmental organisations, commercial entities and public stakeholders, including indigenous and local communities.
The earlier GBIO framework (Fig.
The GBIO Framework (Hobern et al. 2012) identified 20 components as essential elements of biodiversity informatics and organised as four layers: Culture, Data, Evidence and Understanding. The framework offered a coordinated model for delivering the data products required to support biodiversity assessments and indicators. An organised approach of this kind also facilitated the bidirectional linkages necessary with other research domains (environment, climate, agriculture etc.), particularly to refine modelled representations of biodiversity. The GBIO Framework (and the efforts of other research domains) should benefit from and build on research infrastructure investments that are in place or planned in many countries and regions.
The Culture tier addresses the open data and open science context required to enable effective collaborative information sharing, with the following components:
Open access and reuse culture
Data standards
Persistent storage and archives
Policy incentives
Biodiversity knowledge network
The Data tier focuses on digital access to well-formed streams of data from all relevant sources of biodiversity observations and measurements, with the following components:
Published materials
Collections and specimens
Field surveys and observations
Sequences and genomes
Automated, remote-sensed observations
The Evidence tier focuses on organising these disparate streams into accessible, integrated information resources, with the following components:
Fitness-for-use and annotations
Taxonomic and phylogenetic frameworks
Integrated occurrence data
Aggregated species trait data
Comprehensive knowledge access
The Understanding tier focuses on building modelled representations of biodiversity patterns and properties, based on all available evidence, with the following components:
Multiscale species modelling
Trends and predictions
Modelling biological systems
Visualisation and dissemination
Prioritising new data capture
Each of the twenty components in this framework encompasses the activities of many existing organisations, agencies or projects as well as individual researchers.
The challenge presented for the GBIC2 attendees was to propose a collaborative approach for the global community for planning and agreeing on an optimal set of new or improved policies, data standards, processes, governance arrangements, software tools, informatics infrastructure investments and research programmes, with sufficient clarity to deliver an interoperable global infrastructure of the kind envisioned by the GBIO framework.
Four parallel working groups reviewed different components from the GBIO framework, each selected to capture a broad range of different challenges and opportunities:
Biodiversity knowledge network (Culture)
Published materials (Data)
Integrated occurrence data (Evidence)
Trends and predictions (Understanding)
Each of these working groups sought to identify major sociological and technical issues limiting the effectiveness of current solutions in the area concerned. They then considered possible responses by the global community to resolve these issues and accelerate delivery as part of a broader biodiversity informatics landscape. The outputs from these four working groups are summarised in Annex A - Outputs from GBIC2 Working Groups (Suppl. material
Plenary presentations helped guide discussion of governance models that could enable such coordination:
Donald Hobern: GBIC2 - Realising the Vision*
Robert Hanisch: International Collaboration for Open Data and Open Science*
Maria Uhle: A Funders’ Perspective*
Jerry Lanfear: ELIXIR: the European ELIXIR consortium and delivery of core data services*
Ismaêl Mejía: The Apache Way and collaborative development*
The attendees at GBIC2 reached consensus*
Support for science and evidence-based planning
Support for open data and open science
Support for highly-connected biodiversity data
Support for international collaboration
Although the stakeholders at GBIC2 all represented organisations that focus on a subset of the listed ambitions, each recognised the role that their work programmes can play as components within this larger shared vision. No single stakeholder has responsibility for coordination and delivery of this integrated vision, nor is any stakeholder positioned to address these challenges for all communities and at all scales.
The potential strategies and investments identified by the four working groups to address current impediments in biodiversity informatics reinforced this vision. Many of the suggested solutions depended fundamentally on all stakeholders adopting a single shared process or implementation. The majority of the rest of the suggestions would deliver significant multiplier effects as stakeholders converged on more standardised approaches. The robust and sustainable solutions needed to address many well-understood challenges are unimaginable without an open and inclusive approach that maximises cooperation.
Accordingly, GBIC2 attendees re-asserted the need for a coordination mechanism that helps all parties align their missions and work programmes in a complementary fashion.
Many diverse organisations and institutions, including governments, academia, industry and public stakeholders, have an interest in some aspects of biodiversity knowledge management. In particular, independent national investments form a major and significant part of overall funded activity and frequently deliver tools, standards and services later adopted in other countries and regions. Examples include the services and tools provided by the Flanders Marine Institute (VLIZ) to support the World Register of Marine Species, the Living Atlases software originally developed for the Atlas of Living Australia and tools built by the Humboldt Institute in Colombia to support GEO BON. As a result, any coordination mechanism must provide the flexibility both to accommodate and to benefit from this diversity, rather than seeking to implement a prescriptive programme of planned deliverables.
The design of any viable model for coordinating activity should therefore support:
open discussion of requirements and open communication and participation by all stakeholders
development of a shared vision and roadmap for increasing alignment
collaborative approaches to design, fund, implement and sustain infrastructure components and tools required by multiple stakeholders
It is possible that some aspects of this coordination could be handled within the mission of existing international organisations.
The final report of the OECD Megascience Forum Working Group on Biological Informatics*
The Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) and the Convention on Biological Diversity (CBD) may provide relevant settings in which further necessary discussions could take place, though these bodies are less closely involved in the technical development of data infrastructures. GEO BON is also an important international network, particularly for the earth-observing and monitoring aspects.
As a result, the GBIC2 attendees agreed to explore the establishment of a lightweight alliance, building on lessons learned from different collaborative models, including open-source software communities such as the Apache Software Foundation (ASF) and open science partnerships such as the Global Alliance for Genomics and Health (GA4GH). GBIC2 tasked GBIF with facilitating the exploration and establishment phase for a global alliance to transform our understanding of biodiversity by connecting all efforts to observe, measure and model the living planet.
By facilitating open discussion between interested and aligned parties, this alliance will seek to refine the GBIO framework and construct a shared roadmap for interoperable infrastructure components, as with the GA4GH Strategic Roadmap*
Such an alliance should be expansive enough to include representation and perspectives outside those actively delivering data or building informatics infrastructure. In particular, it should facilitate closer collaboration with practitioners working in conservation, environmental management and sustainable development.
GBIC2 attendees identified five initial steps to advance the concept of this proposed alliance for biodiversity knowledge. The GBIF Secretariat will provide administrative and technical support for this exploratory phase.
The goal following GBIC2 is to refine the concept and to develop a broad and inclusive community of stakeholders interested in mobilising, improving and using biodiversity information. The immediate priority is to communicate this goal and expand engagement and support for implementing an appropriate model.
This paper is therefore presented as an invitation to the global community to develop a new coordination and collaboration model and to work towards the multifaceted vision outlined above. We call below for institutions and individuals to indicate their support and to participate in online discussions around possible models for such an alliance. An alliance site has been established at https://biodiversityinformatics.org/ to organise supporting materials and to host discussions for the other steps outlined below. Contributions in languages other than English are welcomed. Recommendations arising from these discussions will be synthesized within a proposed Charter and Mission for the alliance. We urge readers to participate in this process and to contribute to developing a model that will benefit all parties.
Although GBIC2 discussions wholeheartedly supported the establishment of a lightweight alliance and identified relevant examples from other domains, more work is required to meet the needs of this complex and diverse stakeholder community. Review of other similar alliances, coalitions and consortia should guide and assist planning for longer term approaches.
GBIC2 attendees acknowledged the remarkable success of the Apache Software Forum in fostering healthy and productive international projects involving many organisations and individuals. The biodiversity informatics community might benefit from adoption of elements from ‘The Apache Way’, in particular its merit- or reputation-based model for individual contributors to participate within a community. These approaches could be relevant, not only to the development of components of a collaborative infrastructure, but also to cooperation around curation of major datasets. A major aspect of ASF’s success comes from the culture that they promote through their Code of Conduct*
Important questions relating to the basis of membership remain open, such as whether both individuals and organisations should be eligible to become members. Both models may be valid, as exemplified by GA4GH, which represents an alliance of member organisations and ASF, which operates through participation of individuals, even when many of them represent organisations.
The GBIO framework and other reviews offer clear conceptual precedents for fitting disparate components together within an integrated biodiversity informatics infrastructure. The multifaceted vision outlined above shows the convergence of understanding amongst international stakeholders. Calls for the development of Essential Biodiversity Variables (
However, these perspectives are not adequate by themselves to establish the immediate and medium-term goals for this alliance. It is important to establish a set of significant but achievable use cases to guide thinking and prioritisation over a five- to ten-year period.
GBIC2 attendees therefore proposed a consultation with diverse stakeholders—including research groups, taxonomic facilities, the CBD, IPBES, FAO, conservation bodies and other user communities—to develop a set of defining questions or use cases against which to measure progress. These questions should address a range of research and societal needs with sufficient detail and precision to guide priorities for collaborative planning, development and implementation, while serving as milestones for improved collaborative delivery.
GBIF will work with representatives from GBIC2 and with other interested parties to consult widely in developing this set of defining questions online at https://biodiversityinformatics.org/.
The stakeholder landscape is complex. Overlapping and changing missions, work programmes and responsibilities applying at different scales make it difficult to identify all parties with existing interests in addressing particular challenges—increasing the risk of inadvertent conflict or duplication of effort.
GBIC2 attendees therefore agreed on the importance of a network analysis to understand the roles and responsibilities of major organisations, particularly at global, regional and national scales, including major components in their work programmes and deliverables and dependencies.
This challenging problem could readily absorb significant resources, especially if the proposed alliance seeks to maintain an up-to-date information overview over time. It is critical, therefore, that the entities and information to be included in a stakeholder map are clearly and tightly scoped. This work will benefit from expertise within the social sciences in mapping and analysing the structure of community networks.
GBIF will coordinate initial discussion of possible models to capture appropriate landscape information. Successful understanding of stakeholder relationships will assist the proposed alliance to identify critical services that need to be created or sustained and indicate opportunities for better alignment or unification of services developed by different parties.
Several existing collaborative activities are representative of the range of possible projects that an alliance might incubate. In order to raise visibility and to make progress in a more open and transparent way, GBIC2 participants and other interested parties are encouraged to propose existing activities as early proof-of-concept projects for the alliance model.
During the establishment phase of the alliance, governance processes will not be in place to support all aspects of such projects. However, adoption of practical examples will likely help develop the concept for and approaches to the alliance itself.
A transparent process is required for selecting even proof-of-concept projects. This should take into account the relevance of the project to a broad range of stakeholders, as well as the openness of the project for new partners to join and contribute to the plans and deliverables. The incubation process*
As an example of an existing collaboration, which might serve as a proof-of-concept project and which would benefit from increased exposure and openness, Catalogue of Life, GBIF, Encyclopedia of Life, Barcode of Life Data Systems and Biodiversity Heritage Library are currently working to develop a new collaborative model for building a shared taxonomic framework, under the project name, Catalogue of Life Plus (CoL+). This activity is currently funded by GBIF Netherlands and is seeking wider engagement with other bodies that are responsible for checklists or nomenclatural datasets. Increasing the transparency of this activity and involving other biodiversity informatics initiatives would allow CoL+ to engage more broadly with taxonomic authorities and maximise expertise contributed across different taxonomic groups. CoL+ could therefore serve as an early proof of concept for an alliance-led incubator project.
Additionally, there are growing efforts in several regions, particularly the NSF Advancing Digitization of Biodiversity Collections (ADBC) program*
Stakeholders are encouraged to identify other suitable collaborative projects, not just in the development of software or management of data, but also in other areas of activity, such as capacity enhancement and sustainability planning.
GBIF has allocated funds as part of its 2019 Work Programme for work on the initial steps identified above. The SYNTHESYS+ project, funded by the European Commission, includes additional funding for coordination activity and workshops. The biodiversity_next conference, which is planned as a major international forum for biodiversity informatics and which will take place in Leiden in October 2019, will serve as a venue for updates on progress.
Information on these activities and opportunities to participate in developing the alliance concept will be shared through an alliance website at https://biodiversityinformatics.org/. As far as possible, to maximise input from stakeholders in all regions, all discussions will be take place through open online discussion. A brochure for wider communication, an alliance for biodiversity knowledge, is included here (Suppl. material
We urge all stakeholders with an interest in the production, management and use of data on the world’s biodiversity to visit this site and indicate their interest and support for collaborating on development of an alliance for biodiversity knowledge. Discussion threads have also been opened to facilitate exploration of the initial steps identified above (Expand engagement; Evaluate models; Clarify scope and target outcomes; Map stakeholders; Adopt proof-of-concept projects). These discussions aim to define the parameters for workshops, white papers and online consultations starting in the coming months, so that, by late 2019, a model can be proposed for the operation of the alliance, including a framework for developing the shared roadmap for implementing infrastructure components and services.
The GBIC2 workshop was an activity under the GBIF Annual Work Programme 2018, with additional funding from the Atlas of Living Australia, DiSSCo, Field Museum, Harvard Museum of Comparative Zoology, iDigBio, JRS Biodiversity Foundation, L’Agence Française pour la Biodiversité, Netherlands Biodiversity Information Facility, Pensoft Publishers and UN Environment and using facilities made available by the University of Copenhagen.
This paper builds on the active contributions of the workshop attendees (Annex B - List of GBIC2 Attendees) during GBIC2 and through subsequent online communications.
The brochures included as supplementary materials were translated by the following GBIC2 attendees and representatives of the GBIF community: Anne-sophie Archambeau, Dimitri Brosens, Leonardo Buitrago, Felipe Castilla, Katia Cezón, Dairo Escobar, Rui Figueira, Nina Filippova, Javier Andres Gamboa Martínez, André Heughebaert, Tsuyoshi Hosoya, Natalya Ivanova, Niu Jingmei, Chihjen Ko, Patricia Koleff Osorio, Melissa Liu, Carmen Lujano, Maofang Luo, Tai Messina, Nicolas Noé, Sophie Pamerlon, Anabela Plos, Andrea Ferreira Portela Nunes, Pierre Radji, Niels Raes, Dmitry Schigel, Maxim Shashkov, Carole Sinou, Kumiko Totsu, William Ulate, Miguel Vega, Cristina Villaverde, Paula Zermoglio, Jinfeng Zhou and the Official Service of Translators of the Basque Government.
Outputs from GBIC2 Working Groups
List of GBIC2 Attendees
English PDF brochure presenting a call to participate in an alliance for biodiversity knowledge
Arabic PDF brochure presenting a call to participate in an alliance for biodiversity knowledge
Spanish PDF brochure presenting a call to participate in an alliance for biodiversity knowledge
Basque PDF brochure presenting a call to participate in an alliance for biodiversity knowledge
French PDF brochure presenting a call to participate in an alliance for biodiversity knowledge
Japanese PDF brochure presenting a call to participate in an alliance for biodiversity knowledge
Dutch PDF brochure presenting a call to participate in an alliance for biodiversity knowledge
Portuguese PDF brochure presenting a call to participate in an alliance for biodiversity knowledge
Russian PDF brochure presenting a call to participate in an alliance for biodiversity knowledge
Simplified Chinese PDF brochure presenting a call to participate in an alliance for biodiversity knowledge
Traditional Chinese PDF brochure presenting a call to participate in an alliance for biodiversity knowledge
Consensus was reached through open discussion by all participants during closing plenary sessions of the GBIC2 workshop and confirmed through input, open review and feedback to this document by attendees.
Societal goals will evolve over time and systems and services will need to evolve to meet new requirements.
The term "expert" is intended to cover any contributor with relevant expertise and is not restricted to professional researchers.
This "distributed global infrastructure" will necessarily depend on many underlying tools and services developed and maintained by many institutions and agencies. The community must establish processes to identify components that are important to the operation and sustainability of this overall infrastructure and to manage the lifecycle of such components to ensure continuity. ELIXIR offers one community model for identifying "Core Data Resources"*