Biodiversity Data Journal :
Software Description
|
Corresponding author: Arnald Marcer (arnald.marcer@uab.cat), Agustí Escobar (a.escobar@creaf.uab.cat)
Academic editor: Vincent Smith
Received: 27 Jan 2022 | Accepted: 07 Apr 2022 | Published: 28 Apr 2022
© 2022 Arnald Marcer, Agustí Escobar, Víctor Garcia-Font, Francesc Uribe
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Marcer A, Escobar A, Garcia-Font V, Uribe F (2022) Ali-Bey - an open collaborative georeferencing web application. Biodiversity Data Journal 10: e81282. https://doi.org/10.3897/BDJ.10.e81282
|
Georeferencing preserved specimens represents a major effort at the Museu de Ciències Naturals de Barcelona (MCNB), given the available resources and limited staff that can be allocated to the task. Georeferencing is a labour-intensive and hard-to-automate task that requires software tools that can help in making it as efficient as possible. The tool we present, Ali-Bey, has been slowly developed over 15 years and its functionalities have been gradually built in a process of development, testing, use in production and refinement, rather than as a single development cycle out of a comprehensive specifications requirement document. At the start, the MCNB could not find a tool that fully satisfied the requirements listed as essential and made the decision to develop a custom tool. At the end, the initiative has proved successful since it has delivered a new georeferencing tool that meets the MCNB's needs, all in a context of yearly scarce availability of funds. The tool has been gradually matured and developed over the years, in line with the scarce financing. Only recently, after reaching a notable set of novel features, we considered to release it as an open-source project. The MCNB has supported its development up until this date and decided to open it in order to give the NHC community the opportunity to contribute to its development.
We present the software tool Ali-Bey that provides new functionality for the georeferencing of specimens in Natural History Collections, namely the possibility of cooperation between different institutions, the traceability of georeferences and the capability of managing different versions of a same site name, namely for historical reasons. The tool is an open-source web application implemented in Python and the Django framework that leverages other commonly-used specialised geodatabase and map server tools. An API provides access to the geodatabase to externally-developed tools. In addition, for an easy installation, the tool is provided as a multi-container Docker application.
georeferencing, web application, site name versioning, collaborative database, digital specimens, natural history collections, traceability, Museu de Ciències Naturals de Barcelona
Institutions holding Natural History Collections (NHC) have recently embarked on new and demanding digitisation projects that can provide important societal benefits in research, conservation, education and outreach (
Assigning coordinates to preserved specimens of NHC is an important task that allows us to relate these specimens to the environmental conditions in which they were collected, presumably their living habitat or ecological niche. Once set in their ecological context, these specimens can help address fundamental questions in ecology, evolution and societal pressing issues, such as global change (
Retrospective georeferencing is a skill-demanding, labour-intensive and hard-to-automate task (
The job of georeferencing entails the need to enter, store, view and query alphanumeric and spatial data, navigate and query reference digital cartography, perform spatial operations, manage user access and permissions, work collaboratively and make results public and easily accessible. Software applications can provide tools that can greatly help georeferencers and their institutions in this undertaking.
The Museu de Ciències Naturals de Barcelona (MCNB) holds a collection of about three million preserved samples of specimens (
Usually, georeferencing procedures are supported by external and specific tools, like GEOLocate (
Ali Bey - an open collaborative georeferencing web application.
Ali-Bey is a web application aimed at providing support in the management of the georeferencing process, i.e. the assignment of coordinates to textual descriptions of site names, where preserved specimens from natural history collections were collected. It provides tools to help in determining georeferences, assigning metadata and querying, visualising, storing and publishing them. Site names can be hierarchically organised in a tree-like structure and they can be versioned, i.e. newer improved versions of georeferenced site names are kept on top of all their older versions. The application also provides user administration capabilities and the possibility of collaborating between institutions in order to sum up resources towards a common better list of georeferenced site names. Georeferenced sites can be made publicly available through a public API. The following list describes its main features.
Latest version 1.0.0
Features
Hierarchical structure
All site names descend from a top node called World. Each site name can be assigned to a parent node and each node can be georeferenced, i.e. each node can be in itself a site name. For instance, the municipality Vic is a site name nested under the county node Osona, the region node Catalonia, the country node Spain, the continent node Europe and the top node World. All of these nodes can also have their corresponding set of coordinates and metadata. The hierarchy can be based on political-administrative entities as the above-mentioned example or on other user-defined hierarchies, such as physical or ecological features. This hierarchical structure not only allows the easy retrieval of all georeferenced site names for a specific world region, but also the assignment of editing permissions for different parts of the world to different users. The site names hierarchy is shown within the application in a tree-like control with which users can interact (Fig.
Site names with a terrestrial/aquatic border may deserve two distinguishable georeferencing estimations: one for the terrestrial surface and another for georeferencing collecting places for aquatic organisms. For instance, fishes caught off the coast of Barcelona.
Versioned locations
Georeferenced site names are versioned, i.e. historical extents or newer improved versions of a given georeferenced site name are kept on top of the older ones (Fig.
Coordinates and uncertainty
Uncertainty is a measure of how close an estimated measure is to its real value (
The coordinates and uncertainty representing a given location can be entered in three alternative ways: a) they can be determined externally and manually entered into the system through a set of textbox controls, b) a spatial object representing the location can be directly entered using the built-in map viewer which allows for digitising points, lines and polygons and c) by importing a shapefile. Both in b) and c) cases, Ali-Bey will automatically calculate the centroid and the spatial uncertainty derived from the spatial object representing the site name.
Georeferencing resources
Resources used in the georeferencing process (e.g. maps, gazetteers, web map servers etc.) can be entered and stored in the system’s database. They can be assigned a spatial object representing the geographical extent covered by the resource. Hence, georeferencers can find which resources are available for the part of the world where a given specimen was collected. Additionally, each georeferenced site name version that is entered can be tagged with the resource used in the georeferencing process. Thus, any site name georeference history can be traced back to the georeferencing resources used in the process. In the case of web map servers, they can be registered with their corresponding URL or they can be created by importing a spatial file, such as a TIFF or shapefile. In the latter case, the spatial file is automatically entered into the system’s map server and made available as an additional layer in the map viewer.
Map viewer
The application provides navigation map windows for visualising, querying and editing both geographical resources and site names (Fig.
Detail of the map viewer for: a) georeferencing resources and b) site names. The polygon encompassing the georeferencing resource “Nomenclator of Site Names of the Balearic Islands” is shown in a) and a detailed view of the georeferenced site names of the eastern part of the Island of Minorca is shown in b). Tools for zooming and editing can be seen on the left side of the viewer and for layer selection on the top-right corner.
Querying and filtering
Site names and georeferencing resources can be queried by clicking on them in their corresponding map viewer windows and their associated information can be accessed by clicking on the corresponding icon in the row of the tables listing them. Both site names and resources can be filtered with the use of a multicriteria filter. An arbitrary number of conditions can be added to the filter with "and/or" and negation operators. Filters can be saved and retrieved for later use (Fig.
Import / Export
Georeferenced site names can be automatically imported from comma-delimited files (CSV) and exported to comma-delimited (CSV), spreadsheet (XLS), GIS (shapefile and KML) and document (PDF) formats. Georeferencing resources can also be exported to CSV, XLS and PDF.
Usage statistics
The application provides different charts that illustrate different statistics of the georeferencing activity, such as number of georeferenced site names by georeferencer, country, type and terrestrial/aquatic nature. With respect to the georeferencing resources, their number by type is given.
Look-up tables
Shared lists of terms in the form of look-up tables (key-value tables), such as authors, organisations, keywords, version qualifiers, content types, support types, site name types and unit types, can be edited and curated with specific forms.
User management and permissions
The application provides user management functionality. A user given the administrative role can add and remove users and assign editing privileges. Users can also be given geographically-bounded permissions for editing locations, facilitated by the hierarchical structure of the site names. This allows the spatial compartmentalisation of the world into different regions which can be assigned to different georeferencers and organisations.
Institutions collaboration through federated lists
Different institutions can collaborate towards a shared georeferenced gazetteer and, thus, sum up resources and divide and distribute the overall effort. Moreover, this can be combined with the corresponding permissions to assign different geographical zones to different institutions and georeferencers, taking advantage of the better direct knowledge of parts of the territory by different actors and, potentially resulting in better georeferencing. Georeferenced site names are tagged with the name of the institution of their corresponding georeferencers. Users can choose to see only the site names georeferenced by their institution or the whole set of site names from all federated institutions.
Multi-language support
The application supports multiple languages. They can be added to the list of available languages by translating a language configuration file.
Public API
Ali-Bey also provides a public application programming interface that gives access to different functionalities. Other applications can use it and perform different queries to the system, such as returning all locations within a given geometry, returning all data concerning a given locality etc.
A working example that uses Ali-Bey's API can be found at https://www.bioexplora.cat/en/geocoding. This is a gateway to the Ali-Bey's database of georeferenced site names of the MCNB. This web page also provides the possibility for users to give comments to improve the georeferencing.
Containerised deployment
Ali-Bey can be deployed as a multi-container docker application. This allows for the easy set-up in any operating system which supports Docker, avoiding conflicts with existing libraries and language interpreter versions in the hosting operating system.
The application is developed around the Model-View-Controller pattern on a three-tier architecture.
Presentation tier
Ali-Bey can be accessed with any web browser, although the most common ones are recommended (Chrome, Firefox, Safari, Edge). The user interface is implemented in HTML5 and JavaScript using the Bootstrap CSS framework (https://getbootstrap.com) and the jQuery JavaScript library (https://jquery.com). Map navigation is provided via the Leaflet JavaScript library (https://leafletjs.com). The user interface is responsive, it adapts itself to different device screen sizes. Ali-Bey uses the Nginx (https://www.nginx.com) web server as a proxy for the Django application and renders HTML pages to the user.
Application tier
The application layer is implemented in Python using the Django web framework (https://www.djangoproject.com). Django provides data access, processing and presentation functionality. Spatial operations, such as centroid calculation, are done using PostGIS (https://postgis.net) and the Geospatial Data Abstraction Library (GDAL, https://gdal.org). Cartography is served through the Web Mapping Service Interface Standard (WMS), from either locally-stored geospatial files or externally accessible WMS sites. Internal WMS functionality is provided by GeoServer (http://geoserver.org), an open source server for sharing geospatial data. Apache Tomcat (https://tomcat.apache.org) acts as an internal application server for GeoServer, the WMS map server.
Data tier
The data layer is composed of two main parts, the geodatabase and the digital cartography. The geodatabase is a PostgreSQL (https://www.postgresql.org) with the PostGIS (https://postgis.net) spatial relational database extension. Digital cartographic data, corresponding to internal georeferencing resources, are stored as geospatial data files.
API
Although Ali-Bey is meant to be an internal tool for georeferencing within an institution holding natural history collections, other applications can be built using the public API to give access to the georeferenced site names database. The Museu de Ciències Naturals de Barcelona provides public access to its Ali-Bey database of georeferenced site names through a web page which implements the Ali-Bey API.
The application programming interface is based on the REST (REpresentational State Transfer) architectural style which allows the use of HTTP methods to perform create, retrieve, update and delete operations. Currently, in the Ali-Bey API only retrieve operations have been implemented.
Dockerised multi-container platform
Ali-Bey uses the Docker containerisation platform (https://www.docker.com) to bundle its components and provide for an easy deployment (Fig.
Application architecture of Ali-Bey deployed as a Docker multi-container application. A user connects with a web browser through the Internet to an Nginx web server that gives access to the application. The application is implemented with the Django web framework that interacts with the map server Geoserver and with the PostgreSQL/PostGIS database with the help of the Python HTTP server Gunicorn. The PostgreSQL alphanumeric and spatial database files and the layer metadata are all persistently stored in the host filesystem, configured as a Docker volume.
Miguel Prieto, Anna Díaz, Cristina González and Olga Boet form part of the team of the Museum of Natural Sciences of Barcelona devoted to georeferencing collections. They have been crucial in maturing the first ideas of designing a georeferencing tool to the end. Miguel Martínez has been a key player in the integration of Ali-Bey in the MCNB infrastructure. The authors also want to express their gratitude to the reviewers who helped improve the manuscript. Finally, the participation in the EU Cost Action CA17106: “MOBILISE - Mobilizing Data, Experts and Policies in Scientific Collections” has provided useful insights into the georeferencing needs of the NHC community.
FU conceived the need and idea for the tool. AM led and coordinated the project. AE, VG and AM designed the software application. AE and VG led and developed the implementation of a previous Java version of the tool. AE led and developed the current Python version. AM drafted the manuscript with contributions from all authors.