Corresponding author: John Stephen Wood (
Academic editor: Quentin Groom
Background
Traditional sources of species occurrence data such as peer-reviewed journal articles and museum-curated collections are included in species databases after rigorous review by species experts and evaluators. The distribution maps created in this process are an important component of species survival evaluations, and are used to adapt, extend and sometimes contract polygons used in the distribution mapping process.
New Information
During an IUCN Red List Gulf of Mexico Fishes Assessment Workshop held at The Harte Research Institute for Gulf of Mexico Studies, a session included an open discussion on the topic of including other sources of species occurrence data. During the last decade, advances in portable electronic devices and applications enable 'citizen scientists' to record images, location and data about species sightings, and submit that data to larger species databases. These applications typically generate point data. Attendees of the workshop expressed an interest in how that data could be incorporated into existing datasets, how best to ascertain the quality and value of that data, and what other alternate data sources are available. This paper addresses those issues, and provides recommendations to ensure quality data use.
Species, Distribution, Crowdsource, IUCN, Red List, Protocol, Geographic Information Systems, GIS, Biodiversity Databases, citizen science.
“How can we standardize the methods used to incorporate point data into distribution range polygons? How can we accelerate the collection of observation (point) data”? These questions were posed to an international group of taxonomists during a workshop that was held in conjunction with the IUCN Red List Gulf of Mexico Fishes Assessment Workshop, which was held at the Harte Research Institute for Gulf of Mexico Studies on the campus of Texas A&M University-Corpus Christi in January 2014. Table
The goal of this workshop was to discuss a methodology for community-based recording of observations of marine species. These data, if collected in a repeatable and consistent manner over a long period of time, will become a valuable reference for distribution mapping for marine species ranges (adapted from
The IUCN Red List of Threatened Species™ is essentially a checklist of taxa that have undergone an extinction risk assessment using the IUCN Red List Categories and Criteria, as shown in Fig.
The majority of assessments appearing on the IUCN Red List are carried out by members of the IUCN Species Survival Commission (SSC), appointed Red List Authorities (RLAs), Red List Partners, or participants of IUCN-led assessment projects (
A detailed guidance document ‘Documentation standards and consistency checks for IUCN Red List assessments and species accounts’ (
Extent of Occurrence (EOO) is defined as the area contained within the shortest continuous imaginary boundary which can be drawn to encompass all the known, inferred or projected sites of present occurrence of a taxon, excluding cases of vagrancy (species far out of their typical range). This measure may exclude discontinuities or disjunctions within the overall distributions of taxa (e.g., large areas of obviously unsuitable habitat; but see 'area of occupancy'). Extent of occurrence can often be measured by a minimum convex polygon (the smallest polygon in which no internal angle exceeds 180 degrees and which contains all the sites of occurrence).
Area of Occupancy (AOO) is defined as the area within its 'Extent of Occurrence' (see definition above) which is occupied by a taxon, excluding cases of vagrancy. The measure reflects the fact that a taxon will not usually occur throughout the area of its extent of occurrence, which may, for example, contain unsuitable habitats. The area of occupancy is the smallest area essential at any stage to the survival of existing populations of a taxon (e.g. colonial nesting sites, feeding sites for migratory taxa). The size of the area of occupancy will be a function of the scale at which it is measured, and should be at a scale appropriate to relevant biological aspects of the taxon. The criteria include values in km2, and thus to avoid errors in classification, the area of occupancy should be measured on grid squares (or equivalents) which are sufficiently small.
The definitions above are taken directly from:
Distribution maps display a polygon intended to communicate that a species probably only occurs within its extent, which is based on known occurrences, knowledge of habitat preferences, remaining suitable habitats, elevation (or depth) limitations, and other expert knowledge. Point data, which can include line-based data from transects, polygon data from a defined area, such as a national park, and grid data (observations or survey records from a regular grid) from which these polygons are derived is obtained from published peer-reviewed literature, ‘grey’ literature (academic or government literature that is not formally published), field observations, biodiversity and taxonomic databases such as the Global Biodiversity Information Facility (GBIF) and Ocean Biodiversity Information System (OBIS), museum and other curated collections, or from taxonomic expert knowledge. There is a wide variety in the quality and quantity of these data. There are also online utilities, such as GeoCAT (
The existing IUCN Red List mapping protocol for marine species differs from that for terrestrial species, primarily in that bathymetry may be used to delineate species range limits, much like elevation limitations may be used to limit the ranges of terrestrial species. It also differs from the mapping protocol for freshwater fishes, where drainage basins are typically used for determining and delineating range extents. The IUCN protocol for converting marine observation point data into distribution polygons involves a three-step process: Step One: Plot Observation Points, Step Two: Expand the Range, and Step Three: Refine the Range.
In Step One, point observation data are plotted. Since data often come from a diverse range of formats and sources, methods for plotting data points will vary. All data should be plotted in the Geographic Coordinate System, WGS-1984.
In Step Two of this protocol, the range is extrapolated based on the extent of suitable habitat (ESH) in the area and expert knowledge of the species and its requirements. Surrounding areas of similar habitat may be included. For terrestrial species, there are various other factors such as elevation, temperature, and even natural physical barriers, such as oceans. Marine species range may be affected by depth, water temperature gradients, salinity ranges, photic zone depths, and O2 concentrations. Often, this extrapolation is accomplished by buffering the point data, and then creating a convex polygon that surrounds the available points.
In Step Three, areas that are deemed unsuitable for a species are removed from the extrapolated habitat polygon(s). Note that this extrapolation and elimination of areas may result in discontinuous or non-contiguous polygons. This may result in different Extent of Occurrence and Area of Occupancy, and results in the best representation of the species’ likely occurrence or distribution based on currently available information. The Area Of Occupancy (AOO) will reflect influences from both biotic and abiotic factors.
A ‘Best Practices’ section of the tutorial offers several ‘rules of thumb’ to go by:
Always name the polygon shapefiles by the taxon’s scientific name (using “genus_species” format) Smooth all polygons and check for irregularities before submitting Provide GIS data in geographic coordinates (specifically WGS84, the default setting for most GPS units) Remember that data attributes are absolutely required with spatial data, including codes for presence, origin and seasonality.
The distribution map produced with this protocol represents the taxon’s distribution within its overall range for communication and/or conservation planning purposes; it may not equate to either the spread of extinction risk (Extent Of Occurrence) or the occupied range area (Area Of Occupancy) as defined by the IUCN Red List Categories and Criteria, but can be used to support these measurements.
There are several areas where questions may exist that this protocol doesn’t address such as: what point data sources should be included, what areas should be eliminated, how should seasonality by represented (separate GIS file, separate polygon within the same GIS file…).
Much of the workshop discussion focused on bringing in additional data and data sources. With the advent of numerous portable electronic devices, including Smartphones, with different applications and interfaces and GPS/mapping capabilities, new and exciting sources of species and species/location data are available, which could be included with current datasets. Crowdsourcing, commonly known as ‘citizen science’, is a manner of collecting data and observations in which collaborators who may lack credentials and formal institutional affiliation can contribute to the work of taxonomists and scientists. For example, rather than requiring a master’s degree in ichthyology, a citizen science project might ask if a candidate can learn to identify a particular species of fish using a dichotomous key (
The
To illustrate crowdsourcing, consider several examples:
An enterprising PhD student, Devin Bloom, from the University of Toronto Scarborough successfully used FaceBook to post images and identify almost 5,000 fish specimens collected during the first ichthyologic survey on Guyana’s Cuyuni River. This feat was accomplished in less than 24 hours, by a network of friends, many of whom had PhD’s in ichthyology. The National Museum of Natural History and the Smithsonian Institute ( Dr. Amanda Vincent, Director of Perhaps even more telling: an ‘app’ called ‘ The
This small sampling of crowd-sourced data collection applications emphasizes the need to achieve a consensus on whether information collected in this manner can be used to enhance the current point and polygon observation data used to determine range and distribution extent information.
Crowd-sourced data using some of the ‘apps’ mentioned above would require a minimum of data fields be filled; other attributes should be added from existing and expert knowledge or specimen voucher information. The scientific name (binomial), the name of the compiler or submitter, and the citation (organization or app name) should be collected and added to the geo-tagged image information when available. Many of the apps and website entry points currently available fail to generate useable data, because they do not conform to taxonomic standards, or lack georeferencing. Database curators and developers now have access to several 'toolkits', such as that available from
The Museum of Vertebrate Zoology (MVZ) at Berkley publishes a
The traditional sources of biodiversity data include but certainly aren't limited to museum collections, taxonomic monographs, and biodiversity databases, which obtain much of their data from the first two sources. Individual specimen and observations within these collections come from a variety of sources, including published and unpublished (grey) literature, amatuer naturalists, and volunteer recorders (
In addition to biodiversity databases, surveys, often conducted by state and federal fisheries agencies, are another source of biodiversity data. The Texas Parks and Wildlife Department has been conducting seine, gill net and trawl surveys since the 1970’s. The Louisiana Department of Wildlife and Fisheries has been collecting fishery independent data since 1988, from programs utilizing various gear and sampling techniques. The Florida Fisheries-Independent Monitoring program began the same year. The Southeast Area Monitoring and Assessment Program (
Biodiversity databases and literature contain vast amounts of distribution and taxonomic information, however the quality, scope and scale of data varies. To address this potential problem, data should be verified and vetted by species experts and other knowledgeable workers before the information can be incorporated into Red List assessments. Taxonomic information can be verified in authoritative taxonomic databases such as the Integrated Taxonomic Information System (ITIS), the World Register of Marine Species (WoRMS), and other initiatives, which count on the assistance of taxonomic experts to keep the information as current as possible.
Biodiversity databases such as GBIF, OBIS, and Red List usually rely on multiple biodoversity and taxonomic databases to keep information current. It is recommended that any change in taxonomy be fully documented and linked to the source of the taxonomic authority. Similarly, information on potential mis-identifications should be provided to avoid potential problems.
Distributional data can have several sources of errors including incomplete or vague locality descriptions, wrong information from original source, transcription of data from hand-written labels and field log books, transposition of latitude and longitude, and GPS or other instrument error or calibration problems. Therefore, biodiversity databases should have fields for accuracy and data confidence, ideally reviewed by staff or an expert. If the data point is considered problematic, it should be flagged as such so that users can evaluate its usefulness. There are numerous database validation tools available.
This paper is not intended to present all the possible combinations of crowdsourcing data or species evaluations, but instead should serve as a starting point for further discussion.
The review process:
In addition to the attributes currently required for inclusion in the IUCN data base, spatial data on distribution during different life stages, seasonality, and depth ranges would be helpful in the evaluation and assessment process. Ancillary information such as competitor and predator expansion and invasions, disease ‘hotspots’, environmental and habitat degradation are vital to the distribution mapping and evaluation processes. The existing Red List mapping protocol for marine species, (Steps One, Two and Three as described briefly above), is sufficient for adding point data for inclusion into and redrawing existing species distribution polygons, if species experts and evaluators confirm the validity and accuracy of the data. Expert evaluation and acceptance are key to this process! Maps are produced and data collected for specific purposes, and should be used for other purposes only with extreme caution. Users should understand the purpose of a distribution map (is this map showing the possible range (EOO), or where the species is actually presumed to be limited to (AOO) by habitat and other factors), and the limitations inherent in that map. There are millions of data points of marine species currently available in biodiversity databases such as GBIF, OBIS, and others, which can complement and confirm distributions. Also, digitization of museum collection efforts such as iDigBio will produce millions of additional data points. Inclusion of these points also requires expert vetting. Experts should be aware of these possible sources, and relavant sites should be reviewed during the planning stages of the review process.
Crowd-sourced data:
Crowdsource and ‘citizen science’ data can considerably increase the amount of data available for evaluating species distribution. Auto-collection of geo-locations, the use of autocomplete functions and drop-down lists can substantially add to the accuracy of that data. Crowdsourcing can be used to screen the majority of data on common species to reduce the workload of species experts, leaving the data on less common species or the observations outside of current polygons to be reviewed by experts. Points located outside of the existing polygons should be examined carefully. Points located inside existing distribution polygons may confirm the information is still current. Point sets should be examined in relation to the date of collection. In most cases, points are only an indication of a successful sampling effort. Points alone only indicate that a species was sighted at a given time and location. Lack of a point in a location does not preclude it being there. Further spatial analysis is often possible with mixed and 'noisey' data sets, using spatial and statistical analysis techniques. Inference of absence requires further analysis. New records of species not previously reported from an area could be a gap in knowledge, range extension due to introduction or climate change, or an error; prudence should be taken to verify the information with additional observations or rely on expert knowledge. Projects that count on ‘citizen scientist’ contributions to produce biodiversity data or to check data quality should be focused in scope and establish a minimum data quality standard; however, the required information should be as simple as possible to provide to avoid eliminating crowdsource data (i.e., if too much information required for contributions, the public may not get engaged). Crowdsourcing applications should include ample instructions, dropdown fields and auto-fill options where practical, leave little room for error, and strive for accuracy wherever possible. Data should be vetted by experts in the field of study.
The authors wish to acknowledge the International Union for Conservation of Nature Redlist Species Evaluators, team members, and leaders, especially the Gulf of Mexico Fishes Assessment Workshop attendees, experts and facilitators from the Species Survival Commission.
Red List Workshop (Jan. 2014) Attendees
|
|
|
Beth | Polidoro | IUCN/Arizona State University |
Bruce | Collette | Smithsonian Institute/ chair of Tuna and Billfishes SSG |
Christi | Linardich | IUCN/Old Dominion University |
Fabio | Moretzsohn | Harte Research Institute |
George | Sedberry | NOAA Office of National Marine Sanctuaries |
Gina | Ralph | IUCN/Old Dominion University |
Heather | Harwell | IUCN/Christopher Newport University |
Hector | Espinosa-Perez | Instituto de Biología, UNAM, Mexico |
Howard | Jelks | USGS Southeast Ecological Science Center |
James | Tolan | Texas Parks and Wildlife Department, Coastal Fisheries Division |
Jeff | Williams | Smithsonian Institute |
Jim | Cowan* | Louisiana State University |
John | McEachran | Texas A&M University |
John | Wood | Harte Research Institute |
Jorge | Brenner | The Nature Conservancy, Corpus Christi |
Kathy | Goodin | NatureServe |
Ken | Lindeman* | Florida Institute of Technology/Co-Chair of Snapper, Sea Bream, Grunt SSG |
Kent | Carpenter | Old Dominion University/Manager IUCN Marine Biodiversity Unit |
Kyle | Strongin | IUCN/Arizona State University |
Labbish | Chao | Museum of Marine Biology & Aquarium, Taiwan/ |
Luiz | Rocha | California Academy of Sciences/ member of Groupers and Wrasses SSG |
Luke | Tornabene | Texas A&M University |
Maria | Vega Cendejas | CINVESTAV-IPN, Unidad Merida, Mexico |
Mia | Comeros-Raynal | IUCN/Old Dominion University |
Michelle | Zapp Sluis | Harte Research Institute |
Riley | Pollom | Project Seahorse – University of British Columbia Fisheries Centre, Canada |
Rodolfo | Claro | Instituto de Oceanología CITMA, La Habana, Cuba |
Roger | McManus | IUCN, Arizona |
Ross | Robertson | Smithsonian Tropical Research Institute, Panama |
Tomas | Camarena Luhrs | National Commission of Natural Protected Areas–SEMARNAT, Mexico |
Required Attributes for IUCN Distribution Shapefiles (
|
|
|
|
ID_NO | Integer | Internal Record ID | Assigned by IUCN |
BINOMIAL | String | Scientific name of the species | Recommended but not necessary |
BASINID (for freshwater species only) | Integer | River Basin ID (Hydrosheds). (Note that this field is only included when species are mapped using the freshwater mapping protocol) | |
PRESENCE | ShortInt | Is/Was the species in this area, codes listed below | Assigned by IUCN |
ORIGIN | ShortInt | Why/ How the species is in this area, codes listed below | Assigned by IUCN |
SEASONAL | ShortInt | What is the seasonal presence of the species in the area, codes listed below | Assigned by IUCN (by date/time stamp?) |
COMPILER | String | Name of the individual/s or institution/s responsible for generating the polygon, if not IUCN. | Yes, with contact information (usually email address) |
YEAR | ShortInt | Year in which the polygon was mapped, compiled, or modified | Date Field |
CITATION | String | Individual/s or institution /s responsible for providing the data | Assigned by IUCN/app? |
SOURCE | String | Source of distribution range given. | Yes (app name?) |
DIST_COMM | String | Distribution comments that refer directly to the polygon. | Optional |
ISLAND | String | Name of the island the polygon is on | Bay system or other geography? |
SUBSPECIES | String | Epithet | Optional |
SUBPOP | String | Epithet | Optional |
TAX_COMM | String | Taxonomic comments that refer directly to the polygon. Includes notes on polygons pertaining to subspecies or subpopulations. | Assigned by IUCN |
LEGEND | String | Code containing the combinations of the presence, origin and seasonality fields determining how the map will be displayed on The IUCN Red List website. | Assigned by IUCN |
Coded Domain Values for Presence (
|
|
1 | Extant |
2 | Probably Extant (discontinued) |
3 | Possibly Extant |
4 | Possibly Extinct |
5 | Extinct (post 1500) |
6 | Presence Uncertain |
Coded Domain Values for Origin (
|
|
1 | Native |
2 | Reintroduced |
3 | Introduced |
4 | Vagrant |
5 | Origin Uncertain |
Coded Domain Values for Seasonality.
|
|
1 | Resident |
2 | Breeding Season |
3 | Non-breeding Season |
4 | Passage |
5 | Seasonal Occurrence Uncertain |