An Australian collector's authority file, 1973–2020

Abstract Background Biodiversity databases contain omissions and errors, including those resulting from data entry mistakes and from the use of outdated or incorrect data sources. Some of these omissions and errors can be minimised by the use of authority files, such as expert-compiled taxonomic name databases. However, there are few publicly available authority files for collecting events, and the "where", "when" and "by whom" of specimen data are typically entered into biodiversity databases separately and directly, item by item from specimen labels. New information Here I describe a publicly available compilation of 3829 of my own collecting events over a 48-year period in Australia. Each record contains a unique combination of date, georeferenced location and location notes.


Introduction
Most of the authority files used in biodiversity informatics are databases of taxonomic names, such as those made available by WoRMS and the Catalogue of Life. An index of botanists has been developed by Harvard University Herbaria and Libraries that includes both taxonomists and collectors, and Groom et al. (2020) have argued for online resources that would include all specimen collectors.
Publicly available authority files for collecting events are much rarer, although these do exist within individual institutions. For example, the Queensland Museum Entomology Collection can provide to interested researchers an events table with locality name, georeference, upper and lower elevation, starting and finishing date, collector name(s) and collecting method. The table covers thousands of collecting events, mainly in Queensland, dating back to the early years of the 20th century.
In the past, tables of collecting events were familiar items in the reports of major scientific expeditions. Documentation of the HMS Challenger voyage in the 1870s includes a table (Anonymous 1885) with date, location, collecting method and sea and bottom conditions for more than 500 sampling locations. The 1905 Michaelsen-Hartmeyer expedition to Western Australia reported its results with a list of numbered collection stations and dates, and a map with the numbered stations clearly marked (Michaelsen and Hartmeyer 1907).
In contrast, the BushBlitz program in Australia has not required participants to report collecting event data from the 40 Bush Blitz expeditions undertaken since 2010. Each expedition produces a report with a narrative of field work and a summary of results, but individual collecting events might only be documented as uniform dots on maps, e.g. "Appendix B: Collecting sites" (Bush Blitz 2020). Records of collecting events are the responsibility of individual Bush Blitz participants, and specimen occurrence records are only expected to appear, eventually, in the Atlas of Living Australia (ALA), or in scientific publications. A complete, correct and consistently formatted authority file of collecting events for each Bush Blitz expedition would be useful to participants, natural history collections and ALA.
Less formal but still valuable sources of collecting event data are digitised field books, such as those made available by The Field Book Project of the Smithsonian Institution Archives. Field book data typically require interpretation and reformatting, but they are excellent raw material for authority files.
As with other look-up tables, an authority file for collecting events allows users to avoid database entry errors and to correct errors in data items already in databases. Users may also find information in the authority file that was unintentionally omitted from database entries based on specimen labels, or deliberately excluded in compliance with institutional databasing rules.
I compiled the authority file presented here in Darwin Core format after noticing that museum data in ALA included occurrence records attributed to me as collector with wrong dates and georeferences. The errors presumably arose when museum staff or volunteers entered data incorrectly from labels or data files I had provided. I also found "false positive" records in which I was erroneously listed as a collector. I contacted the relevant institutions asking that these errors be corrected, but there are undoubtedly other such mistakes and omissions in ALA and in museum databases.
I hope the dataset offered here will be of value in future for checking and completing museum database entries, and for avoiding the need for individual institutions to independently georeference my collecting localities. How authority-file information can best be incorporated into collection databases is a matter for institutional data mangers to decide. Users are welcome to contact me for clarifications and for additional information about particular collecting events, and minor updates to the authority file (additional events) may appear in future as new versions in Zenodo (https://doi.org/10.5281/zenodo.4990402).

General description
Purpose: Beginning in 1975 I kept written records of my plant and invertebrate collecting events in Australia. In the mid-1990s the accumulated written records, backdated to 1973, were entered in digital files, together with more recent digital records. The data items in the records were primarily the locality names or descriptors, the georeferences and the dates, i.e. the "where" and "when" of each event. For many events I also recorded landform and vegetation details in field books and diaries, but almost never the "what" of the event, namely the identity and number of specimens collected. My digital "where" and "when" records were maintained, corrected and updated until I stopped collecting in 2020.
From 1987 to 1997 I was employed as a contract collector on 27 invertebrate sampling projects (noted in the authority file), each of which generated a contractor's report. These reports are mainly unpublished "gray literature" held in institution and government agency libraries. Most of the reports contain more information about individual collecting sites than is included in the authority file.
The authority file is not quite complete. I sometimes collected particular taxa on request for specialists in Australia and elsewhere, and although I always passed on collection details with the the specimens, I did not always enter those details in my digital records. These missing events are mainly from the 1980s and 1990s; if I learn more about these events I will add them to the latest version of the authority file in Zenodo (https://doi.org/10.5281/ zenodo.4990402).
There are also a few events in the authority file which do not have any associated specimens. These events were failed searches for target species in millipede mapping studies. Unfortunately I cannot yet reliably identify which of these failed searches were true blanks, with nothing collected, and which were "target blanks", with non-target species collected and sent to a specialist or museum.
For convenience of use, all fieldnames in the authority file are Darwin Core terms. The only non-standard usage is in samplingProtocol, which includes data items with sampling plot size.

Geographic coverage
Description: The dataset contains three events from the Australian Capital Territory, 122 from New South Wales, 17 from Queensland, 41 from South Australia, 3332 from Tasmania and 314 from Victoria. There are no events from the Northern Territory, Western Australia or the Australian island territories. All locations are terrestrial habitats.

Additional information
Location text. The verbatimLocality entries are the location text strings I used in my field notes and in most of my publications. The text strings may differ in detail from those on hand-written or printed specimen labels I prepared.
Spatial data. From the 1970s through the 2000s I used paper maps to estimate locations, and beginning in 2001 I also used a handheld GPS unit and online digital maps. In all cases I later checked the estimated locations using better spatial data resources, e.g. Google Earth or the online mappers provided by Australian state governments. These checks sometimes resulted in small changes in georeferences or spatial uncertainties. For practical reasons I did not provide updated spatial data to the museums holding original specimen labels, and label locations may therefore differ slightly from locations in the authority file. The originally recorded format for spatial data also changed during the dataset period, from UTM coordinates to latitude/longitude, and from the AGD66 datum to GDA94 (equivalent to WGS84 during the sampling period).
Spatial uncertainty. The minimum value in the coordinateUncertaintyInMeters field is 25 (apart from eight locations measured along a transect), i.e. the collecting site is within a circle with diameter 50 m centered on the indicated coordinates. This minimum is meant to cover both the area searched for specimens and the GPS uncertainty, which can be considerable in dense forest locations in hilly terrain in Tasmania, where I did most of my collecting. Larger uncertainty values in this field mean that I collected over a larger area or that the scale of the map used to locate collecting sites did not allow for more exact georeferencing.