A dataset for examining trends in publication of new Australian insects

Abstract Australian Faunal Directory data were used to create a new, publicly available dataset, nai50, which lists 18318 species and subspecies names for Australian insects described in the period 1961–2010, together with associated publishing data. The number of taxonomic publications introducing the new names varied little around a long-term average of 70 per year, with ca 420 new names published per year during the 30-year period 1981–2010. Within this stable pattern there were steady increases in multi-authored and 'Smith in Jones and Smith' names, and a decline in publication of names in entomology journals and books. For taxonomic works published in Australia, a publications peak around 1990 reflected increases in museum, scientific society and government agency publishing, but a subsequent decline is largely explained by a steep drop in the number of papers on insect taxonomy published by Australia's national science agency, CSIRO.


Introduction
This paper examines trends in publication of new Australian insects over the past 50 years. It is based on 'nai50' (Suppl. material 1), a dataset compiled by the author and freely available at Zenodo (https://zenodo.org/record/10481; http://dx.doi.org/10.5281/zenodo. 10481) and as a supplement to this paper. The raw materials for nai50 were names lists for insects from the Australian Faunal Directory (AFD). The AFD is an online resource compiled by taxon specialists, and is maintained and updated by the Department of the Environment, Australia (http://www.environment.gov.au/biodiversity/abrs/online-resources/ fauna/afd/home).
The nai50 dataset is based on a snapshot of AFD insect data in early 2014, and will not be updated. It should therefore not be used as a substitute for the AFD as a source of taxonomic information on Australian insects. Another reason not to use nai50 as a taxonomic resource is that the AFD contains errors and formatting inconsistencies (see below). While I tried to correct as many of these as possible in nai50, I may have inadvertently introduced new errors.

AFD downloads and initial processing
In early 2014, the AFD allowed for export of names lists for any higher taxa, e.g an insect order or superfamily, as comma-separated value (CSV) files. A download limit prevented export of a names list for all insects, so I chose the largest higher taxa for which export was possible. I downloaded most of the lists in March 2014, more in April 2014 while AFD staff addressed an access problem on the AFD Web portal, and the remainder in June 2014.
The CSV files were merged, converted to a tab-separated value (TSV) file and processed as described below. Processing was done on the command line using GNU/Linux utilities and the AWK programming language, or in Gnumeric spreadsheet software. The merged AFD names lists were filtered to: contain replacement names for species or subspecies described in the period 1961-2010; valid names with minor emendations (e.g. a published change of species epithet to correct lack of agreement in gender with genus) were left as emended. 'Synonym of... .' entries contain valid names for species and subspecies described in the period 1961-2010 and subsequently placed in synonymy. In both cases (replacement and synonymy) the information in nai50's Name_type field comes from the relevant AFD names list. • Rank was modified to reflect the rank of the taxon as described; e.g. if an author described a subspecies and the name was later synonymised with a species, the Rank field in nai50 contains 'Subspecies', not 'Species'. • Citation (PUB_PUB_FORMATTED in AFD) was modified to remove AFD markup, correct errors and make formatting more consistent for unique publications (i.e., I made the text string for a particular publication the same for all entries where it appeared).

Further processing
In the following list of fields in nai50, the fields I added are marked with an asterisk and described in bracketed notes: *Pub_country (the country in which the work describing the species or subspecies was published) *Pub_ento (yes/no, for whether the work describing the species or subspecies was published in an entomological journal or book) *J_title (journal title from Citation field, or 'na' for Book and Chapter in Book) *A_publr_class (for works published in Australia, whether the publisher was a government agency, a museum, a scientific society or 'other', e.g. a private individual; 'na' for works not published in Australia) *A_publr (for works published in Australia, the name of the publisher, e.g. CSIRO; 'na' for works not published in Australia)

Data cleaning
I used programmatic checks on nai50 to find AFD omissions, errors and formatting inconsistencies, which were numerous. Omissions and errors were corrected with the aid of original publications and online resources compiled by taxon specialists. I am not a taxon specialist for any insect group, so the taxonomic errors I corrected were mainly those detected by programmatic checks, such as comparing the AFD fields YEAR and PUB_PUB_YEAR, and AUTHOR and PUB_PUB_AUTHOR for all names. For checking the link between name and citation, the BioNames project (http://bionames.org/) was particularly helpful. Bibliographic data on publications came from Worldcat (https:// www.worldcat.org/) and the National Library of Australia (http://trove.nla.gov.au/).

Analysis
Data summaries and working tables were generated from nai50 using AWK commands. For charting and consistency checks, working tables were imported into Gnumeric spreadsheet software. The statistics in the analysis are purely descriptive.

Overview and summary statistics
The dataset 'nai50' is a 6.3 MB plain-text, tab-separated table with 22 columns and 18319 rows (including header row). It contains the names of 17905 species (97.7% of all names) and 413 subspecies (2.3%) of Australian insects described in the period 1961-2010. The 18318 total may include some introduced insects and is likely to omit some recently described species and subspecies, as well as species and subspecies in data gaps in the AFD (see 'AFD limitations' above). However, I regard the total as large enough for the primary purpose of this study, which is to identify trends in publication.
The 18318 species and subspecies in nai50 were described in 3628 publications by ca 1460 taxonomic authors. The latter number is tentative because it is not always clear from a citation alone whether 'Smith, A.' is the same author as 'Smith, Publications in the USA, UK, Germany and New Zealand contributed more than half of the remaining taxonomic works and names (Fig. 1).
The AFD recorded 523 of the 17905 species names (2.9%) and 68 of the 413 subspecies names (16.5%) as synonymised. Synonymised names have been included in the trend analysis below.
The 18318 species and subspecies were in 26 orders, with Coleoptera, Diptera, Hymenoptera and Hemiptera contributing more than three-quarters of all names (Fig. 2).
Publications were almost entirely 'order-loyal'; only four of the 3628 works (naming 29 species or subspecies) included new names from more than one insect order.

Trends in authorship
A single publication can contain names with different numbers of taxonomic authors, e.g. 'Smith in Jones & Smith, 1998' for one name and 'Jones & Smith, 1998' for another. The following trends count publications in a year, but nine of the 3628 publications have been double-counted for the reason just mentioned.
There was a marked and steady decline in the proportion of publications with singleauthored names (Fig. 3), from ca 90% of publications in the 1960s to about half of all publications today.  The number of publications introducing names with two taxonomic authors increased steadily over the 50-year sampling period, with an increase in three-author names starting in the 1990s (Fig. 4). A six-author name was published in 1998, and a seven-author name in 2005.
Having more taxonomic authors, however, did not lead to correspondingly greater taxonomic productivity, i.e. more new names per publication, and single taxonomic authors consistently introduced the majority of new Australian species and subspecies (Table 1). Another strong trend in authorship was a steady increase in the number of publications containing names with 'A in B' authorship, e.g. 'Smith in Jones & Smith, 1998' (Fig. 5). The first 'A in B' authorship in nai50 was in 1974.  Mean new names/publication by decade in nai50 with 1, 2 and 3 taxonomic authors (number of new names).

Trends in publishing
Following a strong increase at the beginning of the 1980s, the number of new species and subspecies varied around an average of ca 420 per year for the last 30 years of the sampling period, with an average of 430 in the 1980s, 427 in the 1990s and 410 in the 2000s (Fig. 6).  The publication of new names in journals was fairly steady over the 50-year sampling period (Fig. 7) New names also appeared fairly steadily in books, but books only first became significant outlets for new names in the 1980s (Fig. 7). Beginning in the mid-1980s, the proportion of publications with new Australian insect names which were devoted to entomology abruptly declined, reaching a 50-year low of about one-third of publications in 2009 (Fig. 8).
There was a small increase in the average number of new names per publication (Table 2)   50 years (Table 2). Average new names per publication was greater in non-Australian publications than in Australian publications in nine of the 50 years, and five of the nine were in the 2000s. The proportion of publications containing only one new Australian insect name declined, but not dramatically, from ca 55% in the 1960s to ca 45% in the 2000s (Fig. 9). Note that this is not the same as the proportion of publications containing a single new name; a publication reviewing a regional fauna, for example, might contain new names for many species or subspecies, only one of which is Australian. Mean new names/publication by decade in nai50 for Australian, non-Australian and all publications (number of publications).

Figure 9.
Percentage of publications by year in nai50 containing only one new species or subspecies name (Suppl. material 10).

Trends in Australian publishing
The number of publications containing new Australian insect names varied surprisingly little around a long-term average of ca 70 publications per year (Fig. 10). Higher values around 1990 and lower values more recently are largely explained by a distinct peak in Australian publications (Fig. 11). At the peak, a little more than half the taxonomic works containing new names were Australian publications, while the proportion had dropped to about a third by the end of the 2000s. To explore this trend more closely, I categorised Australian publications by publisher and publisher class (see 'Further processing', above). Overall statistics are given in Table 3.     Five-year moving averages for the three main Australian publisher classes (Fig. 12) show that the 1990 peak in Australian publications reflected broadly synchronous peaks in agency, museum and society publishing. While museum and society publishing later declined only slightly from their peaks, agency publishing dropped precipitously (Fig. 12). All but two of the 438 agency publications from 1961-2010 were produced by CSIRO, the Australian government's national science agency, and the 436 CSIRO publications contained 5366 new species and subspecies, or 29% of the total for the 50-year period.
Tracking publications (Fig. 13) and names (Fig. 14) in three of the CSIRO journals reveals an interesting pattern. Australian Journal of Zoology published an increasing number of    (Figs 13, 14).

Discussion
The nai50 dataset provides an objective basis for identifying long-term research and publishing trends in Australian entomology, and is readily extendable. Since the dataset is now in the public domain, interested users are welcome to keep it up to date, extend it backwards in time and add new fields, such as author age and affiliation at time of publication.
Users are also welcome to search for and correct errors, which are undoubtedly still present in nai50. For every hour exploring and analysing nai50, I spent several hours detecting, investigating and correcting omissions, errors and formatting inconsistencies in AFD data. AFD data validation, both at the time of data entry by specialist compilers and later by AFD staff, could usefully be extended and improved. In correspondence with the author, AFD staff have said they are aware of the data cleaning issues in AFD and hope to address them more effectively when additional resources are made available to the project.