TaxaGloss - A Glossary and Translation Tool for Biodiversity Studies

Abstract Background Correctly identifying organisms is key to most biological research, and is especially critical in areas of biodiversity and conservation. Yet it remains one of the greatest challenges when studying all but the few well-established model systems. The challenge is in part due to the fact that most species have yet to be described, vanishing taxonomic expertise and the relative inaccessibility of taxonomic information. Furthermore, identification keys and other taxonomic resources are based on complex, taxon-specific vocabularies used to describe important morphological characters. Using these resources is made difficult by the fact that taxonomic documentation of the world's biodiversity is an international endeavour, and keys and field guides are not always available in the practitioner's native language. New information To address this challenge, we have developed a publicly available on-line illustrated multilingual glossary and translation tool for technical taxonomic terms using the Symbiota Software Project biodiversity platform. Illustrations, photographs and translations have been sourced from the global community of taxonomists working with marine invertebrates and seaweeds. These can be used as single-language illustrated glossaries or to make customized translation tables. The glossary has been launched with terms and illustrations of seaweeds, tunicates, sponges, hydrozoans, sea anemones, and nemerteans, and already includes translations into seven languages for some groups. Additional translations and development of terms for more taxa are underway, but the ultimate utility of this tool depends on active participation of the international taxonomic community.


Introduction
The correct identification of organisms is vital to all fields in biology. Species identification is a particular challenge for those working in biodiversity science where researchers and conservation practitioners often encounter a broad diversity of species from a variety of taxa. These often include rare, poorly known and undescribed organisms as well as those that are outside the researcher's area of expertise. In these cases workers must draw on a scattered literature of taxonomic keys, field guides, and primary taxonomic literature to assist with identifications. This literature includes complex, taxon-specific vocabularies, and is inaccessible to those who are not familiar with the technical terms. For many groups glossaries explaining these terms are difficult to find and are commonly published in journals with limited circulation. Few glossaries are available in a searchable online format and they are seldom illustrated. In addition, researchers must also often consult species descriptions or taxonomic monographs in multiple languages, but glossaries are available in a few languages, mainly English, French and German.
Biologists working with ascidians provide an example of the scope of this linguistic challenge. The 780 publications available in the source list of the Ascidiacea World Database as of October 2015 (Shenkar et al. 2015) are written in 12 languages. While slightly more than half have been published in English, a significant number are in German or French (Fig. 1a). The primary tunicate taxonomists working and training students today are based in Brazil, Israel, Spain, Japan and the United States. The community of students and taxonomists in training is even more linguistically diverse. Four courses in tunicate taxonomy held between 2006 and 2014 at the Smithsonian Tropical Research Institute's Bocas del Toro Research Station hosted students from 21 countries including native speakers of eight languages (Fig. 1b). Expert instructors participating in the same four courses came from five countries with four native languages. Similar examples could be drawn from a number of other taxa.
The diversity of languages in use in the taxonomic literature as well as among the current practitioners and students of a taxon means that translations between a large number of language pairs are needed. To a large extent GoogleTranslate can be used to obtain translations of background text in scientific publications between most pairs of languages, and taxon names are given in multiple languages in the World Register of Marine Species (Costello et al. 2013). However, species descriptions and keys, which are largely based on morphology and involve a number of highly specialized taxon-specific terms generally fail to translate adequately with such online tools. The idiosyncratic nature of the taxonomic literature means that in some cases researchers may have to translate technical terms by going through a third language, if they can find any translated glossaries. To aid in this process we developed TaxaGloss -a multilingual illustrated glossary and translation tool. We hope that it will increase access to definitions of morphological terms and aid Publications: English -408; German -134; French -181; Italian -16; Russian -10; Spanish -4; Portuguese -1; Latin -10; Norwegian -1; Danish -10; Japanese -1; Swedish -1.
taxonomists, biodiversity researchers, students and others in their understanding of the taxonomic literature.

Design and Implementation
TaxaGloss has been designed as a PHP/JavaScript web application integrated into the Symbiota software platform (Gries et al. 2014). Symbiota follows the Open Source paradigm (Gries et al. 2014) allowing the application to be easily implemented within any Symbiota portal. TaxaGloss is currently implemented as part of the Marine Life of Panama Portal. Data consisting of terms and definitions in any language, images, citations and links to formal ontologies can be entered manually or batch processed. Data can be used to produce output for a single term, a single-language illustrated glossary, or a custom translation table (Figs 2, 3, 4). Single language glossaries contain the definition, and an illustration (optional) for all terms pertaining to a selected taxon as well as citations and acknowledgements (Fig. 3). Custom translation tables, with up to four data fields, can be created by selecting terms and/or definitions in multiple languages (Fig. 4).
TaxaGloss currently includes glossaries for six marine taxa: macroalgae (divisions Rhodophyta and Chlorophyta, and class Phaeophyceae), sponges (phylum Porifera), hydroids (phylum Cnidaria, class Hydrozoa), sea anemones (phylum Cnidaria, class Anthozoa, order Actinaria), nemerteans (phylum Nemertea), and tunicates (phylum Tunicata, class Ascidiacea). These foundation taxa were selected because they are the focal groups in the BocasARTS project and they have been or will be subjects of taxonomy training courses as part of the Smithsonian Tropical Research Institute's (STRI) Training in Tropical Taxonomy program. Each glossary began with a core list of terms and definitions complied by the taxon editor for each group. These lists include original content as well as The TaxaGloss display of a single term from the Nemertea. terms or definitions drawn from previously published lists of terms (Table 1) and online glossaries. Translations of the terms and the definitions were provided by multilingual taxon editors or solicited from qualified colleagues and collaborators. Images were provided by taxon editors, colleagues and collaborators, and students participating in the taxonomy training program. Finally, a scientific illustrator was commissioned to produce bauplan illustrations and schematics for a subset of the terms. At its launch in 2016, TaxaGloss included over 1,300 English terms, 500 of which are connected to a photograph or illustration (Table 1). Depending on the taxon, each term had been translated into 2-7 languages (Table 1).    Kott 1985, Kott 1990, Kott 1992, Kott 2001, Nishikawa 1995, Ishikawa et al. 1986, Berrill 1950

Utility, Discussion and Future Prospects
TaxaGloss offers an open-access, web-based, illustrated glossary for technical terms used in biodiversity studies and systematics. This information is otherwise scattered throughout an often difficult to access literature. By increasing accessibility to this information TaxaGloss will provide an entrée to the literature for researchers and students who do not have access to mentors with expertise in their taxon of interest. Easy access to a single glossary may also help to standardize usage of obscure terms, as the inconsistent use of terms that plague the taxonomic literature in some groups may stem from lack of accessible reference sources.
Although English is the current language of taxonomy and systematics, this has not always been the case. Much useful data and many original species descriptions have been published in other European languages and for some taxa there is a significant body of literature in Russian and a number of Asian languages. Today, many of the users of taxonomic and biodiversity data are not native English speakers. This provides a challenge for working taxonomists as well as for training local taxonomists and parataxonomists who may have expert knowledge of the organisms but may face a linguistic impediment to understanding the literature. We expect that TaxaGloss will assist such practitioners in their daily work and in training students. Primarily it provides an open-access, user-friendly illustrated list of terms, accessible in a number of languages. For many taxa there are glossaries published in few languages, which are available only in obscure publications that may not be easily accessible. For other taxa no such glossaries are currently available. The Table 1.
Summary of taxa and sources in the first version of TaxaGloss.
illustrations may be particularly valuable for helping undergraduate students understand the internal anatomy of marine invertebrates in lab classes that involve dissections. Students in R. M. Rocha's laboratory in Brazil have already found the glossary to be useful in their daily work. They access it on their smartphones and tablet computers as they work their way through dissections. Secondarily, the translation table tool will facilitate translations of the literature, and will possibly result in a more congruent and homogeneous use of taxonomic terms. We also anticipate this electronic tool will facilitate scientists at any stage of their career, and especially younger generations of researchers in countries with few or no remaining taxonomic experts, in understudied groups such as those available on TaxaGloss.