Biodiversity Data Journal :
Software Description
|
Corresponding author:
Academic editor: Viktor Senderov
Received: 16 Jun 2016 | Accepted: 29 Jul 2016 | Published: 01 Aug 2016
© 2016 Theresa Dellinger, Victoria Wong, Paul Marek
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Dellinger T, Wong V, Marek P (2016) Makelabels: a Bash script for generating data matrix codes for collection management. Biodiversity Data Journal 4: e9583. https://doi.org/10.3897/BDJ.4.e9583
|
Digitization of natural history collections allows easy access and reuse of the invaluable biodiversity data held within a collection by providing access to specimen level data through the Internet. Each digitized specimen in a database requires a unique catalog number to distinguish it from the many other biologically unique specimens within the collection. However, there are few open source barcode generators available, and of these even fewer platforms exist to enable the mass production of barcode labels required by natural history collections. We developed a low-cost, open source solution to generating data matrix barcodes with unique catalog numbers for use in the Virginia Tech Insect Collection.
Here we describe the makelabels script, which uses the open source Unix packages libdmtx and ImageMagick to generate unique specimen labels containing both a human-readable catalog number and a machine-readable data matrix barcode. The mass production of labels and use of both types of catalog symbology provides flexibility in specimen management and increased efficiency in digitization and specimen processing workflows.
collections management, curation, data matrix barcode, digitization, entomology collection, natural history collection, open source
Natural history collections (NHCs) are invaluable repositories of biological information documenting the biodiversity of our planet (
One force driving the acceleration of interest in NHCs is enhanced accessibility. In the past, using a biological collection has been limited to: physically visiting a museum’s holdings; requesting a loan of material; or less often, obtaining a spreadsheet or email detailing the label data associated with a specimen. These traditional means of access created an unintentional bottleneck where only a limited number of people were privy to the specimens and their associated data. Even if a museum had developed a catalog, frequently the included information was not very extensive and often limited to a species list that lacked specimen-level data. In contrast, digitization of a NHC serves specimen photographs, a database of associated label data (collection localities, coordinates, dates of collection, habitat notes, etc.), and georeferenced distribution maps online—where these data can be read, retrieved, and reused by anyone with an Internet connection. Complementary data such as field journal entries made by collectors or audio recordings of species’ songs can be added to the entry for a specimen. This approach places a wealth of previously inaccessible “dark” information into the hands of all interested parties (
Researchers are using these digitized databases to answer a wide range of important questions addressing: ecological processes and the biodiversity of species (e.g.,
A useful model of the digitization process has been described by
The Virginia Tech Insect Collection (VTEC) has begun digitizing its specimen data online to serve a wider audience. A member of the Symbiota Collections of Arthopods Network (SCAN,
However, like many NHCs, VTEC faces limited funding that impacts basic curation and management of the collection. In the interest of conserving funding as much as possible, and to share an open source resource with other collections, we wrote our own script to generate labels with unique specimen identifiers as a low-cost alternative to commercially available software or costly pre-printed labels. The VTEC uses 9-digit locally unique identifiers printed on labels and 36-digit universally unique identifiers (UUIDs) automatically assigned and stored in SCAN for specimen management. Here we describe a script we developed using open source libraries and a graphics package for creating specimen labels with a unique alphanumeric reference number that is also encoded in a data matrix barcode.
We developed a Bash shell script, makelabels, which uses readily-available open source software packages to create labels for specimens in the VTEC. Each label consists of a locally unique number and a corresponding data matrix barcode for each specimen (Fig.
We used runnable programs from the two open source packages, libdmtx 0.7.4 (
When executed, makelabels first creates a blank image to contain a page of labels and adds the lines that make up the grid. The program libdmtx generates each barcode into a temporary file and then makelabels merges the barcode and the alphanumeric VTEC number into the page image using the ImageMagick composite command. Using individual commands and intermediate files with makelabels allows for easy modification of the label appearance as needed and is scalable to use UUIDs longer than our 13-alphanumeric text.
Including both the symbology of the barcode and the alphanumeric code on a label allows for maximum flexibility as the label can be read by a person or an optical reader. The resulting labels are easily read by a number of free barcode scanner applications available for iOS and Android operating systems, eliminating the need to purchase a commercial optical reader.
A README.txt file accompanies the Bash script online at Github and describes the process of using makelabels. The printable page is encoded as a PNG file with letter-sized dimensions (8.5 X 11 inches or 216 X 279.4 mm). While printing preferences vary according to institution and conservation protocol, we use an Epson Stylus C88 inkjet printer with black DURABrite pigment-based ink (Epson America, Inc., Long Beach, CA). We print the labels on cotton archival 32 lb. paper using the photo quality mode of the printer to maintain anti-aliasing and resolution. Labels are cut by hand, according to the grid marks on the printed label page.
This project is supported by a NSF Collections in Support Biological Research (CSBR) award, DBI #1458045.
The script is fully documented and includes configuration parameters in README.txt. As written, a user can edit the script to change the overall size of the cell, which includes the alphanumeric VTEC code, the data matrix barcode, and the placement of the VTEC code and data matrix barcode within the cell. Font style and size can also be changed as needed. The complete path for a font is needed if the code is run in Linux.
Although the makelabels script is recommended for use on a Unix operating system, a Windows system may be used with a Unix-like command-line interface such as Cygwin (
Care should be taken when adjusting some of the configuration parameters. For example, the resolution is set to work with ImageMagick commands. Cell X and Y margins are defined in pixels to avoid handling problems with some printers, but changing the resolution will alter the pixel description and the overall appearance of the labels.
When initiating the software, the user can specify the beginning and ending sequence numbers to be printed on the labels. If no end number is provided, the code will stop after generating a total of 26 labels (start number plus 25 additional labels). A printer name can be specified as well for immediate printing. Otherwise, labels will be generated and saved as PNG files (e.g., page1.png, page2.png, …). Because the user specifies the beginning and ending number for each batch of labels, the user must keep track of which labels have been created previously to avoid duplicates.
The disadvantage of makelabels is that the file operations are assembled into a Bash script that is relatively inefficient and slow when compared to compiled programs. Now that the steps to use dmtxwrite and the commands in ImageMagick are known, a compiled program could be written using the libdmtx library to create a page of labels in memory and without using intermediate files in each run.
While we chose to write labels encoding only the unique VTEC specimen number, the makelabels script can be modified to include more information, such as a URL to a database containing a specimen record, and up to 2,335 alphanumeric numbers (approximately two-thirds of a page of text in 12-point font). Additionally, some barcode scanning applications will scan, track, and automatically export coded information into a file. This would obviate the need to manually enter each unique code into a database to pull the associated information for that item.
The concept of a natural history collection has expanded over time from whole, preserved specimens to include other biological collections, such as living stocks and cultures, and preserved tissues. Our labels can be used with these types of collections, and also as part of collection management in geology, paleontology, archeology, and related fields. For our purposes, the data matrix barcoded labels created by makelabels are an integral part of digitizing the VTEC collection (Fig.
We thank an anonymous software engineer for assistance in developing the makelabels script and reviewing the manuscript, and Patricia Shorter for her assistance with proofing different barcode symbologies, the preliminary generation of data matrix barcodes for this project, and testing the final codes with barcode readers. We greatly appreciate the helpful suggestions of two reviewers that improved an earlier version of this paper.