A visual identification key utilizing both gestalt and analytic approaches to identification of Carices present in North America (Plantae, Cyperaceae)

Abstract Images are a critical part of the identification process because they enable direct, immediate and relatively unmediated comparisons between a specimen being identified and one or more reference specimens. The Carices Interactive Visual Identification Key (CIVIK) is a novel tool for identification of North American Carex species, the largest vascular plant genus in North America, and two less numerous closely-related genera, Cymophyllus and Kobresia. CIVIK incorporates 1288 high-resolution tiled image sets that allow users to zoom in to view minute structures that are crucial at times for identification in these genera. Morphological data are derived from the earlier Carex Interactive Identification Key (CIIK) which in turn used data from the Flora of North America treatments. In this new iteration, images can be viewed in a grid or histogram format, allowing multiple representations of data. In both formats the images are fully zoomable.


Introduction
The last ten years may be remembered for the rebirth of plant taxonomy and systematics in a new guise, computational biodiversity informatics.For much of the earth, and North America in particular, botanical information that once required substantial effort to acquire is now reliably provided in seconds by such websites as the Global Biodiversity Information Facility (GBIF), Flora of North America, Missouri Botanical Garden's Tropicos, Encyclopedia of Life, United States Plants Database, and emerging regional herbarium networks.Plant biodiversity is now literally at everyone's fingertips.

State of the art plant identification systems
Traditional biological identification systems today are of two primary types; analytic and gestalt (K.Thiele, pers. comm. 2013).Two forms of analytic keys commonly used today are dichotomous and interactive matrix-based keys.Both are primarily text-based question systems that can yield static images upon the final determination.Conversely, gestalt keys, use an identifiable image of the organism in question.Similar to what is seen in field guides.
Analytic matrix-based keys are considered to be state of the art today The University Of Queensland 2006 due to their ability to scale up across hundreds of taxa.To use, users select characters to achieve a determination of the unknown taxon using a four-panel informational interface.The information panels often represented are 'characters available', 'characters chosen', 'entities available', and entities discarded'.Within this format, it is possible to insert thumbnail-sized, static images to accompany the text if the taxa numbers are relatively small (< 100).But when taxa numbers are higher (>100), their inclusion results in the information panel becoming too long to be usable, e.g. the Carices used here would require copious scrolling across its many meters of length.
Visual keys borrow from both gestalt and analytic methods.They use character matrices for initial pruning of the image set analytically.After a few characters choices the many hundreds of small images are reduced to a manageable set of bigger images.Now gestalt methods take over as the images become larger and truly informative.With this hybrid of functionality, featuring the best of both gestalt and analysis, a novel identification method is created that can cater to the neophyte as well as the expert.

Carex, Kobresia, and Cymophyllous: a model for scalability
Carex is the largest vascular plant genus in North America (Ball and Reznicek 2002).With two closely related genera, Kobresia and Cymophyllus, it forms the Carices of North America; all three are members of the family Cyperaceae, commonly called sedges but often erroneously referred to as grasses.These three genera share a number of basic morphological characteristics including having linear leaves and a fruit enclosed in a bag-like structure called a perigynium.All have small flowers that lack large, colorful petals and sepals.Plus they share one other important characteristic: they are difficult to identify.Nevertheless, they are morphologically distinct and relatively easily recognizable as a group.

The new visual key
The data used in this project are primarily derived from an interactive identification program to Carex that has been online since 2006 at both Utah State University and Louisiana State University (http://www.herbarium.lsu.edu/keys/carex/carex.html).During this time it has been consistently revised and is currently in version 21. (Suppl.materials 3, 4).Web statistics have been tracked from 2007.Data show that numerous individuals worldwide, government agencies, students in classrooms, and participants in identification workshops have repeatedly used the keys.Many users have graciously suggested revisions and clarifications that have increased their usability and performance.The key presented here reflects contributions from several individuals, innumerable field trips, and countless hours in herbaria both identifying and imaging specimens.It is only with such collaboration and effort that an image key to such a large genus can be created.

Goals
My goal in this project was to create an easy to use identification resource that maximized the value of high resolution images while enabling users to explore the distribution of morphological diversity within the genera.Query-able images.For example, to answer questions such as: how are species with trigonous achenes geographically distributed across Canada by province or territory?How common are species with two-sided achenes in species with leaf blades more than 10 mm wide?These sorts of hypotheses are easily answered in histogram mode Fig. 4. Because for the first time, side-by-side image comparisons are possible across species permitting comparative examination and discrimination among closely-related members of any complex, of which there are many, within the Carices.CIVIK is seen here: http://www.herbarium2.lsu.edu/aba/

Study area description:
This key is designed for use in North America, including Mexico.The original descriptive data was derived from Flora of North America (Ball and Reznicek 2002) and (Mackenzie 1940).My images come from fieldwork focused in eastern North America while other individuals have contributed images from other locations across North America.

Contributors
Steve Matson and Tony Reznicek both sent a DVD copy of their Carex field images.Lowell Urbatsch contributed his teaching-microscopy-images (http://www.herbarium.lsu.edu/keys/eee/b52.html).My images were collected from many field sites primarily in the northeastern United States.The New York Botanical Garden Press granted the use of the plates of both North American Cariceae volumes (Mackenzie 1940).The remaining images were found on the World Wide Web (WWW) and their owners (Forest Starr, Kim Starr, Nhy Nyugen, Ann Debolt) contacted by email to request permission for their use.The remaining image contributor, Robert Mohlenbrock, had made the image used here available on http:// www.plants.usda.gov/so it could be used without seeking permission.

Processing of images
To manage the large image numbers (e.g., Matson hundreds of images; Jones, many thousands), each set of images from each owner was segregated on a local drive.Predictably, across this many image contributors, naming conventions differed greatly, thus significant renaming of image files was required.The basic convention used was to include the taxon name, type of image, and the author in the file name.Another issue of note was the fact that many of these images had been prepared for delivery via the WWW, and had been re-sized.Larger file sizes were selected for inclusion while those that were originally designed as thumbnails were not used.Rarely, older images that were scanned from slides were either cropped or otherwise manipulated with Photoshop CS 3. Lastly, rotation of images for appropriate orientation was also often required.

Image sizes
Image sizes are variable and range from 40 K to over 13 MB.Line drawings and most images by Jones are at 2848 × 4288 with a maximal bit depth of 24.Matson's images were more variable as some images had been prepared for web use.They range from 2592 × 3888 to 550 × 689 with variable bit depths.Other contributed images are of intermediate sizes.

Imaging of Mackenzie's plates
New York Botanical Garden Press gave permission to image the plates in K. K. Mackenzie's two volume treatment of Carices of North America (Mackenzie 1940) for use in this project.All plates were imaged with a traditional copy stand, using a Nikon 300D camera with a 1:1 macro lens, and two halogen desk lamps for illumination using JPEG format.All images required batch-processing in Photoshop CS3 for color and a minor defect in skew.Additionally, to limit total file size of the project, the images were reduced to approximately one megabyte from three megabytes by resizing.

Primary data via export
The dataset was derived from an export of CIIK (http://www.herbarium.lsu.edu/keys/carex/carex.html) in comma separated values (CSV) from LUCID 3.4 Identification Software (The University Of Queensland 2006).These data were the template for the new secondary dataset (Fig. 1).The exported data were imported into Excel 2010 and the Excel PivotViewer plug-in generated the Commerce eXtensible Markup Language (CXML) version of the data (Suppl.material 1).This plugin has since been deprecated in favor of a command line tool, Pauthor (Microsoft 2010a, Microsoft 2010b).

Interface considerations in a micro-ontology
In Pivot Viewer with the Silverlight 4 format, the characters and states (C&S) are located in the searchable information pane on left, with the displayable information pane on right.This left pane is of a fixed width, lacking word-wrapping functions (Fig. 2).If all C&S Workflow of project information data mined were used, extensive scrolling would be required and thereby reduce the usability of the key.For this reason, long text strings in the C&S were edited for brevity.A 'less is more' approach was taken, with C&S being restricted to those that would be appropriate in an ontology.

Clustering issues in the graphical mode require a "normalization character state" *Visual keys require a normalization character state; or the image numbers must be standardized for graphical display*
If image numbers between species are not consistent, a representative or semantic image is required.This leading image permits true one-to-one comparisons over any number of taxa.Without it, accurate representations of the data would be obscured due to clustering.For this reason, only those taxa with a line drawing are presented here to allow for a oneto-one comparison across taxa.It was done early in development as a work-around to the differing number of images per taxon problem.Later unpublished works of this type deal with this issue in multiple ways (see 'Additional information').
To use this normalization feature, select 'Image by' at the base of the left information pane, then select 'Mackenzie, K. K.' from the information panel.Now, only grey scale images are used in a portrait format with an attention to the aspect ratio.All images are presented in the same fashion and uniformity in a grey scale that is easy to visually interpret.This adhoc commitment to Mackenzie's species list was done for this reason.

Data and images together
Images were added in small batches in a new Excel file.Character data were copy-pasted from the secondary spreadsheet to the third instance of Excel to form the final building file across multiple monitors.The Visual Carices of North America upon instantiation in default grid setting.

Tertiary data
The completed third spreadsheet is now run using the 'New collection tool' by selecting its icon in the ribbon panel of Excel.It generates two primary products; image tiles in numerous folders and a CXML file (Suppl.material 1).The control leverages Deepzoom technology (Microsoft 2008) to create a deep zoom image library (DZI) and deep zoom collection files (DZC) like those seen on Google or Bing maps (Fig. 3).This geometric series of images supports the zoom-ability of images.As the user zooms in, things get geometrically resolved without the penalty associated with a large image download.As users pan through a collection, they can see only what they desire.

Issues completing tertiary data for image tiles and CXML
Hardware and software issues were experienced at all stages.Testing revealed that while tiling a few hundred high resolution images with PivotViewer is manageable, using over a thousand high-resolution images made Excel unstable.Memory allocation as well as the processor spiking issues -limited development time and resulted in extended periods of waiting for test builds overnight or on a build across many days.The creation of the image tiles is best attempted with a state-of-the-art computer with a solid state drive.CIVIK total tile-set and cxml build-time was approximately 12 hours for the final presented build (Fig. 4).

Deployable image tiles sizes
The DZI files are nearly four gigabytes in file size and comprise over 250,000 imagetile files in over 18,000 folders with an associated CXML of 3.3 megabytes in size.A Silverlight application package (XAP) file is also required to drive the application.

Compile with Visual Studio
To compile with Visual Studio, open a new instance of a Silverlight application for the web in Visual Studio.Now add the references to PivotViewer on the main Extensible Application Markup Language (XAML) page in UserControl.Then add the URL to the CXML file to the XAML.CS code behind file.Then, build or compile the deployment package for placement on the server.

XAML and XAML.CS Code behind Files
See 'software technical features'

Deploy to web server
Ensure that the following Multipurpose Internet Mail Extensions (MIME) types are configured on server; significant development time was lost due to one of these settings not being in place.Coordinates:

History of Use
CIVIKhas been tracked via Google Analytics with the other later works of visual types.These combined works reveal that 13,933 visits occurred from 116 countries in 2464 cities over a three year period.An average dwell time of two minutes across the three works of type is seen here.(See Additional information and Suppl.material 6).

Considerations and discussion
While Silverlight is ideal for this data format, it will be deprecated (see http:// support.microsoft.com/gp/lifean45) as no future versions are scheduled for release.It will, however, be supported for ten years which will aid future works of this kind.Thankfully, HTML 5 versions are also now available for PivotViewer that enable the CXML format across all devices in a device agnostic fashion.This cross platform capability is exciting as it does not require the Silverlight runtime, so phone and tablets are enabled as well with HTML 5. HTML 5 versions have one other important advantage -a Google translate function is easily added in minutes to over 70 languages (see http://translate.google.com/about/).Opening the door to future iterations of high-resolution images supported by text that is translatable.

Geographic coverage
Description: The identification key can be used for species occurring in United States, Canada, and Mexico.Several species have a much wider distribution, hence the key has some value in other regions as well.90 and 15 Latitude; -180 and -45 Longitude.

Figure 3 .
Figure 3. Tiled image set illustrating the change in file size as well as number of images by creating a geometric series of images

Figure 4 .
Figure 4.An Interactive Visual Identification Key to Carices of North America beta version.