Corresponding author: Cynthia S. Parr (
Academic editor: Edward Baker
The Encyclopedia of Life (EOL,
In this paper, we review recent developments added for Version 2 of the web site and subsequent releases through Version 2.2, which have made EOL more engaging, personal, accessible and internationalizable. We outline the core features and technical architecture of the system. We summarize milestones achieved so far by EOL to present results of the current system implementation and establish benchmarks upon which to judge future improvements.
We have shown that it is possible to successfully integrate large amounts of descriptive biodiversity data from diverse sources into a robust, standards-based, dynamic, and scalable infrastructure. Increasing global participation and the emergence of EOL-powered applications demonstrate that EOL is becoming a significant resource for anyone interested in biological diversity.
Biodiversity science has produced hundreds, if not thousands, of isolated database resources (
The Encyclopedia of Life (EOL,
EOL’s focus on description and illustration complements several related global efforts. The Catalogue of Life Partnership (CoL,
The task of documenting all life is vast, perhaps too vast for the relatively small community of formally-trained biodiversity experts (
EOL’s first phase established a basic content aggregation and curation infrastructure with the original website launching in 2008 (
In this paper, we review recent developments added for Version 2 and subsequent releases through Version 2.2. We outline the core features and technical architecture of the system. We summarize milestones achieved so far, both to present results of the system implementation and to establish baselines upon which to judge future improvements and comparisons with other systems. Finally, we discuss the significance of the Encyclopedia of Life to the landscape of biodiversity informatics.
<br/>
EOL Version 2 involved a complete redesign of page styles to be more personal and engaging. In addition to the “March of Life” (a changing set of images linked to selected EOL pages), the homepage (
Commenting by users was available in the first version of EOL, but it has become a more central feature in EOL Version 2. Comments are now displayed much more prominently and are incorporated into EOL Newsfeeds, which also aggregate user actions relevant to the topic of the newsfeed. Newsfeed topics include users, taxa, collections and communities. EOL members (users who register for accounts on the site) are notified of responses to their comments and actions, and email notifications from newsfeeds can be customized in a preferences panel. The new EOL commenting system resulted in a roughly 4-fold increase in the rate of commenting compared to Version 1.
With the addition of a WYSIWYG editor to the existing text contribution interface, the authoring of taxon descriptions in the EOL interface has become easier in Version 2, and over 7,000 articles have been contributed in this way. In addition, we have introduced a link object so that contributors can submit well-described links to external resources; these are found on the Resources tab.
EOL Version 2 introduced the ability for members to form communities and build collections (of taxa, of image objects, of other collections, etc.) on EOL, as described more fully below (Implementation). EOL collections allow users to collaborate on projects and to annotate and arrange EOL content from a personal point of view. Since the content of collections is available through the EOL API (see
Most of the 1.9 million species described by science (
The EOL Version 2 redesign included a complete rewrite of EOL’s presentation layer with the goal of delivering content in meaningful ways to the widest possible audience regardless of the recipient's device, ability or location. The structure, style and client-side behavior components of each page were separated and rewritten using progressive enhancement techniques (
Design and architectural changes meet the World Wide Web Consortium (W3C) recommended Web Content Accessibility Guidelines (WCAG) 2.0 (
In partnership with
To better support beginning users, EOL now provides pages on general topics such as “What is biodiversity?” and introductory pages to major groups of organisms. Some of these pages are adapted from partner projects such as the
Support was provided by John D. and Catherine T. MacArthur Foundation (93466-0 amendment to grant 06-89123-000-GEN), Alfred P. Sloan Foundation (2009-6-076), Smithsonian Institution, Marine Biological Laboratory, and Harvard University.
Homepage:
Wiki:
Blog:
Programming language: Ruby on Rails, PHP
Service endpoint:
Type: Git
Other
Third-party content copyright remains with rightsholders. All content is either in the public domain or licensed for re-use with Creative Commons licenses. All but non-derivative ND licenses are accepted for third-party content (see
Fig.
Resource documents made available by content partners define the text and multimedia being provided as well as the taxa to which the content refers, the associations between content and taxa, and the associations among taxa (i.e. taxonomies). Expert taxonomists often disagree about the best classification for a given group of organisms, and there is no universal taxonomy for partners to adhere to (
This taxonomic reconciliation process involves comparing the preferred scientific names, synonymy, and taxonomy from an incoming resource document to the same information from all previously indexed resources. It is designed to merge taxa based on synonymy (for example when the preferred name of one taxon is in the synonymy of another) and keep taxa separated that are homonyms (the same scientific name appearing in two distinctly different clades like
Partners can provide common names and synonyms as part of their taxon definitions. Synonyms are used by EOL to help determine which taxon definitions should be aggregated into the same Taxon Pages. They are also valuable search keywords that help users find the pages they are looking for.
Previous studies suggest that common names are often more valuable for search than scientific names or synonyms (
Taxon Pages are the main organizational unit of EOL, presenting a standardized page for every taxonomic entity that the system recognizes. Each Taxon Page has 9 tabs: Overview, Details, Media, Maps, Names, Community, Resources, Literature, and Updates, plus an additional tab for EOL curators, Worklist. The default tab, Overview (Fig.
The Community tab offers information about what EOL Communities and Collections are interested in the taxon, and who the curators of the taxon have been. The Updates tab lists all of the comments on the Taxon Page as well as statistics about the content on the page, including the page’s Richness Score (see
Images, text, videos, sound files, and maps provided by content providers and EOL members are referred to as “Data Objects”. Data Objects are the building blocks of EOL. Taxon Pages are populated through the aggregation of relevant Data Objects from multiple sources. Each Data Object also has its own dedicated page that contains information about the taxon (or taxa) the Data Object is associated with, license information, all available source and attribution information, a tool for rating the Data Object, links to other versions of the Data Object, comments on the Data Object, and, for non-text objects, a text description (caption) if available. These Data Object Pages are accessible through links from EOL Taxon Pages and through their own unique URLs (e.g.
Initially, EOL harvested resource documents formatted according to an XML transfer schema drawing from standards such as
Most EOL content is aggregated via content partner tools (designed for projects that have large amounts of content to share) or added directly to the web site by users. Any EOL member can add and manage an EOL content partner account through their member profile (see
Currently, EOL members can add text objects, also known as articles, directly to EOL using the “Add an Article” button on the Details tab. Multimedia objects cannot be uploaded directly to EOL but must be added through partners such as Flickr, Wikimedia Commons, iNaturalist, Vimeo, YouTube, and Soundcloud.
EOL has developed a Richness Score for taxon pages (Fig.
EOL Communities provide a way to group users. The primary value of this feature at the moment is to share the management of different EOL Collections. They also provide a simple forum through the associated newsfeed. Collections provide a way for users to organize, annotate, and share the content on the site. Collections may range from species lists for local areas (e.g.
EOL provides curation tools for volunteer data curators. All curators must register under their real names. To facilitate participation of EOL members with different levels of expertise, three different curator levels are distinguished. As of April 2014, almost 300 EOL members have registered as assistant curators and over 1,300 members have been approved as full or master curators.
The Assistant Curator status requires no qualifications and conveys limited curation powers. Assistant Curators can add taxon associations to data objects (e.g., to identify organisms shown in an image), but these associations are marked as "unreviewed" until confirmed by a Full Curator. Assistant Curators can also add common names, select preferred common names, select exemplar images and articles, and crop image thumbnails. They are encouraged to add text and help find problems that Full Curators can resolve. Full Curators must have credentials (e.g. relevant professional affiliations, publications, membership in a professional association). In addition to the powers of Assistant Curators, they can trust or untrust text or multimedia objects and select preferred classifications for taxon pages. Master Curators can manage taxon concepts (overriding the automated reconciliation process by merging or splitting classifications featured on a given taxon page) and delete comments that do not adhere to EOL community policies.
Untrusted content is hidden from public view but still visible to Full and Master Curators for further review. Curation actions and comments are reported to content providers (Feedback, in Fig.
The EOL website search is configured to find scientific names and common names, with preference in search result ordering given to preferred scientific names (names that have been manually selected by curators as “preferred” for a taxon) first, followed by preferred common names, and synonym. EOL search also indexes Communities, Collections, EOL members, Data Objects, and EOL documentation pages, and search results can be filtered by these categories. If there is a best result, the system takes the user directly to that taxon page, with an option to return to the search results page to view other results.
The
EOL Version 2 provided an opportunity to significantly improve the hardware and software infrastructure of EOL. The entire software and hardware stack supporting the serving of
The EOL technical team uses a modified version of the Scrum software development framework (
EOL has a worldwide audience including experts, enthusiasts and casual visitors. About 39% of user sessions originate in the United States and more than 47% of user sessions originate in countries where English is not an official language. Starting with v2, visitors registering to become EOL members were invited to select one or more audience categories to describe themselves. Of 6,410 people who self-identified by 18 April 2014, 47% chose "enthusiast", 36% chose "student", 20% chose "educator", 18% chose "citizen scientist", and 20% chose "professional scientist". However, this distribution may not reflect the more than 73,000 current EOL members or the vastly larger number of visitors who never register or who encounter EOL content primarily via social media channels.
Experts and enthusiasts are encouraged to participate in EOL as content curators. As of April 2014, almost 300 EOL members have registered as assistant curators and over 1,300 members have been approved as full or master curators.
At least in North America, the formal education audience is an important demographic for EOL. We see from Google Analytics that there are increases in the use of the site when most schools are in session. The EOL Learning & Education group also actively posts information on about 15 listservs, including the National Science Teachers Association (NSTA), Scuttlebutt (NOAA Marine Education site) and the Ecological Society of America's EcoLogic Listserv.
EOL’s growth in overall information, provider resources, and membership has steadily increased (Fig.
Still, most EOL pages remain without content, i.e., EOL provides nothing but a taxon name, and in some cases author information and a reference. Overall, EOL has indexed about 3.5 million taxa. This represents most of the 1.9 million extant (
Closer examination indicates that EOL has an uneven distribution of content across languages, licenses, and topics. While EOL has vernacular names in 163 languages (Table
To date, users have created more than 5,000 EOL Collections. Many collections (approximately 35%) are for specific geographic regions and represent user-generated checklists that could be useful for refining map queries in areas where occurrence data are not yet available. Presence of a taxon or object in many user-generated Collections could be used (by EOL or by others) to sort or filter search results so that they are most relevant to user needs. Collection statistics, along with traffic statistics, could also help researchers explore the factors that make an organism or data object more engaging to broad audiences.
Though there is room for growth in curation activity, EOL is increasingly in a position to improve data quality across its network of providers. In July 2013, EOL had 1,258 registered curators (250 Assistant, 1,001 Full, 7 Master) of which 163 have been active in the last 12 months. In comparison, iNaturalist has 94 curators and the World Register of Marine Species has 826 editors (a thoughtful analysis of curation power across projects with different models is beyond the scope of this paper). The majority of data objects are considered trusted (92%), most having been acquired from authoritative sources. An average of 905 objects per month are being curated. Assistant and Full Curators have different patterns of activity, not surprisingly given their different access to tools (Fig.
In the period from August 2012 through July 2013, EOL was visited by 3.7 million unique users. About 44% of visits are from North America (including Mexico). Thirteen countries on other continents contributed a significant number of visits.
EOL has established its role of improving access to biodiversity information by aggregating and standardizing descriptive information and multimedia objects currently available across many otherwise isolated resources. It provides the infrastructure to connect both major hubs and independent projects (
EOL complements long-term archives and metadata registries, e.g. DataONE (
By taking a phased approach (phase 1 of core infrastructure and phase 2 of engagement), EOL has successfully built a professional, usable platform at a scale appropriate to its task of serving global biological information to multiple international audiences. Because it is scalable, as EOL grows, its Richness Scores can be used to assess the availability and quality of knowledge across the tree of life, especially when extended to structured data. The scores could also enable assessment of individual contributions and standardization (
Several challenges remain to be tackled in future phases. While there is some evidence (growth in collections, emergence of third party applications, curator activity, user traffic) of effective impact on and engagement by various audiences, tools for community and curator engagement are not as successful as hoped and so they may require more tailored experiences and effective feedback (
The next phase of EOL moves beyond the limits of encyclopedic text and multimedia to add the ability to ingest and serve highly structured data (numeric and controlled vocabulary terms with rich semantics) about the attributes of and relationships among organisms (Parr et al. in review). In the same way that EOL has helped to bring together and connect text and media from isolated sources, we aggregate structured data to provide a broad-scale view of analyzable biodiversity data. EOL’s standardized open access also facilitates new text mining or crowd-sourcing efforts to extract structured data about biological diversity, e.g.
Support was provided by John D. and Catherine T. MacArthur Foundation, Alfred P. Sloan Foundation, Smithsonian Institution, Marine Biological Laboratory, and Harvard University. The production hardware infrastructure for the EOL website is supported by the Harvard Faculty of Arts and Sciences (FAS) Sciences Division Research Computing Group. We thank all of our providers and global partners, Eli Agbayani, Tracy Barbaro, Dana Campbell, Vitthal Kudal, Erick Mata, David Patterson, and Mark Westneat. Leo Shapiro and Dawn Field provided helpful comments on the manuscript.
Conceived and designed the experiments: CSP NW BC MS. Coded the software: PL JR LW AG. Managed data ingestion: JAH, KSS CSP PL. Provided detailed requirements and tested the software: JAH KSS CSP JTGH MS. Performed analyses: KSS CSP. Wrote the paper: CSP NW JAH KSS KL PL LW AG.
Google PageRank™ of various biodiversity websites, per
Languages of common (vernacular) names.
Language | Common Names |
---|---|
English | 690163 |
Spanish | 114579 |
Chinese | 87643 |
French | 85973 |
German | 69945 |
Japanese | 51432 |
Portuguese | 42497 |
Italian | 39264 |
Czech | 37455 |
Russian | 35379 |
Danish | 30775 |
Dutch | 30775 |
Finnish | 29785 |
Polish | 24918 |
Other | 280057 |
Languages of text articles.
Language | Articles |
---|---|
English | 3096313 |
Spanish | 58978 |
Chinese | 11678 |
Arabic | 4807 |
Portuguese | 2373 |
Dutch | 1143 |
Indonesian | 173 |
French | 107 |
Other | 10180 |
Subject | Articles |
---|---|
Distribution | 805503 |
Molecular Biology | 434545 |
Combined Topics | 354322 |
Type Information | 326720 |
Habitat | 292478 |
Conservation Status | 144969 |
Threats | 94140 |
Morphology | 66571 |
Conservation | 65618 |
Diagnostic Description | 61512 |
Management | 57894 |
Trends | 57888 |
Size | 55453 |
Description | 49074 |
Associations | 38677 |
Taxon Biology | 26861 |
Uses | 24458 |
Trophic Strategy | 21563 |
Population Biology | 17767 |
Taxonomy | 16301 |
Ecology | 15060 |
Reproduction | 14996 |
Notes | 14440 |
Migration | 13991 |
Cyclicity | 11880 |
Life Cycle | 9759 |
Life Expectancy | 8875 |
Behavior | 6391 |
Key | 6118 |
Diseases | 4325 |
Use | 4283 |
Evolution | 2158 |
Risk Statement | 2022 |
Look Alikes | 1897 |
Dispersal | 1649 |
Functional Adaptations | 1438 |
Genetics | 1000 |
Growth | 785 |
Barcode | 720 |
Education Resources | 646 |
Physiology | 269 |
Cytology | 129 |
Taxa with Content
Comma-Separated-Values
Taxon pages with content (at least one text article, image, map, video, or sound).
File: oo_6172.csv
Number of Resources
Comma-Separated-Values
Published resources (content import files). A provider may submit more than one resource file, for example when providing different kinds of content.
File: oo_6173.csv
Registered Members
Comma-Separated-Values
Registered EOL members.
File: oo_6174.csv
License Distribution
Comma-Separated-Values
Distribution of Creative Commons and other licenses for data objects on EOL.
File: oo_6175.csv
Curator Activity
Comma-Separated-Values
Activity patterns of EOL Assistant Curators compared to Full and Master Curators.
File: oo_6176.csv
Data Object Rating
Comma-Separated-Values
Data Object rating patterns of EOL members in relation to their curator status.
File: oo_6177.csv