Digging for historical data on the occurrence of benthic macrofaunal species in the southeastern Mediterranean

Abstract Background The benthic macrofaunal biodiversity of the southeastern Mediterranean is considerably understudied compared to other Mediterranean regions. Monitoring biodiversity in this area is crucial as this region is particularly susceptible to biological invasions and temperature alteration. Historical biodiversity data could provide a useful baseline for monitoring potential changes and provide informarion to support a better understanding of the possible effects of anthropogenic activities on marine benthic communities. New information In this study, performed under the LifeWatchGreece Research Infrastructure, we present historical benthic occurrence data obtained from the sampling expedition carried out in 1933 by Adolf Steuer in the coastal area around Alexandria, Egypt, eastern Mediterranean. The occurrences were geo-referenced to more than 170 stations, mostly located in the area of Alexandria, and the nearby coasts and lakes. All records were digitized and species names were cross-checked and taxonomically updated using the World Register of Marine Species. The outcome clearly shows that such initiatives can reveal an unexpected amount of highly valuable biodiversity information for “data-poor” regions.


Introduction
At the beginning of the 20th century, the importance of recording marine biodiversity was already recognized. Numerous expeditions had been organized with the aim of investigating "local fauna and flora" in various areas of the world. In 1924, Cambridge Expedition at the Suez Canal recorded the fauna of the Red Sea (Fox 1926), while Danish Oceanographical Expedition in 1908-1910 provided biological and hydrographical information for the Mediterranean and Adjacent Seas (Schmidt 1912). During these scientific expeditions, local biodiversity of various taxonomic groups was collected, recorded and the outcome was published in many scientific volumes. These historical occurrence data could provide a useful baseline for monitoring potential alterations, although they are often fragmented and found only in hard copy and grey literature. Such information is invaluable and needs to be digitized as it can provide the historical context for present observations and facilitate the process of setting correct reference conditions (Borja et al. 2012); it can also support predictive modeling of the consequences of human activities for the environment and biodiversity (Costello et al. 2013a). Additionally, historical datasets often contain descriptions of new species that are important for taxonomy as the first description of a species has legal priority for the name of this species (Costello et al. 2013b).
In this study, we present occurrence data which were digitized from 14 publications on the Egypt Expedition under the general report "The fishery grounds near Alexandria" made by Adolf Steuer and his colleagues and published between [1935][1936][1937][1938][1939][1940]. Twelve of these publications included occurrence data on twelve macrofaunal groups and two of them were preliminary reports which described the sampling protocols that were followed during the expedition (Table 1). The digitization of "The fishery grounds near Alexandria" -Egypt Expedition -is a part of a broader strategy for the LifeWatchGreece Research Infrastructure, which aims at the digitization of historical datasets that contain biodiversity information from the Mediterranean region.  (1935)(1936)(1937)(1938)(1939)(1940). In rare cases, occurrence data for planktonic species were available in these volumes and were included in the digitized datasets.
Personnel: The datasets were digitized by the LifeWatchGreece data management team. Irini Tsikopoulou (data manager), Stamatina Nikolopoulou (data, database and webgis application manager) and Aglaia Legaki (data manager) were the resource creators, Panagiotis D. Dimitriou (data manager) and Evangelia Avramidou (data manager) were content providers. Nicolas Bailly has checked difficult taxonomic cases.
The original data were collected by Dr. Adolf Steuer, professor at the University of Innsbruck, who organized and led the sampling expedition to the coasts near Alexandria, Egypt. After sampling, all collected specimens were preserved and sent to several experts for taxonomic identification. Each expert was responsible for the publication of his macrofaunal report.

Study area description:
The study area of the Egypt Expedition is located between the Western and Eastern harbors of Alexandria, including nearby localities such as Abukir Bay, the Suez Canal and the lakes Edku and Mariout (Fig. 1). The majority of the sampling stations do not exceed the isobath of 200 meter. The coasts that were investigated were in part shallow and sandy, in part steep. Information concerning the sediment characteristics and vegetation of the studied area was also available and included in the digitized dataset.  Stations as they were mapped in Steuer's preliminary report (1935).
Digging for historical data on the occurrence of benthic macrofaunal species ... the harbor area of the Alexandria city, as well as a small row boat, an automobile and the sampling equipment.

Sampling methods
Sampling description: Sampling took place at 172 locations in the marine area off Alexandria, in the Suez Canal, in the Nile river and in two lagoons (Lake Mariout and Lake Edku). Adolf Steuer was in charge of the sampling which lasted from April to November of 1933. A motor-launch (small military vessel) 15 m long, named "El Hoot", belonging to the Marine Laboratory, was used for the one-day trips at sea. Since it was difficult to sail too far from the shore, only two stations (station 26 and station 64) surpassed the 200 m isobath. In some cases a small rowing boat was also used. The collection of benthic samples was done almost exclusively by using a dredge with an opening of 20x70 cm. In only one case sampling was performed with a large otter trawl (bottom trawling) in the eastern part of Bay of Abukir, at a depth of 20 meters. A bottom sampler (Petersen's grab) of 0.2 m surface was also used once in the Eastern harbor due to difficulties in its manipulation (Vatova 1935). In shallow water, where no other equipment could be used, the samples of benthos were taken by diving. The sites where the sampling was performed along the coast were: the mouth of the Nile near Rosetta (Rashid), Lake Mariout and Lake Edku. Concerning planktonic samples, vertical hauls were operated using a medium sized net with buckets of celluloid with a gauge bottom.
Quality control: Every single dataset was digitized manually from scanned documents. Some publications were in French or in German, depending on the author, and therefore the information was translated to English. Species names and sampling location names in the digitized datasets were kept same as in the original paper. Afterwards, all scientific names were cross-checked and taxonomically updated using the Taxon Match tool of the World Register of Marine Species (WoRMS) (WoRMS Editorial Board 2016). Station coordinates were produced by georeferencing maps from Fauvel (1937) using a Geographic Information System (GIS). The digitized datasets are presented in a standardised way, using Darwin Core terminology, informations on taxonomy, locality, sampling date, sampling protocol and individual measurements where they were available.
Step description: Digitization process The digitization of the historical publications concerning the Egypt Expedition is a challenging process due to their complexity and the variety of the format across the different faunistic reports. Information on the sampling protocol and the sampling sites were digitized mainly based on the preliminary reports of Steuer (1935) and Vatova (1935) enriched with information from maps and the main text in the rest of the publications.
Occurrence data were digitized based on the individual faunistic reports. The data digitization was made using the Darwin Core terminology.
The digitization process of the Egypt Expedition datasets included several steps that are described below: 1. Data managers read and comprehended individual faunistic publications, in order to overcome difficulties originating from the heterogeneity in the format and the content among the historical papers. The original authors did not follow a specific format for the presentation of their results. Some of them included species distribution maps, some reported species list, sampling dates and depths, while others also recorded individual species counts. If there was a species list in the historical papers, species were recorded according to their taxonomical classification. Respectively, if there was a station list, stations were reported chronologically.

2.
A spreadsheet was created for each faunistic report and were populated with original species names found at each location. In this stage of digitization process, obvious typographic errors were corrected. The spreadsheets also contained information on the sampling depth (minimum and maximum depth), sampling date (year, month, day), sampling protocol and habitat (substrate type and vegetation). For benthic samples, station depth and sampling depth were matched. In some faunistic reports, station depths were given in fathoms (i.e. approximately 1.8 meters). In these cases, station depths in the datasets were converted to meters. For some taxonomic groups, additional information such as sex, lifestage, individual counts or body length measurements were available, either on a species level or on a specimen level. Accepted taxon names and taxonomic classification, as derived from the World Register of Marine species, were also included in the spreadsheets.

3.
After the digitization of all available information contained in the main text and tables in the publications, sampling stations were georeferenced using the species distribution maps in every faunistic report. Since there were no stations coordinates, latitude, longitude and coordinates uncertainty were estimated using a GIS based on the distribution maps in each publication and in Fauvel (1937). In cases, where a station was only referred to as a specific locality in the text, and not accompanied by a symbol on a map, a new station with higher uncertainty was created based on the locality description. 4.
In the next step, a code (fieldNumber) was created for each sampling event. A unique event was defined as a sampling event that took place in a specific station at a specific time and sampling depth using a specific sampling protocol. In some cases several samples had been taken in a location without defining the sampling station but only the wider area. To represent these, a new station ID was created, accompanied by an respective location remark. A code (occurrenceID) was also created for each species occurrence record. 5.
The outcome of the above digitization steps was twelve spreadsheets with 56 columns containing occurrence data of twelve benthic macrofaunal taxa. These tables were combined in the MedOBIS PostgreSQL database in order to correct mistakes originating from differences in the information or absence of information derived from Steuer's preliminary report (1935) and individual faunistic reports. In cases of corrections, original information was always kept as a remark in the dataset. 6. The

Difficulties regarding data digitization
During the digitization process, several issues with the data were encountered. The majority of these problems were similar across datasets. In the following paragraphs, we will highlight the most common ones and explain how they have been dealt with.
1. Data on the same sampling event were scattered and repeated in different publications. This created inconsistencies both across and within publications. Information on stations characteristics and sampling protocol were often repeated with small differences, missings or typographic errors due to different languages in the publications as well as in the preliminary reports. Within each faunistic report, occurrence records were often presented in two different ways, once as a list of species by station and again as a list of stations by species, leading to small differences or typographic errors. For practical reasons, we decided to consider as correct the information on sampling protocol that was obtained from the preliminary reports and the information on species distribution obtained from species list rather than station list. In any case, different information was always kept as a remark in the datasets.

2.
The final number of stations recorded was 172: a total of 150 benthic stations reported in the Steuer's preliminary report (1935) enriched with 10 planktonic stations derived from the 12 faunistic reports and with 10 new stations generated during the digitization process. Some stations described only verbally in the faunistic reports were not on a map. For example, some species referred to be collected from the "eastern harbour, on the body of a ship" or "eastern harbour, epifauna" without displaying on the map. In such case, a new station was created (e.g. easterharbour1), in order to include all available information. Other examples were LacMarioutCenter and westernharbour1 stations. In addition, new station IDs had to be created by the data management team because some stations were reported in the historical maps without a station name. For example, a sampling position "near the bath" was mentioned and mapped in many reports without a specific station name. Other stations without a station name were coastAbuQir-nearRosetta, LakeEdku_marinebeach, LakeEdkubridge, offSidiBishr and Silsila. 3. Besides general difficulties, described above, some sampling stations needed extra consideration.
• For station D2, the sampling date was not recorded in mollusks report (Steuer 1939b). Nevertheless, this gap was corrected using the 1st of October 1933 (1/10/1933) as the sampling date because all the trips were one day trips and in all papers D2 was visited only on that day.

•
Another problematic station was station 104. This station was reported in four faunistic reports: two of them without sampling date and the rest with different dates, 1/11/1933 in Sipuncula and 8/11/1933 in Polychaeta. Eventually, the date 8/11/1933 was considered as correct instead of 1/11/1933 and used for all the reports. This decision was made, because, as mentioned above, stations were reported in chronological order and maps (lack of station name, landscape changes) new points were placed on the map manually, with higher uncertainty. The planktonic stations were also placed manually on the map.

Geographic coverage
Description: The Egypt expedition covered, with 162 benthic and 10 planktonic stations, the area along the coasts of Alexandria, the Suez Canal, the Nile river and the lakes Edku and Mariout (Fig. 2).

Taxonomic coverage
Description: This set of historical data includes distribution information for 571 marine macrobenthic species belonging to 10 phyla, 21 classes and 257 families (Fig. 3). Malacostraca was the most speciose class with 26% of total species found, followed by Polychaeta (21%), Gastropoda (20%) and Bivalvia (14%) (Fig. 4). The family with the highest number of species richness was Syllidae (17 species), followed by Trochidae (14 species) and Veneridae (11 species). For the rest of the families, more than half of them (146 of 257) were represented by a single species. Georeferenced map of all stations from "The fishery grounds near Alexandria" macrofaunal reports.
These macrofaunal species were distributed in 172 stations located in the marine area off Alexandria, Egypt (Table 2). Species richness at the different sampling stations was very heterogenous. The most species rich stations were station 61, located above the isobath of 50 fathoms (90m), station 35, off Sidi Bishr and station 7 located close to the Eastern Harbour of Alexandria. The ten most common species in the study area are presented in Table 3. These species were found in more than 10% of the total number of stations.    Table 4.

Column description of 'Darwin Core Event' table
Digging for historical data on the occurrence of benthic macrofaunal species ...

Conclusions
Data rescue is an increasing need with expected effects on the scientific and societal perception of biodiversity. Despite the many challenges encountered during the digitization process of historical datasets (e.g. taxonomic updates, georeferencing, misspellings of taxa and places, compiling overlapping information from different publications), the outcome clearly shows that such initiatives are invaluable in making accessible previously unavailable biodiversity data. Concerning the Egypt expedition, this paper is the first step for the digitization of the whole set of publications from "The fishery grounds near Alexandria". In Eastern Mediterranean, these data could be used to set the reference conditions for checking the invasion of alien species through the Suez Canal or to compare past species occurrences with current ones. In addition, the availability of these historical data through public databases (such as LifewatchGreece Research Infrastructure and MedOBIS) provides useful tools for present observations or monitoring potential change in benthic communities. Through virtual labs, scientists or other users could search, visualize on a map, combine and download species occurrences from all over the Mediterranean in several different formats.
Digitizing historical datasets offers also valuable information on functional species traits, as they usually contain individual characteristics, such as maturity and body length, and habitat characteristics, such as sediment type and vegetation. Information on functional species traits is required in describing species patterns and assessing future evolution of benthic communities.