A new dataset on plant occurrences on small islands, including species abundances and functional traits across different spatial scales

Abstract Background We introduce a new dataset of woody plants on 60 small tropical islands located in the Raja Ampat archipelago in Indonesia. The dataset includes incidence, abundance and functional trait data for 57 species. All islands were sampled using a standardised transect and plot design providing detailed information on plant occurrences at different spatial scales ranging from the local (plot and transect scale) to the island scale. In addition, the dataset includes information on key plant functional traits linked to species dispersal, resource acquisition and competitive strategies. The dataset can be used to address ecological questions connected to the species-area relationship and community assembly processes on small islands and in isolated habitats. New information The dataset yields detailed information on plant community structure and links incidence, abundance and functional trait data at different spatial scales. Furthermore, this is the first plant-island dataset for the Raja Ampat archipelago, a remote and poorly studied region, and provides important new information on species occurrences.


Introduction
Islands are ideal research models to study ecological processes in spatially discrete arenas (Whittaker and Fernández-Palacios 2007). Detailed understanding of island ecology has led to influential theories in biodiversity research, such as the Equilibrium Theory of Island Biogeography (MacArthur and Wilson 1967) or the General Dynamic Model (Whittaker et al. 2008). These theories are based on species richness on islands to discern assembly processes and biodiversity patterns across islands. However, recent advances in island biogeography advocated for incorporating other biodiversity measures to separate the underlying processes of species assembly on islands. These measures include species abundances (Chase et al. 2019), functional traits , Schrader et al. 2020a and community structure at different spatial scales . For instance, incorporating species abundances provides information on ecological mechanisms behind the species-area relationship . The species-area relationship describes the increase of species richness with island area and is one of the most fundamental patterns in ecology (Rosenzweig 1995). Functional traits characterise morphological, physiological or phenological features of a species and can offer detailed understanding of ecological filtering (Cadotte and Tucker 2017) and ecosystem functioning (Dıáz and Cabido 2001). However, open access datasets that include multiple facets of island biodiversity, such as abundance data and functional traits at different spatial scales, remain scarce.
Here, we provide a novel island dataset that features occurrences, abundances and key functional traits of 57 plant species on 60 small tropical islands. Species occurrences were recorded at three different spatial scales ranging from small-scale plot and transect level data to species communities for the whole island. Furthermore, the study area, lying in the western part of the island of New Guinea, is biologically largely uncharted and the dataset can be used to map species occurrences in this biologically rich region.

General description
Purpose: The dataset was assembled with the purpose of investigating the underlying processes behind the island species-area relationship, the small-island effect and community assembly on small islands (e.g., Schrader et al. 2019b, Schrader et al. 2019a). The species-area relationship can form a notable exception for small islands, where species richness varies independently or increases at a different rate with area than on larger islands, a pattern termed the small-island effect (Lomolino and Weiser 2001). The ecological mechanisms behind the small-island effect are still poorly understood. To test whether a small-island effect prevails in the study system, we also included islands with no species in the dataset, as these are important for the correct detection of the small-island effect (Wang et al. 2016).
For all islands, we provide information on island area, island perimeter, island distance to the nearest larger landmass, neighbouring landmass proportion around each island, mean soil depth and proportion of leaf litter coverage on each island. The dataset includes species occurrence and abundance information for woody plants with a diameter at breast height ≥2 cm for each island at three different spatial scales. For each plant species, we sampled key functional traits that we measured from samples collected on the islands. Species occurrences are also available in the Global Biodiversity Information Facility database (GBIF; DOI: https://doi.org/10.15468/zjq49b) and the trait data in the TRY database (Kattge et al. 2020).

Sampling methods
Study extent: The dataset includes 60 islands ranging in area size from 3 m to 11,806 m . All islands included in the dataset are located in the Raja Ampat archipelago in West Papua, Indonesia (Fig. 1). Botanical field surveys and trait sampling were conducted during six months between June 2016 and February 2018. We sampled only islands that were undisturbed by people and covered with woody vegetation, which we ensured by checking for any signs of human use (e.g., clear-cuts, gardens, habitations) or cutting of woody vegetation (see also Schrader et al. 2019a). This excluded all islands that featured gardens, clear-cuts and buildings, limiting maximum island size sampled to <12,000 m , as well as the main island of Gam (Fig. 1).

Sampling description: Island metrics
We georeferenced all islands in Gam Bay in ArcGis (v.10.3) using satellite images (World Imagery, ESRI 2017). For islands <100 m , we additionally measured the island's dimensions in the field and matched them with the ArcGis georeferenced shapes. Based on the georeferenced shapes, we calculated island area (m ) and the perimeter of each island (m). To assess the level of isolation of each island, we calculated two alternative isolation metrics following Weigelt and Kreft (2013). The first isolation metric indicated the minimum distance (m) to the next larger landmass (i.e., calculated as minimum distance from island edge to landmass edge), which was the large island of Gam (Fig. 1). The 2 2 2 2 2 second metric considered the surrounding landmass proportion within a 1000 m radius around each focal island.

Plot design
To sample species occurrences, we used a transect design subdivided into plots (Fig. 1). We used a nested sampling design to obtain information on species assemblages at different spatial scales on the islands (Schrader et al. 2019a). All transects had a dimension of 2 x 10 m and were comprised of five 2 x 2 m plots. The number of transects on an island was roughly proportional to the island area and ranged from one to six transects (one transect was placed on islands <500 m (40 islands); two transects on islands between 500 m and 750 m (two islands); three transects on islands between 750 m and 1000 m (two islands); four transects on islands between 1000 m and 3000 m (nine islands); five transects on islands between 3000 m and 5000 m (three islands); six transects on islands >5000 m (four islands) (see also Suppl. material 1). For islands with a maximum extension of <10 m we placed as many plots as possible on the island at the longest extension. This was the case for the 36 smallest islands. Larger islands had two transects oriented towards the island centre on the opposite sides of the island. The interior was covered with a varying number of transects (depending on the island size) of perpendicular orientation, ranging from one to four transects. The distance between transects on each island with multiple transects was held constant but was related to the longest extension of an island, and hence varied among islands. With this method we ensured sampling of the island edge as well as the interior, which likely harbour different Map of the study region and schematic representation of the study design. a) Location of 60 islands studied in Gam Bay in the Raja Ampat archipelago, Indonesia. The 25 largest sampled islands are highlighted in dark grey. The 35 islands smaller than 100 m are not visible at this scale. b) Species richness and number of stems were recorded in plots (2 × 2 m) and transects (10 × 2 m). The number of transects placed on an island depended on island area, whereby larger islands received more transects. On islands smaller than the area of a single transect, we placed as many plots as possible.
2 species communities (Schrader et al. 2019b). Soil depth was recorded in all plots at five spots at equal distance to each other (33 cm) and spaced along the central axis of the transect. At each spot where we measured soil depth, we also recorded the presence or absence of leaf litter.
We recorded all species with a diameter at breast height ≥2 cm rooted within the plots. This allowed us to assess species occurrences at different spatial scales. These scales were i) the plot scale (species sampled in each plot), ii) the transect scale (species sampled along each transect) and iii) the island scale (pooled species occurrences of all transects for each island) (see also Schrader et al. 2019a). For each individual species, we recorded the diameter at breast height in cm (by convention 1.3 m) and the plant height (m). Based on these metrics, we calculated the tree basal area per ha (m ha ) for each island.

Geographic coverage
Description: All islands were located in Gam Bay, a large bay of Gam Island, and are sheltered from the open ocean (Fig. 1a). The climate is tropical, mostly calm and lacking pronounced seasonality, with a mean annual temperature of 27.4 °C and annual precipitation of around 2768 mm (weather station Sorong/Jefman; www.worldclimate.com 2020). All islands are composed of coralline limestone, belong to the same limestone plateau and are likely of similar age. Differences in topographic heterogeneity and elevation across islands were small, ranging for elevation between c. one to eight m a.s.l. Mineral soil was absent on all islands. Organic litter, mostly accumulating from dead plant material, was the only basis for soil development on the islands. Stages of decomposition depend on leaf litter depth, which was highly variable, ranging from a few cm to >1 m.

Taxonomic coverage
Description: We inventoried all woody plants with a diameter at breast height ≥2 cm (Fig.  2). This included 57 species from 26 families. The most common species were Rapanea rawacensis (Primulaceae) and Eugenia reinwardtiana (Myrtaceae), accounting for almost 50% of all records. Four species were only recorded once (Fig. 2). All recorded species were native, whereas alien species are not known to occur on the islands ( Takeuchi 2003). The community data for all islands and species can be found in Suppl. material 3. Species occurrence data formatted following the Darwin Core standard are also available in Suppl. material 5 and in the Global Biodiversity Information Facility database (GBIF -http:// ipt.pensoft.net/resource?r=plant-occurrences_raja-ampat_j-schrader_2020; DOI: https:// doi.org/10.15468/zjq49b).

Traits coverage
We sampled data of ten plant functional traits that cover important dimensions of species life-history strategies (Reich 2014, Díaz et al. 2016, Westoby 1998, Wright et al. 2004): tree height, wood density, leaf area, leaf mass per area (LMA), chlorophyll content, leaf chemical contents (leaf nitrogen, carbon and phosphorous) and seed and fruit mass (  Relative abundance (proportion of individuals) of the 57 woody plant species recorded across all studied islands. Wood density (g cm ) describes the volume of the main stem divided by its oven-dry weight. Wood samples were dried for 48 h at 100 °C. Branches, bark and green parts were removed prior to measurements. We measured wood density of two mature individuals per species. Including more samples was impossible due to the rarity of many species (Fig. 2).
All leaf traits were measured on ten mature and sun-exposed leaves from several individuals when available. We measured leaf area (cm ) using the android application Leaf-IT (Schrader et al. 2017), and leaf dry mass using a digital scale (± 0.001). We ovendried leaves for 48 h at 80 °C. For leaf mass per area (LMA; g cm ), we divided the leaf area by its dry mass.
For chlorophyll content, we used a chlorophyll meter (Konica Minolta, SPAD -502DI Plus). We provide the original SPAD units as well as converted the SPAD measurements to chlorophyll concentrations (µm cm ) using the equation by Coste et al. (2010).
Leaf chemical contents (nitrogen, carbon and phosphorous) were measured for the same leaves used for leaf area measurements, by grinding the oven-dried leaves. Leaf nitrogen and carbon concentrations (mg g ) were determined by automated dry combustion (Elementar, Vario EL Cube). Leaf phosphorous concentrations (mg g ) were measured using an inductively coupled plasma-atomic emission spectrometer (iCAP 6300 Duo VIEW ICP Spectrometer, Thermo Fischer Scientific GmbH, Germany).  We collected and measured the dry fruit and seed mass (g) of 44 and 38 species, respectively. We aimed for at least ten fruits per species, which was difficult for some species when fruiting was scarce (the number of fruits sampled per species ranged from 1 to 40; mean = 11.6). Fruit and seeds were oven-dried for 72 h at 80 °C. The fruits of most plants were eaten and dispersed by birds. A checklist of the birds occurring in the study region is provided by . Description: This dataset describes the occurrence of all taxa that are identified at least to the level of genus (nine unidentified taxa are excluded here but can be found in the dataset Suppl. material 2) and can be used as occurrence records and as a taxonomic list for all studied islands. However, the occurrence records cannot be regarded as a comprehensive checklist for the flora of the islands. Data is formatted according to the Darwin Core standard (https://dwc.tdwg.org/terms). This dataset is available in the Global Biodiversity Information Facility, GBIF (Schrader 2020).

Usage rights
The dataset is also available in Suppl. material 5.

Column label Column description
id Unique ID for each occurrence record. basisOfRecord The specific nature of the data record. All samples were obtained from living specimens.
occurrenceID Occurrence ID for GBIF: An identifier for the occurrence (as opposed to a particular digital record of the occurrence).
recordedBy Names of collectors.
eventDate Time frame of sampling.
islandGroup The name of the island group in which the location occurs. country The name of the country in which the location occurs.
countryCode The standard code for the country in which the location occurs (here ISO 3166-1 alpha-2).

decimalLatitude
The geographic latitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic centre of a location. decimalLongitude The geographic longitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic centre of a location.
geodeticDatum The ellipsoid, geodetic datum, or spatial reference system (SRS) upon which the geographic coordinates given in decimalLatitude and decimalLongitude are based.
Here: WGS84 coordinateUncertaintyInMeters Indicator for the accuracy of the coordinate location, described as the radius of a circle around the stated point location in metres.
identificationQualifier "cf." to express doubt about the species identification.
scientificName The full scientific name of a taxon. kingdom The full scientific name of the kingdom in which the taxon is classified. family The full scientific name of the family in which the taxon is classified. taxonRank The taxonomic rank of the most specific name in the scientificName.