Corresponding author: Ingolf Kühn (

Academic editor: Scott Chamberlain

Grid-based data sets, whether they are biodiversity data such as species distributions, species abundance or trait data or sets of ecosystem service values, sociological or economic indices, of concentrations of pollutants, soil or hydrological properties, are often used in spatial modelling and may exhibit some degree of spatial dependence. This means that sampling at nearby locations may lead to sample values that are more similar than those further apart (

Various spatial methods are available to account for spatial autocorrelation in multiple linear regressions (e.g.

Package

Next, the other main model that is introduced in this package is explained- the wavelet-revised model (

This means that wavelet analysis can be used for spatial filtering similar to the principal coordinates of neighbour matrices (PCNM) analysis (e.g.

The number and kind of coefficients in discrete wavelet transforms depend on the number and kind of wavelets used in the analysis. Briefly, there are two kinds of coefficients:

Different from other methods such as PCNM/MEM, the wavelet filters in thisapproach are applied to the response variable as well as all explanatory variables in a multiple regression. Moreover, they are applied within every step of GLM iteration before the regression coefficients are computed (

Spatial autocorrelation occurs when data sampled at adjacent locations exhibit more similar values than distant ones. This feature is detectable by the coefficients of smooth and relatively small wavelets. In WRMs, therefore, such kind of coefficients are always removed and thus smooth (i.e. low frequency) parts of data and, as a consequence, spatial autocorrelation. The tuning parameter

For illustration, WRM is presented using the same musdata data set as above.

All calculations can be performed using different types of wavelet families and different types of wavelet transforms, where

WRM has many of the same features as GEE. Setting

Having filtered all data sets in relation to their frequency characteristics and removed smooth parts with a WRM, one can additionally decompose the detail (i.e. high frequency) parts into scale-specific subcomponents. One is then able to develop a scale-specific regression technique, subsequently known as wavelet multiresolution regression (

It is important to mention that it is essential to avoid comparisons of significance tests across scales. This is due to the fact that the sample size in general changes when scale-specific subcomponents are eliminated. However, in order to provide a consistently good quality criterion, there is the possibility to calculate log likelihoods and AIC values and then estimate the relative importance of a variable using the approach of model selection based on multimodel inference (MMI)(see below).

The generalised estimating equations approach developed by

To illustrate the use of the package, a GEE model is presented using the generated musdata data set included in the package. The following code uses already available datasets and provides

One drawback to non-clustered methods for GEEs arises from the way that

The package includes other functions that may be useful in diagnosing scale-specific features.

For example, the user might want to plot the wavelet variance or covariance of each of the variables as a function of level. The

The user may also want to view the smooth (i.e. slowly varying) components of any spatial data set at different resolutions. For this, the upscale function is offered, which allows the user to visually examine his/her data. The data sampled on a grid of squared cells and mostly sampled on adjacent cells may have any external contour. A square is used that embeds or completely encapsulates the data. Therefore, it becomes visible as a surface in a square. This square consists of 2^{n }x 2^{n} grid cells allowing a dyadic up-scaling for a number of different levels of

Currently, the function only supports backwards model selection. In other words, one has to start with a full model (i.e. all of the variables in your model formula) and they are removed in a stepwise fashion. It is intended to add forward model selection methods shortly. Additionally, step.spind is written to always respect the hierarchy of variables in the model and the user cannot override this currently. For example,

An example of

Additionally, multimodel inference tools are offered for GEEs, WRMs and WMRRs which are loosely based on the

Finally, one further tool is offered for model selection specific to WMRRs.

Using an appropriate accuracy measure is essential for assessing prediction quality in modelling spatially explicit data. Goodness of fit measures such as Cohen's kappa coefficient, receiver operating characteristic (ROC), the area under the ROC curve (AUC) and maximum true skill statistic (TSS) are widely used to assess prediction errors in presence/absence models. There are some problems, especially related to prevalence, with more traditional measures such as AUC and hence TSS is recommended nowadays (e.g.

In

Finally, the authors would like to mention that, in many of these analyses, it is necessary to calculate spatial autocorrelation using Moran's I function (

Note that you can adjust the number of distance bins to examine in

First, while there are many packages and functions available accounting for spatial autocorrelation in linear modeling or generalised linear modeling like frameworks, this is the only one that offers the user-optimised regression methods: spatial GEE, spatial WRM and scale-specific WMRR in conjunction with extended methods, step-wise model selection and multi-model inference analysis. Advantages are its simplicity to use as well as it computationally efficiency.

Second, to the best of the authors' knowledge, no other package or commercially available software offers the possibility to assess the accuracy of model predictions taking spatial dependence into account and being comparable to classical measures of model accuracy.

It is therefore believed that

The package, together with documentation, is available on CRAN:

It is open-source software (published under the GPL public licence, ver. 3).

We thank Joaquín Hortal, Carsten Dormann and Scott Chamberlain for critical and constructive comments on an earlier version of the manuscript. We acknowledge support by the EU Collaborative project 'EU BON: Building the European Biodiversity Observation Network' (grant 308454) funded under the European Commission Framework Programme (FP) 7.

Autocorrelation of residuals from GEE and in comparison to GLM. GEE with correlation structure: fixed performed best for the musdata data set. Spatial autocorrelation is computed as Moran’s I using the

Autocorrelation of residuals from WRM in comparison to GLM. WRM with level 1 performed best for the musdata data set. Spatial autocorrelation is computed as Moran’s I using the

Smooth components of wavelet decompositions at different scale levels. The upscaling is performed by the upscale function for variable aridity belonging to ^{n}x2^{n }grid cells and is padded with predefined values, default is mean value, by the function provided. Thus the data recorded is available in a form that enables wavelet analyses.) Level=0 displays the raw, full-resolution predictor values, which are then "aggregated" by wavelets to ever coarser resolutions. Values increase from black to white. This function can be applied to any variable of interest, e.g. predictor, response or residuals.