# Develop best practice recommendations for combining seabird study data collected from different platforms: study

This study developed best practice guidance to combine seabird survey data collected from different platforms based on a literature review, expert knowledge and a bespoke model development including sensitivity analysis. This can be used in environmental assessments for planning and licensing.

## 5 Challenges and opportunities in multi-survey modelling for seabirds

Some of the challenges faced in seabird modelling are common to all SDMs. Errors in the observation of usage data (Section 5.1) and imprecise, or missing, environmental data (Section 5.2) can plague any analysis. Equally common to all analyses, although less frequently discussed, is the issue of model transferability (Section 5.5), the task of using insights gleaned from one place at one time, in other spatiotemporal frames. Beyond these common problems, the natural history of seabirds poses unique challenges to distribution modelling that are accentuated when trying to analyse multi-survey data. The high mobility of individuals, the stochastic nature of their local spatial distribution with ephemeral aggregations of large numbers at temporally transient foraging opportunities, combined with their association with breeding colonies for parts of the year gives rise to complex processes of density dependence between individuals, colonies and species (Section 5.3). These processes may manifest spatially, and the detail used to describe and estimate them from spatial data generates much of the complexity (non-linearity) in seabird SDMs. Model complexity is compounded by the need to deal with (and take advantage of) spatial and temporal autocorrelation in multi-survey data (Section 5.4). Therefore, the key issue in deriving appropriate frameworks for seabird multi-survey data is navigating the sources of computational complexity to achieve a workflow that makes the most of the data but reaches results in plausible times (Section 5.6).

### 5.1 Imperfect observation and multi-survey modelling

Different survey methodologies (boat-based, visual aerial and digital aerial) are affected by different types of biases and imprecisions. The behaviour of seabirds can amplify the differences between survey methods. For example, the strong attraction of scavenging seabirds to boats at some times of year but not others, and in some regions but possibly not in others can caused strong fluctuations to detectability (potentially even boosting it above 100%). These effects need to be explicitly accounted for during analysis. In general, observation error types found in survey data comprise false negatives, false positives, effort imbalances and location error (Miller et al. 2011, 2019, Hefley and Hooten 2016).

#### False negatives and false positives

False negatives involve errors of omission and misidentification, whereas false positives are predominantly due to double-counting or misidentifications of one species as another. There is some debate about the relative importance of these two types of error. For example, (Tyre et al. 2003) argue that false negatives will be less frequent for two reasons: First, misidentifications are conditional on a detection happening, and their frequency can therefore be reduced by observer training and improvement in survey protocols. Second, it is common practice in several designs to not record a detection (or to exclude it from the analysis) if there is any doubt about its identity – hence converting a potential false positive into a potential false negative. However, it could be argued that observer training could be as beneficial for detection as it is for identification. Equally, detections can be recorded and analysed in conjunction with a degree of certainty for their correct identification (Miller et al. 2011). Therefore, it is perhaps ideal to proceed with the assumption that analysis frameworks will need to deal with both false positives and negatives (Miller et al. 2011).

In the setting of transect surveys, both false negatives and false positives relate clearly to the baseline detection probability (the intercept of the model). Most obviously, false negatives give the impression that an organism is less prevalent than it actually is. But even false positives can create consistent biases. If two species are easily mistaken for each other, then the bias will be positive or negative depending on whether the true prevalence of the focal species is respectively smaller or larger than the prevalence of the non-focal species. In logistic habitat models, focusing purely on occupancy, they have an even greater effect on the slope of the relationship with covariates (Tyre et al. 2003). In such studies, it is suggested that at least three repeat visits are required to correctly estimate the probability of omission. Furthermore, a statistical trade-off between survey extent and survey overlap is expressed. Simulated results indicate that when the rate of false negatives is low (e.g. <50%), it may be better to increase the extent of the surveys rather than the number of visits. As false-negative rates increase, the variance of parameter estimates is reduced more by increasing the number of visits, especially when the overall extent is low. There are several reasons for being careful about taking these quantitative recommendations immediately on-board in developing best practice for seabird surveys. First, the probability of detection will generally not be the same in different seabird surveys. Second, these recommendations were based on the assumption of near homogeneity in distribution. Third, occupancy data analysed by logit or probit models are considerably more vulnerable to observation errors and therefore more limited in their inferential abilities than models of abundance. Nevertheless, similar effects have been recovered in broader classes of models, including the Poisson point process (Lahoz-Monfort et al. 2014).

Discussion of errors with higher relevance to seabird surveys can be found in commissioned reports. For example (Camphuysen et al. 1995), carried out an inter-calibration exercise for 20 observers on ten ships for ship-based ESAS surveys in North Sea. The authors reported major differences in detections from different observer teams. For example, team A reported about 6-10 times more kittiwakes than teams B, C and F in the same region and time period. Anecdotally of relevance is the misidentification of guillemot and razorbill with high variance in the ratio reported by different observers. Equally important is the practice of assigning detections to group rather than identifying them to species. For example, divers are often recorded as 'diver species'. These can be apportioned to species later, according to the ratio of red-throated, black-throated and great northern in the sample that were identified to species. However, great northern is easy to ID to species whereas many black-throated divers are difficult to separate from red-throated divers. So, apportioning 'divers' to species based on proportions ID to species tends to result in overestimating numbers of great northerns.

These are real problems where understanding the natural history might help resolve some of the biases, whereas others (especially observer effects) can't easily be taken into account (ESAS has too many individual observers for example so quantifying observer effects would not be statistically feasible).

#### Location errors

Errors in the detection, identification and location of individuals may vary temporally (see example of diurnal and seasonal patterns along distance sampling transects in (Furnas et al. 2019)), according to environmental conditions (such as weather and ambient light) or according to geomorphology (Frair et al. 2010). Note that, contrary to some practices (e.g. Oppel et al. 2012) when the detection probability varies with environmental conditions, the counts cannot be used as a relative index of abundance without a correction to the effective detection distance and the baseline probability of detection, or much better, incorporation of these influences as of detection in the distance sampling analysis (Marques and Buckland 2003, Buckland et al. 2008). In general, both the probability of detection and the abundance of a species will depend on environmental covariates. For example geomorphology (e.g. proximity to land) will affect both, bathymetry will affect only abundance and ambient light conditions may affect only detection. This gives rise to potential problems of identifiability (our ability to pinpoint estimates for coefficients pertaining to both detection and species abundance). The statistical requirements for achieving identifiability have been examined by (Lele et al. 2012). For occupancy models, these authors concluded that if the intersection of the two sets of covariates contained a non-categorical variable, and if there was at least one variable that belonged to one set but not the other, estimation was identifiable.

This debate about detection errors is particularly pertinent to the analysis of multi-survey data. For many situations, it is impossible or logistically unlikely that more than one visit to any location will take place, and frameworks must be developed to ensure that detection errors can be ameliorated by use of the habitat similarity in non-overlapping visits (Lele et al. 2012) . However, in the presence of overlapping surveys, progress can be made more directly to account for detection errors as part of inference (Bolker 2008). In repeated visits, the target species is recorded as being detected or not detected at each visit. At locations where the species is present, detection error will occasionally result in species not being detected even though it is present at the site. Assuming the true abundance does not change over the duration of repeated visits fluctuations in observed abundance at a particular site can be attributed to detection error and thus facilitate the estimation of the baseline detection probability in a survey. Although not widely available for abundance data, analytical expressions have emerged that describe how repeat visits increase the statistical power in the estimation of detection probabilities (Guillera-Arroita and Lahoz-Monfort 2012).

#### Variable observation effort

Heterogeneity in observation effort is a crucial driver of the point patterns presented by the raw data (Manly 2003, Miller et al. 2019). In particular, (Chakraborty et al. 2011) propose a distinction between the *potential* intensity surface (representing the biological process generating occurrences of a species in space) and the *realised* intensity surface (which is curtailed by the distribution of observation effort). This is more widely known as a thinned Poisson point process because it incorporates an observation model into the intensity function (Hefley and Hooten 2016), so that the point patterns recorded are sparser than what would be seen if observation effort saturated the whole of space. Such combinations of the observation model with the underlying biological model offer the potential of fully integrated inference, the idea of conducting the estimation of detection functions simultaneously with the habitat analysis. This would allow the forward and back propagation of errors as discussed in the previous sections, so that the estimation of detection distances would benefit from the information on habitat. A key assumption in distance sampling is that the distribution of the species in the detectible vicinity of the transect is uniform. Violations of this assumption could be prevented by modelling the distribution of the species (as driven by covariates) jointly with the detection function associated with the transect (along with its own covariates). This approach was taken by (Nelli et al. 2019) in their study of malaria incidence patterns. In that paper, a modification of point transect distance sampling (appropriately tailored to epidemiological data) was embedded in a model for the covariates of incidence, hence improving inference for both the observation and biological model.

### 5.2 Imperfect habitat data and multi-survey modelling

Imperfections in the covariate data can arise in different forms. The resolution of different variables or different regions of the same variable may be mismatched. Covariate data may be partly, or wholly missing. Alternatively, the values of known covariates may be measured with uncertainty, or be temporally fluctuating. The idea of data integration from multiple surveys and data types has a crucial role to play in using such imperfect data.

#### Mismatched scales

Mismatches between explanatory variables are informally addressed by some process of alignment, whereby a common reference grid is applied to all the covariates, usually involving a process of linear interpolation to shift the centre-points of existing cells to the new grid nodes, and also a process of coarsening of the resolution of maps to the lowest resolution available in the data (Kent et al. 2006). We will call this the *lowest common denominator *approach to mismatched scales. None of these steps are necessary if the data generating mechanism is modelled as a point process. In this case, covariate values are extracted from the available maps, in their native resolution, at the location of each point in the response data. Without a doubt, different scales will have characteristic impacts on the results of the SDM (Levin 1992, Paton and Matthiopoulos 2018, Pacifici et al. 2019), but at least in this way, explanatory data are used at the finest resolution available and in a minimally processed form.

Mismatches in different geographic regions for the same explanatory variable are a problem that requires treatment because it affects the consistency with which a variable is allowed to affect the response in the model. The key decision is between downgrading the high-quality areas (an easy but somewhat destructive solution), or upgrading the low-quality areas. Upgrading could be approached by geo-statistical interpolation, as a pre-modelling stage of the analysis. Specifically, a method like kriging could be used for the coarse resolution regions (e.g. (Monestiez et al. 2006)). Kriging relies on an estimated object (the semi-variogram) which captures the characteristic spatial autocorrelation in a variable. By estimating the semi-variogram in the fine resolution regions and applying it across space, the coarse scale region could be resolved. This will crucially depend on whether the coarse scale readings are localised (albeit sparse) measurements, or averages over coarse scale cells. The latter scenario would preclude the geo-statistical interpolation approach.

A much more general and flexible approach to scale mismatches involves the use of hierarchical Bayesian approaches to resolve coarser regions into finer scales (Keil et al. 2013). Such approaches could also be used to increase precision when multiple layers of information exist at different scales for the same environmental variable.

#### Partially missing covariates

There will be situations where the desired spatial extent of the region to be used for modelling is not spanned by available covariate layers. Similarly, cloud cover or other obscuring influences will result in gaps in remotely sensed layers. Once again, the most direct approach (the lowest common denominator approach for partially missing covariates) is to retain a minimal data set that comprises only cells with complete covariate entries. This invariably leads to heavy censoring of the data either via cells being dropped, or via covariates being judiciously removed to try and retain more cells. It is possible to formalise this process of censoring by starting with a reduced set of cells and a full set of covariates, begin model fitting and drop some variables through formal model selection. As the set of covariates in the model is reduced, the hope is that more cells will be able to be once again included in the analysis. This pragmatic approach is not ideal because it involves information loss through censoring, the degree of which is not entirely in the analyst's control. Instead, interpolation methods (i.e. density estimation methods, see Section 3.1 or geo-statistical methods, see previous subsection) can be used to reconstruct the expanses of missing data.

A particularly relevant problem in this category arises from Camphuysen et al. (2004) recommendation that to enhance the cost-effectiveness of ship-based surveys, vessels should be equipped with an Aquaflow (logging surface water characteristics including temperature, fluorescence (chlorophyll), and salinity information simultaneously with species abundance). The idea of contemporaneous recording of environmental variables at high resolution is certainly appealing and can lead to models of high explanatory power. However, this approach is limited when spatial predictions are required for areas not visited by the boats, because high-resolution environmental data are unlikely to be available for the whole of space. Interpolation of the environmental variables between transect lines can solve this problem, assuming that the spatial autocorrelation in the relevant environmental data does not decay rapidly compared to the separating distance between the transects. If that is not the case, then it may be necessary to combine the measurements of variables along the transects with synoptic raster data at coarser resolutions. That brings the problem back to the class of scale mismatches (see previous subsection), whereby integration of environmental data that differ in resolution and extent is used to reconstruct uninterrupted layers of explanatory variables. A key consideration in this is the ability to predict from such models. If the Aquaflow variables are dynamic, then these data will not be available outside the temporal window of measurement. In such cases, forecasts will be impossible (see subsection on Variability and Errors in Measurements, below).

Once again, all of these processing steps can be incorporated into the main inferential framework via hierarchical Bayesian approaches to resolve scale mismatches and data absence (Keil et al. 2013, Nelli et al. 2019).

#### Unknown covariates

Missing covariates is a widely recognised and difficult to diagnose source of estimation bias (Barry and Elith 2006, Fieberg et al. 2018). Although it may not be possible to reconstruct multiple missing covariates (such a data vacuum is beyond the reach of even the best statistical model), it is possible to capture the collective effect of missing covariates in the residual autocorrelation of the fitted model as in (Beale et al. 2014, Nelli et al. 2019), and also see discussion in Section 5.4.

#### Variability and errors in measurements

Dynamic explanatory variables may follow seasonal (e.g. monthly average temperature), diurnal (e.g. tide) or less predictable (e.g. weather) fluctuations. Although there is certainly a biological appetite to include such variables to explain the distribution of a species, data availability can be a constraining factor in this. At the model fitting stage, inclusion of dynamic variables means that each observation of species abundance must be synchronised with contemporaneous environmental data. This can be difficult because data absence becomes more pronounced as data are subdivided into smaller time frames. However, problems are more pronounced at the prediction stage. Forecasting species distribution into the future requires us to have complete layers of explanatory maps, which must also be forecasted, if they refer to dynamic variables. This calls for some consideration on which variables to include in a dynamic format. If the spatio-temporal availability of data for an explanatory variable (either from historical data or from available forecasts) is low, then even if there is biological confidence about their importance, it may still be necessary to exclude them from the analysis.

Explanatory variable data may contain measurement biases and imprecisions which may also be spatially correlated (Barry and Elith 2006). Known, consistent biases are often corrected at the stage of pre-processing of covariates. Imprecisions, especially if these vary across space, should be propagated through the analysis so that they are reflected in the final confidence intervals of the SDM parameters and spatial predictions on the distribution of the species (Barry and Elith 2006). This idea has its roots in type II linear regression (Sokal and Rohlf 1995), but can be considerably updated by using the flexible modelling structures of hierarchical Bayesian models. Spatially correlated errors can arise from reconstruction methods (see interpolation methods mentioned earlier) that are often used to generate complete layers of covariate information. Theoretically, the approach for propagating such errors to the final results is similar to the approach of propagating un-correlated uncertainty, with covariance structure for neighbouring locations. However, this is, as yet rarely done for computational reasons.

All these suggestions about uncertainty quantification eventually need to be visualised in conjunction with median predictions. Informatively mapping uncertainty is not straightforward however and communicating uncertainty to policy makers can cause confusion. Interesting approaches to using uncertainty in SDMs for informing policy decisions are actively developed in the field of ecological reserve design (Tulloch et al. 2013).

### 5.3 Accessibility and density dependence in seabird distribution modelling

For at least some parts of the year, species of seabirds are central place foragers. Their use of different marine locations is therefore likely to be affected by how accessible these locations are from the breeding colony. Accessibility might vary by species and season. The effects of accessibility on distribution have been anticipated theoretically (Matthiopoulos 2003a, 2003b) and found empirically in marine central place foragers such as seabirds (Lewis et al. 2001, Wakefield et al. 2011, Grecian et al. 2012, Thaxter et al. 2012, Waggitt et al. 2019) and pinnipeds (Matthiopoulos et al. 2004, Aarts et al. 2008, Jones et al. 2015). In addition, the coloniality of seabirds leads to foraging aggregations and potential resource depletion in the regions surrounding the colonies (Lewis et al. 2001). Ultimately, the use of particular locations at sea will be determined by the trade-off between commuting costs (as shaped by accessibility) and foraging benefits (as shaped by environmental resources and depletion).

Both accessibility and depletion/interference may be thought of as functions of distance from the colony, but they are complex, highly non-linear processes for distinct reasons. Accessibility is mainly complicated by the fact that different colonies will be placed in locations that are variably affected by the coastline. Hence, colonies on a small island are likely to be unconstrained in every direction, colonies on a relatively straight coastline will only have a semicircle of marine directions available for departure, while colonies in an inlet may be limited to a single water body route into the sea. To capture declines in accessibility with distance, it is possible, as a first approximation, to introduce a distance-decay function, parameterised identically for different colonies (Matthiopoulos et al. 2004, Grecian et al. 2012). However, the fact that the available area of water around each colony will depend on coastal morphology, means that the resulting marine distribution from such a function would not allocate equal numbers of birds at units of area that are the same distance from different colonies.

These behaviours will not be independent of age structure. Seabird populations include a high proportion of immatures that are less competitive than adults so may tend to distribute at sea in areas away from colonies (at least until they start to seek to recruit into a colony themselves). So, we can expect some 'infilling' of marine areas away from large colonies by immatures, especially younger age classes of immatures.

Density dependence is complicated initially by resource competition between colony members (Lewis et al. 2001). As the size of the colony grows, individuals need to travel further to escape the density dependent effects of depletion or interference. At the same time, there is evidence that usage from neighbouring colonies can saturate space leading to the appearance of home ranging behaviour at the colony level (Wakefield et al. 2013). This implies that even without the constraints of commuting costs, a colony might not extend its foraging range (and, consequently, its population size) indefinitely. In addition to inter-colony competition with conspecifics, it is possible that individuals from a colony are experiencing competition from neighbouring colonies of other species. In this case, the resulting asymmetries in range will not only be due to relative colony sizes, but also due to trophic niche overlap and competitive dominance between species. All of the above aspects of biology will interact with each other and with environmental productivity. For example, for a given colony size, colonies that are obscured by coastline formations may tend to have greater ranges than island colonies because density dependence is acting over smaller areas close to the obscured colonies.

Non-linearity in the relationship between abundance and its covariates is widely recognised in the statistical literature as well as in the seabird-related literature (e.g. Section 1.1.2 in Oedekoven et al. 2012b) and has prompted the development of semi-parametric, non-linear models based on splines or additive basis functions. These models, broadly known as Generalised Additive Models (Wood 2006) allow the data to "speak for themselves", i.e. to inform the shape of the relationship with explanatory variables. The greatest advantage of GAMs and their extensions (such as mixed-effects GAMS, or GAMMs), is that their estimation remains embedded in the frameworks of the linear model. Their greatest disadvantage is that they can result in spurious relationships via overfitting. Instead, in some cases we can argue for the derivation of a functional form from biological first principles (see discussion on mechanistic models in Section 3.1). The sophistication that is used for modelling accessibility and density dependence will determine the computational feasibility of the approach (see Section 5.6 below), but we can outline here an idealised approach that would correctly account for both effects. This hypothetical approach would have the following features:

**1) ****Accessibility** is treated as a distance-based function only. Simpler approaches would use Euclidean distance, but improved approaches would consider distance measures based on at-sea travel (see biological distance in (Matthiopoulos 2003a)). These distances would need to be calculated as spatial layers, specific to each colony. More elaborate measures of accessibility, incorporating landscape resistance (e.g. due to prevailing wind fields) could be calculated (Zeller et al. 2012, 2017). For any given marine location, these metrics could result in asymmetric outward and homeward distances.

**2) ****Intra-colony competition** may be treated by including seabird density as an auto-covariate (Augustin et al. 1996) in the model, effectively using the response variable *in the neighbourhood* of a point as an explanatory variable for the response variable *at the location* of the point. The apparent circularity of this step requires careful consideration of the model fitting stage, but it is mechanistically equivalent to density dependence (i.e. the feedback effect of a population onto its own distribution). An alternative approach is to introduce an explicit autocorrelation term in the response, but this would need to be made colony-specific (and ideally, colony-size specific).

**3) ****Inter-colony competition** may be accounted for via simultaneously using the expected usage of one colony as an explanatory variable (possibly with a negative coefficient) for the expected usage of another colony, and vice-versa. The need to couple the usage of different colonies, gives rise to another point of apparent circularity, which can be addressed via simultaneous regression techniques (i.e. modelling multiple response variables at the same time, to allow the use of each response variable as an explanatory variable for the others). The coefficients of these simultaneous regressions would need to be asymmetric, based on the size of each colony (so that large colonies have stronger negative effects on the marine use of smaller colonies).

**4) ****Inter-colony, interspecific competition** could be dealt with in exactly the same way as intraspecific competition between colonies (see 3, above), allowing for asymmetric effects due to differences in species, as well as differences in colony size.

As we will also discuss in Section 5.6, a model with the above specifications is just about possible to write and simulate from, but not computationally feasible for fitting to data. Instead, a more pragmatic approach would involve the following simplifications:

**1) ****Accessibility:** Write accessibility as a distance-decay function. Any measure of distance can be used (e.g. Euclidean, travel-time, landscape resistance), because these calculations are not part of model fitting.

**2) ****Intra-colony competition**: Expand the decay parameter of the previous step, so that it is larger for smaller colonies and vice-versa. This means that as colony size increases, the decay of expected usage with distance from the colony slows down, hence extending the range of the colony to take account of intra-colony density dependence. The function may be allowed to be non-monotonic, so that total usage initially increases with distance, and then eventually starts to decay.

**3) ****Inter-colony competition:** The formulation from the previous step, can be extended to account for the effects of other colonies but in this case, the effect on a the usage of a given marine point by a focal colony will depend on the distance of that point from the competing colony (as well as the competing colony's size).

**4) ****Inter-colony, interspecific competition: **Although biologically, the differences between species are important, from a mathematical point of view, all that is required to capture these effects is a re-parameterization of the model from Step 3, above.

In Section 2.2 of the Vignette that accompanies this report, we develop, graphically explore and spatially simulate from such a model. The resulting model is non-linear but it has modest parameter requirements. For example, for a four-species system, where each species is represented by several colonies, a total of 40 parameters would need to be estimated for all species. If the focus was on a single species with three competing species, then ten parameters would be required. If no inter-specific competition was considered, only four parameters are needed to capture effects of accessibility and density dependence (i.e. Steps 1-3, above). The important benefit from such an approach is that we can learn from the data about the strength of accessibility and density dependence for a particular species. Fitting this model simultaneously with environmental covariates allows us to account for the confounding between intrinsic and environmental regulation of spatial usage.

### 5.4 Autocorrelation and multi-survey modelling

Informally, residual autocorrelation (RAC) is the spatial or temporal similarity in the direction and magnitude of discrepancies between model and data. The assumption of all statistical regression models (including SDMs) is that all sources of autocorrelation (AC) in the data are the result of autocorrelation in the available explanatory variables^{[1]}, and, therefore, that *conditional* on the covariates included in the model, the residuals of the model are independent. This assumption, however, is often violated and the existence of spatial and temporal AC in the residuals casts doubts over the results of inference (Segurado et al. 2006, Dormann et al. 2007). Several methods exist for dealing with the problem (Dormann et al. 2007), but they differ in one crucial aspect: whether they correct the consequences of AC on model dispersion and parameter standard errors, or whether they model AC explicitly. In this section, we will argue that modelling AC explicitly can lead to capital information gains in the analysis of data from multiple surveys.

#### Spatial autocorrelation

RAC can result from three routes. First, if an influential covariate is missing from the analysis, the fitted values will under-/over-estimate true usage in a spatially aggregated way. Second, even if all relevant covariates are included, they may be misrepresented in the analysis. For example, covariates that have been created by interpolation may inherit over- or under- smoothing to the model's estimates. Over-smoothing is equivalent to using coarse level environmental data to explain responses that are happening at finer spatial scales (de Knegt et al. 2010). Alternatively, it may be that animals are responding to the broader context rather than the finely resolved environmental data offered as explanatory variables (Paton and Matthiopoulos 2018). Third, even if all relevant covariates have been included at the most appropriate scale, there is still the possibility that clustering in usage is caused by intrinsic processes, such as individual memory (Van Moorter et al. 2009), limitations in movement speed (Aarts et al. 2008), or social cues (Riotte-Lambert and Matthiopoulos 2019). In the broadest sense (Beale et al. 2010), spatial AC can be captured either as a covariate , or by introducing a spatially auto-correlated structure in the model's error. Including AC as a covariate could be done by introducing flexible functions of latitude and longitude (Mendel et al. 2019), or by using density as a local auto-covariate (Augustin et al. 1996). Including AC as an auto-correlated error can be one of several components of the error structure of a hierarchical model.

Often, one of the symptoms of un-modelled heterogeneity generated by RAC is over-dispersion in the residuals. For example, omission of influential covariates is a well-known source of over-dispersion. In seabird analyses, this is frequently addressed by use of over-dispersed distributions (e.g. Lieske et al. 2014) or zero-inflated mixture models (e.g. Oppel et al. 2012, Waggitt et al. 2019). Such approaches, however, assume that the residuals in nearby locations are independent, an assumption that is unlikely to hold. There are therefore arguments for the use of flexible spatial functions in capturing residual spatial complexity (e.g. Sections 1.1.3-4 in Oedekoven et al. 2012b) and, given that formal tests for autocorrelation in residuals can be weak, there is a view that modelling autocorrelation by one of the methods outlined in (Dormann et al. 2007, Beale et al. 2010) should be carried out pre-emptively to avoid biasing the estimated coefficients of available covariates (Fletcher et al. 2016).

However, although in most cases we can be certain that one or more sources of spatial autocorrelation are operating, it is not clear how to interpret the results of explicitly spatial models, and indeed it is not certain that the inclusion of such terms exactly counteracts the causes of the problem. Interpretation of autocorrelation terms is challenging because the processes generating autocorrelation operate at different characteristic scales (Levin 1992) and the observed residual autocorrelation in a species distribution model is the result of all of these influences acting together. It is, therefore, not always easy to interpret autocorrelation patterns biologically or to attribute them to estimation artefacts. The bias-correcting effect of AC terms is also in doubt, because they can be shown to be confounded (and hence, potentially in competition) with fixed-effects terms (Hodges and Reich 2010). In some cases, incorrect specification of spatial errors can lead to severe biases in parameter estimates for fixed effects, even if all the relevant covariates have been included in the model (Beale et al. 2010, Sinclair et al. 2010).The outcome of this competition between spatially structured random effects and covariates, depends on the penalty imposed on the flexibility of random effects. Low penalties encourage over-parameterised spatial terms that diminish (i.e. negatively bias) the coefficient estimates for covariates. This underlines the need for out-of-data validation of fitted models.

#### Temporal autocorrelation

Temporal autocorrelation causes residual similarity due to temporal proximity between observations, essentially two successive snapshots of a system are not very different from each other because the system has not had enough time to diverge from its original state. This is considered an important violation of the independence assumption, particularly for telemetry studies (Fieberg et al. 2010), but it is less of a concern for surveys, assuming that design recommendations are adhered to, to avoid double counting (see Section 3.2). However, temporal dependence offers us valuable opportunities for exploiting multi-survey data that have been collected at different times. Currently, investigation of multiannual trends and relative changes in usage is usually considered to lie beyond the scope of seabird distribution analyses (Perrow et al. 2015), but such features would be essential if multiannual survey data were to be correctly integrated. Assuming stationary values for the covariates, the counts from two surveys conducted over the same region at different points of time should be expected to be more similar, the closer the surveys were in time. We can add temporal autocorrelation via the fixed effects of a model (as flexible functions of time), or we can assume an autoregressive model (e.g. a random walk in the measurements of residuals). More complex error structures spanning several time lags or continuous time are also possible (Wood 2006).

#### Spatiotemporal autocorrelation for multi-survey data

In the setting of multiple surveys spatial and temporal autocorrelation can be a real asset. Although habitat and species distribution models are predominantly visualised as maps, most of these models are fitted in environmental space, and are, therefore, neither explicitly spatial nor temporal (Chakraborty et al. 2011). Hence, a classic SDM is unable to recognise the fact that two contemporary surveys that overlapped spatially share more than just the same values for the covariates. Furthermore, within the range of autocorrelation, a model should be able to acquire additional support from the fact that even when two surveys do not exactly coincide in time and space, they can share similar information depending on their spatiotemporal proximity. Therefore, particularly for multi-survey studies it is important to allow environmental, spatial, temporal and spatiotemporal dependencies (Hothorn et al. 2011). However, it should also be recognised that these will not be able to bridge over large distances and intervals. For this, we require transferrable models that can address the non-stationarity of animal responses to different environmental compositions.

### 5.5 Model transferability

SDMs based on survey models could be asked to perform one of three predictive tasks, presented below in increasing order of difficulty.

#### Spatial and temporal Interpolation

The most reliable type of SDM prediction deals with the spatial and temporal frames within which the survey data were collected (a task we will call spatio-temporal interpolation). These applications make two, relatively easily satisfied assumptions. First, that the resolution of the surveys is finer than the scales of all autocorrelations used to describe spatial and temporal dependencies in the model. Second, that the coefficients estimated by the models are stationary in space and time. When stationarity is in the temporal dimension, this later assumption is often called the pseudo-equilibrium assumption (Guisan and Zimmermann 2000). If the windows of observation are small and sufficiently densely surveyed, these assumptions are satisfied and the SDM can provide an accurate representation of density at any point in space and any instant in time within the observation ranges (Isojunno et al. 2012).

#### Environmental interpolation

The SDM models recommended in this report are empirical in nature (albeit with elements of mechanistic modelling – see Section 5.3). Robustness (i.e. precision and accuracy) of such models under prediction is subject to the dangers of extrapolation, particularly when regression is equipped with high functional flexibility (Bell and Schlaepfer 2016). Non-stationarity in environmental responses is a recognised source of variability (Hothorn et al. 2011) and a safe route to unsuccessful model predictions. The difference between explanatory and predictive models is a key area of interest in an discipline such as ecology, which focuses on the consequences of spatial non-stationarity and change over time (Yates et al. 2018). However, for habitat models, spatio-temporal extrapolation need not necessarily be equivalent to environmental extrapolation. A high priority for empirical modelling in general is that sampling effort covers as wide a range of covariate values, and combinations thereof (Oedekoven et al. 2012b). Conceivably, this can be achieved in a single survey, but may be difficult given the logistical constraints on spatial extent (i.e. it may be difficult to survey highly contrasting environments without covering prohibitively large distances). Nevertheless, it may be possible to achieve by pooling multiple surveys together. Matthiopoulos et al. (2011) discuss how a survey can be considered as a single *sampling instance* in a pooled data set. That paper first formalised a generalised model for functional responses in habitat selection (known as a Generalised Functional Response - GFR). Functional responses broadly describe changes in use of an environmental variable as a function of the value of that and other environmental variables (Arthur et al. 1996, Mysterud and Ims 1998, Boyce et al. 1999, Holbrook et al. 2019). For example, in trophic ecology, a functional response describes the consumption of prey by a single predator, as a function of the prey's abundance (Holling 1959). Generalisations of the Holling concept of functional responses to multiple prey give rise to multispecies functional responses (Asseburg et al. 2009, Smout et al. 2010). In a similar way, a GFR describes how preference for a particular habitat changes in response to the availability of *all* habitats in the environment of an individual, or a subpopulation (e.g. a seabird colony). Such models have been shown to bring considerable gains in predictive power for environmental scenarios that are within the range of environmental values recorded in the pooled data (Matthiopoulos et al. 2011, Paton and Matthiopoulos 2018, Holbrook et al. 2019).

#### Environmental extrapolation

As soon as we require predictions for scenarios found outside the previously observed spatiotemporal and environmental frames of reference (i.e. the environmental profiles used for model-training), several processes begin to interfere with our predictive ability. Sinclair et al. (2010) identify no fewer than ten major biological and methodological obstacles to the success of such predictions. Examples include the appearance of unprecedented environmental domains and concurrent alterations in the community context in which the focal species is embedded.

Although functional response frameworks have been found to increase the robustness of predictions outside the extremes of observed scenarios (Matthiopoulos et al. 2011), it is nevertheless not clear how far they can be pushed. Indeed, there is currently no formal method for measuring the degree to which a particular prediction scenario is one of environmental interpolation or extrapolation.

The challenge of environmental extrapolation is particularly serious for forecasting (Sinclair et al. 2010, Tuanmu et al. 2011), which happens to be the main objective of anticipatory modelling in modern ecology. Considering even more challenging problems such as evolution and adaptation makes it clear just how under-equipped statistical SDMs are in dealing with extrapolation. Arguably, increasing the mechanistic content of SDMs (see Section 3.1) increases predictive ability (at the risk of model misspecification). Hence, there is now a clear tendency in the literature to consider species distributions in the backdrop of the population dynamics of a species (McLoughlin et al. 2010, Ehrlén and Morris 2015a, Matthiopoulos et al. 2015, Turlure et al. 2019), and also in the context of whole ecological communities (Ovaskainen et al. 2010, Calabrese et al. 2014, Distler et al. 2015).

### 5.6 Computational efficiency

#### Non-linear and spatial effects

The use of spatially autoregressive models, particularly in combination with non-linear predictors in an MCMC context, is computationally very expensive and shortcuts are necessary (e.g. the "covariate" model in Pacifici et al. (2017), or the list of four methods cited in Section 5.1 of Chakraborty et al. (2011)). For the particular approaches discussed in this report the key challenge lies in estimating biologically interesting and important parameters (pertaining to density dependence, inter-colony effects and inter-specific interactions in distribution – see Section 5.3) at the same time as dealing with spatially and temporally structured residuals. The methodological literature on models with spatiotemporal structure has been revolutionised by approximate Bayesian methods that either deal with fully non-linear models (as is the case with Approximate Bayesian Computation - ABC - Beaumont 2010, Csilléry et al. 2010) or deal with linearised versions of these models (as is the case with Integrated Nested Laplace Approximation – INLA - Martino and Chopin 2007, Rue et al. 2009, Lindgren 2015, Bachl et al. 2019). ABC methods have yet to meet with broad application in SDMs, so there is limited understanding of how to implement them in that setting. On the other hand, INLA methodology was developed initially for spatial point process modelling, so it is ideal for the purposes of SDMs. The question therefore is how best to linearise the models for combined seabird surveys.

The models implemented in the joint survey R library accompanying this report illustrate the MCMC implementation of both of these features but are very difficult to use for extensive spatio-temporal predictions. Nevertheless, the ability to fit these models implies that we can get good posterior estimates for the parameters participating in the non-linear parts of the models, while correctly accounting for spatiotemporal autocorrelation. This suggests three steps forward, in terms of computation: First, a pilot model-fitting exercise can be used within a limited spatial range to obtain posteriors for parameters in the non-linear components of the model (Section 5.3). This would follow the protocol of model SPATIAL in the attached JointSurvey R package and described in the accompanying vignette. Second, having obtained the posteriors for their parameters, these non-linear functions can then be considered as constructed covariates in a linear model (see further details in package vignette). Third, this linearised model can be fitted in INLA to enable wider inference for non-stationary processes (e.g. under a GFR framework) but, most importantly, to allow the treatment of spatial and temporal structures for inference and prediction.

An interesting development in this area is the ongoing extension of INLA under the INLABRU library (Bachl et al. 2019) to enable users to fit non-linear responses very similar to the ones outlined in this report. Pending computational evaluation of this facility, it would allow the workflow, in its entirety, to be ported into INLA.

#### Model selection

Models with high computational demands become particularly cumbersome if they need to be fitted repeatedly for the purposes of model selection (e.g. in order to derive likelihood ratios or information criteria). A particularly computationally expedient approach recently suggested by Renner et al. (2019) combines i) likelihood maximisation with ii) localised models of spatial effects (such as area-interaction models) and iii) shrinkage methods in-lieu of model selection. Likelihood maximisation is generally faster than Bayesian methods because its task is to find optimal parameter values, rather than to describe full posterior distributions for the parameters. Of course, this goes counter to the requirement for uncertainty propagation to the final results, but approximate measures of uncertainty can be obtained. Localised models of spatial effects capture the spatial dependencies that matter most, and therefore avoid calculations pertaining to negligible long-distance effects. Finally, shrinkage estimators reduce the value of parameters for non-influential explanatory variables to zero, hence performing the task of model selection without stepwise forward addition or backward elimination of variables.

### Contact

Email: ScotMER@gov.scot

## There is a problem

**Thanks for your feedback**