# Develop best practice recommendations for combining seabird study data collected from different platforms: study

This study developed best practice guidance to combine seabird survey data collected from different platforms based on a literature review, expert knowledge and a bespoke model development including sensitivity analysis. This can be used in environmental assessments for planning and licensing.

## 4 Approaches to multi-survey modelling

There is an appetite more broadly in applied ecology for integrated analyses and adaptive resource management. Such ambitions (and, indeed, the terms "data integration" or "data pooling") are motivated by the statistical community (see Section 7.1 on extensions) but are also expressed by more descriptive papers (e.g. Perrow et al. 2015), indicating that there is an increasing dissatisfaction with piece-wise comparisons between surveys and species. Momentum behind these ideas is encouraging the incorporation of different sources of spatial information onto a single, joint inference framework, greatly enhancing statistical power, even if the data themselves cannot be directly pooled because of their qualitative differences. For a fixed amount of effort, any survey will make a decision on the trade-off between spatial/temporal resolution and extent. Different surveys may have entirely different designs, and their overall effort may also differ. These discrepancies offer challenges, but also opportunities for complementary use of different surveys.

### 4.1 Indiscriminate pooling

The most naïve but (at 73% of reviewed papers by Fletcher et al. 2019) also, the most prevalent approach to dealing with multi-survey data is to pool them without considering their particular observation biases and imprecisions. The (rather wishful) expectation is that somehow these errors will cancel each-other out to give unbiased estimates of distribution and habitat preferences.

### 4.2 Ad-hoc Comparison

The most obvious approach to using data from multiple sources is to analyse each source individually and then compare the outputs of such analyses. In some cases, the comparison takes the form of validation and calibration (Munson et al. 2010), but this has assumed the existence of a gold standard (i.e. a high-resolution, precise and accurate data set), which may not necessarily be available, particularly in the marine environment. Nevertheless, the core idea of calibrating one data set based on another (as featured in Munson et al. 2010 and elsewhere) need not require that either data set is perfect. Indeed, imperfect observations "borrowing strength" from each other has been widely applied elsewhere in spatial survey design (see double-observer methods in Buckland et al. 2010). This is a concept that we will rely on in Section 4.3 and beyond.

However, staying, for now, with the idea of map comparisons from different (imperfect) data sets, these are carried out either visually (e.g. Bradbury et al. 2014, Perrow et al. 2015), or via some ad-hoc quantitative method (Sardà-Palomera et al. 2012, Sansom et al. 2018). For example, (Sansom et al. 2018) used four distinct analyses carried out on data (both survey and telemetry) from four UK seabird species. Using as their starting point the utilisation maps generated from each analysis, on each species, they performed all possible pairwise comparisons. They focused on overlap between each pair of maps measured both as the extent (area) and density (utilisation) shared by them at their core areas (defined using varying density contours). This allowed them to discuss patterns of similarity in these estimated snapshots of distribution. However, they were not in a position to draw combined inferences about parameter values relating the patterns of utilisation to their underlying covariates. Further, they were not in a position to share statistical power between surveys conducted on the same species, possibly at similar times or regions.

Overall, therefore, such ad-hoc comparisons are biologically valuable because they inform intuition and motivate scientific hypotheses. However, methodologically, they are of limited utility because they do not facilitate the flow of information between data sets.

### 4.3 Post-hoc combination

An improved approach which allows information from one data set to flow into another (but not the other way around) is a sequential analysis, which completely deals with one data set first and then somehow incorporates the second data set as a second stage of fitting. Such approaches are not wide-spread and they seem to be specific to the particular analyses at-hand (Yamamoto et al. 2015). However, particularly in the context of Bayesian updating, where sequential analyses are possible, it is plausible to think of methods that use the results of one analysis (based on a single data set) to specify priors for the analysis of the next data set (Matthiopoulos 2003b, Talluto et al. 2016). Such ideas have been proposed, but not realised in spatial ecology, mainly because they require assigning parametric probability distributions (the priors) to space as a whole.

- An alternative idea, ensemble forecasting, examines a large (infinite, even) models of a system (Araújo and New 2007). Instead of picking the best model from the ensemble, assuming that each model carries some independent information, the combination of forecasts from different models is characterised by a lower mean error than individual forecasts. This idea generalises on the field of model averaging (Dormann et al. 2018) because ensembles can be created by examining different models, different parameterisations of the same models, different initial or boundary conditions and different stochastic realisations from each model (Fig. 1 in Araújo and New 2007). Post-hoc combinations from an ensemble can be unweighted (i.e. combination by committee) or weighted according to some measure of quality (e.g. based on assessments of data precision). Established model averaging methods adopted in ecology have previously used weights derived from information criteria (Burnham and Anderson 2004, Burnham et al. 2011), hence rewarding parsimony in the weighting.
- The above post-hoc approaches seem to fall naturally into categories of parallel and sequential model fitting. The "wisdom of crowds", a pervasive idea represented here by ensemble modelling, achieves a pooling of predictions from a collection of
*parallel*models. This leads to robust predictions, but the models are not allowed to inform each other. Less developed, but perhaps more powerful ideas about*sequential*model-fitting (Matthiopoulos 2003b, Yamamoto et al. 2015) allow later models to be informed by earlier ones, but the information flow is unidirectional.

Summarising some of the above ideas in their recent review on data integration, (Fletcher et al. 2019) identify three distinct cases of ad-hoc combination. 1) Ensemble modelling of independent models, 2) Use of the maps produced by one model as a covariate participating in the linear predictor of the other model and 3) In a Bayesian context, using one model to generate informative priors for the parameters or the predicted distributions of the other model. Together, these categories take up about 20% of the data-combination literature.

### 4.4 Spatial data integration

Integrated analyses of multiple data sources aim to enhance statistical power by greatly increasing the effective sample size of the data set but, also, by using data from different regions, different times and spatial resolutions in a complementary way. Developing the fundamentally useful idea of calibration (see Section 4.3), into the more general concept of integration, several papers (Fletcher et al. 2016, Pacifici et al. 2017, Koshkina et al. 2017, Peel et al. 2019) examined whether using presence-only (opportunistic) data in combination with the gold standard of presence-absence (survey) data could improve the descriptive and predictive ability of species distribution models. This is indeed likely to be the case, but the statistical method for achieving it must first be considered, so that the inferential platform, built from the perspective of calibration, can be used for integrated analyses that do not necessarily contain a gold standard.

Fletcher et al. (2016) pointed out that the first methodological decision in data integration is whether space should be treated as a nested hierarchy of grid resolutions (e.g. Keil et al. 2013, 2014) or as a continuous plane of coordinates. The former approach is possible by conditioning higher resolution grid cells on the observed/estimated contents of lower resolution grids, but this runs the risks of data mismatches between scales. The alternative, of treating space as continuous is represented by the Inhomogeneous Poisson Process approach, discussed in Section 3.1. The key conceptual advantage of IPPs is that they acknowledge that spatial processes occur at individual points in space and may remain unobserved (see, thinned IPPs), be reported with some spatial error, or be aggregated into counts at coarser spatial resolutions. Hence, the IPP paradigm recognises that the data will have an underlying common scale, even if they are reported at coarser resolutions. The underlying IPP is considered *latent* or *unobserved*. Different data can then be considered to originate from it, subject to the span and detectability limitations of the particular survey scheme (see Section 3.2). This allows us to write a *joint likelihood *for multiple data sets, conditional on the latent IPP. The distinction between a latent biological process and the different data-collection processes that can be used to observe it gives us the ability to think more mechanistically about the origin of the data (Hefley and Hooten 2016, Fletcher et al. 2019).

Data integration must also be done in a way that does not misleadingly increase the apparent precision of the results and model predictions (Miller et al. 2019). For example, un-modelled spatial and temporal autocorrelation in the data (see Section 5.4) may artificially inflate the apparent sample size of the data, despite the prudency recommendations made for spacing out observations and transects (see Section 3.2). These concerns about pseudo-replication apply particularly for multi-survey analyses because different surveys may have overlapped in space or in time. Alternatively, uncertainty contained in the pre-analysis of transects (e.g. uncertainty in the detection function see Section 5.1), if not propagated to the final results, may under-represent the uncertainty in distribution. All of these mechanisms could threaten the precautionary approach and have adverse implications for management and policy decisions. Therefore, uncertainty in the observation processes from different surveys and the habitat data must be correctly propagated to the end-results, to give a reliable measure of precision. The current situation in the literature is far from ideal, given that most published marine SDM studies (94%) have failed to report the amount of uncertainty derived from data deficiencies and model parameters (Robinson et al. 2017).

Fletcher et al. (2016), Pacifici et al. (2017) and Peel et al. (2019) found that the combination of the data gave better explanatory and predictive performance than either of the two data sets on their own. Crucially, the use of opportunistic data improved the performance of the model based on formal survey data. The authors attributed these improvements to the sheer sample size of opportunistic data and their broader extent compared to survey data, both spatially, but also in terms of environmental variables.

A central theme in integrated SDMs is the idea of *complementarity *in achieving spatial breadth and depth. In most situations of data-collection, logistic and budgetary constraints mean that we need to settle on trade-offs between the resolution and the extent of surveys. For example, given a fixed amount of ship-time, covering a greater area at sea necessarily means using sparser transects (i.e. either increasing the distance between successive observation points, or increasing the spacing between transect lines). Similar trade-offs exist between different types of data. For example, opportunistic data tend to have greater sample sizes but lower accuracy and precision, compared to formal survey data. Several authors (Pacifici et al. 2017, Nelli et al. 2019) have now pointed out that, by integrating different surveys and different data types into one analysis we do not merely achieve an increase in sample size, but a complementary use of the different spatial extents and resolutions that characterise these data. Complementarity means that detailed features of species distributions can be embedded in big-picture data, even where such details have not been directly observed.

Spatial data integration need not only be used with data that inform the same latent surface. For example, hurdle approaches that combine abundance conditional on occupancy (or occupancy conditional on abundance) have traditionally been implemented as two-stage analyses (Waggitt et al. 2019). However, these can be easily implemented as integrated analyses, as in Clark et al. (2019).

### Contact

Email: ScotMER@gov.scot

## There is a problem

**Thanks for your feedback**