Develop best practice recommendations for combining seabird study data collected from different platforms: study

This study developed best practice guidance to combine seabird survey data collected from different platforms based on a literature review, expert knowledge and a bespoke model development including sensitivity analysis. This can be used in environmental assessments for planning and licensing.

This document is part of a collection

1 Executive summary

Existing frameworks for the statistical analysis of spatial survey data offer a clear workflow towards the estimation of absolute and relative abundance of wildlife, in association with present and future environmental profiles (whether naturally or anthropogenically effected). At the same time, more broadly in applied ecology, there is a keen interest in integrated analysis and adaptive resource management. Momentum behind these ideas is encouraging the incorporation of different sources of spatial information onto a single, joint inference framework, so that statistical power can be greatly enhanced, even if the data themselves cannot be directly pooled because of their qualitative differences. The present project used systematic literature review, expert knowledge on survey methodology, bespoke model development and sensitivity analyses on realistic simulation data to derive methodological and quantitative guidelines for best practice in conducting such joint inference for multi-platform seabird survey data. We subdivide our recommendations into six distinct categories.

1.1 Appropriate response and explanatory variables

  • Keep the highest-grade form of data. Data collected in aggregated or thresholded form can be analysed but these operations should be avoided on highly resolved data.
  • Analyse even low-grade data as if originating from abundance. Use of latent surfaces of abundance allows us to interface lower-grade data with high resolution inference.
  • Avoid inflated error structures until the end of modelling. Modelling with covariates will generally explain some of the over-dispersion in the raw data and use of spatially and temporally auto-correlated errors will account for unexplained hot- and cold-spots in distribution.
  • Partly missing covariates should not necessarily lead to data censoring. This may prove necessary in the end of the analysis however, it may be worth attempting to reconstruct the covariate either as a separate interpolation step, or as part of an integrated analysis with partially missing data.

1.2 Treatment of survey design attributes and observation errors

  • Use distance sampling. Distance sampling techniques facilitate the pooling of surveys with different protocols by reducing them into a common set of detectability characteristics. The extensions of distance sampling that deal with transect design and the incorporation of covariates facilitate error correction.
  • Prioritise cross-calibration between surveys. Joint analysis of multiple surveys allows the combination of high detectability and high span. Surveys with known detectability errors should be prized highly because they can be used within a joint analysis to cross-calibrate less detailed surveys that may have happened close in space and time.
  • Consider state-space approaches. Rather than correcting the observations for biases, prior to the formal analysis, a statistical observation model is combined with the biological model to effect the necessary correction in an integrated way. Both the biological and the observation models are tuned with regard to each other and uncertainty propagation from the observation model to the final predictions happens automatically.

1.3 Treatment of space time

  • Use point process models. Point process models allow space-time to be modelled jointly and continuously, they subsume all other valid approaches to species distribution modelling and are compatible with other features of modelling developed to enhance predictive power.
  • Use auto-correlated structures. Spatially and temporally auto-correlated structures can account for missing covariates, they can be used to impute gaps in covariate layers and, most importantly, they are the best way to leverage information sharing between surveys according to their spatiotemporal overlap or proximity.
  • Take dynamics into account. If we need to account for multi-survey data that include before-and-after control impact, it is important to account for temporal non-stationarity.

1.4 Accessibility and density dependence

  • Use realistic distance measures. If we are concerned that birds avoid flying over land, circumnavigate human structures, or if, due to glide-flight, they rely on prevalent wind direction, it is important that we account for these effects in the measure of distance.
  • In the present, use abstracted models for density dependence. Currently, the computational demands of a fully spatially explicit model of intra-colony, inter-colony and interspecific competition are prohibitive. We have provided an illustration (in the project vignette), of how a pragmatic model for these processes can be developed and incorporated into joint modelling.
  • In the future, consider spatially explicit models for density dependence. As computational approaches become more widespread in the field of SDMs, it may become possible to model competition in a fully spatially explicit way.

1.5 Inferential Platforms

  • Use hierarchical models. These allow us to use features such as cross-calibration of observation models, covariate imputation and latency, and use of spatio-temporal proximity to allow the predictions to borrow strength from multiple surveys.
  • Use Bayesian approaches. Computer-intensive Bayesian model-fitting allows state-space and hierarchical structures. More importantly, Bayesian inference permits the elicitation of expert opinion in the form of parameter priors.
  • Use Data integration. Under joint inference, multiple data sets are analysed simultaneously to extract maximum power. These approaches are also particularly useful for extending the analyses to non-survey data.
  • Fully propagate uncertainty to the final predictions. There is always a limit to how much missing information can be imputed by statistical modelling. The multiple sources of uncertainty along data collection and estimation need to be translated to aggregate measures of precision in the final spatial predictions.

1.6 Computational platforms

  • Support open source. As a matter of process, all code developed by government funding should be made available to the scientific community.
  • Ensure strong interface with Geographic Information systems. Establishing stable protocols for data formatting, and by using the GIS functionality in platforms like R would allow data processing on a single platform.
  • Parameterise non-linear model components with exact methods. The prototype models presented in the jointSurvey library are computationally greedy, but they have the best chance of retrieving the difficult parameters pertaining to density dependence and competition.
  • Implement large scale predictions using fast approximate methods. It is imperative to move towards efficient methods, such as INLA.

In addition, we review future extensions of methods that could facilitate integration of single-species, multi-survey data with a variety of other data, including survey data from other species, shore-based vantage point data, citizen science data, telemetry (tracking) data, mark-recapture data and demographic data.



Back to top