Develop best practice recommendations for combining seabird study data collected from different platforms: study

This study developed best practice guidance to combine seabird survey data collected from different platforms based on a literature review, expert knowledge and a bespoke model development including sensitivity analysis. This can be used in environmental assessments for planning and licensing.

7 Future extensions

7.1 The promise of integrated hierarchical models

The use of hierarchical approaches in SDMs (Keil et al. 2013, Hefley and Hooten 2016, Pacifici et al. 2017, Fletcher et al. 2019) follows the principles set out in previous sections of this report. In particular, it assumes a true but unknown underlying distribution (usually, the continuous intensity surface of the heterogeneous Poisson process), which is observed by one or more methods that may be incomplete or imbalanced in their spatiotemporal coverage, biased in consistent ways and imprecise in other ways. This separation between the biology (which forms the objective of statistical inference) and the observation processes that generate data from it, allows us to do two useful things: first, to allocate proportionate modelling effort to the formulation of the methodological imbalances, biases and imprecisions. This leads to error models that separate natural stochasticity (a biological source of uncertainty) from methodological artefacts. Second, it allows the use of multiple observation models for a single underlying truth. We have seen in this report that such complementary use of different surveys and, possibly different methods, can lead to benefits such as the cross-calibration of methodologies, the combination of high spatiotemporal extent and resolution, and the ability to reinforce predictions that are explicitly spatial and temporal.

From the perspective of seabird SDMs, hierarchical models can perform three types of integration. They can bring together data from multiple line transect surveys (as explored in this report and the accompanying vignette), they can combine survey data with distribution data of fundamentally different types , such as occupancy data from citizen science records (e.g. Keil et al. 2013, Hefley and Hooten 2016, Pacifici et al. 2017, Fletcher et al. 2019), but they can also combine distribution data with supporting information that underpins the analysis with more mechanistic principles. Such precedents of multi-data integration are considerably more developed in the area of population dynamics (Buckland et al. 2004, 2007, Newman et al. 2014, Zipkin and Saunders 2018). In this section, we briefly explore possibilities for data integration beyond the multi-survey context.

7.2 Multi-species surveys

Surveys at sea offer the opportunity to track multiple species. This is an alternative interpretation of the multi-survey idea, in the sense that the same platform provides multiple datasets. Interest in hotspots of biodiversity has led to the idea of stacking single-species SDM models (Calabrese et al. 2014, D'Amen et al. 2015, Distler et al. 2015). Although stacking is not an integrated analysis in the sense outlined in this report, it has been useful in demonstrating the magnitude and duration of seabird aggregations or partitioning in the open sea from both survey (Nur et al. 2011) and tracking (Jones et al. 2015, Grecian et al. 2016) data. However, an interesting research direction lies in allowing data sets from multiple species to gain strength from each-other. We outlined earlier how spatiotemporal proximity can be used to borrow strength by jointly analysing a collection of surveys that have been carried out within a defined geographic region and time window. The same idea could be extended to develop hierarchical models using taxonomic or functional proximity (Kindsvater et al. 2018) . Multispecies SDMs could be developed to quantify the (apparent) associations between species (Guisan and Zimmermann 2000, Ovaskainen et al. 2016, Thorson et al. 2016), and then these could be used to reconstruct and predict the distribution for any-and-all of the species participating in the model. This approach can also have potential as a cross-calibration method in correcting for errors due to detectability, or unknown observation effort (Chambert et al. 2018, Peel et al. 2019).

7.3 Combination with vantage point data

Several data types could come under this category. The most important is derived from on-shore observation stations (e.g. by use of total station comprising theodolite and distancer). These could be important sources of information for near-shore distribution. Their combination with line transect survey data is relatively straightforward since both data types belong to the broader class of transect methods (Buckland et al. 2001). Terrestrial habitat preferences for seabirds are a considerably less studied aspect of their biology, but one that is particularly pertinent for determining the placement of potential new colonies and for examining nest placement within colonies. For example, (Clark et al. 2019) used integrated modelling of transect and burrow occupancy data to map out the distribution of a cryptic seabird on the colony. Of particular relevance for studying human-seabird interactions is the terrestrial distribution of scavenging species such as gulls, as it shifts away from marine foraging.

Other data could come from methods of detection/non detection such as acoustic stations or camera trapping (Ngoprasert et al. 2019). Their integration with survey data is equivalent to the combination between occupancy and abundance (Keil et al. 2013, Hefley and Hooten 2016, Pacifici et al. 2017, Fletcher et al. 2019).

7.4 Combination with citizen science data

Methods for coordination of citizen science programmes are flourishing in ecology (Bonney et al. 2009, Amano et al. 2014, Chase and Levine 2016, Giraud et al. 2016, Kosmala et al. 2016, Wald et al. 2016, La Sorte et al. 2018) and so is the development of statistical methodologies for dealing with the fundamental restrictions in the quality of such opportunistic data (Hochachka et al. 2012, Bird et al. 2014). The main issue with citizen scientist data isn't so much the higher level of bias or imprecision in species identification that might arise in some cases, but rather, the heterogeneity in those across individual observers, through space and time. Although it is possible in principle to account for such heterogeneities in analysis frameworks, the task is made difficult by the frequent absence of information on effort, precision and accuracy. Such gaps in knowledge then need to be supplemented by proxies (such as plausible assumptions about the behaviour and distribution of citizen observers or more detailed models of these). It is also possible that integrated analysis of multispecies surveys (see Section 7.2) may help by allowing the collective detections of all species to act as an approximation of the effort distribution. The combination of citizen science data with survey data may happen either by using the opportunistic data to fine-tune the design of surveys (Reich et al. 2018) or by analysing them together in an integrated platform using different observation models (Nelli et al. 2019).

7.5 Combination with telemetry data

The idea of combining survey with tracking data is as old as the early days of satellite telemetry. This combination however has proved particularly challenging. There is a fundamental difference between those two data types: Surveys focus on particular regions of space and can (in principle) observe any individual animal in the population. Telemetry studies focus on particular individuals and can (in principle) observe any region in space. Therefore, we have a situation of incompatibility, which (like many of the data-pooling problems discussed in this report) could be turned into a situation of complementarity, although as yet that has not been achieved. Studies that have attempted this marriage in the seabird literature have often tended to inflict heavy censoring on the data. For example, (Louzao et al. 2009) found it necessary to convert survey data into occupancy and to select a single foraging trip from each tagged bird, achieving a form of indiscriminate pooling (see Section 4.1). There are papers (e.g. Carroll et al. 2019) that have taken an ad-hoc comparative approach (see Section 4.2) and papers (e.g. Yamamoto et al. 2015, Zipkin and Saunders 2018) that have followed more powerful approaches of post-hoc combination (see Section 4.3). However, none of the current approaches have achieved fully integrated inference. A major obstacle to joint inference is the incongruence between frameworks used for these two data types. Telemetry data are most conveniently analysed via step selection functions (SSFs), while resource selection functions (RSFs) are most appropriate for survey data. A fundamental problem with these approaches is that they do not, by default, lead to the same results. Specifically, scaling up by simulation the microscopic model obtained via SSFs does not yield the same expected distribution generated by an RSF (Signer et al. 2017). A promising development in this area is the convergence between the frameworks of resource selection and step selection analyses (Michelot et al. 2019b, 2019c). This work has established the conditions under which SSF and RSF frameworks agree, and has derived methods for joint inference (Michelot et al. 2019a).

7.6 Combination with mark-recapture data

Mark-recapture data have rarely been used to map seabird distributions and fit habitat models (Camphuysen et al. 2004), however, they are a potentially valuable repository of spatial data that are also individually referenced. In a sense therefore, mark-recapture data carry intermediate information between point transects and telemetry tracking and could, in the longer-term benefit from current developments in the integration between these two (see Section 7.5).

7.7 Combination with non-spatial data

Integration into SDMs of non-spatial data can nevertheless be valuable for spatial prediction. We have argued at several points above that SDMs can benefit by being embedded in the dynamics of the population and community that they refer to. There is now a consistent move to think more mechanistically about the constraints of species distributions by connecting them to those other aspects of ecology (Morales et al. 2010, Ehrlén and Morris 2015b, Matthiopoulos et al. 2015, Mcloughlin et al. 2018, Zipkin and Saunders 2018, Yen et al. 2019).



Back to top