Automated video identification of marine species (AVIMS) - new application: report

A commissioned report on the development of a web-based computer application for machine learning-based (semi-)automated analysis of underwater video footage obtained during the monitoring of aquatic environments.


Object Detection

Further to the recommendation of (Blowers, Evans and McNally, 2020), our application supports instance segmentation in addition to object detection. Where an object detection algorithm detects the presence of an object in an image and estimates its size, an instance segmentation algorithm proposes an outline surrounding the region of the image that the object covers. This can be advantageous in situations where the object density is very high, as the rectangular region proposed by an object detector can cover several objects, making the intended target ambiguous.

The work of (Blowers, Evans and McNally, 2020) outlines technical details of some of the components described below. They discussed the Google Object Detection API – that is part of Tensorflow – as a potential approach for object detection. Tensorflow is one of a number of open source deep learning systems that are available.

Many of the computer vision libraries provided by deep learning toolkits implement the Faster R-CNN object detection algorithm that proved to be a strong contender in (Blowers, Evans and McNally, 2020). It is expected that the various implementations will offer similar accuracy. One such implementation we have used in the past is provided by the torchvision library (Torchvision 0.16 documentation (pytorch.org)) that is part of the PyTorch ecosystem (PyTorch - an open source machine learning framework (pytorch.org)). PyTorch is an open source deep learning toolkit developed by Facebook AI Research (FAIR) whose goals are very similar to those of Tensorflow. We also note that torchvision provides an implementation of the Mask R-CNN instance segmentation algorithm that we wished to utilise. Our choice of Pytorch over Tensorflow has been motivated by the ease of development using PyTorch in comparison to other systems and the ease with which it can be incorporated into the final deliverable application. Other systems including TensorFlow often require the installation of several dependent packages, complicating the installation process and also subsequent maintenance.

Consequently, in this application we utilize the Faster-RCNN algorithm implementation provided by torchvision. They provide a model that combines an ImageNet (ImageNet (image-net.org)) pre-trained ResNet-50 (Deep Residual Learning for Image Recognition (arxiv.org)) backbone with an FPN head (Feature Pyramid Networks for Object Detection (arxiv.org)) and is fine-tuned for object detection using the CoCo dataset (COCO - Common Objects in Context (cocodataset.org)). We adapted this network by removing the final class prediction and bounding box regression layers and replacing them with new layers with the appropriate outputs for the object classes defined by the labelling schema designed by the AVIMS site users. The network was fine-tuned using the training dataset.

AVIMS site offers the option – in fact defaults to it – of detecting objects as oriented ellipses rather than axis-aligned bounding boxes. Rather than using a specific oriented object detection network, we train a Mask-RCNN (Mask R-CNN (arxiv.org)) instance segmentation model. We convert the oriented ellipse labels to masks with oriented elliptical shapes that are used as Mask-RCNN training targets. Our algorithm takes the predicted instance mask and computes the closest fitting oriented ellipse using the regionprops function provided by the SciKit-Image library (scikit-image: Image processing in Python (scikit-image.com)).

Contact

Email: craig.robinson@gov.scot

Back to top