Automated video identification of marine species (AVIMS) - new application: report

A commissioned report on the development of a web-based computer application for machine learning-based (semi-)automated analysis of underwater video footage obtained during the monitoring of aquatic environments.


Datasets and Experiments

During the development of the web application, we were provided with several video datasets which were meant to aid the development and testing of various aforementioned features of the AVIMS application. The datasets have been provided throughout the project by means of the evolving prototypes of the web application. They have been partially annotated by biologists at Marine Directorate using AVIMS tools. This allowed us to produce initial machine learning models and test further functionalities of the AVIMS application including model training, results reporting and inference.

The datasets provided have been split according to a survey type as described in the Methodology section and included : Sea Pens, Rocky Reef, River – infrared, River – daylight, Vaki light boxes, Smolt Trawl, fan mussels and horse mussels.

It is important to emphasise that the machine learning experiments performed here were not meant to measure the algorithm performance that might realistically be expected by Marine Directorate for each of the survey types in future, but rather help us to develop and debug the machine learning applications and teach the Marine Directorate future users the most appropriate ways of creating datasets, training models and analysing results.

The numbers of labelled images in each of the above datasets were very low for the standards of modern object detectors utilising deep learning such as those we use here. The datasets included no more than 200 image frames in most cases, which often came from one or just a few videos. Further, some classes in the chosen schemas were very poorly represented in the training sets (heavily imbalanced datasets). Hence, it is not surprising that the resulting machine learning models could not generalise adequately, and the reported performance of the system as measured using mean average precision of the object detector was below the level that we would like to see in a practical application.

The most satisfactory initial results were obtained for counting fish in overhead in river counters for the River Daylight and River Infrared datasets. The machine learning models were trained from datasets comprising 81 labelled fish and 61 otters (River Daylight) and 208 fish (River Infrared). Here, the mean average precision ranged from 9% to 55% for a very challenging condition where the test camera was not part of the training set, and for an easier (and unrealistic) condition where some images from the training video (not the same as in the training set) were part of the test set. The analysis of the inference results confirms that the system was able to track fish and otters. Some overcounting took place usually in places where unusual (unseen during training) water turbulences were present, but these could be eliminated if more images were available for training.

For Smolt Trawl and Vaki light boxes datasets, we have observed that the chosen class schema (which included sex differentiation for salmon) was most likely too ambitious for the limited size of the training set and consequently resulted in low mean average precision of the object detector and ultimately significant overcounting of fish in the videos. This teaches us that despite the fact that fish were clearly visible in the videos provided (better visibility than in overhead in-river counters discussed above), having too many (similar) classes of fish may significantly degrade the performance of the detector and consequently the tracker, in particular where the training set contains few images.

The Sea Pens dataset has proven to be particularly challenging. The image features are arguably more difficult to spot for the human observer than the previous applications. The chosen class schema contained 10 classes of which 7 had no more than 20 samples in the dataset. Consequently, the reported mean average precision of the object detector was low and the object tracking resulted in noticeable overcounting. The overcounting was particularly significant for those parts of the videos where image features related to the disturbed sediment (not seen in the training set) were present.

The machine learning experiments performed to this stage have proven that the developed features of the web application work correctly for all datasets. The performance of machine learning algorithms do vary between different survey types and will require careful design of the class schema for each survey type and the annotation of a greater number of images than were available during the development of this application.

Contact

Email: craig.robinson@gov.scot

Back to top