A machine’s guide to birdwatching

Posted by Ben Hughes on 27 May 2021, last updated 27 May 2021

The 'Watches' (Springwatch, Autumnwatch, Winterwatch) are the home of birdwatching and UK wildlife at the �鶹Լ��. However, alongside its team of naturalists, the programme has started to rely on another kind of birdwatcher; artificial intelligence (AI). Machine learning tools provided by �鶹Լ�� R&D are now used to photograph and identify birds from remote cameras around the country. Now, as part of our work in explainable machine learning, we have created an interactive tool for identifying garden birds and finding out what a machine really knows about birdwatching.

The Intelligent Production Tools team have worked with �鶹Լ��’s Spring/Autumn/Winterwatch, capturing images and videos of wildlife using artificially intelligent camera traps. Part of their work has also covered image classification; taking the captured image and using machine learning to automatically identify what species of animal it contains.

In our team, we’ve been researching different ways of explaining machine learning (ML) to non-specialists. Our work on wildlife and machine learning started as my placement project, but we quickly realised there was a more general opportunity to provide a useful machine learning-based bird identification tool, while also experimenting with explainable machine learning techniques.

Image classification is a well-studied area of explainable ML, and comes with the possibility of using accompanying visuals to provide insight into how a classification is made. There is also a human precedent for using specific visual features to identify birds - bird watching and bird identification guides. Garden birds are something the audience are familiar with, particularly with the focus on gardens and wildlife that nationwide home isolation prompted.

The end result was a collaboration between many of us in the team: Jess Bergs and I focused on creating the ML model and deploying it as an API, while working with David Man and Tristan Ferne on the design and words of the explainer. Alicia Grandjean and Jess worked on the design and implementation of the explainer interface.

A Machine’s Guide to Birdwatching

“A Machine’s Guide to Birdwatching” consists of two parts: a tool for identifying British garden birds in uploaded images, and an explainer to explore how and why the machine has made a given prediction.

The tool allows a user to upload a photograph of a bird, or use preloaded examples. The chosen image is processed and the user is provided with a prediction of what bird the image contains, along with other likely candidates.

Bird identification tool

So how are these predictions made?

The tool uses a convolutional neural network (CNN) - a machine learning algorithm that is commonly used in image classification problems. An image classification CNN model takes a digital image as an input. The image is processed through multiple fully connected convolutional layers. These are statistical operations which are tuned to be activated by certain patterns and features in an image. Finally, the algorithm calculates which class of image has been activated most strongly.

The learning in machine learning comes from the fact that the model is trained to be activated by image features that lead to more correct predictions. In training, the algorithm is provided with images of different target classes, each labelled with a ground truth classification. Throughout the training the parameters in each convolutional layer are optimized to improve the accuracy of the classifications.

Rather than training a model from scratch, a common technique is to make use of transfer learning, when you take a model that has been pretrained on a general problem, swap the output classes with the ones you want, and retrain the model. This means that the model retains what it knows about shapes and patterns, but can quickly apply this to new subsets of data.

For our bird classifier, we used a pretrained ResNet34 architecture, a CNN model with 34 convolutional layers that is trained on ImageNet a database of more than 14 million annotated images scraped from the web. The classes included in this database range from everyday objects to unusual plants and creatures.

To create the bird classifier, we removed the final layer of the pretrained ResNet and replaced it with the image classes we wanted to identify: 70 British garden bird species. In order to retrain the model we needed labeled data containing images of these garden bird species. This was provided by the , a large collection of wildlife image data collected and identified by citizen scientists. Unlike many wildlife image collections, the iNaturalist dataset contains many labeled European birds, including nearly 300 British bird species.

We then retrained the model using these bird-specific images, tuning the convolutional layer parameters of the pretrained ResNet34 network. Using transfer learning means that the layers that are already able to recognize useful features don’t have to be retrained from scratch. Instead, most of the optimization happens in the last few layers, where combinations of these features are weighted to differentiate between specific bird species. Transfer learning reduces training time and means that less bird-specific training data is needed.

The training process is controlled using PyTorch, a machine learning library for Python. Using images in a neural net is a process that involves large amounts of graphics processing, so we opted for performing training cycles in the cloud using virtual machines with high performance GPUs. The classifier networks produced in this way took 1-2 hours to train fully.

The result is a model that calculates the probability of an image containing any one of the possible bird species. In the first part of the tool, we relay the bird species with the highest probabilities for an image back to the user.

Prototype software running on a Raspberry Pi

An early prototype, running on a Pi 3

Long-tailed birds

Machine learning algorithms of this kind operate like an opaque box. You see what data goes in and get a result back, but you’re not given any indication of how the decision is made, or whether it can be trusted. We explore this in the second part of the “Machine’s Guide to Birdwatching” prototype.

One approach to solving this problem is to try and explain why an individual image is given a certain classification. In the explainer, we use saliency maps to estimate which regions of the input image contribute most to the given classification. Class saliency maps work by calculating gradients throughout the convolutional layers of the algorithm, with respect to the output class. This can give an indication of the areas of the image that are activated the most by a given layer in the network.

In our explainer, these saliency maps can be interpreted as ‘where the machine was looking’ in order to predict a given species of bird. We used a PyTorch library called , created by �鶹Լ�� R&D’s Misa Ogura, to produce the image specific class saliency maps described above.

When a correct prediction has been made, saliency maps are great for showing what elements of an image the model considers typical of the bird species. Take the correctly classified magpie below - the saliency map has picked out the overall body shape and its distinctive feather pattern.

Salience map for a magpie

Image maps of this kind can also help explain an incorrect classification, or alert you to a mistake that has gone unnoticed. This image of a long tailed tit was incorrectly classified as a house sparrow by our tool. The incorrect classification could be due to a number of factors - the busy background, reflections in the water, or blurry movement of the bird captured on a camera.

But when we look at the FlashTorch map, we see that the main feature that distinguishes the long tailed tit from other tits, its excessively long tail feathers, have not been focused on at all. This provides another explanation of the why the mistake was made, and an insight into what is going on in the otherwise opaque machine learning model.

Salience map of a long-tailed tit

Long-tailed data

Tracing how a model predicts an individual image is useful, but the rest of the “Machine’s Guide to Birdwatching” explainer deals with the consequences of the training data on overall predictions that the model makes.

The training data is the most important building block of a machine learning model. Through being collected and classified by domain experts, the images and ground truth labels of iNaturalist dataset we used are of a high quality. However, this citizen science approach leads to very different numbers of photos of each species in the dataset, depending on which birds are most commonly photographed.

The plot below shows all 283 species of British birds in the dataset and the number of images of each bird that the dataset contains . You can see that although there are some species with over 1000 images, most have far fewer. In fact, nearly half the bird species have less than 100 images available to train on, with some having as little as 20 samples.

Plot of frequency of birds in the dataset

The phenomenon of having a few classes with lots of training data and lots of classes with very little training is known as a long tail problem, and is a type of unbalanced data.

We explore how unbalanced data can lead to issues is the next section of the explainer. For our prototype, we restricted the classes used to the 70 most common UK garden birds, according the RSPB’s annual Big Garden Birdwatch survey. This eliminated some of the less common birds, but still maintained a dataset that represented a long-tail data distribution.

In the explainer, we visualize this data as interactive bubbles. Each circle represents one of the 70 garden birds in our dataset. The area of each bird bubble is proportional to the number of training images for the given bird.

Bubble visualisation of the dataset

Ducks, starlings and sparrows feature heavily in the dataset

An important part of producing any statistical model is testing how well it performs. A common way to do this is to set aside a percentage of the dataset that won’t be used for training, and then examine how well the model performs on this unseen data. An overall accuracy can be calculated: the percentage of how many images in the test dataset are classified correctly.

To probe whether unbalanced data is an issue, we tested the model on 20 unseen images of each bird class, and calculated and accuracy for each one.

Sorted bubble visualisation of the dataset

The plot shown in the explainer shows a trend of the model being more accurate at classifying birds with more training images. A model systematically performing differently for different classes is an example of algorithmic bias. Algorithmic bias can easily arise from unbalanced data and can proliferate social and real-world biases that are contained in a dataset.

Dealing with unbalanced data

So an important question for a machine learning developer should be: how do I avoid unbalanced data biasing my algorithm?

The first and most effective answer is to use a different, more balanced, dataset. Although there is no such thing as a perfect dataset, the decision of what data to use has the most impact on how biased a model ends up. However, datasets are expensive and time consuming to produce, so options are often limited for a given domain. In our case, the unbalanced iNaturalist dataset was the best dataset available to provide enough training images of British bird species.

During the project we experimented with resampling techniques to minimize the negative effect of unbalanced data on the final model. Undersampling and oversampling are approaches that aim to vary the amount of data used per class in each training cycle.

Undersampling - reducing the amount data sampled from larger classes - can be achieved by setting a limit on how many images in each class are used. This random discarding of data may be throwing away useful information, but it can be a simple way to produce more balanced data.

Oversampling is the process of increasing the amount of useable training data in classes that contain fewer images. In the first instance, this can be done by duplicating images in the dataset; in effect, showing the algorithm the same data multiple times in each training cycle. Training on duplicates of an image can increase the performance on the duplicated class. However, duplicating small amounts of data can easily cause the model to over-fit and not generalize to unseen images of the smaller class.

To reduce over-fitting, data augmentation can be used. In the case of images, this can involve randomly cropping, resizing, rotating and changing the brightness or exposure of the training images. When oversampling, preprocessing means that the same image can be used multiple times with slightly different visual transforms applied on each occasion.

There are also more advanced methods of data augmentation where completely new data can be synthesized from existing data in a class. Synthetic Minority Over-sampling Technique (SMOTE) is a common technique that generates new training information based on halfway points between pieces of under-represented real data. While training the model, we experimented with these resampling techniques and plan to implement them when creating a classification models for production.

When assessing our machine learning model, attention also needed to be paid to the test dataset. By taking a percentage of the data to be used as a test dataset, we would end up testing the model on data with the same unbalanced distribution as seen in the training data. An overall accuracy on this test data would disproportionately measure how well the model classifies the birds that are most common in the dataset.

One way of compensating this issue in testing is by using evaluation metrics that take in account precision and recall, like the F-score, or looking at per-class metrics, like an averaged per-class accuracy. However, the approach we took was to create a balanced test dataset, with every bird class having 20 images that the model is tested on. This was done by undersampling large classes and supplementing the test images of smaller classes with additional images sourced from alternative datasets. We could then use standard accuracy metrics on the balanced test set in order to compare different models and explore the class bias in the explainer.

Back to school

The “Machine’s Guide to Birdwatching” explainer has currently been shared internally in order to get feedback from those interested in machine learning, explainable AI and outreach at the �鶹Լ��. In Spring 2021, we took the explainer into the (virtual) classroom, leading lessons with GCSE and A-Level Computer Science classes. We let the students try out the tool, and created a lesson plan around this to explain the basics of machine learning and its possible issues.

Despite being in their last week of working from home, the students were excited to hear from us and asked many fantastic questions. We got to see them wonder why the rare bird they picked wasn’t being recognized, and explain that the machine can only recognize what it is trained on. Later, we got them thinking about the different places that artificial intelligence is used by the �鶹Լ�� and elsewhere in their own lives. The feedback from the students and teachers was positive and we hope the explainer can be used for more outreach sessions in the future.

The explainer provided a way to explore many different ideas within the team's research topic of making machine learning technologies explainable and controllable. In a single prototype, we provided explanations for specific predictions, exposed properties of the training data, and described how ML works in the abstract. This work, and other projects, will help to inform how we try to explain and educate about machine learning in future projects.

This post is part of the Internet Research and Future Services section

Topics

Artificial Intelligence & Machine Learning

�鶹Լ��

Accessibility links

Research & Development