Using AI to Classify Bird Sounds Amid Noise
Generative AI helps identify bird calls in noisy environments for better conservation.
Anthony Gibbons, Emma King, Ian Donohue, Andrew Parnell
― 6 min read
Table of Contents
- The Challenge of Identifying Bird Sounds
- What is Data Augmentation?
- Enter Generative AI Models
- The Data Collection Dilemma
- Building a Bird Sound Dataset
- Creating Spectrograms
- Generating Artificial Sounds
- Evaluating the Synthetic Sounds
- Training the Classifiers
- Potential Impacts of This Research
- Future Directions
- Conclusion
- Original Source
- Reference Links
In today’s world, technology has a knack for helping us understand nature better. One cool innovation is using generative AI to help classify bird sounds. Think of this as a high-tech version of trying to recognize the call of a blue jay from an audio clip. The twist? Sometimes, the sounds come from noisy places, like wind farms, where turbines spin and rustle the leaves.
The Challenge of Identifying Bird Sounds
Bird monitoring is crucial for checking how our ecosystems are doing. The variety of bird species gives us clues about environmental health. Birds help manage pests, spread seeds, and even pollinate plants. But how do we tell one bird from another when they sound so similar? Enter audio monitoring!
Traditionally, researchers would use folks with sharp ears to listen to hours of recordings and identify bird calls. This method is not only time-consuming but also costly, as it requires expert knowledge. Nowadays, many researchers have turned to computer programs that can listen and classify bird calls for them. But there’s a catch. The accuracy of these programs can sometimes be shaky, especially when there’s a lot of background noise.
Data Augmentation?
What isHere’s where data augmentation steps in, like a friendly sidekick. Imagine you want to train a computer program to recognize bird sounds. You need lots of examples, or data. Since obtaining expert-annotated data can be tough, data augmentation helps by artificially increasing the variety of sounds available. It’s kind of like making a smoothie, where you mix fruits to create something deliciously different.
But here’s the rub: the techniques that work great for photos, like flipping or rotating, don’t always translate well to sound. After all, can you really flip a bird call?
Enter Generative AI Models
To tackle this issue, scientists started using generative AI models. These models can create new sounds that mimic real ones. Two popular methods include Auxiliary Classifier Generative Adversarial Networks (ACGAN) and Denoising Diffusion Probabilistic Models (DDPMs).
Auxiliary Classifier Generative Adversarial Networks (ACGAN)
Think of ACGANs as a pair of rivals in a game. One part, the generator, tries to create convincing bird sounds, while the other part, the discriminator, tries to tell the real sounds from the fake ones. They get better through competition. By adding class information, or what kind of bird sound it is, ACGANs can make more realistic examples.
Denoising Diffusion Probabilistic Models (DDPM)
On the other hand, DDPMs take a different approach. They start with random noise and gradually refine it. Picture it as starting with a rough draft of a drawing and slowly adding detail until it resembles the final masterpiece. Through a series of steps, they create high-quality images resembling spectrograms, which visually represent sound.
The Data Collection Dilemma
For their research, scientists collected audio from five wind farm locations in Ireland. Since these sites can be noisy, separating the bird sounds from all that background racket is like trying to pick out a song on a crowded bus. The team recorded around 640 hours of audio. That’s a lot of listening!
They then fed the audio into BirdNET, a clever classification program, to identify the sounds. After running their analysis, they ended up with over 67,000 detections! But the catch is, they only focused on birds identified with a high level of confidence.
Building a Bird Sound Dataset
Using the identified sounds, the team filtered the data to include only those bird calls with enough examples. Ultimately, they had around 8,248 audio clips of 27 different bird species. Those clips were then used to train the Classification Models, with some labeled as training and others as validation data.
Creating Spectrograms
To turn these audio clips into something the generative models could handle, the team converted the sounds into mel spectrograms. This visual representation shows how the sound energy is distributed over time and frequency. It’s like turning music into a colorful wave painting.
Generating Artificial Sounds
Once the real data was set, the team set out to generate more samples using ACGANs and DDPMs. Initially, they found that while ACGAN generated samples with some recognizable features, they often focused too much on background noise. Meanwhile, the sounds created by the DDPMs were more varied and clear.
Evaluating the Synthetic Sounds
To determine how well each method performed, the scientists used different metrics, namely the Inception Score (IS) and the Fréchet Inception Distance (FID). Higher IS means the generated sound is clearer and more diverse, while lower FID suggests it resembles the real thing more closely.
Training the Classifiers
After determining the quality of the generated sounds, the team then trained various classification models with the real and synthetic data. They used recognized models like MobileNetV2 and ResNet18. The goal was to see how the addition of synthetic sounds influenced the models’ performance.
The results were promising! When they added synthetic DDPM samples to the training data, the performance improved. The classifiers had an accuracy of 92.6% on the validation set. This was a significant jump from the performance when only using the real data.
Potential Impacts of This Research
The implications of this research are exciting. By enhancing bird sound classification with synthetic data, researchers can improve conservation efforts. Better identification leads to more effective monitoring of bird species, aiding in biodiversity preservation.
Future Directions
While the study showed great promise, the scientists acknowledged some limitations. They noted the need for automatic data pruning to filter out less convincing synthetic samples. Furthermore, they wanted more controllable generation to create specific types of sounds based on different parameters.
Conclusion
In a nutshell, this study demonstrates that generative AI can significantly aid in the classification of bird sounds, particularly in challenging environments. By enhancing data collection methods with synthetic sounds, researchers can better understand and protect bird species.
And to bring it all back home—if computers can help us sort out the symphonies of nature, maybe the next time you hear a bird call in your backyard, you can be a little less bird-brained and a little more bird-wise!
Original Source
Title: Generative AI-based data augmentation for improved bioacoustic classification in noisy environments
Abstract: 1. Obtaining data to train robust artificial intelligence (AI)-based models for species classification can be challenging, particularly for rare species. Data augmentation can boost classification accuracy by increasing the diversity of training data and is cheaper to obtain than expert-labelled data. However, many classic image-based augmentation techniques are not suitable for audio spectrograms. 2. We investigate two generative AI models as data augmentation tools to synthesise spectrograms and supplement audio data: Auxiliary Classifier Generative Adversarial Networks (ACGAN) and Denoising Diffusion Probabilistic Models (DDPMs). The latter performed particularly well in terms of both realism of generated spectrograms and accuracy in a resulting classification task. 3. Alongside these new approaches, we present a new audio data set of 640 hours of bird calls from wind farm sites in Ireland, approximately 800 samples of which have been labelled by experts. Wind farm data are particularly challenging for classification models given the background wind and turbine noise. 4. Training an ensemble of classification models on real and synthetic data combined gave 92.6% accuracy (and 90.5% with just the real data) when compared with highly confident BirdNET predictions. 5. Our approach can be used to augment acoustic signals for more species and other land-use types, and has the potential to bring about a step-change in our capacity to develop reliable AI-based detection of rare species. Our code is available at https://github.com/gibbona1/ SpectrogramGenAI.
Authors: Anthony Gibbons, Emma King, Ian Donohue, Andrew Parnell
Last Update: 2024-12-02 00:00:00
Language: English
Source URL: https://arxiv.org/abs/2412.01530
Source PDF: https://arxiv.org/pdf/2412.01530
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to arxiv for use of its open access interoperability.
Reference Links
- https://github.com/gibbona1/SpectrogramGenAI
- https://doi.org/10.1111/j.1365-2664.2011.02094.x
- https://doi.org/10.1002/ecs2.2673
- https://doi.org/10.1111/2041-210X.12060
- https://doi.org/10.1111/2041-210X.13101
- https://doi.org/10.1007/s11284-017-1509-5
- https://doi.org/10.1111/2041-210X.14003
- https://doi.org/10.1111/2041-210X.13436
- https://doi.org/10.1111/2041-210X.14239
- https://doi.org/10.1016/j.ecoinf.2023.102321
- https://doi.org/10.1016/j.ifacol.2019.12.406
- https://doi.org/10.1016/j.neunet.2020.09.016
- https://doi.org/10.3390/biology12060854
- https://doi.org/10.1111/2041-210X.13334
- https://doi.org/10.1111/2041-210X.14125
- https://arxiv.org/abs/2006.11239
- https://doi.org/10.48550/arXiv.2210.04133
- https://doi.org/10.1016/j.imu.2024.101575
- https://arxiv.org/abs/1711.00937