This study examines how gestures affect learning from virtual agents.
― 6 min read
Cutting edge science explained simply
This study examines how gestures affect learning from virtual agents.
― 6 min read
Learn about online speaker diarization and its significance in various applications.
― 6 min read
New benchmark tool assesses discrete audio tokens for various speech processing tasks.
― 8 min read
A new method for music generation using self-similarity matrices and attention systems.
― 7 min read
New techniques improve guitar amplifier modeling using unpaired data and GANs.
― 7 min read
A new method for understanding how audio models make predictions.
― 5 min read
Introducing spatial voice conversion to enhance audio realism and immersion.
― 6 min read
Research explores how speech analysis can predict suicide risk, considering gender differences.
― 5 min read
This paper presents a system to create visuals that respond to music.
― 7 min read
A new system helps robots learn tasks using audio from real-life demonstrations.
― 7 min read
New methods improve accuracy in recognizing overlapping sounds across diverse audio sources.
― 6 min read
A new method combines acoustic features and confidence scores for better error correction.
― 5 min read
SecureSpectra offers a new way to safeguard audio identity against deepfake threats.
― 5 min read
Combining physics and geometry for improved acoustic scattering predictions.
― 5 min read
A new system for accurate and fast speech translation across multiple languages.
― 6 min read
A simple method to create voices and control emotions in speech synthesis.
― 5 min read
Improving MMDenseNet for quick and efficient music separation.
― 5 min read
A new method improves machine dialogue through pseudo-stereo data.
― 6 min read
This study presents a dataset and method to enhance Chinese ASR accuracy using Pinyin.
― 7 min read
Innovative techniques improve loudspeaker design and sound direction.
― 4 min read
This study focuses on improving detection of deepfake audio using advanced methods.
― 5 min read
Using visual interfaces and models to enhance music generation.
― 5 min read
A new framework for creating synchronized sound effects in videos.
― 6 min read
A study on enhancing audio segmentation by integrating speaker embeddings.
― 5 min read
This article introduces a more efficient TTS system that adapts to speakers.
― 5 min read
New methods improve speech models for languages with limited data.
― 5 min read
Understanding uncertainty boosts the accuracy of emotion recognition in real-world scenarios.
― 6 min read
A new method enhances phoneme alignment accuracy for various speech applications.
― 5 min read
A study on translating Nigerian English for better accessibility in Nollywood films.
― 6 min read
This article presents a dual encoder system for effective speech representation learning.
― 6 min read
MelodyT5 offers a new approach to music creation and analysis using symbolic notation.
― 6 min read
GTZAN-synth dataset leverages synthetic music for better music tagging systems.
― 5 min read
MelodyLM simplifies music creation using text and voice inputs.
― 6 min read
SAVE model enhances audio-visual segmentation with efficiency and precision.
― 6 min read
New model improves speech-to-text translation using large language models.
― 6 min read
Research presents a model linking sound recordings to mouth movements for speech.
― 6 min read
This article discusses how Wav2Vec2.0 processes speech sounds using phonology.
― 5 min read
Improving speaker anonymization technology for nine languages to ensure privacy.
― 5 min read
Exploring technology's role in enhancing fish farming efficiency and welfare.
― 5 min read
A novel approach combines voice analysis with privacy protection for dementia detection.
― 6 min read