Latest Articles for Audio Processing

Sound Revolutionizing Audio Quality Measurement with PLCMOS

PLCMOS offers a new way to evaluate speech quality without human listeners.

2025-11-03T10:32:10+00:00 ― 5 min read

Sound Improving Speech Recognition with the Sidecar Approach

A new method combines speech recognition and speaker identification for overlapping speech.

2025-11-03T00:49:10+00:00 ― 5 min read

Audio and Speech Processing Advancements in Voice Conversion Technology

A new method for voice conversion improves clarity and adaptation.

2025-11-02T19:57:40+00:00 ― 6 min read

Computer Vision and Pattern Recognition Understanding Diffusion Models in Data Generation

Explore how diffusion models transform noise into valuable data outputs.

2025-11-02T15:17:28+00:00 ― 6 min read

Sound Advancements in Speech Separation with S4M

A new model improves voice isolation in noisy environments.

2025-11-02T10:14:40+00:00 ― 5 min read

Audio and Speech Processing DeCoR: A New Method for Audio Learning

DeCoR helps machines learn new sounds without forgetting old ones.

2025-11-01T21:17:20+00:00 ― 5 min read

Audio and Speech Processing Improving Speech Diversity in TTS Systems

A new method enhances the naturalness and variety of text-to-speech output.

2025-11-01T13:11:30+00:00 ― 5 min read

Audio and Speech Processing Advancements in Audio Classification with Treff Adapter

Treff adapter improves audio classification with limited labeled data.

2025-11-01T12:22:55+00:00 ― 5 min read

Audio and Speech Processing Advancements in Speech Emotion Recognition Using Speaker Embeddings

Research highlights effective methods for recognizing emotions in speech using embeddings.

2025-11-01T07:31:25+00:00 ― 6 min read

Computation and Language Analyzing Dialects Through Audio Processing

This research analyzes dialects using audio recordings to reveal their similarities.

2025-11-01T02:39:55+00:00 ― 6 min read

Sound Advancements in Audio Classification Techniques

A novel method enhances audio classification by learning new sounds efficiently.

2025-10-31T22:37:00+00:00 ― 4 min read

Audio and Speech Processing Improving Speech Disorder Alignment with New Techniques

A new method aligns disfluent speech with text efficiently.

2025-10-31T08:02:30+00:00 ― 5 min read

Sound Advancements in Weakly Supervised Keyword Spotting

A new method for training keyword spotting models using weak supervision in noisy environments.

2025-10-31T01:33:50+00:00 ― 6 min read

Sound MERT: A Self-Supervised Model for Music Understanding

MERT addresses music modeling challenges through innovative self-supervised learning techniques.

2025-10-30T23:56:40+00:00 ― 6 min read

Audio and Speech Processing AVLIT: Advancing Speech Separation in Noise

AVLIT model combines sound and video for better speech clarity in noisy settings.

2025-10-30T18:16:35+00:00 ― 6 min read

Sound Advancing Voice Activity Detection with SVVAD

Discover how SVVAD improves voice activity detection for better speaker verification.

2025-10-30T09:22:10+00:00 ― 5 min read

Sound UnDiff: A New Approach to Audio Clarity

UnDiff enhances audio quality using innovative speech restoration techniques.

2025-10-29T16:21:55+00:00 ― 5 min read

Sound MW-MAE: A New Approach to Audio Learning

Discover the innovative Multi-Window Masked Autoencoder method for enhanced audio processing.

2025-10-29T11:30:25+00:00 ― 5 min read

Sound Improving Audio Restoration with Visual Cues

A novel method merges audio and visual data to repair missing speech.

2025-10-29T10:41:50+00:00 ― 6 min read

Audio and Speech Processing Real-Time Tracking of Singing Voices with SingNet

SingNet improves beat tracking in singing voices using past data.

2025-10-28T04:44:15+00:00 ― 6 min read

Audio and Speech Processing Reevaluating Speaker Anonymization and Vocoder Impact

A fresh look at speaker anonymization and the crucial role of vocoders.

2025-10-27T18:12:40+00:00 ― 5 min read

Sound Addressing the Challenge of Fake Audio Detection

A new method aims to improve fake audio detection without losing past knowledge.

2025-10-25T16:00:30+00:00 ― 6 min read

Sound LinDiff: A Leap Forward in Speech Synthesis

New model LinDiff improves speech synthesis speed and quality.

2025-10-25T00:37:25+00:00 ― 4 min read

Sound Enhancing Speech Clarity in Noisy Environments

Techniques to improve speech recognition amidst background noise.

2025-10-24T16:50:20+00:00 ― 5 min read

Audio and Speech Processing HiddenSinger: A New Era in Singing Voice Synthesis

HiddenSinger improves singing voice quality using advanced AI techniques.

2025-10-24T14:54:25+00:00 ― 5 min read

Sound Advancements in Electrolaryngeal Voice Conversion Technology

New methods improve speech clarity for electrolarynx users.

2025-10-24T13:17:15+00:00 ― 6 min read

Computation and Language Advancements in Automatic Speech Recognition for Norwegian Languages

Recent research improves ASR models for Norwegian, enhancing performance in Bokmål and Nynorsk.

2025-10-23T21:10:00+00:00 ― 4 min read

Sound Advancements in Speech Quality Improvement

Gesper framework enhances speech clarity in noisy environments.

2025-10-22T19:59:30+00:00 ― 5 min read

Sound A Simplified Approach to Hybrid HMM for ASR

This article discusses a new method for building efficient ASR systems.

2025-10-22T14:19:25+00:00 ― 5 min read

Sound Improving Audio Processing with SFI Layers

New algorithms enhance audio processing performance across varying sample rates.

2025-10-21T00:16:00+00:00 ― 5 min read

Sound Advances in Multitrack Music Transcription with Perceiver TF

A new model improves music transcription accuracy for multiple instruments.

2025-10-20T12:07:15+00:00 ― 5 min read

Sound Bringing AI to Music Creation on Bela

A guide to using AI models for music on the Bela platform.

2025-10-19T22:21:20+00:00 ― 5 min read

Sound Advancements in Voice Conversion Technology

A new model improves voice conversion by simplifying speech separation techniques.

2025-10-19T12:38:20+00:00 ― 6 min read

Sound Converting Mono Audio to Immersive Stereo

A new method transforms mono signals into engaging stereo experiences.

2025-10-17T01:31:45+00:00 ― 5 min read

Sound Addressing the Challenge of Audio Deepfakes

A new system enhances detection of manipulated audio through innovative techniques.

2025-10-16T15:00:10+00:00 ― 5 min read

Computation and Language Introducing LyricWhiz: Transforming Lyric Transcription

LyricWhiz combines advanced models to improve lyric transcription accuracy across languages.

2025-10-15T09:51:10+00:00 ― 5 min read

Machine Learning Addressing Dataset Imbalance in Audio Classification

This article discusses challenges and techniques for managing dataset imbalance in audio classification.

2025-10-15T00:08:10+00:00 ― 6 min read

Sound Advancements in Speech Recognition with Whisper-AT

Whisper-AT combines speech recognition and audio tagging for improved performance.

2025-10-12T08:10:05+00:00 ― 5 min read

Computation and Language Improving Speaker Diarization for Media Localization

A new method enhances speaker identification in film and TV localization.

2025-10-12T04:50:54+00:00 ― 5 min read

Sound Advancements in Automatic Piano Transcription

New method improves accuracy in turning piano audio into sheet music.

2025-10-11T14:21:15+00:00 ― 4 min read