A new model identifies funny moments in videos using visual, audio, and text data.

2025-08-30T23:09:25+00:00 ― 6 min read

Audio and Speech Processing Advancements in Dielectric Elastomers for Technology

Dielectric elastomers convert electrical energy into mechanical motion, offering diverse applications.

2025-08-30T20:43:40+00:00 ― 7 min read

Computation and Language Using ASR Technology to Aid Dementia Diagnosis

ASR transcripts with errors can help identify Alzheimer's more accurately.

2025-08-30T16:40:45+00:00 ― 7 min read

Computation and Language Introducing ELLA-V: A New Chapter in Speech Synthesis

ELLA-V enhances text-to-speech quality and control, surpassing previous models.

2025-08-30T01:17:40+00:00 ― 5 min read

Quantitative Methods New Method Enhances Acoustic Monitoring of Wildlife

A new approach improves animal call detection accuracy without arbitrary thresholds.

2025-08-29T23:46:39+00:00 ― 6 min read

Computation and Language Advancing Speech Classification with Multimodal Data

A new model integrates audio and text for better speech classification.

2025-08-29T18:49:00+00:00 ― 6 min read

Sound NOTSOFAR-1 Challenge: Advancing Meeting Transcription Technology

A new initiative to improve transcription technology for meetings in large rooms.

2025-08-29T16:23:15+00:00 ― 7 min read

Computation and Language Advancements in Speech Recognition Error Correction

New methods enhance accuracy in noisy speech recognition using large language models.

2025-08-29T01:48:45+00:00 ― 6 min read

Sound Understanding Laying Hen Vocalizations for Better Farming

Analyzing hen sounds helps improve their health and farm productivity.

2025-08-29T00:11:35+00:00 ― 7 min read

Human-Computer Interaction Sound Unblending: A New Tool for Mixed Reality

A method to help the visually impaired recognize sounds in mixed reality.

2025-08-28T20:57:15+00:00 ― 5 min read

Audio and Speech Processing Addressing Speech Technology Challenges for Under-Resourced Languages

This article discusses solutions for speech applications in languages with limited transcribed data.

2025-08-28T18:31:30+00:00 ― 6 min read

Machine Learning Advancements in Sound Classification Using Projected Belief Networks

Researchers combine generative and discriminative methods for improved sound classification.

2025-08-28T16:05:45+00:00 ― 6 min read

Cryptography and Security Strengthening Speaker Verification Against Spoofing Attacks

A new model improves voice identification security and resists voice spoofing.

2025-08-28T15:17:10+00:00 ― 5 min read

Machine Learning Enhancing Attention Mechanisms with GAAM

A look at Gaussian Adaptive Attention for improved AI performance.

2025-08-28T14:28:35+00:00 ― 6 min read

Audio and Speech Processing Deep Learning and Language Rhythm Analysis

Research shows deep learning improves our grasp of language rhythm.

2025-08-28T12:51:25+00:00 ― 6 min read

Audio and Speech Processing Advancements in Multimodal Processing with CoAVT

CoAVT integrates audio, visual, and text data for enhanced understanding.

2025-08-28T12:02:50+00:00 ― 7 min read

Audio and Speech Processing Advancements in Speaker Diarization with E-SHARC Method

E-SHARC improves speaker identification in various audio environments.

2025-08-28T06:22:45+00:00 ― 6 min read

Sound MoodLoopGP: Crafting Emotions in Loopable Music

A new system generates music tailored to express happiness and sadness.

2025-08-28T04:45:35+00:00 ― 5 min read

Sound Navigating Influences in Generative Music Models

A guide to understanding music similarity in generative models.

2025-08-27T22:16:55+00:00 ― 9 min read

Audio and Speech Processing Techniques for Sound Reproduction and Evaluation

A study on sound synthesis and its evaluation in controlled environments.

2025-08-27T16:36:50+00:00 ― 4 min read

Audio and Speech Processing Advancements in Sound Source Localization Techniques

A new method enhances accuracy in locating moving sound sources using microphone arrays.

2025-08-27T01:13:45+00:00 ― 6 min read

Audio and Speech Processing A New Approach to Audio Quality Assessment with PAM

PAM offers a novel way to measure audio quality without needing reference recordings.

2025-08-26T21:10:50+00:00 ― 6 min read

Sound Audio Flamingo: A New Model for Sound Understanding

Audio Flamingo excels in listening, conversing, and adapting to new audio tasks.

2025-08-26T16:19:20+00:00 ― 5 min read

Audio and Speech Processing Advancing Spatial Sound Reasoning in Machines

A new model enhances machines' understanding of spatial audio.

2025-08-26T15:30:45+00:00 ― 5 min read

Computation and Language Enhancing Real-Time Speech Recognition Systems

A new model improves speech-to-text efficiency in real-time applications.

2025-08-26T11:27:50+00:00 ― 6 min read

Computation and Language Reevaluating the Role of Sounds in Language Relationships

This study assesses sounds versus words in reconstructing language family trees.

2025-08-26T03:22:00+00:00 ― 6 min read

Machine Learning Advancements in AI Music Generation

New model improves music creation using user feedback.

2025-08-25T21:41:55+00:00 ― 7 min read

Audio and Speech Processing Reborn: A New Era in Unsupervised ASR

Reborn offers innovative solutions for automatic speech recognition without labeled data.

2025-08-25T19:16:10+00:00 ― 6 min read

Audio and Speech Processing Transforming Sounds: The Listen, Chat, and Edit Tool

A new tool helps users modify sounds easily through simple text instructions.

2025-08-25T17:39:00+00:00 ― 8 min read

Computation and Language Advancements in Language Technology

A new model merges spoken and written language for improved communication.

2025-08-25T03:53:05+00:00 ― 6 min read

Computation and Language Advancements in Spoken Dialog Technology

A look at new models for natural spoken responses.

2025-08-25T03:04:30+00:00 ― 6 min read

Computation and Language Enhancing Speech Recognition with Acoustic Data

A new method integrates acoustic information into language models for better speech recognition.

2025-08-25T02:15:55+00:00 ― 8 min read

Human-Computer Interaction Transforming Cancer Understanding Through Music

Using music to explain cancer can enhance understanding and engagement.

2025-08-25T01:27:20+00:00 ― 6 min read

Sound Understanding Sound Source Localization Techniques

Learn how sound localization identifies the source of sounds using advanced techniques.

2025-08-25T00:38:45+00:00 ― 4 min read

Sound Capturing Speech Rhythm: A New Method

A new approach to synthesize voices with improved rhythm accuracy.

2025-08-24T23:50:10+00:00 ― 8 min read

Computation and Language Enhancing Medical Transcription with AI

LLMs improve accuracy in medical transcriptions, benefiting patient care.

2025-08-24T16:32:55+00:00 ― 6 min read

Audio and Speech Processing Adapting Melody Extraction for Diverse Music Styles

A method for improving melody extraction across different music styles with minimal human effort.

2025-08-24T15:44:20+00:00 ― 8 min read

Audio and Speech Processing Improving Speaker Diarization with Multi-Microphone Approaches

New methods enhance voice activity and overlap detection in speaker diarization.

2025-08-24T13:18:35+00:00 ― 6 min read

Audio and Speech Processing Improving Depression Detection with Speech Analysis

New method integrates speech signals for enhanced depression detection.

2025-08-23T18:41:10+00:00 ― 4 min read

Audio and Speech Processing Creating Even Sound Fields: Techniques and Insights

This article discusses methods to create immersive sound fields using various arrangements.

2025-08-23T17:04:00+00:00 ― 5 min read

Computer Science - Sound