Electrical Engineering and Systems Science - Audio and Speech Processing

RSS

Audio and Speech Processing Addressing the Challenge of Audio Deepfakes

This study investigates the effectiveness of multilingual models in detecting audio deepfakes.

2025-08-14T00:40:35+00:00 ― 5 min read

Sound Measuring Adherence in Generative Music Models

A new approach to evaluate how well music follows audio prompts.

2025-08-13T23:03:25+00:00 ― 8 min read

Computer Vision and Pattern Recognition Introducing the 360+x Dataset for Enhanced Scene Understanding

A new dataset improves how robots interpret real-world environments.

2025-08-13T18:11:55+00:00 ― 6 min read

Sound New Approach to Audio Separation Using Language

This method improves audio separation by combining language descriptions with sound analysis.

2025-08-13T14:57:35+00:00 ― 6 min read

Computer Vision and Pattern Recognition Introducing UniAV: A Unified Approach to Video Localization

UniAV combines action localization, sound detection, and audio-visual event localization for better video understanding.

2025-08-13T10:06:05+00:00 ― 7 min read

Audio and Speech Processing CLaM-TTS: Advancing Text-to-Speech Technology

CLaM-TTS improves speech synthesis using advanced techniques for better efficiency and quality.

2025-08-13T08:28:55+00:00 ― 6 min read

Social and Information Networks Analyzing Music Through Graphs

Graphs allow for new insights into music structure and relationships.

2025-08-13T03:09:57+00:00 ― 5 min read

Audio and Speech Processing Improving Text-to-Speech with RALL-E

RALL-E enhances text-to-speech synthesis for clearer, more natural speech.

2025-08-13T01:11:40+00:00 ― 5 min read

Sound MuPT: Advancing Music Generation with ABC Notation

MuPT utilizes ABC notation for effective music generation with AI.

2025-08-12T09:00:00+00:00 ― 5 min read

Audio and Speech Processing Advancing Audio Learning with M2D and M2D-X

New methods improve audio representation through self-supervised learning techniques.

2025-08-12T07:22:50+00:00 ― 6 min read

Computer Vision and Pattern Recognition Introducing PEAVS: A New Way to Measure Audio-Visual Sync

PEAVS analyzes how well audio and video work together for better viewer experiences.

2025-08-12T03:19:55+00:00 ― 7 min read

Audio and Speech Processing Improving Sound Field Reconstruction with AI

A method using AI enhances sound representation in various environments.

2025-08-12T00:54:10+00:00 ― 6 min read

Classical Physics Understanding Spectral Moments in Electromagnetic Testing

Explore the role of spectral moments in reverberation chamber testing and the impact of noise.

2025-08-12T00:28:33+00:00 ― 5 min read

Audio and Speech Processing Efficient Real-Time Piano Transcription Model

A new system for accurate and lightweight real-time piano transcription.

2025-08-12T00:05:35+00:00 ― 5 min read

Computer Vision and Pattern Recognition Any2Point: Bridging 3D Understanding in AI Models

A new framework enhances AI's grasp of 3D spaces.

2025-08-11T19:14:05+00:00 ― 7 min read

Sound Advancements in Voice Attribute Editing Technology

New model allows precise control of voice qualities while retaining content.

2025-08-11T18:25:30+00:00 ― 4 min read

Audio and Speech Processing Evaluating Speech Processing Models with SUPERB

A new framework for assessing foundation models in speech tasks.

2025-08-11T09:31:05+00:00 ― 8 min read

Sound Advancing AI in Text-to-Audio Generation

A study on improving audio outputs from text prompts using preference optimization.

2025-08-11T07:05:20+00:00 ― 6 min read

Sound Advancements in Automated Music Generation Using AI

Exploring recent developments in AI tools for music creation.

2025-08-10T16:30:50+00:00 ― 5 min read

Signal Processing Combining Active and Passive Acoustic Sensing in Robotics

Research explores merging sound techniques for better robotic navigation and mapping.

2025-08-10T13:16:30+00:00 ― 8 min read

Sound Improving Music Tagging with Musical Word Embedding

A new approach enhances music tagging and retrieval by combining general language and music terms.

2025-08-10T06:47:50+00:00 ― 10 min read

Audio and Speech Processing FlashSpeech: A Leap in Speech Synthesis

FlashSpeech offers rapid, high-quality speech synthesis solutions.

2025-08-10T03:33:30+00:00 ― 6 min read

Sound Advancements in Deepfake Detection with RAD Framework

A new method improves detection of audio deepfakes using similar sample references.

2025-08-10T01:07:45+00:00 ― 6 min read

Sound Measuring Virtuosity in Electric Guitar Performance

This study analyzes sound signals to measure virtuosity among electric guitarists.

2025-08-09T18:39:05+00:00 ― 5 min read

Sound Navigating Vulnerabilities in Speech Emotion Recognition

This study examines the weaknesses of SER models against adversarial attacks across languages.

2025-08-08T21:35:55+00:00 ― 5 min read

Audio and Speech Processing Advancing Audio-Visual Target Speaker Extraction with SEANet

SEANet improves speaker isolation by reducing noise in audio processing.

2025-08-08T20:47:20+00:00 ― 6 min read

Sound SemantiCodec: The Next Step in Audio Technology

A new audio codec offering high-quality compression and rich semantic content.

2025-08-08T19:10:10+00:00 ― 6 min read

Sound New Tool Analyzes Audio and Video Content

A tool that combines audio and video analysis to identify events.

2025-08-08T12:41:30+00:00 ― 5 min read

Audio and Speech Processing Measuring Sound Absorption: A New Method

A method to measure how materials absorb sound effectively.

2025-08-08T10:46:48+00:00 ― 5 min read

Audio and Speech Processing Advancing ASR: A New Learning Approach

A two-stage active learning method enhances speech recognition accuracy with less data.

2025-08-08T02:09:55+00:00 ― 5 min read

Audio and Speech Processing Advancements in Hearing Aid Technology with Deep Learning

New methods improve speech clarity in hearing aids through deep learning techniques.

2025-08-08T01:21:20+00:00 ― 6 min read

Sound Sound Source Localization: Techniques and Applications

Learn about sound localization techniques and their uses in various fields.

2025-08-07T23:44:10+00:00 ― 4 min read

Sound Addressing the Rise of Deepfake Audio Detection

New dataset and methods improve detection of ALM-generated audio deepfakes.

2025-08-07T06:43:55+00:00 ― 5 min read

Computation and Language Assessing ASR Systems for Stuttered Speech

This study evaluates ASR systems' performance with individuals who stutter.

2025-08-07T04:18:10+00:00 ― 7 min read

Computation and Language New Attack Method Silences ASR Systems

A universal audio clip can mute advanced ASR models like Whisper.

2025-08-07T03:29:35+00:00 ― 6 min read

Sound New Device Enhances Conversation in Noisy Environments

A device helps focus on specific voices in crowded places.

2025-08-06T19:23:45+00:00 ― 6 min read

Sound Advancing Audio Editing with Diffusion Models

A new method improves audio editing using diffusion models for precise changes.

2025-08-06T16:09:25+00:00 ― 5 min read

Computation and Language Integrating Audio and Language Models: SpeechVerse

SpeechVerse bridges audio understanding and language processing for improved human-computer interaction.

2025-08-06T06:26:25+00:00 ― 6 min read

Sound Evaluating Bias in Voice Assistant Technology

New dataset highlights performance gaps among demographic groups using voice assistants.

2025-08-06T02:23:30+00:00 ― 6 min read

Computation and Language Examining the Safety of Speech Language Models

This article investigates vulnerabilities in speech models and ways to enhance their security.

2025-08-05T23:09:10+00:00 ― 5 min read