Electrical Engineering and Systems Science - Audio and Speech Processing

RSS

Machine Learning Challenges in Audio Watermarking Techniques

Investigating vulnerabilities in audio watermarking methods against real-world threats.

2025-07-30T13:18:20+00:00 ― 7 min read

Sound Introducing PianoMotion10M: A New Dataset for Piano Learning

PianoMotion10M provides detailed hand movements to aid piano learners.

2025-07-30T01:09:35+00:00 ― 6 min read

Computer Vision and Pattern Recognition Advancements in Action Sound Generation from Video

A new model improves sound matching with visual actions in videos.

2025-07-29T23:32:25+00:00 ― 11 min read

Sound Advancements in 3D Audio Rendering with AVGS

New model improves realistic audio experiences in virtual environments.

2025-07-29T20:18:05+00:00 ― 7 min read

Audio and Speech Processing Using Audio Technology for Pedestrian Tracking

This study examines audio methods for tracking pedestrian movement in urban areas.

2025-07-29T17:52:20+00:00 ― 7 min read

Audio and Speech Processing Advancing Foley Audio with the MINT Dataset

A new dataset improves the creation of foley audio for multimedia content.

2025-07-29T17:03:45+00:00 ― 6 min read

Audio and Speech Processing Advancements in Automatic Speech Recognition with Dynamic TTA

New methods enhance speech recognition in noisy environments using adaptive techniques.

2025-07-29T13:49:25+00:00 ― 6 min read

Sound SPEAR: A New Approach to Sound Analysis

SPEAR predicts sound behavior in 3D spaces using minimal data collection.

2025-07-29T10:35:05+00:00 ― 6 min read

Computation and Language Advancements in Code-Switching Speech Translation

A new method improves translating mixed-language speech into English.

2025-07-29T09:46:30+00:00 ― 5 min read

Sound Improving Speaker Verification in Radio Communications

A new method enhances speaker verification accuracy in challenging radio environments.

2025-07-29T08:57:55+00:00 ― 6 min read

Sound Improving Backdoor Attacks in Speech Recognition

New method targets rhythm changes for stealthy speech attacks.

2025-07-29T08:09:20+00:00 ― 5 min read

Sound GAMA: A New Model for Sound Understanding

GAMA improves audio processing by merging sound and language insights.

2025-07-29T04:55:00+00:00 ― 5 min read

Audio and Speech Processing AV-CrossNet: Improving Speech Recognition in Noise

A new system helps separate speech from noise for clearer communication.

2025-07-29T03:17:50+00:00 ― 6 min read

Audio and Speech Processing GigaSpeech 2: A New Dataset for Speech Recognition

GigaSpeech 2 offers a vast dataset for low-resource languages to improve speech recognition.

2025-07-29T02:29:15+00:00 ― 5 min read

Audio and Speech Processing Revolutionizing Text-to-Speech with DiTTo-TTS

A new model enhances text-to-speech technology with efficiency and adaptability.

2025-07-29T01:40:40+00:00 ― 6 min read

Audio and Speech Processing New Framework for Clear Speech Production

A novel method optimizing speech analysis and synthesis using vocal tract movements.

2025-07-28T20:49:10+00:00 ― 7 min read

Human-Computer Interaction The Impact of Gestures in Virtual Explanations

This study examines how gestures affect learning from virtual agents.

2025-07-28T19:12:00+00:00 ― 6 min read

Audio and Speech Processing DExter: A New Approach to Expressive Piano Performance

DExter uses AI to create expressive piano music from written scores.

2025-07-28T10:17:35+00:00 ― 5 min read

Sound Real-Time Speaker Diarization: An Overview

Learn about online speaker diarization and its significance in various applications.

2025-07-28T06:14:40+00:00 ― 6 min read

Sound Evaluating Discrete Audio Tokens for Speech Tasks

New benchmark tool assesses discrete audio tokens for various speech processing tasks.

2025-07-28T04:37:30+00:00 ― 8 min read

Sound Advancements in Structured Music Generation with SING

A new method for music generation using self-similarity matrices and attention systems.

2025-07-28T01:23:10+00:00 ― 7 min read

Sound Advancements in Audio Modeling with GANs

New techniques improve guitar amplifier modeling using unpaired data and GANs.

2025-07-27T22:08:50+00:00 ― 7 min read

Audio and Speech Processing Advances in Cross-Lingual Voice Conversion

A new method improves voice conversion between languages while preserving speaker traits.

2025-07-27T15:40:10+00:00 ― 4 min read

Sound Analyzing Audio Models with Network Dissection

A new method for understanding how audio models make predictions.

2025-07-27T12:25:50+00:00 ― 5 min read

Sound Advancing Voice Conversion with Spatial Awareness

Introducing spatial voice conversion to enhance audio realism and immersion.

2025-07-27T01:54:15+00:00 ― 6 min read

Audio and Speech Processing WavRx: A New Model for Speech-Based Health Diagnostics

WavRx analyzes speech for health while protecting privacy, showing promising diagnostic results.

2025-07-26T21:51:20+00:00 ― 7 min read

Computation and Language Analyzing Speech to Assess Suicide Risk

Research explores how speech analysis can predict suicide risk, considering gender differences.

2025-07-26T13:45:30+00:00 ― 5 min read

Sound A New Tool for Music Visualization

This paper presents a system to create visuals that respond to music.

2025-07-26T10:31:10+00:00 ― 7 min read

Robotics Learning with Sound: A New Era for Robots

A new system helps robots learn tasks using audio from real-life demonstrations.

2025-07-26T09:42:35+00:00 ― 7 min read

Audio and Speech Processing Advancements in Sound Event Detection for 2024

New methods improve accuracy in recognizing overlapping sounds across diverse audio sources.

2025-07-26T07:16:50+00:00 ― 6 min read

Computation and Language Improving Speech Error Correction in ASR Systems

A new method combines acoustic features and confidence scores for better error correction.

2025-07-25T20:45:15+00:00 ― 5 min read

Cryptography and Security Protecting Voices in the Age of Deepfakes

SecureSpectra offers a new way to safeguard audio identity against deepfake threats.

2025-07-25T16:42:20+00:00 ― 5 min read

Machine Learning Advancements in Predicting Acoustic Scattering with PGI-DeepONet

Combining physics and geometry for improved acoustic scattering predictions.

2025-07-25T15:54:09+00:00 ― 5 min read

Computation and Language Advancements in Real-Time Speech Translation Systems

A new system for accurate and fast speech translation across multiple languages.

2025-07-25T15:05:10+00:00 ― 6 min read

Sound New Method for Voice Creation in Speech Synthesis

A simple method to create voices and control emotions in speech synthesis.

2025-07-25T14:16:35+00:00 ― 5 min read

Sound Advancements in Real-Time Music Source Separation

Improving MMDenseNet for quick and efficient music separation.

2025-07-25T12:39:25+00:00 ― 5 min read

Audio and Speech Processing New Method for Clearer Sound in Noisy Environments

A novel approach to enhance sound clarity using advanced deep learning techniques.

2025-07-25T11:02:15+00:00 ― 7 min read

Audio and Speech Processing Improving Speaker Detection with Audio and Visual Data

A system combines audio and video to enhance speaker detection accuracy.

2025-07-25T10:13:40+00:00 ― 5 min read

Computation and Language Advancements in Spoken Dialogue Systems

A new method improves machine dialogue through pseudo-stereo data.

2025-07-25T08:36:30+00:00 ― 6 min read

Computation and Language Improving Chinese Speech Recognition Through Pinyin Regularization

This study presents a dataset and method to enhance Chinese ASR accuracy using Pinyin.

2025-07-25T07:47:55+00:00 ― 7 min read