Electrical Engineering and Systems Science - Audio and Speech Processing

RSS

Sound Advancements in Speech Inpainting Techniques

Learn how speech inpainting is restoring audio quality in various fields.

2025-08-02T18:13:45+00:00 ― 6 min read

Latest Articles

Audio and Speech Processing Advancements in Raga Identification with New Dataset

A new dataset enhances the study of Raga identification in Indian music.

2025-08-01T12:16:10+00:00 ― 5 min read

Audio and Speech Processing Advancements in Text-to-Speech Technology

Seed-TTS creates lifelike speech from text for various applications.

2025-08-01T10:39:00+00:00 ― 5 min read

Audio and Speech Processing Advancements in Speech-to-Singing Technology

New method improves conversion from speech to singing using self-supervised learning.

2025-08-01T09:50:25+00:00 ― 7 min read

Computation and Language StreamSpeech: A New Way to Translate Speech

StreamSpeech improves real-time speech translation with efficiency and quality.

2025-08-01T03:21:45+00:00 ― 5 min read

Audio and Speech Processing Introducing the 4D Model in Speech Recognition

A new model improves speech recognition using multiple decoding methods.

2025-08-01T01:44:35+00:00 ― 6 min read

Computation and Language Improving Arabic Speech Recognition Through Knowledge Distillation

A study on enhancing ASR for Arabic dialects using efficient model techniques.

2025-07-31T23:18:50+00:00 ― 5 min read

Computation and Language BLSP-Emo: A New Step in Empathetic AI

Introducing BLSP-Emo, a model that understands speech and emotions for better interactions.

2025-07-31T21:41:40+00:00 ― 5 min read

Human-Computer Interaction Revisiting Data Interpretation: Sound and Visuals Study

A recent study replicates key findings on data interpretation using sound and visuals.

2025-07-31T20:04:30+00:00 ― 6 min read

Computer Vision and Pattern Recognition Combining Text and Images for Music Generation

New model generates music using both text and visual information.

2025-07-31T12:47:15+00:00 ― 7 min read

Computer Vision and Pattern Recognition DenseAV: Bridging Sounds and Images

A system that connects sounds with visuals, improving machine understanding.

2025-07-31T10:21:30+00:00 ― 6 min read

Audio and Speech Processing Advancements in Speech Synthesis with ARDiT

New model ARDiT improves text-to-speech synthesis and speech editing.

2025-07-31T07:55:45+00:00 ― 5 min read

Audio and Speech Processing Advancements in Speech Separation Techniques

New methods improve clarity in isolating voices from audio mixtures.

2025-07-31T04:41:25+00:00 ― 4 min read

Computation and Language Enhancing AI Understanding Through Contextual Parsing

Introducing SPICE, a task to improve AI interactions using contextual information.

2025-07-30T23:49:55+00:00 ― 7 min read

Sound Advancements in Cross-Modal Music Processing

Research introduces MOSA dataset, enhancing understanding of music's visual and auditory aspects.

2025-07-30T23:01:20+00:00 ― 7 min read

Computation and Language Introducing mHuBERT-147: A Compact Speech Model

mHuBERT-147 processes speech in multiple languages efficiently.

2025-07-30T22:12:45+00:00 ― 4 min read

Sound Transforming Audio Captioning Through Innovative Methods

A new approach to audio captioning reduces reliance on paired data.

2025-07-30T21:24:10+00:00 ― 5 min read

Sound Advancements in Emotion Recognition Through Speech

New methods improve how machines recognize emotions in human speech.

2025-07-30T18:09:50+00:00 ― 5 min read

Audio and Speech Processing Advancements in Target Speech Diarization Technology

A look at new methods in understanding overlapping speech during conversations.

2025-07-30T14:06:55+00:00 ― 8 min read

Machine Learning Challenges in Audio Watermarking Techniques

Investigating vulnerabilities in audio watermarking methods against real-world threats.

2025-07-30T13:18:20+00:00 ― 7 min read

Sound Introducing PianoMotion10M: A New Dataset for Piano Learning

PianoMotion10M provides detailed hand movements to aid piano learners.

2025-07-30T01:09:35+00:00 ― 6 min read

Computer Vision and Pattern Recognition Advancements in Action Sound Generation from Video

A new model improves sound matching with visual actions in videos.

2025-07-29T23:32:25+00:00 ― 11 min read

Sound Advancements in 3D Audio Rendering with AVGS

New model improves realistic audio experiences in virtual environments.

2025-07-29T20:18:05+00:00 ― 7 min read

Audio and Speech Processing Using Audio Technology for Pedestrian Tracking

This study examines audio methods for tracking pedestrian movement in urban areas.

2025-07-29T17:52:20+00:00 ― 7 min read

Audio and Speech Processing Advancing Foley Audio with the MINT Dataset

A new dataset improves the creation of foley audio for multimedia content.

2025-07-29T17:03:45+00:00 ― 6 min read

Audio and Speech Processing Advancements in Automatic Speech Recognition with Dynamic TTA

New methods enhance speech recognition in noisy environments using adaptive techniques.

2025-07-29T13:49:25+00:00 ― 6 min read

Sound SPEAR: A New Approach to Sound Analysis

SPEAR predicts sound behavior in 3D spaces using minimal data collection.

2025-07-29T10:35:05+00:00 ― 6 min read

Computation and Language Advancements in Code-Switching Speech Translation

A new method improves translating mixed-language speech into English.

2025-07-29T09:46:30+00:00 ― 5 min read

Sound Improving Speaker Verification in Radio Communications

A new method enhances speaker verification accuracy in challenging radio environments.

2025-07-29T08:57:55+00:00 ― 6 min read

Sound Improving Backdoor Attacks in Speech Recognition

New method targets rhythm changes for stealthy speech attacks.

2025-07-29T08:09:20+00:00 ― 5 min read

Sound GAMA: A New Model for Sound Understanding

GAMA improves audio processing by merging sound and language insights.

2025-07-29T04:55:00+00:00 ― 5 min read

Audio and Speech Processing AV-CrossNet: Improving Speech Recognition in Noise

A new system helps separate speech from noise for clearer communication.

2025-07-29T03:17:50+00:00 ― 6 min read

Audio and Speech Processing GigaSpeech 2: A New Dataset for Speech Recognition

GigaSpeech 2 offers a vast dataset for low-resource languages to improve speech recognition.

2025-07-29T02:29:15+00:00 ― 5 min read

Audio and Speech Processing Revolutionizing Text-to-Speech with DiTTo-TTS

A new model enhances text-to-speech technology with efficiency and adaptability.

2025-07-29T01:40:40+00:00 ― 6 min read

Audio and Speech Processing New Framework for Clear Speech Production

A novel method optimizing speech analysis and synthesis using vocal tract movements.

2025-07-28T20:49:10+00:00 ― 7 min read

Human-Computer Interaction The Impact of Gestures in Virtual Explanations

This study examines how gestures affect learning from virtual agents.

2025-07-28T19:12:00+00:00 ― 6 min read

Audio and Speech Processing DExter: A New Approach to Expressive Piano Performance

DExter uses AI to create expressive piano music from written scores.

2025-07-28T10:17:35+00:00 ― 5 min read

Sound Real-Time Speaker Diarization: An Overview

Learn about online speaker diarization and its significance in various applications.

2025-07-28T06:14:40+00:00 ― 6 min read

Sound Evaluating Discrete Audio Tokens for Speech Tasks

New benchmark tool assesses discrete audio tokens for various speech processing tasks.

2025-07-28T04:37:30+00:00 ― 8 min read

Sound Advancements in Structured Music Generation with SING

A new method for music generation using self-similarity matrices and attention systems.

2025-07-28T01:23:10+00:00 ― 7 min read

Sound Advancements in Audio Modeling with GANs

New techniques improve guitar amplifier modeling using unpaired data and GANs.

2025-07-27T22:08:50+00:00 ― 7 min read