Computer Science - Sound

RSS

Sound Advancements in Audio and Language Models

The CLAP model bridges audio and text processing for various applications.

2025-09-14T13:46:00+00:00 ― 4 min read

Computation and Language Advancements in Self-Supervised Learning for French Speech Technologies

A project aims to improve French speech processing using self-supervised learning.

2025-09-14T12:57:25+00:00 ― 5 min read

Audio and Speech Processing Advancements in Automatic Prosody Annotation

New methods improve how machines recognize speech rhythm and emotion.

2025-09-14T12:08:50+00:00 ― 6 min read

Sound New Method for Sound Estimation in Scattered Environments

A new approach improves sound estimation in spaces with scattering objects.

2025-09-14T06:28:45+00:00 ― 6 min read

Sound The Impact of Undecidability on Music Production

Examines how undecidability influences music composition and production today.

2025-09-14T05:40:10+00:00 ― 4 min read

Audio and Speech Processing Improving Speaker Diarization with Language Models

This article explores advancements in speaker diarization using language models for better accuracy.

2025-09-14T03:14:25+00:00 ― 5 min read

Audio and Speech Processing Advancements in Speech Recognition for Children

This study improves ASR systems' ability to recognize children's speech.

2025-09-14T02:25:50+00:00 ― 5 min read

Audio and Speech Processing The Role of Audio in Pedestrian Detection

Researchers explore audio sensing technology for improved pedestrian detection in urban areas.

2025-09-14T00:48:40+00:00 ― 5 min read

Audio and Speech Processing Advancements in Sound Field Recording Techniques

New method enhances sound source localization and field separation.

2025-09-13T20:45:45+00:00 ― 6 min read

Sound Advancements in Synthesizing Percussive Sounds

A new method improves drum sound synthesis by focusing on sharp transient elements.

2025-09-13T19:57:10+00:00 ― 6 min read

Sound Creating Privacy-Friendly Synthetic Voice Datasets

Researchers are developing synthetic voice data to protect privacy in voice recognition.

2025-09-13T15:05:40+00:00 ― 5 min read

Audio and Speech Processing VoxtLM: A Unified Approach to Speech and Text

VoxtLM combines speech recognition, synthesis, text generation, and continuation in one model.

2025-09-13T11:02:45+00:00 ― 4 min read

Audio and Speech Processing PromptASR: Next-Level Speech Recognition Technology

New system enhances speech recognition using context-aware prompts.

2025-09-13T10:14:10+00:00 ― 4 min read

Sound Advancements in Universal Audio Models

EnCodecMAE combines self-supervised learning and audio codecs for improved audio task performance.

2025-09-13T09:25:35+00:00 ― 5 min read

Audio and Speech Processing Advancing Autism Diagnosis Through Sound Recognition

A study on using machine learning to identify children's sounds for ASD assessment.

2025-09-13T07:48:25+00:00 ― 5 min read

Audio and Speech Processing A New Approach to Keyword Spotting

Introducing a flexible method for recognizing keywords in speech across languages.

2025-09-13T06:11:15+00:00 ― 5 min read

Audio and Speech Processing Assessing Speech Quality in Audio Communication

A look at how speech quality is tested using crowdsourcing.

2025-09-13T05:22:40+00:00 ― 5 min read

Sound New Methods for Detecting AI-Generated Audio

Advanced techniques for ensuring audio authenticity in the age of voice cloning.

2025-09-13T03:40:24+00:00 ― 5 min read

Audio and Speech Processing Advancements in Audio Captioning with Text-Only Training

A new method trains audio captioning systems using only text descriptions.

2025-09-13T02:56:55+00:00 ― 6 min read

Sound Essential Steps for Writing Academic Papers

A guide to crafting clear and effective academic papers.

2025-09-13T01:19:45+00:00 ― 3 min read

Human-Computer Interaction Erie: A New Tool for Data Sonification

Erie simplifies turning data into sound for better accessibility.

2025-09-13T00:22:54+00:00 ― 6 min read

Cryptography and Security Backdoor Attacks: A Hidden Threat to Voice Verification

Examining the risks of backdoor attacks on speaker verification systems.

2025-09-12T22:54:00+00:00 ― 6 min read

Computer Vision and Pattern Recognition Advancements in Audio-Visual Segmentation Techniques

A new method enhances audio-visual segmentation without detailed labels.

2025-09-12T20:28:15+00:00 ― 5 min read

Sound New System Improves Voice Extraction from Unstable Head Positions

PIAVE helps machines extract voices clearly, even when speakers turn their heads.

2025-09-12T19:39:40+00:00 ― 6 min read

Audio and Speech Processing Libriheavy: A New Dataset for Speech Recognition

Libriheavy offers 50,000 hours of spoken English to boost speech recognition technology.

2025-09-12T18:51:05+00:00 ― 5 min read

Audio and Speech Processing Improving Speech Clarity with AV2Wav Technology

AV2Wav enhances speech quality using audio and visual cues.

2025-09-12T17:13:55+00:00 ― 5 min read

Audio and Speech Processing EmoConv-Diff: A New Way to Change Emotions in Speech

A fresh method for machines to alter speech emotions naturally.

2025-09-12T16:25:20+00:00 ― 5 min read

Sound Detecting AI-Generated Singing Voices

New methods are being developed to identify deepfake singing voices in the music industry.

2025-09-12T14:48:10+00:00 ― 6 min read

Sound Optimizing Text-to-Speech with Core-Set Selection

Core-set selection improves text-to-speech models by focusing on diverse data.

2025-09-12T08:19:30+00:00 ― 5 min read

Sound Advancements in Speech Emotion Recognition Systems

New models are transforming how we analyze emotions in speech.

2025-09-12T07:30:55+00:00 ― 6 min read

Computer Vision and Pattern Recognition Privacy-First Action Recognition with Ultrasound Technology

A new method uses ultrasound to recognize actions while protecting privacy.

2025-09-12T06:42:20+00:00 ― 5 min read

Sound A New Framework for Speaker Anonymization

Introducing a flexible framework to enhance voice privacy research.

2025-09-12T05:05:10+00:00 ― 7 min read

Sound CiwaGAN: A New Model for Speech Learning

CiwaGAN combines control of speech movements and information sharing for better speech learning.

2025-09-12T04:16:35+00:00 ― 6 min read

Computation and Language IntraVerbalPA: A New Approach to Pronunciation Assessment

A framework that blends verbal and non-verbal cues for better language learning.

2025-09-12T03:28:00+00:00 ― 5 min read

Computation and Language Improving Explanations for Speech Models

A new method simplifies understanding of speech classification models.

2025-09-12T02:39:25+00:00 ― 6 min read

Computation and Language Improving Language Learning with L1-MultiMDD

A new system enhances pronunciation skills by considering first language influences.

2025-09-12T01:50:50+00:00 ― 5 min read

Emerging Technologies Quantum Computing Meets Music Composition

Discover how quantum tools change music creation and performance.

2025-09-12T00:31:30+00:00 ― 6 min read

Audio and Speech Processing Advancements in Voice Conversion Technology

New method improves emotion preservation in voice conversion processes.

2025-09-12T00:13:40+00:00 ― 6 min read

Audio and Speech Processing Emo-StarGAN: Advancing Voice Conversion Technology

New method preserves emotional tone in voice conversion for better human-computer interaction.

2025-09-11T23:25:05+00:00 ― 5 min read

Computation and Language Advancements in Direct Text to Speech Translation

New systems improve translation from text to spoken language without intermediates.

2025-09-11T20:59:20+00:00 ― 4 min read