Learn about Frechet Music Distance and its role in evaluating AI-generated music.
Jan Retkowski, Jakub Stępniak, Mateusz Modrzejewski
― 8 min read
New Science Research Articles Everyday
Learn about Frechet Music Distance and its role in evaluating AI-generated music.
Jan Retkowski, Jakub Stępniak, Mateusz Modrzejewski
― 8 min read
Latest Articles
Sudha Krishnamurthy
― 5 min read
Jianwei Cui, Yu Gu, Shihao Chen
― 5 min read
Evangelia Gkritzali, Panagiotis Kaliosis, Sofia Galanaki
― 6 min read
Quang-Anh N. D., Manh-Hung Ha, Thai Kim Dinh
― 6 min read
Hugo Flores García, Oriol Nieto, Justin Salamon
― 8 min read
Watermarking techniques shield artists' rights in music generation with AI.
Pascal Epple, Igor Shilov, Bozhidar Stevanoski
― 7 min read
Transforming mono audio into immersive binaural experiences with innovative techniques.
Alon Levkovitch, Julian Salazar, Soroosh Mariooryad
― 7 min read
Research explores how speech enhancement models maintain syllable stress amidst noise.
Rangavajjala Sankara Bharadwaj, Jhansi Mallela, Sai Harshitha Aluru
― 6 min read
A new framework enhances the alignment of sounds and visuals in videos.
Kexin Li, Zongxin Yang, Yi Yang
― 6 min read
Revolutionizing text-to-speech with improved efficiency and natural-sounding voices.
Haowei Lou, Helen Paik, Pari Delir Haghighi
― 6 min read
Discover how TTS systems are evolving to sound more human-like.
Haowei Lou, Helen Paik, Wen Hu
― 7 min read
New system transforms audio control through detailed text descriptions.
Sonal Kumar, Prem Seetharaman, Justin Salamon
― 7 min read
Combining video and audio for better emotion detection.
Antonio Fernandez, Suzan Awinat
― 9 min read
YingSound transforms video production by automating sound effects generation.
Zihao Chen, Haomin Zhang, Xinhan Di
― 6 min read
Researchers use echoes to watermark audio, ensuring creators' rights are protected.
Christopher J. Tralie, Matt Amery, Benjamin Douglas
― 8 min read
Robots can now navigate tricky environments using sound thanks to SonicBoom.
Moonyoung Lee, Uksang Yoo, Jean Oh
― 6 min read
MASV model enhances voice verification, ensuring security and efficiency.
Yang Liu, Li Wan, Yiteng Huang
― 5 min read
Exploring the impact of AI tools on music creation and composers' perspectives.
Eleanor Row, György Fazekas
― 7 min read
Speech recognition technology enhances digit recognition, especially in noisy environments.
Ali Nasr-Esfahani, Mehdi Bekrani, Roozbeh Rajabi
― 5 min read
Enhancing multilingual ASR performance for Japanese through targeted fine-tuning.
Mark Bajo, Haruka Fukukawa, Ryuji Morita
― 5 min read
Exploring how BCIs decode imagined speech for improved communication.
Byung-Kwan Ko, Jun-Young Kim, Seo-Hyun Lee
― 7 min read
SonicMesh uses sound to improve 3D human body modeling from images.
Xiaoxuan Liang, Wuyang Zhang, Hong Zhou
― 5 min read
Discover the latest breakthroughs in real-time speech recognition and how they improve our interactions.
Rongxiang Wang, Zhiming Xu, Felix Xiaozhu Lin
― 5 min read
Researchers improve speech processing using Libri2Vox and synthetic data techniques.
Yun Liu, Xuechen Liu, Xiaoxiao Miao
― 6 min read
Discover how emotional TTS changes communication with machines, making them more relatable.
Sho Inoue, Kun Zhou, Shuai Wang
― 6 min read
Learn how insect sounds can help monitor ecosystems and manage pests.
Yinxuan Wang, Sudip Vhaduri
― 7 min read
New methods help machines find key information from spoken content.
Yueqian Lin, Yuzhe Fu, Jingyang Zhang
― 6 min read
Discover how AI streamlines speech data collection through crowdsourcing.
Beomseok Lee, Marco Gaido, Ioan Calapodescu
― 5 min read
Explore the differences between spontaneous and scripted speech in audio processing.
Shahar Elisha, Andrew McDowell, Mariano Beguerisse-Díaz
― 6 min read
DAAN improves how machines learn from audio-visual data in zero-shot scenarios.
RunLin Yu, Yipu Gong, Wenrui Li
― 5 min read
New method improves detection of audio deepfakes using innovative learning techniques.
Yujie Chen, Jiangyan Yi, Cunhang Fan
― 6 min read
A new model from Singapore improves machine speech understanding.
Muhammad Huzaifah, Geyu Lin, Tianchi Liu
― 7 min read
As machines produce music, we must protect human creativity through effective detection methods.
Yupei Li, Qiyang Sun, Hanqian Li
― 8 min read
New models identify synthetic speech and combat misuse of voice technology.
Mahieyin Rahmun, Rafat Hasan Khan, Tanjim Taharat Aurpa
― 5 min read
TAME uses sound to detect drones, improving safety and monitoring.
Zhenyuan Xiao, Huanran Hu, Guili Xu
― 6 min read
Learn how CAMEL improves understanding of mixed-language conversations.
He Wang, Xucheng Wan, Naijun Zheng
― 6 min read
Research shows brain activity can help machines recognize music effectively.
Taketo Akama, Zhuohao Zhang, Pengcheng Li
― 6 min read
Audio technology offers a cost-effective way to track UAVs safely.
Allen Lei, Tianchen Deng, Han Wang
― 6 min read
A new AI method analyzes voices to detect laryngeal cancer risk.
Mary Paterson, James Moor, Luisa Cutillo
― 7 min read
Discover how video-to-audio synthesis is changing media experiences with perfect sound alignment.
Ho Kei Cheng, Masato Ishii, Akio Hayakawa
― 7 min read
A new system revolutionizes how sound designers create audio for videos.
Riccardo Fosco Gramaccioni, Christian Marinoni, Emilian Postolache
― 8 min read
A look at how speech enhancement improves communication through data characteristics.
Leying Zhang, Wangyou Zhang, Chenda Li
― 8 min read
New methods improve ASR systems for languages they haven't encountered before.
Shao-Syuan Huang, Kuan-Po Huang, Andy T. Liu
― 7 min read
Discover how TTA tech merges words and sounds for richer audio experiences.
Yuhang He, Yash Jain, Xubo Liu
― 7 min read
Researchers enhance Swiss German speech recognition through innovative data generation.
Vincenzo Timmel, Claudio Paonessa, Reza Kakooee
― 6 min read
A new method improves lip synchrony in dubbed videos for a natural viewing experience.
Lucas Goncalves, Prashant Mathur, Xing Niu
― 6 min read
Discover how Whisper improves speech recognition in multilingual conversations.
Jiahui Zhao, Hao Shi, Chenrui Cui
― 5 min read
Learn how SpeechRAG improves audio question answering without ASR errors.
Do June Min, Karel Mundnich, Andy Lapastora
― 6 min read
A fresh approach makes sound recognition more accessible and efficient.
Noriyuki Tonami, Wataru Kohno, Keisuke Imoto
― 7 min read
Learn how voice anonymization safeguards personal information in a tech-driven world.
Natalia Tomashenko, Emmanuel Vincent, Marc Tommasi
― 6 min read
Merging audio and visual cues to improve speech recognition in noisy environments.
Zhaofeng Lin, Naomi Harte
― 5 min read
Speech enhancement technology adapts to reduce noise and improve communication.
Riccardo Miccini, Clement Laroche, Tobias Piechowiak
― 5 min read
New tech combines sound and visuals for better drone detection.
Zhenyuan Xiao, Yizhuo Yang, Guili Xu
― 6 min read
A fresh approach combines speech and text for better dysarthria assessments.
Anuprabha M, Krishna Gurugubelli, Kesavaraj V
― 6 min read
Exploring new technology that detects sounds from invisible sources.
Yuhang He, Sangyun Shin, Anoop Cherian
― 5 min read
Discover how Smooth-Foley enhances video audio generation.
Yaoyun Zhang, Xuenan Xu, Mengyue Wu
― 6 min read
Innovative technique connects lyrics and melodies for better song creation.
Jiaxing Yu, Xinda Wu, Yunfei Xu
― 7 min read
Enhancing machine understanding of human dialogue turn-taking dynamics.
Hyunbae Jeon, Frederic Guintu, Rayvant Sahni
― 8 min read
Exploring how language affects DeepFake detection accuracy across various languages.
Bartłomiej Marek, Piotr Kawa, Piotr Syga
― 6 min read
VERSA evaluates speech, audio, and music quality effectively.
Jiatong Shi, Hye-jin Shim, Jinchuan Tian
― 9 min read
Discover how audio-language models are changing sound recognition technology.
Gongyu Chen, Haomin Zhang, Chaofan Ding
― 6 min read
New methods enhance natural dialogue in speech technology.
Zhenqi Jia, Rui Liu
― 6 min read
Discover how SpeechSSM transforms long-form speech generation for better interactions.
Se Jin Park, Julian Salazar, Aren Jansen
― 5 min read
Learn how real-time translation transforms communication across languages.
Sara Papi, Peter Polak, Ondřej Bojar
― 6 min read
A lightweight model designed to effectively separate mixed speech in noisy environments.
Shaoxiang Dang, Tetsuya Matsumoto, Yoshinori Takeuchi
― 6 min read
Researchers tackle audio spoofing to enhance voice recognition security.
Xuechen Liu, Junichi Yamagishi, Md Sahidullah
― 9 min read
Learn how AV-ASR combines audio and visuals for better speech recognition.
Yihan Wu, Yichen Lu, Yifan Peng
― 6 min read
A new method is transforming how machines learn from music.
Julien Guinot, Elio Quinton, György Fazekas
― 7 min read
New technology transforms silent murmurs into audible communication for those in need.
Neil Shah, Shirish Karande, Vineet Gandhi
― 6 min read
New methods in speech synthesis improve clarity and adaptability for diverse applications.
Neil Shah, Ayan Kashyap, Shirish Karande
― 8 min read
Discover the rich tradition of Ethiopian Orthodox Tewahedo Church chants.
Mequanent Argaw Muluneh, Yan-Tsung Peng, Li Su
― 7 min read
A new dataset highlights the beauty of Ethiopian Orthodox chants.
Mequanent Argaw Muluneh, Yan-Tsung Peng, Worku Abebe Degife
― 7 min read
New advances help speech-recognition technology better serve people with speech disorders.
Jimmy Tobin, Katrin Tomanek, Subhashini Venugopalan
― 6 min read
Discover how ETTA turns words into creative audio experiences.
Sang-gil Lee, Zhifeng Kong, Arushi Goel
― 6 min read
A fresh take on how music affects our emotions.
Dengming Zhang, Weitao You, Ziheng Liu
― 7 min read
A new framework for generating synchronized and natural group dances.
Kaixing Yang, Xulong Tang, Haoyu Wu
― 8 min read
New approach in emotion recognition focuses on mouth movements over sounds.
Shreya G. Upadhyay, Ali N. Salman, Carlos Busso
― 6 min read
Discover how Stable-TTS improves text-to-speech technology for a human-like experience.
Wooseok Han, Minki Kang, Changhun Kim
― 7 min read
Innovative sound wave technology offers new insights into indoor walking speed.
Sheng Lyu, Chenshu Wu
― 6 min read
Audio assistants are getting smarter with AQA-K, enhancing responses through knowledge.
Abhirama Subramanyam Penamakuri, Kiran Chhatre, Akshat Jain
― 6 min read
Researchers study how our brain controls speech and its implications for recovery.
Eric Easthope
― 6 min read
Discover how text can transform into audio with cutting-edge models.
Chia-Yu Hung, Navonil Majumder, Zhifeng Kong
― 3 min read