A look at the new single-stage TTS system improving speech generation.
Gerard I. Gállego, Roy Fejgin, Chunghsin Yeh
― 6 min read
Cutting edge science explained simply
A look at the new single-stage TTS system improving speech generation.
Gerard I. Gállego, Roy Fejgin, Chunghsin Yeh
― 6 min read
This study addresses challenges in audio language models for low-resource languages.
Potsawee Manakul, Guangzhi Sun, Warit Sirichotedumrong
― 5 min read
This study enhances emotion recognition systems for less common languages using high-resource data.
Hsi-Che Lin, Yi-Cheng Lin, Huang-Cheng Chou
― 6 min read
A model improves speech tasks in multilingual settings, addressing code-switching challenges.
Jing Xu, Daxin Tan, Jiaqi Wang
― 5 min read
DeFT-Mamba improves sound separation and classification in noisy environments.
Dongheon Lee, Jung-Woo Choi
― 5 min read
CADA-GAN enhances ASR systems' performance across various recording environments.
Chien-Chun Wang, Li-Wei Chen, Cheng-Kang Chou
― 6 min read
EVA combines audio and visual signals for better speech recognition accuracy.
Yihan Wu, Yifan Peng, Yichen Lu
― 4 min read
A new framework simplifies speech recognition in busy environments.
Jinhan Wang, Weiqing Wang, Kunal Dhawan
― 5 min read
Llama-AVSR merges audio and visual inputs for enhanced speech recognition accuracy.
Umberto Cappellazzo, Minsu Kim, Honglie Chen
― 6 min read
WMCodec enhances audio watermarking for better security and authenticity.
Junzuo Zhou, Jiangyan Yi, Yong Ren
― 5 min read
New models tackle sound classification with limited training data.
Jin Jie Sean Yeo, Ee-Leng Tan, Jisheng Bai
― 5 min read
A new approach improves fake audio detection using pretrained models.
Zhiyong Wang, Ruibo Fu, Zhengqi Wen
― 5 min read
New method improves speech generation quality and efficiency.
Xin Qi, Ruibo Fu, Zhengqi Wen
― 4 min read
A method combining labeled and unlabeled data enhances sound source detection.
Vadim Rozenfeld, Bracha Laufer Goldshtein
― 5 min read
Discover how audio cues aid players in table tennis.
Thomas Gossard, Julian Schmalzl, Andreas Ziegler
― 6 min read
A system prioritizing melody while offering control over orchestral music generation.
Dinh-Viet-Toan Le, Yi-Hsuan Yang
― 5 min read
A new method uses virtual shadowing to enhance language learners' pronunciation feedback.
Haopeng Geng, Daisuke Saito, Nobuaki Minematsu
― 6 min read
New methods improve binaural audio quality in challenging sound environments.
Ami Berger, Vladimir Tourbabin, Jacob Donley
― 8 min read
A new ASR method helps technology understand children's speech better.
Zhonghao Shi, Harshvardhan Srivastava, Xuan Shi
― 5 min read
Composer uses text prompts to create complex music compositions in MIDI format.
Jakub Poćwiardowski, Mateusz Modrzejewski, Marek S. Tatara
― 5 min read
A resource for studying singing patterns in Japanese idol music.
Hitoshi Suda, Shunsuke Yoshida, Tomohiko Nakamura
― 6 min read
ViolinDiff enhances the realism of computer-generated violin music.
Daewoong Kim, Hao-Wen Dong, Dasaem Jeong
― 5 min read
Combining features enhances underwater sound classification accuracy.
Amirmohammad Mohammadi, Iren'e Masabarakiza, Ethan Barnes
― 6 min read
Transfer learning improves audio classification for underwater sound detection.
Amirmohammad Mohammadi, Tejashri Kelhe, Davelle Carreiro
― 6 min read
A new model creates audio that matches video, enhancing media experiences.
Ilpo Viertola, Vladimir Iashin, Esa Rahtu
― 4 min read
A method to boost automatic speech recognition by blending keyword lists with language models.
Iuliia Thorbecke, Juan Zuluaga-Gomez, Esaú Villatoro-Tello
― 4 min read
A study on vocal imitation techniques using technology to enhance communication.
Matthew Caren, Kartik Chandra, Joshua B. Tenenbaum
― 5 min read
Learn how to effectively train speech models with fewer labeled resources.
Iuliia Thorbecke, Juan Zuluaga-Gomez, Esaú Villatoro-Tello
― 7 min read
An analysis of gender terminology in speech technology and its societal implications.
Ariadna Sanchez, Alice Ross, Nina Markl
― 7 min read
A new framework improves detection of overlapping sound events in complex audio environments.
Han Yin, Jisheng Bai, Yang Xiao
― 6 min read
Research on improving bird sound identification through machine learning techniques.
Burooj Ghani, Vincent J. Kalkman, Bob Planqué
― 6 min read
A new method improves automatic piano cover creation using existing music transcription technology.
Kazuma Komiya, Yoshihisa Fukuhara
― 6 min read
A look at the Codec-SUPERB challenge results and codec performance metrics.
Haibin Wu, Xuanjun Chen, Yi-Cheng Lin
― 5 min read
MultiMed project enhances automatic speech recognition for better healthcare communication.
Khai Le-Duc, Phuc Phan, Tan-Hanh Pham
― 5 min read
A fresh approach to audio quality assessment without needing clean references.
Jozef Coldenhoff, Milos Cernak
― 6 min read
ECHO framework improves sound classification accuracy using structured labels and a two-stage learning process.
Pranav Gupta, Raunak Sharma, Rashmi Kumari
― 5 min read
New method enhances speech clarity by integrating visual information.
Wenze Ren, Kuo-Hsuan Hung, Rong Chao
― 5 min read
A new approach enhances sound direction estimation for moving speakers in challenging settings.
Daniel A. Mitchell, Boaz Rafaely, Anurag Kumar
― 8 min read
Audio Moment Retrieval enables pinpointing specific moments in long recordings.
Hokuto Munakata, Taichi Nishimura, Shota Nakada
― 5 min read
Safe Guard detects hate speech in real-time during voice interactions in social VR.
Yiwen Xu, Qinyang Hou, Hongyu Wan
― 6 min read