Composer uses text prompts to create complex music compositions in MIDI format.
Jakub Poćwiardowski, Mateusz Modrzejewski, Marek S. Tatara
― 5 min read
Cutting edge science explained simply
Composer uses text prompts to create complex music compositions in MIDI format.
Jakub Poćwiardowski, Mateusz Modrzejewski, Marek S. Tatara
― 5 min read
A resource for studying singing patterns in Japanese idol music.
Hitoshi Suda, Shunsuke Yoshida, Tomohiko Nakamura
― 6 min read
ViolinDiff enhances the realism of computer-generated violin music.
Daewoong Kim, Hao-Wen Dong, Dasaem Jeong
― 5 min read
Combining features enhances underwater sound classification accuracy.
Amirmohammad Mohammadi, Iren'e Masabarakiza, Ethan Barnes
― 6 min read
Transfer learning improves audio classification for underwater sound detection.
Amirmohammad Mohammadi, Tejashri Kelhe, Davelle Carreiro
― 6 min read
AI technology is changing the landscape of vishing scams, increasing risks for individuals.
João Figueiredo, Afonso Carvalho, Daniel Castro
― 5 min read
A new model creates audio that matches video, enhancing media experiences.
Ilpo Viertola, Vladimir Iashin, Esa Rahtu
― 4 min read
A method to boost automatic speech recognition by blending keyword lists with language models.
Iuliia Thorbecke, Juan Zuluaga-Gomez, Esaú Villatoro-Tello
― 4 min read
A study on vocal imitation techniques using technology to enhance communication.
Matthew Caren, Kartik Chandra, Joshua B. Tenenbaum
― 5 min read
Learn how to effectively train speech models with fewer labeled resources.
Iuliia Thorbecke, Juan Zuluaga-Gomez, Esaú Villatoro-Tello
― 7 min read
An analysis of gender terminology in speech technology and its societal implications.
Ariadna Sanchez, Alice Ross, Nina Markl
― 7 min read
A new framework improves detection of overlapping sound events in complex audio environments.
Han Yin, Jisheng Bai, Yang Xiao
― 6 min read
Research on improving bird sound identification through machine learning techniques.
Burooj Ghani, Vincent J. Kalkman, Bob Planqué
― 6 min read
A new method improves automatic piano cover creation using existing music transcription technology.
Kazuma Komiya, Yoshihisa Fukuhara
― 6 min read
A look at the Codec-SUPERB challenge results and codec performance metrics.
Haibin Wu, Xuanjun Chen, Yi-Cheng Lin
― 5 min read
MultiMed project enhances automatic speech recognition for better healthcare communication.
Khai Le-Duc, Phuc Phan, Tan-Hanh Pham
― 5 min read
A fresh approach to audio quality assessment without needing clean references.
Jozef Coldenhoff, Milos Cernak
― 6 min read
ECHO framework improves sound classification accuracy using structured labels and a two-stage learning process.
Pranav Gupta, Raunak Sharma, Rashmi Kumari
― 5 min read
New method enhances speech clarity by integrating visual information.
Wenze Ren, Kuo-Hsuan Hung, Rong Chao
― 5 min read
A new approach enhances sound direction estimation for moving speakers in challenging settings.
Daniel A. Mitchell, Boaz Rafaely, Anurag Kumar
― 8 min read
Audio Moment Retrieval enables pinpointing specific moments in long recordings.
Hokuto Munakata, Taichi Nishimura, Shota Nakada
― 5 min read
Safe Guard detects hate speech in real-time during voice interactions in social VR.
Yiwen Xu, Qinyang Hou, Hongyu Wan
― 6 min read
AI is evolving to engage in more natural conversations.
Bandhav Veluri, Benjamin N Peloquin, Bokai Yu
― 5 min read
A novel approach uses real-time MRI to visualize speech production movements.
Hong Nguyen, Sean Foley, Kevin Huang
― 5 min read
A new method to detect early room reflections improves audio experiences.
Yogev Hadadi, Vladimir Tourbabin, Zamir Ben-Hur
― 6 min read
A project developing speech and text datasets for languages with limited resources.
Nikola Ljubešić, Peter Rupnik, Danijel Koržinek
― 5 min read
A new framework enhances voice recognition and adapts to various speech tasks.
Junyi Peng, Ladislav Mošner, Lin Zhang
― 4 min read
New methods are needed to detect advanced deepfake speech technologies.
Lam Pham, Phat Lam, Dat Tran
― 5 min read
New methods boost accuracy in identifying animal sounds from limited data.
Yaxiong Chen, Xueping Zhang, Yunfei Zi
― 5 min read
New method improves virtual sound integration in AR environments.
Francesc Lluís, Nils Meyer-Kahlen
― 6 min read
A new method aims to preserve voice privacy while allowing for effective communication.
Jacob J Webber, Oliver Watts, Gustav Eje Henter
― 4 min read
New methods improve speech recognition for low-resource languages without text.
Krithiga Ramadass, Abrit Pal Singh, Srihari J
― 4 min read
New methods enhance accuracy in speech recognition systems using phonetic understanding.
Leonid Velikovich, Christopher Li, Diamantino Caseiro
― 5 min read
This framework improves real-time animations by synchronizing speech and gestures seamlessly.
Zixin Guo, Jian Zhang
― 5 min read
New acoustic features enhance ASR systems' performance in noisy environments.
Muhammad A. Shah, Bhiksha Raj
― 4 min read
A new loss function boosts audio quality by aligning phase and magnitude.
Pin-Jui Ku, Chun-Wei Ho, Hao Yen
― 6 min read
A new TTS model adds emotional depth to computer-generated speech.
Yunji Chu, Yunseob Shim, Unsang Park
― 5 min read
Evaluating speech recognition models for autism diagnostic sessions.
Aditya Ashvin, Rimita Lahiri, Aditya Kommineni
― 6 min read
Recent methods improve audio clarity and quality using advanced models.
Pin-Jui Ku, Alexander H. Liu, Roman Korostik
― 6 min read
A fresh approach improves detection of fake audio recordings.
Viola Negroni, Davide Salvi, Alessandro Ilic Mezza
― 5 min read