A new system targets hate speech in memes effectively.
Xuanyu Su, Yansong Li, Diana Inkpen
― 6 min read
Cutting edge science explained simply
A new system targets hate speech in memes effectively.
Xuanyu Su, Yansong Li, Diana Inkpen
― 6 min read
Latest Articles
Liangdong Qiu, Chengxing Yu, Yanran Li
― 7 min read
Caolu Xu, Zhiyong Chen, Meixia Tao
― 6 min read
Eashan Adhikarla, Kai Zhang, John Nicholson
― 5 min read
Wei Zhou, Zhou Wang
― 7 min read
Tiancheng Shi, Yuanchen Wei, John R. Kender
― 5 min read
A new method streamlines 3D scene editing using just one 2D image.
Guan Luo, Tian-Xing Xu, Ying-Tian Liu
― 6 min read
A new method targets multiple face authentication systems efficiently.
Hanrui Wang, Shuo Wang, Cunjian Chen
― 8 min read
An innovative system automates sound generation for films and games.
Junwon Lee, Jaekwon Im, Dabin Kim
― 8 min read
Learn how Harmonizing Attention improves image blending by focusing on geometry and texture.
Eito Ikuta, Yohan Lee, Akihiro Iohara
― 6 min read
Enhancing image quality leads to better pupil size assessments.
Vijul Shah, Brian B. Moser, Ko Watanabe
― 5 min read
Current benchmarks misjudge models' ability to connect audio and visual data.
Liangyu Chen, Zihao Yue, Boshen Xu
― 5 min read
Automation in animation creation opens new pathways for storytelling and visuals.
Yunxin Li, Haoyuan Shi, Baotian Hu
― 6 min read
A look into the complexities of identifying mixed audio tracks.
Viola Negroni, Davide Salvi, Paolo Bestagini
― 6 min read
StyleSpeech advances TTS systems by capturing natural speech nuances.
Haowei Lou, Helen Paik, Wen Hu
― 6 min read
Cap2Sum uses dense video captions to improve video summarization efficiency and effectiveness.
Cairong Zhao, Chutian Wang, Zifan Song
― 7 min read
MaVEn enhances AI's ability to process multiple images for better reasoning.
Chaoya Jiang, Jia Hongrui, Haiyang Xu
― 5 min read
AI is reshaping how music is composed and experienced.
Sangjun Han, Jiwon Ham, Chaeeun Lee
― 6 min read
A new method improves emotion recognition in conversations using multiple data sources.
Cam-Van Thi Nguyen, The-Son Le, Anh-Tuan Mai
― 5 min read
Introducing RMARN: an innovative approach to connect text and 3D data.
Wenrui Li, Wei Han, Yandu Chen
― 5 min read
A new method transforms text into detailed 3D scenes seamlessly.
Wenrui Li, Fucheng Cai, Yapeng Mi
― 6 min read
A new approach to building accessible virtual spaces using WebXR and A-Frame.
Giuseppe Macario
― 6 min read
SynthDoc creates synthetic documents for machine learning in document reading.
Chuanghao Ding, Xuejing Liu, Wei Tang
― 6 min read
This study presents a model to analyze emotional reactions to video content.
Mingwei Sun, Kunpeng Zhang
― 7 min read
This article discusses the benefits of merging voice and facial recognition systems.
Aref Farhadipour, Masoumeh Chapariniya, Teodora Vukovic
― 5 min read
A new method for creating RGBA images easily and effectively.
Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli
― 7 min read
Kangaroo improves video analysis by integrating visuals, sounds, and text effectively.
Jiajun Liu, Yibing Wang, Hanghang Ma
― 5 min read
This paper presents a single-encoder model for improved image segmentation based on text descriptions.
Seonghoon Yu, Ilchae Jung, Byeongju Han
― 6 min read
New methods improve voice separation in noisy environments.
Tathagata Bandyopadhyay
― 5 min read
A new framework enhances image captioning accuracy and reduces errors.
Qian Cao, Xu Chen, Ruihua Song
― 5 min read
Improving how machines assist users through better interaction and response measures.
Dan Bohus, Sean Andrist, Yuwei Bao
― 5 min read
Exploring digital humans and haptic interfaces for immersive interactions.
Senthil Kumar Jagatheesaperumal, Praveen Sathikumar, Harikrishnan Rajan
― 5 min read
New methods enhance video transmission by predicting missing data effectively.
John Li, Shehab Sarar Ahmed, Deepak Nair
― 5 min read
A framework for real-time music adjustment in games and films.
Haoxuan Liu, Zihao Wang, Haorong Hong
― 5 min read
MRDAC improves face video quality and compression using multiple reference frames.
Goluck Konuko, Giuseppe Valenzise
― 6 min read
Researchers explore ultrasonic echoes for accurate distance measurements in quiet indoor settings.
Junpei Honma, Akisato Kimura, Go Irie
― 6 min read
Exploring shadow detection, removal, and generation in computer vision.
Xiaowei Hu, Zhenghao Xing, Tianyu Wang
― 7 min read
A new method enhances image quality during adverse weather using language and vision models.
Jiaqi Xu, Mengyang Wu, Xiaowei Hu
― 5 min read
This framework enhances multimedia app efficiency while protecting user privacy.
Zhongze Tang, Mengmei Ye, Yao Liu
― 7 min read
LongLLaVA improves multi-image understanding for various applications.
Xidong Wang, Dingjie Song, Shunian Chen
― 5 min read
SegTalker enhances talking face videos with realistic textures and easy editing.
Lingyu Xiong, Xize Cheng, Jintao Tan
― 5 min read
HiSC4D captures human movement using wearable sensors for better interaction analysis.
Yudi Dai, Zhiyong Wang, Xiping Lin
― 7 min read
Introducing a method to improve question-answering in videos with multiple events.
Hangyu Qin, Junbin Xiao, Angela Yao
― 6 min read
An overview of audio-visual speaker diarization methods, challenges, and systems.
Victoria Mingote, Alfonso Ortega, Antonio Miguel
― 5 min read
This work enhances vision-language models through improved data strategies and innovative techniques.
Yuan Liu, Zhongyin Zhao, Ziyuan Zhuang
― 7 min read
A new method improves object identification in images through tailored visual and text integration.
Ruilin Yao, Shengwu Xiong, Yichen Zhao
― 5 min read