Learn how Harmonizing Attention improves image blending by focusing on geometry and texture.
Eito Ikuta, Yohan Lee, Akihiro Iohara
― 6 min read
Cutting edge science explained simply
Learn how Harmonizing Attention improves image blending by focusing on geometry and texture.
Eito Ikuta, Yohan Lee, Akihiro Iohara
― 6 min read
Latest Articles
Vijul Shah, Brian B. Moser, Ko Watanabe
― 5 min read
Liangyu Chen, Zihao Yue, Boshen Xu
― 5 min read
Yunxin Li, Haoyuan Shi, Baotian Hu
― 6 min read
Viola Negroni, Davide Salvi, Paolo Bestagini
― 6 min read
Haowei Lou, Helen Paik, Wen Hu
― 6 min read
Cap2Sum uses dense video captions to improve video summarization efficiency and effectiveness.
Cairong Zhao, Chutian Wang, Zifan Song
― 7 min read
MaVEn enhances AI's ability to process multiple images for better reasoning.
Chaoya Jiang, Jia Hongrui, Haiyang Xu
― 5 min read
AI is reshaping how music is composed and experienced.
Sangjun Han, Jiwon Ham, Chaeeun Lee
― 6 min read
A new method improves emotion recognition in conversations using multiple data sources.
Cam-Van Thi Nguyen, The-Son Le, Anh-Tuan Mai
― 5 min read
Introducing RMARN: an innovative approach to connect text and 3D data.
Wenrui Li, Wei Han, Yandu Chen
― 5 min read
A new method transforms text into detailed 3D scenes seamlessly.
Wenrui Li, Fucheng Cai, Yapeng Mi
― 6 min read
A new approach to building accessible virtual spaces using WebXR and A-Frame.
Giuseppe Macario
― 6 min read
SynthDoc creates synthetic documents for machine learning in document reading.
Chuanghao Ding, Xuejing Liu, Wei Tang
― 6 min read
This study presents a model to analyze emotional reactions to video content.
Mingwei Sun, Kunpeng Zhang
― 7 min read
This article discusses the benefits of merging voice and facial recognition systems.
Aref Farhadipour, Masoumeh Chapariniya, Teodora Vukovic
― 5 min read
A new method for creating RGBA images easily and effectively.
Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli
― 7 min read
Kangaroo improves video analysis by integrating visuals, sounds, and text effectively.
Jiajun Liu, Yibing Wang, Hanghang Ma
― 5 min read
This paper presents a single-encoder model for improved image segmentation based on text descriptions.
Seonghoon Yu, Ilchae Jung, Byeongju Han
― 6 min read
New methods improve voice separation in noisy environments.
Tathagata Bandyopadhyay
― 5 min read
A new framework enhances image captioning accuracy and reduces errors.
Qian Cao, Xu Chen, Ruihua Song
― 5 min read
Improving how machines assist users through better interaction and response measures.
Dan Bohus, Sean Andrist, Yuwei Bao
― 5 min read
Exploring digital humans and haptic interfaces for immersive interactions.
Senthil Kumar Jagatheesaperumal, Praveen Sathikumar, Harikrishnan Rajan
― 5 min read
New methods enhance video transmission by predicting missing data effectively.
John Li, Shehab Sarar Ahmed, Deepak Nair
― 5 min read
A framework for real-time music adjustment in games and films.
Haoxuan Liu, Zihao Wang, Haorong Hong
― 5 min read
MRDAC improves face video quality and compression using multiple reference frames.
Goluck Konuko, Giuseppe Valenzise
― 6 min read
Researchers explore ultrasonic echoes for accurate distance measurements in quiet indoor settings.
Junpei Honma, Akisato Kimura, Go Irie
― 6 min read
Exploring shadow detection, removal, and generation in computer vision.
Xiaowei Hu, Zhenghao Xing, Tianyu Wang
― 7 min read
A new method enhances image quality during adverse weather using language and vision models.
Jiaqi Xu, Mengyang Wu, Xiaowei Hu
― 5 min read
This framework enhances multimedia app efficiency while protecting user privacy.
Zhongze Tang, Mengmei Ye, Yao Liu
― 7 min read
LongLLaVA improves multi-image understanding for various applications.
Xidong Wang, Dingjie Song, Shunian Chen
― 5 min read
SegTalker enhances talking face videos with realistic textures and easy editing.
Lingyu Xiong, Xize Cheng, Jintao Tan
― 5 min read
HiSC4D captures human movement using wearable sensors for better interaction analysis.
Yudi Dai, Zhiyong Wang, Xiping Lin
― 7 min read
Introducing a method to improve question-answering in videos with multiple events.
Hangyu Qin, Junbin Xiao, Angela Yao
― 6 min read
An overview of audio-visual speaker diarization methods, challenges, and systems.
Victoria Mingote, Alfonso Ortega, Antonio Miguel
― 5 min read
This work enhances vision-language models through improved data strategies and innovative techniques.
Yuan Liu, Zhongyin Zhao, Ziyuan Zhuang
― 7 min read
A new method improves object identification in images through tailored visual and text integration.
Ruilin Yao, Shengwu Xiong, Yichen Zhao
― 5 min read
SimCLIP enhances meme analysis by effectively combining text and images.
Javier Huertas-Tato, Christos Koutlis, Symeon Papadopoulos
― 6 min read
MIP-GAF dataset helps analyze social dynamics in images.
Surbhi Madan, Shreya Ghosh, Lownish Rai Sookha
― 5 min read
A new approach refines the connection between images and text in VLMs.
Ali Abdollah, Amirmohammad Izadi, Armin Saghafian
― 5 min read
Research links paintings to music by interpreting emotions.
Tanisha Hisariya, Huan Zhang, Jinhua Liang
― 6 min read
A study reveals a new way to identify emotions using video, sound, and text.
Jiehui Jia, Huan Zhang, Jinhua Liang
― 5 min read
This article explores how varied inputs can boost speech recognition accuracy.
Yiwen Guan, Viet Anh Trinh, Vivek Voleti
― 5 min read
LLaQo offers detailed feedback for music performance assessment, enhancing student learning.
Huan Zhang, Vincent Cheung, Hayato Nishioka
― 5 min read
Exploring how Starlink influences video streaming globally.
Liz Izhikevich, Reese Enghardt, Te-Yuan Huang
― 5 min read
Artificial intelligence is reshaping music with new tools and approaches.
Megan Wei, Mateusz Modrzejewski, Aswin Sivaraman
― 6 min read