Hongsheng Li

Robotics Advancing Robotic Interaction with ManipVQA

New system enhances how robots understand and interact with objects.

2025-08-28T15:19:12+00:00 ― 6 min read

Computer Vision and Pattern Recognition Transforming Image Understanding with SPHINX-V

SPHINX-V enhances AI's ability to interpret images through user interaction.

2025-08-24T07:49:48+00:00 ― 6 min read

Computer Vision and Pattern Recognition Improving Camera Control in Video Generation

New method enhances camera movement control in text-to-video creation.

2025-08-23T07:44:06+00:00 ― 6 min read

Computer Vision and Pattern Recognition Advancements in Urban Scene Generation Techniques

A new method combines 3D layouts and text for better urban scene creation.

2025-08-20T16:08:24+00:00 ― 5 min read

Computer Vision and Pattern Recognition Lumina-T2X: A New Age in Media Creation

Transform text into images, videos, and audio seamlessly with Lumina-T2X.

2025-08-12T05:14:30+00:00 ― 6 min read

Computer Vision and Pattern Recognition Any2Point: Bridging 3D Understanding in AI Models

A new framework enhances AI's grasp of 3D spaces.

2025-08-11T19:14:05+00:00 ― 7 min read

Computation and Language Innovative Method for Character-Level Text Infilling

A new technique improves text generation in natural language processing.

2025-08-06T02:18:06+00:00 ― 6 min read

Machine Learning Introducing the Phased Consistency Model for AI Image Generation

A new model streamlines AI image and video creation with improved speed and quality.

2025-08-05T21:57:24+00:00 ― 4 min read

Machine Learning Advancing AI Decision-Making with UniZero

UniZero enhances AI's long-term memory and decision-making abilities.

2025-07-28T13:09:54+00:00 ― 7 min read

Computer Vision and Pattern Recognition Introducing MM-Instruct: A Step Forward in Instruction Following

MM-Instruct improves large multimodal models' ability to follow diverse instructions.

2025-07-22T17:43:48+00:00 ― 5 min read

Computation and Language Improving Language Models with Step-Controlled DPO

A new approach enhances reasoning in language models by generating controlled errors.

2025-07-22T05:13:18+00:00 ― 6 min read

Human-Computer Interaction Advancing Mobile AI with the AMEX Dataset

The AMEX dataset enhances AI understanding of mobile app interfaces.

2025-07-20T00:09:36+00:00 ― 7 min read

Computer Vision and Pattern Recognition Advancements in Text-to-Image Technology

A new model revolutionizes image generation from text descriptions, enhancing various industries.

2025-07-02T04:22:30+00:00 ― 5 min read

Computer Vision and Pattern Recognition Creating Realistic 3D Avatars with Text Inputs

A new method generates customizable 3D avatars from text descriptions.

2025-06-22T21:12:00+00:00 ― 7 min read

Computer Vision and Pattern Recognition LLaVA-MoD: A New Approach to Efficient Multimodal Models

LLaVA-MoD creates smaller multimodal models using knowledge from larger counterparts.

2025-06-20T22:35:24+00:00 ― 5 min read

Computer Vision and Pattern Recognition The Future of Multimodal Search Engines

Examining the role of LMMs in transforming search capabilities with text and images.

2025-06-09T12:35:30+00:00 ― 6 min read

Computer Vision and Pattern Recognition MedViLaM: A New Model for Medical Data Analysis

MedViLaM integrates multiple medical data types for improved analysis and decision-making.

2025-06-03T10:58:06+00:00 ― 5 min read

Computer Vision and Pattern Recognition TimeWalker: Your Personal Time-Traveling Avatar

Experience aging in 3D with TimeWalker technology!

2025-04-20T02:07:21+00:00 ― 5 min read

Computer Vision and Pattern Recognition StreamChat: Real-Time Video Interaction Revolution

StreamChat transforms how we engage with streaming video in real-time.

2025-03-21T16:43:30+00:00 ― 7 min read