Can Huang

New model improves tracking and recognition of text in video content.

2025-09-18T07:18:30+00:00 ― 4 min read

New dataset Square-10M significantly boosts open-source visual question answering capabilities.

2025-08-18T02:31:12+00:00 ― 6 min read

Introducing a new model that efficiently combines text and layout for better document understanding.

2025-07-20T12:48:00+00:00 ― 5 min read

ParGo enhances understanding of images and text by balancing global and partial views.

2025-06-23T01:16:54+00:00 ― 7 min read

A new approach improves video analysis with dynamic token systems.

2025-03-16T21:09:54+00:00 ― 8 min read