Cihang Xie

New benchmark assesses how video-language models handle inaccuracies effectively.

2025-07-24T17:47:18+00:00 ― 6 min read

A model that improves segmentation of parts and objects in images.

2025-06-18T12:55:12+00:00 ― 5 min read

A framework using memory tokens improves video understanding and interaction.

2025-06-18T08:10:48+00:00 ― 7 min read