Guiming Hardy Chen

Research reveals significant biases in human and LLM evaluations of responses.

2025-09-07T05:12:36+00:00 ― 6 min read

New benchmarks reveal challenges for MLLMs in real-world tasks with long contexts.

2025-08-15T10:16:00+00:00 ― 7 min read