Kaijie Zhu

A new method balances model robustness and generalization against tricky inputs.

2025-10-13T16:16:00+00:00 ― 5 min read

Introducing a fresh approach to assess large language models effectively.

2025-09-05T09:14:00+00:00 ― 6 min read

NPHardEval4V assesses reasoning capabilities of multimodal large language models.

2025-09-01T13:19:48+00:00 ― 7 min read

This study examines how LLMs handle reasoning in abstract and contextual scenarios.

2025-08-02T16:24:18+00:00 ― 5 min read

Examining the issues and potential improvements in academic peer review.

2025-07-27T05:49:42+00:00 ― 7 min read