Arman Cohan

This study evaluates when expansions improve or harm information retrieval performance.

2025-09-26T14:07:30+00:00 ― 3 min read

This study examines LLM capabilities in producing structured data accurately.

2025-09-26T00:57:30+00:00 ― 5 min read

An in-depth look at how LLMs convert language into code across multiple tasks.

2025-09-20T06:58:18+00:00 ― 8 min read

A new open language model for research and innovation in natural language processing.

2025-09-12T09:14:24+00:00 ― 6 min read

Examining vulnerabilities and safety strategies for LLM-powered scientific agents.

2025-09-10T13:23:42+00:00 ― 6 min read

Study reveals significant data overlap affecting language model evaluations in code generation.

2025-09-01T02:16:12+00:00 ― 6 min read

A new dataset helps IR models adapt to complex instructions for better performance.

2025-08-26T18:49:00+00:00 ― 3 min read

Data contamination affects the evaluation of large language models significantly.

2025-07-26T10:12:36+00:00 ― 5 min read

Two methods enhance the accuracy of AI-generated text evaluations.

2025-05-29T22:25:03+00:00 ― 7 min read

A new benchmark assesses models for verifying financial claims in complex documents.

2025-05-27T17:33:54+00:00 ― 7 min read

ChemSafetyBench tests chatbots on chemical safety and knowledge.

2025-05-06T00:39:52+00:00 ― 6 min read