Xiangru Tang

A new framework simplifies creating autonomous language agents for diverse applications.

2025-09-26T19:55:06+00:00 ― 5 min read

This study examines LLM capabilities in producing structured data accurately.

2025-09-26T00:57:30+00:00 ― 5 min read

Examining vulnerabilities and safety strategies for LLM-powered scientific agents.

2025-09-10T13:23:42+00:00 ― 6 min read

A tool designed to improve data science tasks through dynamic planning and error checking.

2025-09-03T08:38:54+00:00 ― 4 min read

AI is changing the way new drugs are developed, making it faster and more efficient.

2025-09-01T18:06:48+00:00 ― 7 min read

This article discusses issues and best practices for evaluating language models.

2025-08-08T10:07:42+00:00 ― 7 min read

Data contamination affects the evaluation of large language models significantly.

2025-07-26T10:12:36+00:00 ― 5 min read

This article discusses new approaches to improve predictions in chemical reactions using technology.

2025-07-20T00:37:15+00:00 ― 8 min read

A new benchmark assesses models for verifying financial claims in complex documents.

2025-05-27T17:33:54+00:00 ― 7 min read

ChemSafetyBench tests chatbots on chemical safety and knowledge.

2025-05-06T00:39:52+00:00 ― 6 min read