Latest Articles for Technology Evaluation

A fresh look at how AI answers medical questions and its effectiveness.

2025-05-23T17:12:45+00:00 ― 6 min read

A new method improves how we assess image generation from text.

2025-05-23T08:56:51+00:00 ― 8 min read

An analysis of ChatGPT's ability to recommend movies effectively.

2025-05-20T17:00:09+00:00 ― 5 min read

Learn how pairwise ranking helps in selecting the best language model.

2025-05-19T04:28:00+00:00 ― 8 min read

SpecTool brings clarity to LLM errors in using tools.

2025-05-16T16:30:40+00:00 ― 4 min read

Assessing language models' effectiveness in coding tasks with new benchmarks.

2025-05-15T17:42:40+00:00 ― 5 min read

AbilityLens standardizes evaluation for multimodal large language models.

2025-05-15T13:54:40+00:00 ― 6 min read

Learn how SelfPrompt helps assess the strength of language models effectively.

2025-04-27T12:04:45+00:00 ― 3 min read

Evaluating language models' abilities in synthetic data creation using AgoraBench.

2025-04-17T19:33:09+00:00 ― 5 min read

Exploring evaluation issues in Explainable Artificial Intelligence and the quest for trust.

2025-04-03T20:54:36+00:00 ― 6 min read

A tool to evaluate the safety responses of large language models in China.

2025-03-05T07:30:00+00:00 ― 5 min read

New methods assess quality of AI-created human faces for realism and appeal.

2025-02-25T12:31:03+00:00 ― 9 min read

MVTamperBench evaluates VLMs against video tampering techniques for improved reliability.

2025-01-23T08:26:15+00:00 ― 5 min read