Latest Articles for Evaluation

Computation and Language Automated Insights in Legal Text Analysis

A new method reveals patterns in legal decisions using automated text analysis.

2025-09-19T06:21:00+00:00 ― 8 min read

Computer Vision and Pattern Recognition Advancements in Smartphone Image Processing

A model for consistent photo quality across different smartphones.

2025-09-18T16:39:24+00:00 ― 8 min read

Information Retrieval Improving Fashion Recommendations with Alternatives

Introducing alternatives can enhance user satisfaction in fashion recommendation systems.

2025-09-17T15:30:30+00:00 ― 5 min read

Computation and Language Assessing LLMs in High School Math Competitions

A new dataset evaluates language models' abilities in advanced math problem solving.

2025-09-17T08:47:36+00:00 ― 5 min read

Software Engineering Addressing Inter-dataset Code Duplication in Model Evaluation

Examining the effects of inter-dataset code duplication on model performance metrics.

2025-09-17T01:33:06+00:00 ― 7 min read

Machine Learning Improving RAG for Brazilian Portuguese Text Generation

This study focuses on enhancing retrieval-augmented generation methods for Brazilian Portuguese.

2025-09-17T00:53:36+00:00 ― 6 min read

Computer Vision and Pattern Recognition WAVES: A New Benchmark for Image Watermarking

This study introduces WAVES, a benchmark to evaluate watermarking techniques against various attacks.

2025-09-16T13:42:06+00:00 ― 4 min read

Computation and Language Orion-14B: A New Era in Language Models

Orion-14B excels in understanding and generating multilingual text with 14 billion parameters.

2025-09-15T17:49:12+00:00 ― 6 min read

Information Retrieval Evaluating Persona in Dialogue Systems

New methods assess how dialogue systems maintain personality consistency.

2025-09-15T11:14:12+00:00 ― 6 min read

Computation and Language A New Approach to Knowledge Composition in NLP

This framework enhances how knowledge is combined in machine learning models for better performance.

2025-09-14T22:59:30+00:00 ― 7 min read

Artificial Intelligence Using Language Models to Summarize PET Reports

Study reveals language models can generate useful PET report impressions.

2025-09-14T07:36:21+00:00 ― 6 min read

Computation and Language Evaluating Large Language Models in Medical Diagnosis

Assessing the accuracy of LLMs in diagnosing medical conditions from images and symptoms.

2025-09-13T23:41:12+00:00 ― 4 min read

Computation and Language Improving AI Evaluation in Radiology Reports

This research enhances AI-generated radiology report evaluations through expert collaboration.

2025-09-13T18:56:48+00:00 ― 8 min read

Computers and Society Evaluating the Safety of Generative AI: The Role of Red-Teaming

Analyzing how red-teaming can enhance AI safety and address potential risks.

2025-09-13T12:13:54+00:00 ― 7 min read

Computers and Society Addressing Harm Amplification in AI Models

Examining harm amplification in text-to-image models and its societal impact.

2025-09-12T18:11:36+00:00 ― 6 min read

Computation and Language Aligning Language Models with Human Preferences

This paper discusses adjusting language models to align with human values and expectations.

2025-09-12T13:03:30+00:00 ― 6 min read

Computation and Language Introducing a Transparent Open Language Model

A new open language model for research and innovation in natural language processing.

2025-09-12T09:14:24+00:00 ― 6 min read

Sound A New Framework for Speaker Anonymization

Introducing a flexible framework to enhance voice privacy research.

2025-09-12T05:05:10+00:00 ― 7 min read

Human-Computer Interaction EvaLLM: A Framework for Evaluating AI-Generated Visualizations

EvaLLM offers a structured approach to assess AI-generated visual content.

2025-09-11T21:07:36+00:00 ― 6 min read

Machine Learning Ensuring Trust in Machine Learning: A New Approach

A method for verifying machine learning models to enhance trust and transparency.

2025-09-11T16:15:18+00:00 ― 6 min read

Computation and Language Understanding AI with SIDU-TXT: A New Approach

SIDU-TXT sheds light on AI decisions in natural language processing.

2025-09-11T02:57:24+00:00 ― 6 min read

Scientific Communication and Education Gender Bias in Academic Research Evaluations

Research shows women face biases in evaluations and funding in academia.

2025-09-11T00:05:30+00:00 ― 9 min read

Computer Vision and Pattern Recognition Transforming Handwritten Notes into Digital Ink

A new method converts handwritten notes into digital ink for easy use.

2025-09-09T20:56:12+00:00 ― 7 min read

Software Engineering Reproducibility Challenges in Software Fault Prediction

An analysis of reproducibility issues in deep learning software fault prediction research.

2025-09-09T19:21:24+00:00 ― 8 min read

Computation and Language Addressing Name Confusion in Text Generation

New method improves fact-checking for computer-generated texts with ambiguous names.

2025-09-09T17:30:48+00:00 ― 7 min read

Multimedia Effective Poster Design Through Simple Metrics

Learn how to design posters that communicate messages clearly and attractively.

2025-09-09T08:49:24+00:00 ― 5 min read

Machine Learning Addressing Reward Hacking in AI Training

Exploring the challenges and solutions of reward hacking in AI model training.

2025-09-09T06:58:48+00:00 ― 7 min read

Computer Vision and Pattern Recognition New Framework for Evaluating Visual Question Answering Models

A fresh method for assessing how models respond to image-related queries.

2025-09-09T06:11:24+00:00 ― 5 min read

Audio and Speech Processing Introducing AV-SUPERB: A New Benchmark for Audio-Visual Models

AV-SUPERB evaluates audio and visual models across various tasks for better performance.

2025-09-08T22:32:35+00:00 ― 5 min read

Computation and Language Evaluating Referring Expression Generation Models

New methods improve how we assess computer-generated text.

2025-09-08T20:18:54+00:00 ― 8 min read

Artificial Intelligence Assessing Large Language Models in Cybersecurity

A detailed look at CyberMetric's evaluation of AI and human experts in cybersecurity.

2025-09-08T19:39:24+00:00 ― 8 min read

Machine Learning The Need for Machine Unlearning in AI

Addressing ethical concerns through selective memory removal in AI models.

2025-09-08T16:53:30+00:00 ― 6 min read

Computer Vision and Pattern Recognition Advancements in Text-to-Image Generation

Exploring how machines create images from text prompts and align with human preferences.

2025-09-08T11:29:36+00:00 ― 5 min read

Computation and Language Advancing Personalization in Language Models

This study examines the benefits of personalized responses in language models.

2025-09-08T05:02:30+00:00 ― 4 min read

Information Retrieval Benchmarking Strategies for Recommender Systems

A new approach to evaluate and compare RecSys algorithms using diverse datasets.

2025-09-07T16:00:24+00:00 ― 14 min read

Computation and Language Improving Question Answering Evaluation Methods

A new framework for assessing AI answer correctness with human-like judgment.

2025-09-07T13:06:36+00:00 ― 6 min read

Computation and Language Enhancing Science Education with Language Models

Language models aim to improve science learning by providing personalized assistance.

2025-09-07T12:27:06+00:00 ― 8 min read

Machine Learning Evaluating Time Series Anomaly Detection with TimeSeriesBench

A benchmark tool for improving time series anomaly detection methods.

2025-09-07T10:04:54+00:00 ― 6 min read

Computation and Language Bias in Judgment of Language Models

Research reveals significant biases in human and LLM evaluations of responses.

2025-09-07T05:12:36+00:00 ― 6 min read

Computation and Language Evaluating Medical AI: A New Benchmark for Med-MLLMs

This benchmark assesses the performance of medical language models in healthcare.

2025-09-07T01:47:12+00:00 ― 7 min read