Latest Articles for Evaluation

Computation and Language New Metric DEnsity Enhances Dialogue System Evaluation

DEnsity offers a fresh approach to evaluate dialogue systems based on human conversation patterns.

2025-11-19T08:03:48+00:00 ― 6 min read

Information Retrieval Enhancing Recommendations Through User Feedback Integration

This article discusses the benefits of using diverse user feedback for better recommendations.

2025-11-18T22:03:24+00:00 ― 6 min read

Computation and Language ArgU: A New Tool for Generating Arguments

ArgU creates structured arguments based on factual information for effective discussions.

2025-11-18T18:06:24+00:00 ― 5 min read

Computation and Language Evaluating GPT-3 in Medical Summarization

This study assesses GPT-3's ability to summarize medical literature effectively.

2025-11-18T08:06:00+00:00 ― 5 min read

Sound Measuring Beauty in Music: A New Approach

A mathematical method to evaluate the beauty of music performances.

2025-11-18T07:37:25+00:00 ― 5 min read

Dentistry and Oral Medicine Improving Dental Care in Brazil: A Study

This study evaluates periodontal care in Brazilian Dental Specialty Centers.

2025-11-18T03:45:30+00:00 ― 5 min read

Computation and Language Evaluating Machine-Generated Rationales for Human Users

This article examines the effectiveness of AI-generated explanations for users.

2025-11-17T22:53:00+00:00 ― 8 min read

Sound DCASE 2023: Advancing Automatic Foley Sound Synthesis

A competition to improve automated Foley sound creation for multimedia.

2025-11-17T14:37:10+00:00 ― 5 min read

Computation and Language Introducing C-Eval: A New Evaluation Tool for Chinese Language Models

C-Eval assesses reasoning and knowledge skills of LLMs in the Chinese language.

2025-11-17T02:52:12+00:00 ― 5 min read

Computer Vision and Pattern Recognition Advances in Document Understanding: A New Dataset

A new dataset improves how machines read and respond to documents.

2025-11-16T12:39:00+00:00 ― 5 min read

Computation and Language Evaluating the RACE Reading Comprehension Dataset

An analysis of the RACE dataset's strengths and weaknesses for reading comprehension.

2025-11-16T12:15:18+00:00 ― 8 min read

Computation and Language Evaluating Claims of Superhuman Performance in NLP

A critical look at language model benchmarks and their implications for human performance.

2025-11-16T11:51:36+00:00 ― 5 min read

Computation and Language Addressing Missing Scores in NLP Evaluations

This article presents a new method for handling missing scores in NLP system evaluations.

2025-11-15T11:45:54+00:00 ― 6 min read

Computation and Language Advancing Empathy in Chatbots

Learn how chatbots are being trained to respond with empathy.

2025-11-15T09:00:00+00:00 ― 5 min read

Computation and Language mLongT5: Advancing Multilingual Text Processing

mLongT5 efficiently manages longer texts across multiple languages.

2025-11-15T00:10:42+00:00 ― 4 min read

Computer Vision and Pattern Recognition Improving Evaluation of Text-to-Image Synthesis

A new method enhances how we evaluate AI-generated images from text descriptions.

2025-11-15T00:02:48+00:00 ― 6 min read

Computation and Language Advancements in Hierarchical Script Generation

A study on creating structured instructions through hierarchical task decomposition.

2025-11-14T19:26:18+00:00 ― 6 min read

Computation and Language IKDSumm: A New Approach to Summarizing Disaster Tweets

IKDSumm effectively summarizes tweets during disasters using disaster-specific knowledge.

2025-11-14T07:51:06+00:00 ― 5 min read

Artificial Intelligence Enhancing LLMs: The TELeR Taxonomy

A new taxonomy to improve LLM performance on complex tasks.

2025-11-14T04:49:24+00:00 ― 6 min read

Computation and Language Evaluating Arguments in a Misinformed World

A new method to assess argument quality by considering context.

2025-11-14T03:46:12+00:00 ― 5 min read

Computation and Language Evaluating Language Models: A Closer Look

Study assesses methods for evaluating language models in understanding language.

2025-11-13T03:08:54+00:00 ― 6 min read

Computation and Language Introducing Seahorse: A Multilingual Summarization Dataset

Seahorse provides a large collection of multilingual summaries with human ratings.

2025-11-13T01:42:00+00:00 ― 6 min read

Computation and Language Cultural Content in Machine Translation: New Insights

Research advancements in translating cultural references using machine translation systems.

2025-11-12T14:14:42+00:00 ― 8 min read

Artificial Intelligence Connecting Different Types of Data with LoReTTa

A new method to integrate various medical data types for better analysis.

2025-11-12T13:51:00+00:00 ― 9 min read

Computation and Language Evaluating Language Models: Bridging the Demographic Gap

Assessing language models' performance across various human demographics is crucial for effective usage.

2025-11-12T13:35:12+00:00 ― 6 min read

Computation and Language Challenges of Retrieval-Augmented Language Models

A study reveals limitations in retrieval-augmented language models for text generation.

2025-11-12T08:11:18+00:00 ― 5 min read

Computation and Language A New Way to Approach Long Document Reasoning

Introducing a structured framework for effective reasoning over long texts.

2025-11-12T05:41:12+00:00 ― 4 min read

Computation and Language Introducing MMSMR: A New Dataset for Evaluating Chatbots

MMSMR dataset aims to improve chatbot conversation evaluation with diverse human responses.

2025-11-12T04:38:00+00:00 ― 5 min read

Computation and Language Cultural Norms: A Comparison of China and America

This study compares social norms between Chinese and American cultures through data analysis.

2025-11-12T03:50:36+00:00 ― 6 min read

Computation and Language Enhancing Table Summarization for User Queries

A new approach to summarizing tables based on user questions for better insights.

2025-11-12T00:48:54+00:00 ― 5 min read

Computation and Language A New Way to Evaluate Generated Text

Introducing a system that explains the evaluation of machine-generated text clearly.

2025-11-12T00:17:18+00:00 ― 5 min read

Computation and Language Advancements in Multilingual Language Models

A new dataset improves language models' ability to understand instructions across various languages.

2025-11-10T17:20:48+00:00 ― 5 min read

Computation and Language Improving Accuracy in Language Models

A new method addresses the challenges faced by language models in providing accurate answers.

2025-11-10T09:58:24+00:00 ― 6 min read

Machine Learning Evaluating Abstaining Classifiers with Counterfactual Scores

A method to assess abstaining classifiers by estimating their missing predictions.

2025-11-10T06:52:08+00:00 ― 8 min read

Information Retrieval The Role of Clarification Questions in Conversational Systems

Clarification questions are essential for effective communication in conversational systems.

2025-11-09T18:34:06+00:00 ― 6 min read

Computer Vision and Pattern Recognition Summarizing Sign Language Videos for Better Communication

A new method improves video summarization for sign language content.

2025-11-09T12:22:48+00:00 ― 4 min read

Computation and Language Advancing Diverse-Modal Entity Linking Techniques

Enhancing model capabilities for linking various data types effectively.

2025-11-09T09:44:48+00:00 ― 5 min read

Computation and Language Chain-of-Thought Hub: Evaluating Reasoning in Language Models

A tool to assess large language models' multi-step reasoning capabilities.

2025-11-09T08:41:36+00:00 ― 5 min read

Computation and Language A New Approach to Summarization Evaluation

Combining reference-based and reference-free methods for better summarization assessment.

2025-11-09T01:11:18+00:00 ― 6 min read

Computation and Language LLMs Outperform Traditional Systems in Translation

Study shows LLMs provide more natural translations, especially for idiomatic phrases.

2025-11-08T23:12:48+00:00 ― 5 min read