Latest Articles for Evaluation

Computation and Language Introducing SciEval: A New Standard for LLM Testing in Science

SciEval evaluates language models on their scientific research skills with diverse questioning.

2025-10-05T01:01:24+00:00 ― 5 min read

Human-Computer Interaction Evaluating Guidance Systems in Visual Analytics

A practical approach to assess guidance systems for effective data analysis.

2025-10-04T23:02:54+00:00 ― 7 min read

Computer Vision and Pattern Recognition Improving Document Classification for Real-World Applications

This article discusses the need for better document classification techniques.

2025-10-04T21:43:54+00:00 ― 6 min read

Optimization and Control Advancing Airbrake Systems with AI Technology

Combining neural networks with traditional methods improves airbrake safety and performance.

2025-10-04T20:23:40+00:00 ― 5 min read

Computation and Language Evaluating Machine Translation: Sentence vs. Paragraph Metrics

This article reviews how well current evaluation methods score paragraph-level translations.

2025-10-04T14:29:24+00:00 ― 5 min read

Computation and Language Evaluating Language Models in Healthcare: A New Dataset

A new dataset aids in assessing language models for healthcare applications.

2025-10-03T22:25:36+00:00 ― 7 min read

Sound Advancements in Speech Enhancement Using Spiking Neural Networks

A new method to improve speech quality using energy-efficient networks.

2025-10-03T21:44:15+00:00 ― 5 min read

Computation and Language A New Dataset for Conversational Knowledge Generation

Introducing a dataset focused on factual question-answer conversations.

2025-10-03T00:26:18+00:00 ― 5 min read

Epidemiology Evaluating One Health Surveillance Systems in Europe

A study assesses the effectiveness of One Health surveillance across eleven European systems.

2025-10-02T14:44:00+00:00 ― 5 min read

Computer Vision and Pattern Recognition Evaluating Object Proposals in Vision-Language Tasks

A new method for better evaluating object proposals in vision and language tasks.

2025-10-02T04:25:30+00:00 ― 6 min read

Computation and Language Advancements in Multilingual Dialogue Evaluation Systems

Researchers use machine translation to enhance dialogue quality assessments in various languages.

2025-10-02T01:08:00+00:00 ― 6 min read

Computation and Language Addressing Hallucination in Large Language Models

This article examines hallucination in AI language models and ongoing research.

2025-10-01T04:19:48+00:00 ― 6 min read

Databases Challenges in Learned Query Optimization

Examining issues and solutions for learned query optimizers in database management.

2025-10-01T01:26:00+00:00 ― 5 min read

Computation and Language New Dataset HAE-RAE Bench Evaluates Korean Language Models

HAE-RAE Bench focuses on assessing cultural knowledge in Korean language models.

2025-09-30T02:31:24+00:00 ― 6 min read

Computation and Language Evaluating Reasoning in Vision-Language Models

This work assesses how well VLMs reason based on visual content.

2025-09-29T06:14:48+00:00 ― 6 min read

Computation and Language Creating Effective Follow-up Questions

A study on generating meaningful follow-up questions to deepen understanding.

2025-09-28T14:03:06+00:00 ― 6 min read

Computation and Language Advancing Expressive Speech Synthesis with New Dataset

A new dataset enhances speech synthesis by capturing emotional expression without relying on text.

2025-09-27T18:22:05+00:00 ― 5 min read

Human-Computer Interaction A New Model for Understanding Emotions

A model integrating appraisal and reinforcement learning enhances emotional evaluation.

2025-09-27T18:10:12+00:00 ― 5 min read

Computation and Language Classifying Revisions in Argumentative Essays

This study examines how to classify revisions for better argumentative writing.

2025-09-27T09:44:36+00:00 ― 5 min read

Computation and Language Evaluating Language Models Across Diverse Languages

Exploring how LLMs can assess model outputs in multiple languages.

2025-09-27T00:00:00+00:00 ― 6 min read

Computation and Language Enhancing Translation Quality with Contextual Evaluation

SLIDE improves machine translation assessments by incorporating broader context during evaluation.

2025-09-26T15:10:42+00:00 ― 5 min read

Robotics New Method for Mobile Robot Navigation

This method enhances mobile robots' path planning in changing environments.

2025-09-26T02:56:00+00:00 ― 6 min read

Computation and Language Evaluating Long-Form Question Answering in Language Models

This study compares performance across various language models in answering complex questions.

2025-09-26T02:24:24+00:00 ― 4 min read

Audio and Speech Processing Evaluating an Automatic Sound Masker System in Urban Parks

A study examines the effectiveness of automated sound maskers in public spaces.

2025-09-25T18:35:40+00:00 ― 5 min read

Software Engineering Directed Fuzzing: Targeted Software Testing for Bug Discovery

A focused approach to quickly identify software bugs through targeted testing.

2025-09-25T00:36:00+00:00 ― 5 min read

Computer Vision and Pattern Recognition Improving Cancer Diagnosis with Causality Signals in Medical Images

A novel method enhances cancer diagnosis by integrating weak causality signals in medical imaging.

2025-09-24T21:02:42+00:00 ― 7 min read

Computation and Language Advancements in Complex Text Style Transfer

New methods improve style transfer for text while maintaining meaning.

2025-09-24T17:53:06+00:00 ― 6 min read

Computation and Language Tackling Hate Speech in the Algerian Dialect

A study on detecting hate speech in Algerian social media language.

2025-09-24T05:22:36+00:00 ― 7 min read

Computation and Language Evaluating Healthcare Chatbots: Metrics for Success

This article discusses the evaluation metrics for effective healthcare chatbots.

2025-09-23T19:53:48+00:00 ― 6 min read

Artificial Intelligence Evolving Deep Learning Models with Regularized Evolution

This study examines how deep learning models change during Neural Architecture Search.

2025-09-23T17:07:54+00:00 ― 7 min read

Logic in Computer Science Enhancing Expression Evaluation in Lambda Calculus

Discover a new approach to improve evaluation efficiency in lambda calculus.

2025-09-23T16:28:24+00:00 ― 7 min read

Machine Learning SALSA-CLRS: A New Benchmark for Algorithms

Introducing SALSA-CLRS to improve algorithm evaluation using sparse graphs.

2025-09-23T14:06:12+00:00 ― 6 min read

Computation and Language A New Way to Evaluate Question Answering Systems

SQuArE metric improves evaluation of QA systems through multiple answer references.

2025-09-23T13:58:18+00:00 ― 5 min read

Machine Learning Improving Online Healthcare with Automatic Classification

A new system aims to connect users with medical professionals through automated classification.

2025-09-21T11:16:48+00:00 ― 5 min read

Computation and Language The Future of Telemedicine: Summarizing Patient Interactions

Advancements in summarizing doctor-patient conversations improve telemedicine communication.

2025-09-21T04:18:06+00:00 ― 8 min read

Programming Languages Proving Reliability in Simply Typed Lambda Calculus

Exploring proof techniques for evaluating functions in programming languages.

2025-09-21T03:46:30+00:00 ― 6 min read

Data Structures and Algorithms The Importance of Individual Preference Stability in Clustering

Stability in clustering ensures groups are effective and meaningful.

2025-09-20T18:01:54+00:00 ― 6 min read

Machine Learning Introducing GRANDE: A New Method for Tabular Data

GRANDE uses gradient descent to improve learning from tabular data.

2025-09-20T08:17:18+00:00 ― 5 min read

Computation and Language Evaluating AI Models with Meta Features

A new method for assessing AI models through embeddings and meta features.

2025-09-19T10:10:06+00:00 ― 7 min read

Computation and Language Automated Insights in Legal Text Analysis

A new method reveals patterns in legal decisions using automated text analysis.

2025-09-19T06:21:00+00:00 ― 8 min read