Latest Articles for Evaluation

Computation and Language Introducing CORE-GPT: A Trustworthy Research Resource

CORE-GPT offers reliable answers using open access scientific articles.

2025-10-23T04:10:54+00:00 ― 4 min read

Software Engineering Improving Code Review with ReviewRanker

A new system to enhance the code review process and ensure quality.

2025-10-22T07:46:24+00:00 ― 5 min read

Machine Learning A Sustainable Approach to Neural Architecture Search

New framework balances deep learning efficiency and carbon emissions.

2025-10-22T01:58:48+00:00 ― 6 min read

Forensic Medicine Challenges Facing Unaccompanied Children Seeking Asylum

Over 300,000 children are seeking asylum in the US, facing numerous obstacles.

2025-10-21T16:46:00+00:00 ― 5 min read

Audio and Speech Processing FALL-E: A New Era in Sound Creation

FALL-E creates high-quality sound effects from text descriptions.

2025-10-21T13:13:20+00:00 ― 5 min read

Artificial Intelligence Challenges in Explainable AI: A Deep Dive

Examining the issues and potential of explainable AI methods.

2025-10-20T10:13:00+00:00 ― 6 min read

Computer Vision and Pattern Recognition Using Language for Creating Dynamic Traffic Scenarios

A method that generates traffic scenes using natural language for self-driving tests.

2025-10-19T09:27:48+00:00 ― 6 min read

Computation and Language The Rise of Small Language Models

Discover the potential of small language models in AI technology.

2025-10-19T09:12:00+00:00 ― 5 min read

Computation and Language Improving Taxonomy Evaluation with Language Models

A new method for evaluating taxonomies using language models shows promise.

2025-10-18T17:08:12+00:00 ― 7 min read

Computation and Language Evaluating Ripple Effects in Knowledge Editing

New research highlights the importance of ripple effects in updating language models.

2025-10-16T11:40:48+00:00 ― 8 min read

Multiagent Systems Enhancing Participatory Budgeting Through Consensus

A new method improves fairness in participatory budgeting by promoting dialogue among voters.

2025-10-16T11:01:18+00:00 ― 6 min read

Scientific Communication and Education Examining eLife's Peer Review Language Clarity

Study assesses clarity of eLife's peer review phrases and suggests improvements.

2025-10-16T05:29:36+00:00 ― 6 min read

Information Retrieval Evaluating Recommender Systems: DCG vs nDCG

A look at the effectiveness of recommendation metrics in user experiences.

2025-10-15T06:11:12+00:00 ― 9 min read

Machine Learning Assessing Uncertainty in Machine Learning Models

Evaluating models' ability to estimate uncertainty for improved predictions.

2025-10-15T03:22:00+00:00 ― 7 min read

Computer Vision and Pattern Recognition Improving Image Generation from Text Descriptions

A new method enhances how images match text inputs.

2025-10-14T14:00:56+00:00 ― 6 min read

Computation and Language New Evaluation Benchmark for Multimodal Models

A new benchmark aims to enhance evaluation of Multimodal Large Language Models.

2025-10-14T06:05:30+00:00 ― 6 min read

Computation and Language Evaluating Active Learning Strategies in NLP

A framework to compare active learning methods for better data labeling efficiency.

2025-10-13T16:31:48+00:00 ― 6 min read

Computation and Language New Method for Evaluating Language Model Responses

A novel approach uses wider networks to improve evaluation quality of language models.

2025-10-12T17:13:30+00:00 ― 6 min read

Computation and Language Addressing Bias in Text Data: The Nbias Framework

A framework to identify and reduce bias in textual data for fairer outcomes.

2025-10-12T14:51:18+00:00 ― 9 min read

Computer Vision and Pattern Recognition Advancements in Radar Data Generation for Automated Driving

A new method uses GANs to create realistic radar data for safer automated driving.

2025-10-12T08:40:00+00:00 ― 4 min read

Cryptography and Security Evaluating Crypto-API Misuse Detectors with MASC

MASC provides a new approach to test crypto-API misuse detectors effectively.

2025-10-12T06:49:24+00:00 ― 5 min read

Optimization and Control Stochastic Optimization: Tackling Uncertainty in Decision-Making

Learn how stochastic optimization addresses uncertainty in various fields.

2025-10-12T04:55:12+00:00 ― 5 min read

Computation and Language Improving Summary Evaluations with Redundancy Awareness

A new metric enhances summary quality by addressing redundancy and multiple references.

2025-10-12T04:11:24+00:00 ― 6 min read

Software Engineering Evaluating Large Language Models for Code

New methods improve understanding of language models used for coding tasks.

2025-10-11T13:18:42+00:00 ― 6 min read

Computation and Language Introducing SciGraphQA: A New Dataset for Scientific Graphs

SciGraphQA provides a rich dataset for understanding scientific graphs through question-answering.

2025-10-11T06:20:00+00:00 ― 5 min read

Computation and Language CheXOFA: A Step Towards Automated X-Ray Reporting

CheXOFA summarizes chest X-ray reports, improving efficiency in healthcare.

2025-10-11T05:26:50+00:00 ― 5 min read

Computation and Language Introducing CLEVA: An Evaluation Platform for Chinese Language Models

CLEVA offers standardized evaluations for assessing Chinese language models effectively.

2025-10-10T07:57:00+00:00 ― 6 min read

Information Retrieval Trends in Personalized POI Recommendations for Tourists

Examining recent advancements in POI recommendation systems from 2017 to 2022.

2025-10-09T04:25:54+00:00 ― 10 min read

Machine Learning Introducing UPREVE: A Tool for Causal Discovery

UPREVE simplifies causal discovery in social and behavioral systems for researchers.

2025-10-08T18:39:48+00:00 ― 6 min read

Computer Vision and Pattern Recognition New Evaluation Method for Text-to-Image Synthesis

A novel approach to assess image generation quality based on text descriptions.

2025-10-08T01:26:24+00:00 ― 7 min read

High Energy Physics - Experiment Generating Images from Particle Physics Data Using AI Models

This article highlights using generative models to create particle physics images.

2025-10-07T19:22:00+00:00 ― 6 min read

Computation and Language Introducing the Comprehensive Medical Benchmark for LLMs in China

A new benchmark for evaluating language models in Chinese medical contexts.

2025-10-07T18:51:24+00:00 ― 9 min read

Machine Learning Improving Feature Attribution Methods in AI

Evaluating feature attribution methods through soundness and completeness for better AI predictions.

2025-10-07T18:35:36+00:00 ― 6 min read

Computation and Language Evaluating ChatGPT for Ontology Alignment

This study assesses how ChatGPT performs in matching ontologies.

2025-10-07T15:57:36+00:00 ― 5 min read

Computation and Language Evaluating Grammatical Error Correction Systems

An overview of how to assess GEC systems effectively.

2025-10-07T12:40:06+00:00 ― 6 min read

Artificial Intelligence Measuring Semantic Relatedness in DBpedia

This paper examines methods for assessing the relatedness of concepts using DBpedia.

2025-10-07T02:16:00+00:00 ― 6 min read

Computation and Language Improving Language Models Through Instruction Tuning

A look into how instruction tuning enhances language model responses.

2025-10-06T12:02:48+00:00 ― 8 min read

Artificial Intelligence Evaluating Knowledge Graphs with KGrEaT

KGrEaT assesses the quality and usefulness of knowledge graphs for various tasks.

2025-10-06T01:22:54+00:00 ― 5 min read

Computation and Language Improving Language Models for Portuguese

A new method enhances language models specifically for Portuguese.

2025-10-05T21:10:06+00:00 ― 5 min read

Computation and Language Introducing the Biomedical Entity Linking Benchmark (BELB)

A standardized benchmark to improve biomedical entity linking and research comparisons.

2025-10-05T17:28:54+00:00 ― 5 min read