Latest Articles for Model Evaluation

Computer Vision and Pattern Recognition Rethinking Evaluation Methods for Multimodal Models

New benchmark improves evaluation of multimodal models by minimizing biases.

2025-07-22T12:12:00+00:00 ― 6 min read

Computer Vision and Pattern Recognition Evaluating Multimodal Learning in Language Models

This study examines how visual and textual data affect model performance.

2025-07-22T07:03:54+00:00 ― 7 min read

Artificial Intelligence Contextual Decomposition: A New Lens for Transformers

CD-T enhances understanding of transformer models, improving interpretation and trust.

2025-07-22T06:40:12+00:00 ― 4 min read

Computer Vision and Pattern Recognition Measuring Gender Bias in Large Vision-Language Models

New benchmark assesses gender bias in AI models related to job roles.

2025-07-22T01:40:00+00:00 ― 6 min read

Machine Learning Addressing Clean-label Backdoor Attacks in Machine Learning

Examining vulnerabilities from clean-label backdoor attacks and how generalization bounds can help.

2025-07-21T22:58:40+00:00 ― 6 min read

Multimedia OpenVNA: Advancing Language Understanding in Noise

A new tool for testing language models in noisy environments.

2025-07-21T05:23:24+00:00 ― 4 min read

Machine Learning Reevaluating Machine Learning Model Assessments for Tabular Data

A new approach to evaluate ML models focusing on data preparation.

2025-07-20T15:49:42+00:00 ― 7 min read

Machine Learning Evaluating Explainable AI Methods for Reliability

Research assesses stability of XAI methods using diabetes dataset.

2025-07-20T07:08:18+00:00 ― 6 min read

Software Engineering Assessing Large Language Models in Coding Contexts

A study on how LLMs manage coding rules and constraints.

2025-07-19T18:29:54+00:00 ― 4 min read

Computation and Language Evaluating Large Language Models: Key Insights

Discover the importance and challenges of assessing LLM performance effectively.

2025-07-19T12:26:30+00:00 ― 5 min read

Software Engineering Evaluating Foundation Models: Challenges and Solutions

A look into foundation model leaderboards and their evaluation issues.

2025-07-19T12:10:42+00:00 ― 6 min read

Machine Learning Reassessing Generative Models Through New Metrics

New metrics provide better evaluation of generative models' performance in machine learning.

2025-07-19T03:21:24+00:00 ― 5 min read

Machine Learning Understanding the Rashomon Effect in Machine Learning

The Rashomon Effect reveals multiple effective models in machine learning.

2025-07-18T20:46:24+00:00 ― 8 min read

Methodology Evaluating Time-to-Event Outcomes: A Critical Review

A review of methods for assessing time-to-event predictions in data science.

2025-07-18T17:22:28+00:00 ― 7 min read

Machine Learning The Role of Invariance in Transfer Learning

Examining how invariance impacts model performance in transfer learning.

2025-07-18T11:33:24+00:00 ― 5 min read

Computation and Language Evaluating the Impact of Post-Training on Language Models

Analyzing the true effects of post-training methods on language model performance.

2025-07-18T04:50:30+00:00 ― 5 min read

Machine Learning Challenges in Lightweight One-Class Classification Models

Examining the vulnerabilities of lightweight models against adversarial attacks.

2025-07-17T15:24:42+00:00 ― 5 min read

Computer Vision and Pattern Recognition Addressing Object Hallucination in Vision-Language Models

This study evaluates how well large models handle multiple objects in images.

2025-07-17T12:30:54+00:00 ― 6 min read

Machine Learning Evaluating Advances in Unsupervised Graph Domain Adaptation

A look into the challenges and innovations in graph domain adaptation methods.

2025-07-16T23:13:00+00:00 ― 7 min read

Machine Learning Enhancing Model Reliability through Calibration Analysis

This research improves machine learning model reliability via calibration and recalibration techniques.

2025-07-16T19:28:48+00:00 ― 8 min read

Machine Learning Challenges in Processing Long Sequences of Data

Examining the difficulties models face with long sequences in various applications.

2025-07-16T04:15:24+00:00 ― 5 min read

Machine Learning Managing Randomness in Deep Learning Models

Learn how random seed selection impacts AI model performance and reliability.

2025-07-15T19:16:52+00:00 ― 6 min read

Computation and Language Rethinking Evaluation Methods for LLMs

A fresh approach to assessing large language models for better performance insights.

2025-07-15T16:32:18+00:00 ― 5 min read

Machine Learning HO-FMN: A New Approach to Adversarial Attacks

Introducing HO-FMN for better evaluation of machine learning model robustness against adversarial attacks.

2025-07-15T08:54:06+00:00 ― 6 min read

Computer Vision and Pattern Recognition Reassessing Vulnerabilities in Semantic Segmentation Models

Examining adversarial attacks and model robustness in semantic segmentation.

2025-07-14T09:27:54+00:00 ― 6 min read

Machine Learning New Framework for Explaining AI Decisions

Introducing PACE, a structured approach for trustworthy AI explanations.

2025-07-13T03:51:48+00:00 ― 5 min read

Machine Learning Questionable Practices in Machine Learning Evaluation

An overview of practices undermining trust in machine learning model assessments.

2025-07-12T19:16:54+00:00 ― 6 min read

Computation and Language Evaluating Multimodal Models on VALSE Benchmark

This article examines multimodal models' effectiveness using language and visual data.

2025-07-11T16:17:24+00:00 ― 8 min read

Machine Learning A New Approach to Feature Evaluation in AI Models

Introducing GOAR, a method for better understanding feature importance in AI.

2025-07-11T13:23:36+00:00 ― 5 min read

Computer Vision and Pattern Recognition Addressing Miscalibration in Vision-Language Models

This article tackles miscalibration issues in vision-language models and offers solutions.

2025-07-11T01:08:54+00:00 ― 5 min read

Sound Evaluating Reasoning in Audio-Language Models

This study assesses the reasoning skills of audio-language models with a new task.

2025-07-10T09:54:05+00:00 ― 7 min read

Machine Learning Evaluating Test-Time Adaptation Methods in Machine Learning

A study on improving TTA methods for real-world data variations.

2025-07-10T06:03:24+00:00 ― 7 min read

Computer Vision and Pattern Recognition Evaluating Multimodal Models with MIBench

MIBench tests multimodal models' performance on multiple images.

2025-07-09T14:23:18+00:00 ― 6 min read

Computer Vision and Pattern Recognition Improving OOD Detection with Vision-Language Models

Advancements in detecting out-of-distribution data using new techniques.

2025-07-08T10:44:18+00:00 ― 6 min read

Computation and Language Evaluating Long-Context Language Models with Lifelong ICL

A new method to assess long-context language models’ learning abilities through Task Haystack.

2025-07-08T10:20:36+00:00 ― 7 min read

Machine Learning Evaluating Model Performance on Diverse Tasks

This article analyzes model performance across various tasks and datasets.

2025-07-08T02:42:24+00:00 ― 5 min read

Statistics Theory Evaluating Machine Learning Models: Cross-Validation vs. Plug-In Approach

A look at model evaluation methods and their effectiveness.

2025-07-06T20:49:44+00:00 ― 5 min read

Machine Learning Challenges in Bayesian Deep Learning: The Epistemic Uncertainty Hole

Exploring the issues of epistemic uncertainty in Bayesian Deep Learning methods.

2025-07-06T09:08:48+00:00 ― 5 min read

Artificial Intelligence Evaluating Large Language Models: A Comprehensive Approach

Explore different frameworks and methods for evaluating large language models effectively.

2025-07-06T02:07:18+00:00 ― 6 min read

Computation and Language Streamlining Language Model Evaluation with Metabench

A new benchmarking framework enhances efficiency for evaluating language models.

2025-07-05T16:27:28+00:00 ― 6 min read