Latest Articles for Model Evaluation

Cryptography and Security Evaluating Large Language Models in Cybersecurity

A new benchmark for assessing LLMs in cybersecurity tasks.

2025-08-04T08:33:48+00:00 ― 7 min read

Machine Learning Revisiting Disentanglement in Machine Learning Models

This paper proposes new methods to evaluate information fragmentation in machine learning.

2025-08-04T01:50:54+00:00 ― 7 min read

Machine Learning A New Method for Interpretable AI Models

This paper introduces an approach for creating easy-to-understand AI classifiers.

2025-08-02T21:24:30+00:00 ― 4 min read

Machine Learning Evaluating Self-Supervised Learning in Clustering Tasks

This study examines how well pretrained models cluster unseen data.

2025-08-02T13:14:42+00:00 ― 6 min read

Machine Learning Advancing Machine Unlearning for Contrastive Learning

Introducing new methods to improve forgetting processes in contrastive learning models.

2025-08-02T03:06:24+00:00 ― 6 min read

Machine Learning Addressing Class Imbalance with Support Vector Machines

An overview of SVM techniques for handling class imbalance in machine learning.

2025-08-02T01:00:00+00:00 ― 6 min read

Machine Learning Overcoming Out-of-Distribution Challenges in Machine Learning

Tackling the issues of OOD generalization and feature contamination in AI models.

2025-08-02T00:12:36+00:00 ― 7 min read

Machine Learning Advancements in Sparse Autoencoders for Language Models

This article explores improvements in sparse autoencoders and their impact on language understanding.

2025-08-01T09:19:54+00:00 ― 7 min read

Computer Vision and Pattern Recognition Evaluating Lightweight Backbones for Image Classification

A study on the effectiveness of various lightweight models in image classification.

2025-07-31T17:08:12+00:00 ― 7 min read

Machine Learning New Framework for Assessing Data Poisoning Risks in Machine Learning

Introducing a method to evaluate model resilience against data poisoning attacks.

2025-07-31T07:39:24+00:00 ― 6 min read

Machine Learning Evaluating Java Programming Skills of LLMs

A new benchmark to assess LLMs for Java programming tasks.

2025-07-31T06:52:00+00:00 ― 6 min read

Machine Learning Evaluating Generalization in Machine Learning Models

This article explores strategies for improving model generalization and understanding gradient behavior.

2025-07-30T16:30:54+00:00 ― 7 min read

Computation and Language Evaluating Safety in Multimodal Language Models

A toolkit for assessing the safety of advanced language models.

2025-07-30T14:40:18+00:00 ― 5 min read

Computation and Language Comparing Fine-Tuned Models and Generative AI in Text Classification

This article analyzes the performance of fine-tuned models versus generative AI in text classification tasks.

2025-07-30T02:17:42+00:00 ― 4 min read

Computer Vision and Pattern Recognition Assessing the Robustness of Visual State Space Models

This article examines how Visual State Space Models handle visual challenges.

2025-07-29T11:48:42+00:00 ― 6 min read

Computer Vision and Pattern Recognition Evaluating Multi-Image Reasoning in AI Models

A new data set assesses how LLMs reason with multiple images.

2025-07-29T07:35:54+00:00 ― 5 min read

Computation and Language Evaluating LLMs: Insights into Human Decision-Making

Investigating how LLM predictions align with human choices using statistical modeling.

2025-07-29T05:05:48+00:00 ― 9 min read

Machine Learning Evaluating Reasoning Shortcuts in AI Models

A new benchmark suite helps assess reasoning shortcuts in artificial intelligence.

2025-07-28T23:57:42+00:00 ― 6 min read

Artificial Intelligence Testing Language Models with Multiple Problems

A study evaluates language models on handling multiple tasks simultaneously.

2025-07-28T14:05:12+00:00 ― 7 min read

Computation and Language Evaluating Reasoning Skills in Large Language Models

A study highlights gaps in reasoning abilities of LLMs for math problem solving.

2025-07-28T03:56:54+00:00 ― 6 min read

Artificial Intelligence New Approach to Evaluate Multilingual Models

A fresh method for testing language model safety and multilingual skills.

2025-07-28T02:37:54+00:00 ― 7 min read

Machine Learning Evaluating Feature Selection Methods in Noisy Data

Methods for identifying important features in low-quality data environments.

2025-07-28T00:47:18+00:00 ― 6 min read

Computation and Language Evaluating Unlearning in Language Models

New methods reveal challenges in unlearning knowledge from language models.

2025-07-27T17:24:54+00:00 ― 6 min read

Machine Learning Examining Decision Boundaries in Language Models

A study on the decision-making processes of large language models.

2025-07-27T12:24:42+00:00 ― 4 min read

Machine Learning The Importance of Model Calibration in Machine Learning

A look at how calibration impacts model predictions and reliability.

2025-07-27T09:09:04+00:00 ― 9 min read

Computation and Language The Impact of Long-Context Language Models

Long-context language models streamline complex tasks and improve interaction with AI.

2025-07-27T08:59:18+00:00 ― 7 min read

Computation and Language Assessing Knowledge in Language Models Without Generated Responses

A method to evaluate model knowledge through internal processing.

2025-07-27T05:26:00+00:00 ― 7 min read

Computation and Language Addressing Data Contamination in Language Models

Examining the impact of data contamination on language model performance and evaluation.

2025-07-26T14:25:24+00:00 ― 6 min read

Machine Learning Text-to-Image Models Struggle with Numerical Tasks

This study reveals the limits of text-to-image models in handling numbers.

2025-07-26T12:03:12+00:00 ― 5 min read

Computation and Language Evaluating Cross-Domain Text Classification with Depth

A new metric improves evaluation of text classification models across different domains.

2025-07-26T10:44:12+00:00 ― 7 min read

Computer Vision and Pattern Recognition Evaluating Image Processing Models for Multi-Object Understanding

A deep dive into how well vision models recognize and represent multiple objects.

2025-07-26T06:15:12+00:00 ― 5 min read

Cryptography and Security Detecting Adversarial Inputs in Deep Learning Models

A study on the effectiveness of OOD detectors against adversarial examples.

2025-07-25T18:40:24+00:00 ― 8 min read

Computation and Language Evaluating In-Context Learning in Language Models

Research highlights in-context learning abilities in large language models.

2025-07-25T16:18:12+00:00 ― 6 min read

Information Retrieval Evaluating Retrieval Models with Improved Datasets

A study highlighting the importance of comprehensive annotations for retrieval evaluation.

2025-07-25T06:02:00+00:00 ― 6 min read

Computer Vision and Pattern Recognition Addressing Spurious Bias in Multimodal Models

A new benchmark highlights the risks of spurious bias in multimodal language models.

2025-07-25T01:25:30+00:00 ― 7 min read

Machine Learning Reevaluating Feedback Methods in Image Generation

Investigating fine-grained feedback for text-to-image models and its practical implications.

2025-07-24T23:34:54+00:00 ― 6 min read

Computer Vision and Pattern Recognition Evaluating Hallucinations in Video-Language Models

New benchmark assesses how video-language models handle inaccuracies effectively.

2025-07-24T17:47:18+00:00 ― 6 min read

Computation and Language APIGen: A Tool for Function-Calling Datasets

APIGen generates diverse, high-quality datasets for function-calling agents.

2025-07-24T00:24:30+00:00 ― 5 min read

Computation and Language Addressing Benchmark Contamination in Language Models

A new method to detect biases in language model training.

2025-07-23T22:49:42+00:00 ― 6 min read

Computer Vision and Pattern Recognition Introducing the SAVE Model for Audio-Visual Segmentation

SAVE model enhances audio-visual segmentation with efficiency and precision.

2025-07-23T16:07:20+00:00 ― 6 min read