Latest Articles for Evaluation

Computer Vision and Pattern Recognition Improving Point Tracking in Videos

A new method enhances point tracking accuracy and efficiency in video processing.

2025-07-08T17:35:06+00:00 ― 5 min read

Software Engineering Enhancing Action Categorization for Developers

A tool improves action categorization, aiding developer efficiency in workflows.

2025-07-08T13:38:06+00:00 ― 4 min read

Optimization and Control Advancements in Stress Minimization Techniques

A new method improves structural design by minimizing stress effectively.

2025-07-08T04:02:17+00:00 ― 5 min read

Computation and Language Tackling Hallucinations in Language Models

A new benchmark evaluates LLMs for factual accuracy.

2025-07-07T18:08:54+00:00 ― 6 min read

Computation and Language New Method for Evaluating Title Sets in Document Collections

A novel approach for faster title set evaluation without human references.

2025-07-07T16:26:12+00:00 ― 7 min read

Computation and Language Evaluating Persona Agents: A New Framework

A fresh approach to assess persona agents using language models.

2025-07-07T06:17:54+00:00 ― 6 min read

Machine Learning Addressing Fairness in Machine Learning Models

Evaluating machine learning models to ensure fairness across diverse populations.

2025-07-07T03:30:16+00:00 ― 5 min read

Computation and Language Dallah: A New Tool for Arabic Dialects

Dallah supports Arabic dialects, improving communication in text and images.

2025-07-07T01:33:30+00:00 ― 6 min read

Computation and Language Evaluating Language Models: A New Toolkit

A toolkit designed for better evaluation of human-bot interactions.

2025-07-06T18:11:06+00:00 ― 5 min read

Information Retrieval Evaluating Information Retrieval Systems with AI Annotations

Using AI-generated relevance marks for efficient evaluation of information retrieval systems.

2025-07-06T13:19:08+00:00 ― 7 min read

Machine Learning New Benchmark Method for Evaluating Reinforcement Learning Algorithms

A novel approach enhances comparisons of reinforcement learning algorithms across diverse environments.

2025-07-06T13:03:00+00:00 ― 7 min read

Sound Assessing Music Understanding with MuChoMusic Benchmark

A new benchmark to evaluate models analyzing music and language.

2025-07-06T05:29:45+00:00 ― 6 min read

Artificial Intelligence Evaluating Large Language Models: A Comprehensive Approach

Explore different frameworks and methods for evaluating large language models effectively.

2025-07-06T02:07:18+00:00 ― 6 min read

Machine Learning Evaluating Interpretability Methods for AI Decisions

A new approach to assess the reliability of methods explaining AI decision-making.

2025-07-06T01:51:30+00:00 ― 7 min read

Multimedia AxiomVision: Transforming Video Analytics for Dynamic Environments

AxiomVision offers a new approach to video analysis, enhancing performance in changing conditions.

2025-07-05T14:40:00+00:00 ― 6 min read

Machine Learning Evaluating Explainable AI: The Rise of BEExAI

A new tool for assessing explainability methods in AI systems.

2025-07-05T10:43:00+00:00 ― 8 min read

Machine Learning Standardizing Backdoor Learning Evaluation: BackdoorBench

BackdoorBench offers a unified approach to assess backdoor learning methods in deep neural networks.

2025-07-05T09:47:42+00:00 ― 7 min read

Computation and Language Evaluating Zero-Shot Capabilities of Multimodal LLMs

An assessment of multimodal LLMs' zero-shot performance across various tasks.

2025-07-05T08:36:36+00:00 ― 5 min read

Human-Computer Interaction AI-Driven Tool Streamlines Questionnaire Translation

A new tool improves the process of translating questionnaires across languages.

2025-07-04T18:07:36+00:00 ― 4 min read

Computation and Language Evaluating Logical Reasoning in Large Language Models

Study assesses the reasoning skills of large language models with complex questions.

2025-07-04T17:20:12+00:00 ― 5 min read

Applications VIEWS Prediction Challenge: Forecasting Conflict Fatalities

A challenge to predict deaths in armed conflicts with a focus on uncertainty.

2025-07-04T02:04:24+00:00 ― 7 min read

Materials Science Harnessing LLMs for Structured Data in Materials Science

Discover how LLMs can streamline data extraction in materials science.

2025-07-04T01:40:18+00:00 ― 7 min read

Human-Computer Interaction Integrating LLMs into Knowledge Engineering Practices

Exploring the role and challenges of LLMs in knowledge engineering.

2025-07-03T12:45:54+00:00 ― 7 min read

Computation and Language Improving AI Responses with Retrieval-Augmented Generation

A new framework enhances language models by integrating external data for better accuracy.

2025-07-02T02:24:00+00:00 ― 5 min read

Cryptography and Security Comidds: A New Resource for Intrusion Detection Datasets

Comidds offers updated information on datasets for intrusion detection research.

2025-07-02T01:44:30+00:00 ― 5 min read

Information Retrieval Workshop on Large Language Models in Information Retrieval

Researchers discuss the impact of LLMs on evaluating information retrieval systems.

2025-06-30T04:26:54+00:00 ― 5 min read

Information Retrieval The Role of Coding Assistants in Modern Development

Learn how coding assistants help developers enhance coding efficiency.

2025-06-30T03:39:30+00:00 ― 5 min read

Computation and Language Improving Evaluation Methods for Machine Reading Comprehension

New methods offer better evaluation of language understanding in models.

2025-06-29T22:47:12+00:00 ― 6 min read

Computation and Language Improving Model Fusion with ProFuser

A new method to combine language models more effectively.

2025-06-29T22:23:30+00:00 ― 6 min read

Image and Video Processing Advancements in Early Detection of Oral Cancer

Utilizing deep learning to improve early detection of oral squamous cell carcinoma.

2025-06-29T18:01:45+00:00 ― 6 min read

Software Engineering Addressing Analysability in Hybrid Quantum Software

This research focuses on improving the quality of hybrid quantum software through analysability.

2025-06-28T16:32:21+00:00 ― 6 min read

Computer Vision and Pattern Recognition Evaluating MLLMs with MathScape

MathScape enhances evaluation of MLLMs with visual and textual math problems.

2025-06-28T00:02:42+00:00 ― 5 min read

Computation and Language Inductive Learning with Large Language Models

Exploring the use of LLMs in inductive logic programming.

2025-06-27T17:43:30+00:00 ― 6 min read

Computation and Language Generating Realistic Online Discussions with Synthetic Data

A structured method to create synthetic conversations using language models.

2025-06-27T13:46:30+00:00 ― 6 min read

Computation and Language New Benchmark Evaluates Legal Knowledge in Arabic Language Models

ArabLegalEval assesses LLMs' performance in handling Arabic legal information.

2025-06-27T05:52:30+00:00 ― 6 min read

Information Retrieval VERA: A Framework for Evaluating RAG Systems

Discover how VERA improves RAG system evaluation accuracy and efficiency.

2025-06-27T04:33:30+00:00 ― 10 min read

Machine Learning Evaluating Large Language Models for Real-World Use

A new approach to assess LLMs with diverse evaluation sets.

2025-06-26T22:53:48+00:00 ― 6 min read

Computation and Language Addressing Format Bias in Language Models

This article examines how format bias affects language model performance and suggests improvement strategies.

2025-06-26T20:23:42+00:00 ― 6 min read

Information Retrieval Hindi-BEIR: A Benchmark for Hindi Information Retrieval

Hindi-BEIR aims to improve information retrieval systems for Hindi content.

2025-06-26T06:50:00+00:00 ― 5 min read

Computation and Language Aligning Language Models with Online Communities

Exploring methods to align LLMs with online groups for better insights.

2025-06-26T04:59:24+00:00 ― 6 min read