Latest Articles for Evaluation

Computation and Language Combining Approaches for Effective Table-based Question Answering

A new method improves accuracy in answering questions from tables by merging two systems.

2025-06-05T14:34:54+00:00 ― 7 min read

Computation and Language Effective Distractors: Enhancing Multiple-Choice Questions

A new method for generating engaging distractors in educational assessments.

2025-06-05T07:44:06+00:00 ― 5 min read

Human-Computer Interaction Improving Accessibility with Automated Alt-Text Generation

A new method aims to enhance alt-text for mobile app icons to aid visually impaired users.

2025-06-05T04:10:48+00:00 ― 5 min read

Artificial Intelligence Introducing DREAMS: A New Framework for EEG Data Analysis

DREAMS simplifies deep learning for EEG data, promoting transparency and ethical practices.

2025-06-04T22:39:00+00:00 ― 7 min read

Computation and Language Evaluating Faithfulness in AI Explanations

A look into assessing the trustworthiness of AI explanations through adversarial sensitivity.

2025-06-04T21:27:54+00:00 ― 7 min read

Computer Vision and Pattern Recognition New Models Transforming Multimodal AI

Recent models enhance AI's ability to generate and understand various media.

2025-06-04T08:49:30+00:00 ― 5 min read

Machine Learning ARLBench: A New Approach to Hyperparameter Optimization in Reinforcement Learning

ARLBench simplifies hyperparameter tuning for reinforcement learning with efficient benchmarking tools.

2025-06-04T08:02:06+00:00 ― 7 min read

Image and Video Processing Evaluating Segmentation Quality in Medical Imaging

A model to assess segmentation quality without ground truth benchmarks.

2025-06-03T22:35:30+00:00 ― 8 min read

Robotics Enhancing Autonomous Vehicle Safety Through Sensor Data Classification

A method to manage conflicting sensor data in autonomous vehicles for improved safety.

2025-06-03T04:15:12+00:00 ― 5 min read

Audio and Speech Processing Advancements in Neural Codecs with ESPnet-Codec

ESPnet-Codec enhances training and evaluation of neural codecs for audio and speech.

2025-06-03T03:09:30+00:00 ― 7 min read

Databases Safe Data Sharing: A New Approach

A three-step method for secure data sharing while protecting privacy.

2025-06-02T09:33:24+00:00 ― 6 min read

Health Informatics Evaluating Large Language Models in Healthcare: Introducing ClinicBench

New benchmark addresses gaps in assessing LLMs for clinical decision-making.

2025-06-01T19:51:00+00:00 ― 6 min read

Programming Languages Making Debugging Easier with Visualization

Visualizing functional programs can simplify the debugging process for programmers.

2025-06-01T18:40:42+00:00 ― 7 min read

Human-Computer Interaction Generative AI in Design: A New Approach

Exploring how Generative AI is influencing interaction design processes.

2025-05-31T19:46:06+00:00 ― 5 min read

Computation and Language Analyzing Values in Texts

This study examines values in human and AI-generated texts for better understanding.

2025-05-30T00:09:27+00:00 ― 3 min read

Bioinformatics Understanding the Impact of Network Biology

NetworkCommons is a new tool for studying molecular interactions.

2025-05-28T22:38:52+00:00 ― 7 min read

Machine Learning Improving Language Models Through Self-Training

A new framework enhances reasoning in language models with quality rationales.

2025-05-26T10:14:42+00:00 ― 7 min read

Computer Vision and Pattern Recognition Evaluating AI's Understanding of Spatial Relations

A study compares AI models in grasping spatial relationships.

2025-05-24T20:50:06+00:00 ― 6 min read

Cryptography and Security Navigating the Challenges of Vision Large Language Models

Examining the vulnerabilities and defenses of new AI models.

2025-05-23T22:25:57+00:00 ― 7 min read

Computation and Language Assessing Toxic Language Detection in Dialects

Examining how well models detect toxic comments across various language dialects.

2025-05-21T20:24:27+00:00 ― 7 min read

Computer Vision and Pattern Recognition MTFusion: A New Approach to 3D Modeling

MTFusion combines images and text for advanced 3D model creation.

2025-05-20T18:44:33+00:00 ― 6 min read

Medical Education Rethinking Medical School Admissions

A look at holistic admissions and its impact on future doctors.

2025-05-20T15:45:18+00:00 ― 6 min read

Graphics Innovative Material Synthesis for Digital Visuals

A new method for creating realistic materials enhances flexibility for artists and designers.

2025-05-20T13:57:27+00:00 ― 6 min read

Computer Vision and Pattern Recognition Addressing Bias in Vision-Language Models

A new approach tackles biases in image-text models effectively.

2025-05-18T13:16:00+00:00 ― 7 min read

Software Engineering Evaluating Language Models for Coding Assistance

Assessing language models' effectiveness in coding tasks with new benchmarks.

2025-05-15T17:42:40+00:00 ― 5 min read

Computation and Language Addressing Hallucinations in Language Models

Understanding how Knowledge Graphs can reduce false information in AI responses.

2025-05-14T12:34:40+00:00 ― 6 min read

Computer Vision and Pattern Recognition New Method Improves Attribution Map Evaluation

A fresh approach to evaluating AI decision-making models using attribution maps.

2025-05-12T12:26:40+00:00 ― 7 min read

Artificial Intelligence The Growing Importance of Human-AI Collaboration

Examining how humans and AI can work together effectively.

2025-05-11T14:04:00+00:00 ― 9 min read

Computation and Language Evaluating with Large Language Models: Pros and Cons

An overview of how LLMs enhance evaluation processes while addressing key challenges.

2025-05-11T11:57:20+00:00 ― 7 min read

Artificial Intelligence Can LLMs Judge Creativity Fairly?

This study examines how well LLMs assess creativity in the Alternative Uses Test.

2025-05-11T10:16:00+00:00 ― 5 min read

Machine Learning STAR: A New Approach to AI Model Design

STAR automates AI model building for smarter and faster results.

2025-05-07T04:30:40+00:00 ― 7 min read

Computation and Language ER 2Score: A New Way to Evaluate Radiology Reports

ER 2Score improves the quality assessment of automated radiology reports.

2025-05-05T22:57:20+00:00 ― 5 min read

Computer Vision and Pattern Recognition PhyT2V: Making Video Creation Real

Transforming text prompts into realistic videos by incorporating physical laws.

2025-04-30T02:21:20+00:00 ― 6 min read

Computation and Language Evaluating Language Models: Consistency Matters

Are large language models reliable evaluators? Exploring consistency in their assessments.

2025-04-29T21:17:20+00:00 ― 7 min read

Computation and Language ChemTEB: A New Benchmark for Chemical Text Embeddings

ChemTEB helps improve chemical text processing by evaluating specialized models.

2025-04-29T20:26:40+00:00 ― 8 min read

Computer Vision and Pattern Recognition AgriBench: The Future of Farming Technology

AgriBench evaluates AI tools to support smarter farming decisions.

2025-04-29T14:57:20+00:00 ― 8 min read

Computation and Language Evaluating Large Language Models: A New Approach

Learn how SelfPrompt helps assess the strength of language models effectively.

2025-04-27T12:04:45+00:00 ― 3 min read

Artificial Intelligence Unmasking Sandbagging: The Hidden Risks of AI

Learn how sandbagging affects AI assessments and ways to detect it.

2025-04-25T09:07:00+00:00 ― 6 min read

Computation and Language Making Sinhala Text Easier to Read

Learn how researchers simplify Sinhala texts for better understanding.

2025-04-23T08:02:30+00:00 ― 7 min read

Software Engineering Revolutionizing Software Testing with TDD-Bench

TDD-Bench enhances automated test generation for developers using TDD methods.

2025-04-21T20:10:45+00:00 ― 7 min read