Latest Articles for Benchmark

High Energy Physics - Experiment Adapting High Energy Physics to New Computing Platforms

High energy physics researchers are optimizing software for diverse computing resources.

2025-06-09T06:30:39+00:00 ― 8 min read

Computation and Language A New Method for Selecting Pretraining Data

This approach simplifies choosing effective pretraining datasets for language models.

2025-06-07T09:13:12+00:00 ― 8 min read

Artificial Intelligence Clarifying AI Benchmarks through Cognitive Models

A new approach to assess AI benchmarks for cultural understanding.

2025-06-05T18:24:00+00:00 ― 8 min read

Artificial Intelligence Advancing Simulation Generation for Intelligent Agents

New method generates complete simulations in code from natural language inputs.

2025-06-04T18:34:06+00:00 ― 8 min read

Software Engineering Evaluating LLMs in Software Test Case Generation

This article assesses how well LLMs generate test cases for Java programs.

2025-06-04T15:40:18+00:00 ― 7 min read

Computation and Language Bypassing Toxicity Detection with ASCII Art

Research reveals weaknesses in online toxicity detection using ASCII art techniques.

2025-06-04T05:39:54+00:00 ― 6 min read

Computation and Language The Need for Specialized Embedding Models in Finance

Exploring the performance gap of general models in finance tasks.

2025-06-04T01:03:24+00:00 ― 6 min read

Computation and Language Advancements in Arabic Language Models

Discover the latest improvements in Arabic language processing technology and its impact.

2025-06-01T07:29:12+00:00 ― 6 min read

Computer Vision and Pattern Recognition Mastering Small Object Editing in Digital Images

Learn how technology helps edit tiny details in images effectively.

2025-06-01T01:41:36+00:00 ― 5 min read

Computation and Language Evaluating AI Agents in CRM Systems

A new benchmark tests AI agents in realistic CRM tasks.

2025-05-31T13:19:00+00:00 ― 6 min read

Computation and Language Understanding Data Contamination in Language Models

Data contamination impacts the performance of language models and evaluation methods.

2025-05-29T09:48:09+00:00 ― 6 min read

Computation and Language Improving Language Model Benchmarks for Better Clarity

This article discusses the need for transparency in language model benchmarks.

2025-05-28T21:11:15+00:00 ― 7 min read

Computer Vision and Pattern Recognition Advancing Technology with 3D Audio-Visual Segmentation

Machines learn to connect sound and visuals in 3D spaces.

2025-05-25T21:37:47+00:00 ― 7 min read

Human-Computer Interaction DBenVis: Simplifying Database Benchmarking

Transforming complex benchmark data into clear visual insights.

2025-05-22T00:58:30+00:00 ― 7 min read

Machine Learning Introducing Milabench: A New Benchmark Tool for AI Research

Milabench provides tailored benchmarks to improve AI performance evaluations.

2025-05-20T12:26:06+00:00 ― 5 min read

Computation and Language New Tests Aim to Enhance AI Language Understanding for Ukrainian

Researchers create tools to improve AI's grasp of the Ukrainian language.

2025-05-15T10:57:20+00:00 ― 6 min read

Computation and Language Reassessing the Value of NLI Tasks in Evaluating LLMs

Are NLI tasks still relevant for testing large language models?

2025-05-14T07:05:20+00:00 ― 6 min read

Bioinformatics New Benchmarks in Genetic Research: A Breakthrough in Somatic Mutations

Researchers develop a new benchmark for studying low-frequency somatic mutations in genetics.

2025-05-12T16:35:04+00:00 ― 8 min read

Computation and Language Understanding Causal Inference with Structural Causal Models

A look into causal inference methods and the role of Structural Causal Models.

2025-05-11T07:34:46+00:00 ― 6 min read

Data Structures and Algorithms Navigating the Online Matching Problem

A look into the challenges of matching servers with requests amid uncertainty.

2025-05-09T15:37:20+00:00 ― 6 min read

Computer Vision and Pattern Recognition Evaluating Video Models with VidHal

VidHal benchmarks video models' ability to accurately interpret content.

2025-05-09T03:22:40+00:00 ― 6 min read

High Energy Astrophysical Phenomena Understanding Marshak Waves in Physics

A look into the behavior of Marshak waves under complex conditions.

2025-05-06T20:21:08+00:00 ― 6 min read

Machine Learning Enhancing Offline Reinforcement Learning Through Action Decomposition

This article explores improvements in offline reinforcement learning by breaking down actions.

2025-05-05T01:45:50+00:00 ― 14 min read

Computer Vision and Pattern Recognition Revolutionizing Counting in AI: LVLM-Count

A new method improves counting in images using LVLMs.

2025-04-27T02:38:30+00:00 ― 5 min read

Portfolio Management Smart Choices in Investment Strategies

Learn how investors can make better payoff choices.

2025-04-26T10:21:12+00:00 ― 6 min read

Databases Efficient Database Query Synthesis: Experiment Results

A study on creating efficient document database queries from examples.

2025-04-02T14:24:54+00:00 ― 6 min read

Computer Vision and Pattern Recognition Testing 3D Spatial Reasoning in AI Models

A new benchmark reveals gaps in AI 3D spatial reasoning skills.

2025-03-26T05:51:54+00:00 ― 6 min read

Computation and Language Teaching Llamas to Speak Dutch: A Digital Approach

Researchers adapt language models to improve Dutch fluency, showcasing new techniques.

2025-03-25T09:48:09+00:00 ― 5 min read

Computation and Language Transforming Chart Comprehension in AI

A new benchmark aims to enhance AI's understanding of scientific charts.

2025-03-22T23:13:12+00:00 ― 7 min read

Computation and Language Enhancing Conversational Question Answering: A Clearer Path Ahead

Discover how new methods improve question answering systems for better user experience.

2025-03-13T04:05:15+00:00 ― 6 min read

Computer Vision and Pattern Recognition Machines Learning to See and Read Together

Discover how machines are improving their understanding of images and texts.

2025-03-10T15:54:00+00:00 ― 7 min read

Computation and Language Data Laundering: AI's Hidden Tricks

How AI models can fake their intelligence through manipulation.

2025-03-07T18:53:06+00:00 ― 7 min read

Computation and Language Empowering Low-Resource Languages: A New Approach

A new framework boosts language models for low-resource languages.

2025-03-04T22:40:21+00:00 ― 4 min read

Computer Vision and Pattern Recognition New CG-Bench Sets Standard for Video Understanding

CG-Bench helps machines analyze long videos better with clue-based questions.

2025-03-03T10:33:36+00:00 ― 6 min read

Computation and Language QUENCH: Rethinking Machine Reasoning Through Cultural Context

A new benchmark to test LLM reasoning across cultural backgrounds.

2025-03-01T20:50:33+00:00 ― 7 min read

Computation and Language AI Agents: Can They Replace Humans in Work?

Examining the capabilities and limitations of AI agents in task automation.

2025-02-19T12:52:12+00:00 ― 5 min read

Software Engineering Navigating Faults in Deep Learning Systems

A guide to understanding and addressing faults in deep learning models.

2025-02-09T21:45:00+00:00 ― 5 min read

Software Engineering Boosting Software Issue Resolution with Visual Data

Combining visual data and language models enhances fixing software issues.

2025-01-29T08:05:06+00:00 ― 5 min read

Artificial Intelligence Advancing Document Understanding: New Benchmarks Unveiled

Explore how new benchmarks are transforming document interpretation by AI models.

2025-01-27T18:22:03+00:00 ― 5 min read