Researchers analyze the predictability of language model performance as training compute scales.
― 6 min read
Cutting edge science explained simply
Researchers analyze the predictability of language model performance as training compute scales.
― 6 min read
A look at backdoor attacks and defenses in deep learning models.
― 6 min read
This paper assesses the efficiency of generated code from various models.
― 6 min read
This article presents a benchmark to assess large language models with complex tasks.
― 6 min read
This study assesses large language models' capabilities in complex planning scenarios.
― 6 min read
Research examines the use of VLMs to assess robot actions.
― 7 min read
Exploring the role of large language models in molecular science.
― 7 min read
Exploring methods to improve robot performance in unpredictable environments.
― 4 min read
AV-SUPERB evaluates audio and visual models across various tasks for better performance.
― 5 min read
New tools improve how systems retrieve information from long documents.
― 4 min read
This benchmark assesses the performance of medical language models in healthcare.
― 7 min read
A method to keep AI models updated based on real-world events.
― 6 min read
New benchmark tests MLLMs on social media tasks like misinformation and hate speech.
― 10 min read
RobotScript enhances how robots execute tasks from natural language.
― 7 min read
A fresh perspective on finding hidden threats in hardware design.
― 5 min read
New methods aim to better evaluate reasoning skills in AI language models.
― 6 min read
DyPyBench offers a diverse set of projects for dynamic analysis in Python.
― 6 min read
AI's capability to turn designs into code is reshaping web development.
― 8 min read
Study reveals significant data overlap affecting language model evaluations in code generation.
― 6 min read
Assessing LLM performance through a dedicated benchmark for bio-image analysis.
― 6 min read
A new method for assessing language processing tools shows promise for improvement.
― 5 min read
A method for assessing the transferability of pre-trained models for object detection.
― 4 min read
A resource designed to help robots learn everyday tasks effectively.
― 6 min read
A look at assessing the decision-making capabilities of large language models.
― 7 min read
A framework to improve NLP performance across various language dialects.
― 4 min read
A fresh benchmark uncovers strengths and weaknesses of VLLMs in multimodal tasks.
― 6 min read
Experts gather to discuss Monte Carlo simulations and GPU enhancements.
― 6 min read
New benchmarks reveal strengths and weaknesses of coding language models.
― 3 min read
Meerkat-7B sets a new standard for open-source medical language models.
― 6 min read
New methods improve video summarization using large datasets and advanced models.
― 7 min read
Research reveals challenges LLMs face in understanding long texts and proposes new benchmarks.
― 6 min read
Exploring the design and benefits of a PMU for RISC-V processors used in space.
― 5 min read
This study examines quality problems in prompts for code generation models.
― 4 min read
A new benchmark reveals gaps in visual understanding of large language models.
― 7 min read
A new benchmark improves how we assess LVLMs and their accuracy.
― 5 min read
The CHC competition showcased advances in solvers and their applications in program verification.
― 6 min read
This article explores how to improve the understanding of indirect answers.
― 5 min read
A study evaluating few-shot learning methods for Polish language classification.
― 4 min read
PatentGPT models are designed to address unique challenges in Intellectual Property.
― 4 min read
A study on the effectiveness of SAST tools for smart contracts.
― 8 min read