This benchmark assesses the performance of medical language models in healthcare.
― 7 min read
Cutting edge science explained simply
This benchmark assesses the performance of medical language models in healthcare.
― 7 min read
A method to keep AI models updated based on real-world events.
― 6 min read
New benchmark tests MLLMs on social media tasks like misinformation and hate speech.
― 10 min read
RobotScript enhances how robots execute tasks from natural language.
― 7 min read
A fresh perspective on finding hidden threats in hardware design.
― 5 min read
New methods aim to better evaluate reasoning skills in AI language models.
― 6 min read
DyPyBench offers a diverse set of projects for dynamic analysis in Python.
― 6 min read
AI's capability to turn designs into code is reshaping web development.
― 8 min read
Study reveals significant data overlap affecting language model evaluations in code generation.
― 6 min read
Assessing LLM performance through a dedicated benchmark for bio-image analysis.
― 6 min read
A new method for assessing language processing tools shows promise for improvement.
― 5 min read
A method for assessing the transferability of pre-trained models for object detection.
― 4 min read
A resource designed to help robots learn everyday tasks effectively.
― 6 min read
A look at assessing the decision-making capabilities of large language models.
― 7 min read
A framework to improve NLP performance across various language dialects.
― 4 min read
A fresh benchmark uncovers strengths and weaknesses of VLLMs in multimodal tasks.
― 6 min read
Experts gather to discuss Monte Carlo simulations and GPU enhancements.
― 6 min read
New benchmarks reveal strengths and weaknesses of coding language models.
― 3 min read
Meerkat-7B sets a new standard for open-source medical language models.
― 6 min read
New methods improve video summarization using large datasets and advanced models.
― 7 min read
Research reveals challenges LLMs face in understanding long texts and proposes new benchmarks.
― 6 min read
Exploring the design and benefits of a PMU for RISC-V processors used in space.
― 5 min read
This study examines quality problems in prompts for code generation models.
― 4 min read
A new benchmark reveals gaps in visual understanding of large language models.
― 7 min read
A new benchmark improves how we assess LVLMs and their accuracy.
― 5 min read
The CHC competition showcased advances in solvers and their applications in program verification.
― 6 min read
This article explores how to improve the understanding of indirect answers.
― 5 min read
A study evaluating few-shot learning methods for Polish language classification.
― 4 min read
PatentGPT models are designed to address unique challenges in Intellectual Property.
― 4 min read
A study on the effectiveness of SAST tools for smart contracts.
― 8 min read
New benchmarks reveal challenges for MLLMs in real-world tasks with long contexts.
― 7 min read
This article explores the bias in code generation models across different languages.
― 8 min read
An overview of code hallucinations in LLMs and their impact on software development.
― 6 min read
Wake Vision enhances person detection for TinyML with a vast dataset.
― 7 min read
This paper discusses the need for explainability in AI text generation models.
― 6 min read
New benchmark assesses toxicity in large language models across various languages.
― 7 min read
Learn how second order stochastic dominance can enhance your investment strategy.
― 6 min read
A new benchmark assesses LLMs' abilities in mathematical modeling processes.
― 5 min read
Exploring how GPUs enhance the efficiency of Differential Evolution algorithms.
― 5 min read
New benchmark aims to improve AI understanding of text and images.
― 7 min read