A new benchmark aims to improve uncertainty assessment in language models.
― 5 min read
Cutting edge science explained simply
A new benchmark aims to improve uncertainty assessment in language models.
― 5 min read
A new method improves model reasoning through structured programming traces.
― 8 min read
Examining how fine-tuning affects safety in language models across various tasks.
― 5 min read
A fresh approach to evaluating ML models using Item Response Theory for better insights.
― 5 min read
Strong baseline models enhance the evaluation of ML systems in healthcare.
― 6 min read
A look at confidence intervals in few-shot learning and their impact on model evaluation.
― 6 min read
Examining the understanding and output accuracy of language models.
― 5 min read
Research highlights using influence functions to enhance PINN performance in physics problems.
― 6 min read
A look into effective dimension and its impact on model training.
― 6 min read
This paper evaluates how well language models explain scientific concepts.
― 4 min read
This article examines GAMs as a solution for predictive performance and interpretability.
― 7 min read
Examining how hard samples affect model performance and the reliability of test accuracy.
― 9 min read
This article examines how different layers affect LLM performance.
― 5 min read
Soft labels can improve machine learning model performance in uncertain data scenarios.
― 6 min read
RepairBench sets benchmarks for comparing AI models in fixing software bugs.
― 5 min read
This method enhances the reliability of language model confidence scores.
― 5 min read
Learn how the applicability domain affects predictive model accuracy in various fields.
― 9 min read
A method to estimate reliability of responses from large language models.
― 4 min read
A new method for testing language models using randomized text.
― 6 min read
A method to improve steering vector effectiveness in language models.
― 5 min read
Explore the impact of shortcut learning on language models and their real-world applications.
― 4 min read
This paper examines methods to compare generative models through embedding-based representations.
― 6 min read
A framework to balance pseudo-label learning in machine learning.
― 5 min read
New tool H-POPE improves accuracy of vision-language models.
― 5 min read
A study on different models' abilities in In-Context Learning.
― 6 min read
A new framework identifies when multimodal models use inappropriate training data.
― 5 min read
This article discusses the need for transparency in language model benchmarks.
― 7 min read
An overview of the strengths and flaws in today's Vision-Language Models.
― 6 min read
A comprehensive study comparing methods for estimating confidence intervals in machine learning models.
― 11 min read
A look at similarity networks to improve fairness in machine learning.
― 6 min read
Learn strategies to improve model performance on imbalanced datasets.
― 7 min read
A guide to understanding AI model performance using the FEET framework.
― 7 min read
A framework for comparing forecasting models using principal components.
― 5 min read
RLInspect helps analyze and improve reinforcement learning models effectively.
― 7 min read
Examining how AI models handle text and images together.
― 7 min read
Exploring how model size affects performance in OOD detection.
― 4 min read
A new method enhances detection of unfamiliar data in deep learning models.
― 7 min read
Are NLI tasks still relevant for testing large language models?
― 6 min read
ICER framework tests safety measures in text-to-image models effectively.
― 7 min read
A study reveals accuracy issues in AI-generated long texts.
― 6 min read