Introducing RePrompt for better language model performance through optimized prompts.
― 6 min read
Cutting edge science explained simply
Introducing RePrompt for better language model performance through optimized prompts.
― 6 min read
A new benchmark evaluates how language models handle text changes.
― 6 min read
User traits influence the responses of language models and their safety.
― 6 min read
A toolkit for assessing performance of retrieval-augmented models in specific domains.
― 9 min read
This study reveals how language models change behavior during training.
― 6 min read
This article examines ways to improve planning abilities in large language models.
― 7 min read
DetectBench evaluates LLMs on their ability to detect hidden evidence in reasoning tasks.
― 5 min read
Examining how neuron activation enhances arithmetic reasoning in large language models.
― 9 min read
A new model generates Czech poetry with improved rhyme and rhythm.
― 6 min read
A new benchmark evaluates reasoning skills in language models.
― 7 min read
A study on how language models generate persuasive rationales for argument evaluation.
― 5 min read
This study assesses the honesty of LLMs in three key areas.
― 5 min read
This article explores how adversaries impact teamwork among language models.
― 12 min read
A comprehensive study on language models’ performance across 10 Indic languages.
― 7 min read
A new method improves code repair for underused programming languages.
― 6 min read
Exploring how attention sinks impact language model performance and introducing a calibration technique.
― 5 min read
RankAdaptor optimizes fine-tuning for pruned AI models, enhancing performance efficiently.
― 8 min read
A study on PlagBench and its role in detecting plagiarism in LLM outputs.
― 4 min read
New dataset assesses LLMs' ability for complex logical reasoning tasks.
― 6 min read
This research investigates how reasoning skills transfer across languages in language models.
― 8 min read
This article discusses how AI models learn from mistakes through self-correction.
― 6 min read
This study evaluates how well LLMs reason about cardinal directions.
― 5 min read
This study assesses how well LLMs handle decision-making in a game setting.
― 8 min read
Study reveals how user traits affect LLM responses and accuracy.
― 8 min read
CharED combines language models for improved performance without shared vocabularies.
― 4 min read
RAGBench introduces a comprehensive dataset for evaluating Retrieval-Augmented Generation systems.
― 6 min read
Exploring fairness issues in AI language models and their implications.
― 8 min read
Introducing a tool to enhance safety in language model interactions.
― 6 min read
This article explores the detection of errors in tools used by language models.
― 5 min read
This article analyzes repetitive structures in text generated by language models.
― 7 min read
A new benchmark assesses how well language models follow multiple instructions in sequence.
― 4 min read
MalAlgoQA dataset evaluates reasoning of Large Language Models in counterfactual scenarios.
― 5 min read
MathCAMPS offers a fresh way to assess mathematical reasoning in language models.
― 9 min read
This work focuses on better number representation using digit embeddings for improved predictions.
― 7 min read
Exploring LLMs' effectiveness in decision-making through Dueling Bandits scenarios.
― 8 min read
A new benchmark for assessing large language models in hypothesis testing.
― 6 min read
CRAB enhances testing for language models in real-world environments.
― 6 min read
Fine-tuning large language models directly on smartphones while protecting user data.
― 6 min read
An overview of mechanistic interpretability in transformer-based language models.
― 7 min read
Exploring how reframing shifts opinions through community discussions.
― 4 min read