Automated tools like LLMs help in verifying claims efficiently.
― 6 min read
Cutting edge science explained simply
Automated tools like LLMs help in verifying claims efficiently.
― 6 min read
This approach uses self-evaluation to guard against harmful outputs in language models.
― 2 min read
Studying how quantization affects performance in different languages.
― 5 min read
DCoT enhances language model performance through multiple reasoning paths.
― 7 min read
Study reveals how word meanings shift with context and time using word embeddings.
― 5 min read
A new approach to training reward models that aligns with human preferences.
― 5 min read
Adapting prompts to specific models improves performance in language tasks.
― 7 min read
Examining the role of semantic graphs in simplifying sentences with large language models.
― 6 min read
Research explores improving citation text generation using large language models.
― 5 min read
A look into methods and challenges of generating counterfactuals in NLP.
― 5 min read
A study classifies tweets from parents about childhood disorders.
― 5 min read
The study reveals the bias in AI evaluation tools favoring longer responses.
― 4 min read
Examining how users shape toxic language in conversations with large language models.
― 5 min read
A new method improves summarization with limited training data.
― 4 min read
This paper evaluates LLM performance in a Theory of Computing course.
― 5 min read
Exploring how confidence levels are attributed to LLMs and their implications.
― 7 min read
We test language models' reasoning skills using various games, revealing significant limitations.
― 8 min read
A new method simplifies science communication using collaborative language models.
― 5 min read
A new method enhances the efficiency of language models using shared attention weights.
― 5 min read
This study examines how LLMs change information through interactions.
― 5 min read
This paper studies how training influences the predictions of large language models.
― 6 min read
New methods enhance cache management for large language models.
― 5 min read
A detailed look at the MMAU benchmark for language models.
― 5 min read
This article examines how embedding initialization affects transformer model performance.
― 6 min read
This article analyzes the effectiveness and reliability of steering vectors in language models.
― 6 min read
Analyzing the storytelling capabilities of large language models compared to human authors.
― 4 min read
A new benchmark assesses language models on scientific coding challenges across multiple fields.
― 5 min read
Research reveals vulnerabilities in watermarking methods for AI-generated text.
― 12 min read
An examination of how LLMs perform on the Abstraction and Reasoning Corpus.
― 5 min read
An analysis of LLM performance on grid puzzles to assess reasoning abilities.
― 6 min read
This article examines multi-prompt decoding to enhance text generation quality.
― 6 min read
MIBench tests multimodal models' performance on multiple images.
― 6 min read
A new method enhances LLM efficiency in creating complex hardware designs.
― 5 min read
Analyzing the effectiveness of RAG and long-context LLMs in processing text.
― 6 min read
A study on language agents' behavior in a social deduction game.
― 4 min read
A new method to detect and fix factual errors in storytelling.
― 10 min read
A new method enhances math solving skills in smaller language models using DPO and self-training.
― 6 min read
New methods for personalizing AI language models are essential for user diversity.
― 6 min read
A look into how language models handle arithmetic tasks and their learning process.
― 6 min read
A toolkit designed for better evaluation of human-bot interactions.
― 5 min read