Latest Articles for Language Models

Computation and Language Optimizing Prompts for Language Model Agents

Introducing RePrompt for better language model performance through optimized prompts.

2025-07-28T08:41:18+00:00 ― 6 min read

Computation and Language RUPBench: Assessing Robustness in Language Models

A new benchmark evaluates how language models handle text changes.

2025-07-28T07:06:30+00:00 ― 6 min read

Computation and Language The Impact of User Personas on AI Responses

User traits influence the responses of language models and their safety.

2025-07-27T23:36:12+00:00 ― 6 min read

Computation and Language Evaluating Retrieval-Augmented Large Language Models

A toolkit for assessing performance of retrieval-augmented models in specific domains.

2025-07-27T18:28:06+00:00 ― 9 min read

Machine Learning Detecting Phase Changes in Language Models

This study reveals how language models change behavior during training.

2025-07-27T13:18:06+00:00 ― 6 min read

Computation and Language Enhancing Planning Skills in Language Models

This article examines ways to improve planning abilities in large language models.

2025-07-27T08:35:36+00:00 ― 7 min read

Computation and Language DetectBench: A New Standard for Evidence Detection in Language Models

DetectBench evaluates LLMs on their ability to detect hidden evidence in reasoning tasks.

2025-07-27T05:02:18+00:00 ― 5 min read

Artificial Intelligence Neuron Activation and Arithmetic Reasoning in LLMs

Examining how neuron activation enhances arithmetic reasoning in large language models.

2025-07-27T00:17:54+00:00 ― 9 min read

Computation and Language Advancing Poetry Generation in Czech

A new model generates Czech poetry with improved rhyme and rhythm.

2025-07-26T22:43:06+00:00 ― 6 min read

Computation and Language Assessing Reasoning in Language Models

A new benchmark evaluates reasoning skills in language models.

2025-07-26T22:11:30+00:00 ― 7 min read

Computation and Language Rationales in Argument Ranking by Language Models

A study on how language models generate persuasive rationales for argument evaluation.

2025-07-26T20:52:30+00:00 ― 5 min read

Computation and Language Evaluating Honesty in Large Language Models

This study assesses the honesty of LLMs in three key areas.

2025-07-26T14:33:18+00:00 ― 5 min read

Computation and Language The Challenges of Collaboration Among Language Models

This article explores how adversaries impact teamwork among language models.

2025-07-26T11:00:00+00:00 ― 12 min read

Computation and Language Evaluating Multilingual Language Models in Indic Languages

A comprehensive study on language models’ performance across 10 Indic languages.

2025-07-25T17:37:12+00:00 ― 7 min read

Machine Learning Advancing Code Repair Techniques for Less Common Languages

A new method improves code repair for underused programming languages.

2025-07-25T15:07:06+00:00 ― 6 min read

Machine Learning Attention Sinks in Language Models

Exploring how attention sinks impact language model performance and introducing a calibration technique.

2025-07-25T11:02:12+00:00 ― 5 min read

Computation and Language RankAdaptor: A New Frontier in Model Compression

RankAdaptor optimizes fine-tuning for pruned AI models, enhancing performance efficiently.

2025-07-25T10:30:36+00:00 ― 8 min read

Computation and Language Addressing Plagiarism in Large Language Models

A study on PlagBench and its role in detecting plagiarism in LLM outputs.

2025-07-25T09:43:12+00:00 ― 4 min read

Computation and Language Evaluating Multi-Step Logical Reasoning in Language Models

New dataset assesses LLMs' ability for complex logical reasoning tasks.

2025-07-25T01:57:06+00:00 ― 6 min read

Computation and Language Examining Language Transfer in Reasoning Tasks

This research investigates how reasoning skills transfer across languages in language models.

2025-07-24T21:20:36+00:00 ― 8 min read

Machine Learning The Role of Self-Correction in AI Language Models

This article discusses how AI models learn from mistakes through self-correction.

2025-07-24T21:04:16+00:00 ― 6 min read

Computation and Language Assessing Large Language Models' Understanding of Cardinal Directions

This study evaluates how well LLMs reason about cardinal directions.

2025-07-24T19:53:42+00:00 ― 5 min read

Computation and Language Evaluating LLMs in Sequential Decision-Making Through UNO Arena

This study assesses how well LLMs handle decision-making in a game setting.

2025-07-24T18:34:42+00:00 ― 8 min read

Computation and Language Examining Language Model Performance Across User Groups

Study reveals how user traits affect LLM responses and accuracy.

2025-07-24T12:47:06+00:00 ― 8 min read

Computation and Language CharED: A New Method for Language Model Enhancement

CharED combines language models for improved performance without shared vocabularies.

2025-07-24T05:48:24+00:00 ― 4 min read

Computation and Language Advancements in RAG Systems: A New Evaluation Framework

RAGBench introduces a comprehensive dataset for evaluating Retrieval-Augmented Generation systems.

2025-07-24T05:24:42+00:00 ― 6 min read

Computation and Language Fairness Challenges in Large Language Models

Exploring fairness issues in AI language models and their implications.

2025-07-24T05:13:00+00:00 ― 8 min read

Computation and Language New Moderation Tool for Language Models

Introducing a tool to enhance safety in language model interactions.

2025-07-24T00:08:42+00:00 ― 6 min read

Computation and Language Addressing Silent Errors in Language Model Tools

This article explores the detection of errors in tools used by language models.

2025-07-23T09:39:42+00:00 ― 5 min read

Computation and Language Examining Syntactic Templates in Language Models

This article analyzes repetitive structures in text generated by language models.

2025-07-23T01:29:54+00:00 ― 7 min read

Computation and Language Evaluating Sequential Instruction Following in LLMs

A new benchmark assesses how well language models follow multiple instructions in sequence.

2025-07-22T21:32:54+00:00 ― 4 min read

Computation and Language Testing Large Language Models with MalAlgoQA

MalAlgoQA dataset evaluates reasoning of Large Language Models in counterfactual scenarios.

2025-07-22T07:35:30+00:00 ― 5 min read

Artificial Intelligence MathCAMPS: A New Approach to Evaluating Language Models

MathCAMPS offers a fresh way to assess mathematical reasoning in language models.

2025-07-22T06:56:00+00:00 ― 9 min read

Computation and Language Improving Numerical Representation in Language Models

This work focuses on better number representation using digit embeddings for improved predictions.

2025-07-22T06:48:06+00:00 ― 7 min read

Machine Learning Evaluating Large Language Models in Dueling Bandits

Exploring LLMs' effectiveness in decision-making through Dueling Bandits scenarios.

2025-07-21T23:41:30+00:00 ― 8 min read

Computation and Language Evaluating Language Models for Scientific Research

A new benchmark for assessing large language models in hypothesis testing.

2025-07-21T19:52:24+00:00 ― 6 min read

Artificial Intelligence Introducing CRAB: A New Benchmark for Language Models

CRAB enhances testing for language models in real-world environments.

2025-07-21T18:41:18+00:00 ― 6 min read

Machine Learning Advancing On-Device Fine-Tuning for Language Models

Fine-tuning large language models directly on smartphones while protecting user data.

2025-07-21T08:40:54+00:00 ― 6 min read

Artificial Intelligence Decoding Mechanistic Interpretability in Transformer Models

An overview of mechanistic interpretability in transformer-based language models.

2025-07-21T02:05:54+00:00 ― 7 min read

Computation and Language Reframing Perspectives: Insights from r/ChangeMyView

Exploring how reframing shifts opinions through community discussions.

2025-07-21T01:34:18+00:00 ― 4 min read