This article examines a new way to create algorithms with LLMs.
― 5 min read
Cutting edge science explained simply
This article examines a new way to create algorithms with LLMs.
― 5 min read
Learn how seven-valued logic enhances decision-making with multiple criteria.
― 6 min read
A challenge focusing on deep generative models for realistic medical image generation.
― 8 min read
A model assesses the readability of Wikipedia articles across 14 languages.
― 7 min read
A new approach using LLMs to create distractors with minimal human input.
― 3 min read
A new approach to evaluating biases in automated AI evaluation metrics.
― 6 min read
New methods aim to enhance reasoning capabilities in language models.
― 6 min read
New metrics shed light on the limitations of language models in representing reality.
― 7 min read
A new system for assessing language models using real-world data streams.
― 5 min read
Introducing IrokoBench to improve LLM evaluation in African languages.
― 7 min read
The ULS23 Challenge aims to improve tumor segmentation in CT scans for better cancer care.
― 5 min read
A fresh approach improves detection of fake images created by AI.
― 6 min read
A new benchmark aims to assess MLLMs in video understanding across multiple topics.
― 6 min read
This study presents a new method for identifying key training images in AI-generated visuals.
― 7 min read
Exploring the significance of unlearning methods in modern machine learning.
― 5 min read
Examining the key issues in offline MARL and proposing standardized solutions.
― 6 min read
Learn about CGP, its function, advantages, applications, and challenges in programming.
― 5 min read
A new dataset improves coherence in image-text sequences for effective content creation.
― 5 min read
SciEx reveals strengths and challenges of LLMs in scientific evaluation.
― 6 min read
SEACrowd aims to improve AI representation for Southeast Asian languages and cultures.
― 7 min read
A study evaluates language models on handling multiple tasks simultaneously.
― 7 min read
A new benchmark tests LLMs' abilities with structured data formats.
― 6 min read
VCEval offers an automated way to assess online course effectiveness.
― 5 min read
A new benchmark targets compositionality in video understanding and language models.
― 6 min read
A new method enhances testing for language models using real user data.
― 5 min read
The Nemotron-4 340B family delivers powerful models for diverse applications and synthetic data generation.
― 7 min read
Evaluating how language models handle cultural cues in real tasks.
― 7 min read
VideoVista offers a comprehensive evaluation for video question-answering models.
― 5 min read
This article explores methods to enhance the reliability of research artifacts in computing.
― 7 min read
GLM-4 models show improved capabilities in language understanding and generation.
― 8 min read
A study on using LLMs to judge other LLMs and its implications.
― 7 min read
A study on how language models generate persuasive rationales for argument evaluation.
― 5 min read
Two new models aim to improve technology access for Galician speakers.
― 5 min read
Examining the difficulties of translating metaphorical language in machine translation.
― 6 min read
DF40 offers a comprehensive approach to improving deepfake detection methods.
― 6 min read
This study assesses the honesty of LLMs in three key areas.
― 5 min read
Discover how companies enhance their question-answering systems for better user support.
― 4 min read
A study on how AI comprehends algorithms and their implications.
― 6 min read
A new metric improves evaluation of text classification models across different domains.
― 7 min read
Data contamination affects the evaluation of large language models significantly.
― 5 min read