A new approach to make language models smaller and faster using 1-bit quantization.
― 7 min read
Cutting edge science explained simply
A new approach to make language models smaller and faster using 1-bit quantization.
― 7 min read
ProSparse improves activation sparsity in LLMs for better efficiency and performance.
― 7 min read
Examining grokking, double descent, and emergent abilities in deep learning models.
― 6 min read
The Yi model family showcases strong language and multimodal processing capabilities.
― 4 min read
New model improves image processing in multimodal systems.
― 7 min read
EREN enhances the accuracy of language models through effective editing techniques.
― 5 min read
New models enhance reasoning skills across various tasks, improving AI performance.
― 6 min read
UltraMedical collections improve medical language models and address data shortages.
― 6 min read
GUICourse aims to improve interaction with digital interfaces through targeted datasets for GUI agents.
― 4 min read
This article discusses new approaches to improve predictions in chemical reactions using technology.
― 8 min read
Examining how LLMs can add numbers without explicit steps.
― 6 min read
Research aims to develop language models with unique personalities for better human-like interactions.
― 8 min read
A new framework enhances evaluation of RAG systems in specialized domains.
― 8 min read
MiniCPM-V enhances AI use on mobile devices with efficiency and performance.
― 6 min read
A new approach to tokenization enhances analysis of ancient scripts.
― 6 min read
Exploring the efficiency and adaptability of language models through modular design.
― 6 min read
Exploring activation sparsity to improve language model efficiency.
― 5 min read
KBAlign helps machines learn faster and more effectively through self-questioning techniques.
― 5 min read
Discover how reward models are changing the way machines learn and perform.
― 7 min read
Explore how large language models are becoming more efficient and accessible.
― 7 min read
A new method combines autoregressive and diffusion models for better media generation.
― 7 min read