Quentin Anthony

A method to improve efficiency in Mixture of Experts models.

2025-09-16T16:43:48+00:00 ― 5 min read

Learn how to improve transformer model efficiency with GPU-friendly design.

2025-09-14T12:03:48+00:00 ― 5 min read

BlackMamba combines state-space models and mixture of experts for efficient language tasks.

2025-09-12T17:55:48+00:00 ― 6 min read

Analyzing GPT-NeoX and LLaMA models for materials science applications.

2025-09-12T12:08:12+00:00 ― 7 min read

Zamba is a hybrid language model combining state-space and transformer architectures.

2025-08-06T22:42:36+00:00 ― 6 min read

Zyda, a dataset with 1.3 trillion tokens, enhances language model training.

2025-08-02T07:50:48+00:00 ― 5 min read

Tree Attention improves efficiency in processing long sequences for machine learning models.

2025-07-01T04:01:00+00:00 ― 5 min read

A study on enhancing data sharing in transformer model training.

2025-06-25T22:08:36+00:00 ― 4 min read

New compression techniques speed up training for large language models while maintaining accuracy.

2025-06-16T23:07:54+00:00 ― 5 min read

RedPajama datasets aim to enhance language model training through transparency and quality data.

2025-05-17T21:13:20+00:00 ― 5 min read