Adam-mini reduces memory usage for training large language models while maintaining performance.
― 6 min read
Cutting edge science explained simply
Adam-mini reduces memory usage for training large language models while maintaining performance.
― 6 min read
A new approach to efficiently solve large-scale linear programming problems.
― 4 min read
MoFO helps large language models retain knowledge during fine-tuning without losing performance.
― 5 min read
Learn how PDQP-Net speeds up solving Convex Quadratic Programs.
― 6 min read