A new approach to improve knowledge distillation effectiveness using Sinkhorn distance.
― 5 min read
Cutting edge science explained simply
A new approach to improve knowledge distillation effectiveness using Sinkhorn distance.
― 5 min read
Effective data selection is key to improving language model performance.
― 5 min read