Zhongzhi Yu

Exploring how attention sinks impact language model performance and introducing a calibration technique.

2025-07-25T11:02:12+00:00 ― 5 min read

A new framework improves how large language models can work on edge devices.

2025-07-25T10:54:18+00:00 ― 7 min read

KVMerger reduces memory use in language models while maintaining performance through effective state merging.

2025-07-15T02:19:06+00:00 ― 6 min read

A new system enhances adaptability of large language models across devices.

2025-05-22T21:38:15+00:00 ― 5 min read