WanJuan-CC DatasetWanJuan-CC DatasetOverviewlanguage model training.High-quality data for effectiveComputation and LanguageWanJuan-CC: A New Dataset for Language ModelsA high-quality dataset for training language models from English web content.2025-09-02T21:19:30+00:00 ― 4 min read