tencent
/

Tencent-Hunyuan-Large

Text Generation

Transformers

Safetensors

English

Inference Endpoints

Model card Files Files and versions Community

svanlin-tencent commited on Nov 1, 2024

Commit

85ae140

1 Parent(s): 6df742a

change rdm

Browse files

Files changed (1) hide show

README.md +27 -10

README.md CHANGED Viewed

@@ -1,21 +1,38 @@
-## 模型介绍
-随着人工智能技术的快速发展，大型语言模型（LLMs）在自然语言处理、计算机视觉和科学任务等领域取得了显著进展。然而，随着模型规模的扩大，如何在保持高性能的同时优化资源消耗成为一个关键挑战。为了应对这一挑战，我们研究了混合专家（MoE）模型，当前亮相的Hunyuan-Large（Hunyuan-MoE-A50B）模型，这是目前业界待开源的基于Transformer的最大MoE模型，拥有3890亿总参数和520亿激活参数。
-本次通过开源Hunyuan-Large的技术成果，我们希望激发更多研究者的创新灵感，共同推动AI技术的进步和应用。欢迎加入我们的开源社区，共同探索和优化未来的AI模型！Hunyuan-Large正式版预计月底正式开源，当前在混元一站式上开放Hunyuan-Large-Preview版本供大家体验。
-### 模型技术优势介绍
-#### 模型
-- **高质量合成数据**：通过合成数据增强训练，Hunyuan-Large能够学习到更丰富的表示，处理长上下文输入，并更好地泛化到未见数据
-- **KV缓存压缩**：采用分组查询注意力（GQA）和跨层注意力（CLA）策略，显著减少了KV缓存的内存占用和计算开销，提高了推理吞吐
-- **专家特定学习率缩放**：为不同专家设置不同的学习率，确保每个子模型都能有效地从数据中学习，并为整体性能做出贡献
-- **长上下文处理能力**：支持高达128K的文本序列，显著提升了长上下文任务的处理能力
-- **广泛的基准测试**：在多种语言和任务上进行广泛实验，验证了Hunyuan-Large的实际应用效果和安全性
 &nbsp;

+### Model Introduction
+With the rapid development of artificial intelligence technology, large language models (LLMs) have made significant progress in fields such as natural language processing, computer vision, and scientific tasks. However, as the scale of these models increases, optimizing resource consumption while maintaining high performance has become a key challenge. To address this challenge, we have explored Mixture of Experts (MoE) models. <span style="background:#fff88f">The currently unveiled Hunyuan-Large (Hunyuan-MoE-A50B) model is the largest open-source Transformer-based MoE model </span>in the industry, featuring a total of 389 billion parameters and <span style="background:#fff88f">50</span> billion active parameters. This is currently the largest open-source Transformer-based MoE model in the industry, featuring a total of 389 billion parameters and 50 billion active parameters.
+By open-sourcing the Hunyuan-Large model and revealing related technical details, we hope to inspire more researchers with innovative ideas and collectively advance the progress and application of AI technology. We welcome you to join our open-source community to explore and optimize future AI models together!
+### Introduction to Model Technical Advantages
+#### Model
+- **High-Quality Synthetic Data**: By enhancing training with synthetic data, Hunyuan-Large can learn richer representations, handle long-context inputs, and generalize better to unseen data.
+- **KV Cache Compression**: Utilizes Grouped Query Attention (GQA) and Cross-Layer Attention (CLA) strategies to significantly reduce memory usage and computational overhead of KV caches, improving inference throughput.
+- **Expert-Specific Learning Rate Scaling**: Sets different learning rates for different experts to ensure each sub-model effectively learns from the data and contributes to overall performance.
+- **Long-Context Processing Capability**: The pre-trained model supports text sequences up to 256K, and the Instruct model supports up to 128K, significantly enhancing the ability to handle long-context tasks.
+- **Extensive Benchmarking**: Conducts extensive experiments across various languages and tasks to validate the practical effectiveness and safety of Hunyuan-Large.
 &nbsp;
+### Benchmark
+### Citation
+If you find our work helpful, feel free to give us a cite.
+```
+@article{Tencent-Hunyuan-Large,
+  title={Hunyuan-Large Technical Report},
+  author={Xingwu Sun, Yanfeng Chen, Yiqing Huang, Ruobing Xie, Jiaqi Zhu, Kai Zhang, Shuaipeng Li Xuemeng Huang, Zhen Yang, Jonny Han, Xiaobo Shu, Jiahao Bu, Zhongzhi Chen, Fengzong Lian Saiyong Yang, Jianfeng Yan, Yuyuan Zeng, Xiaoqin Ren, Chao Yu, Lulu Wu, Yue Mao, Tao Yang Kan Wu, Dengpeng Wu, Guanghu1 Xu, Shaohua Chen, Fusheng Xiang, Shuang Chen, Xiao Feng Yigeng Hong, Junqiang Zheng, Chengcheng Xu, Zongwei Li, Suncong Zheng, Xiong Kuang, Jianglu Hu Dian Jiao, Yiqi Chen, Jinbao Xue, Yangyu Tao, Chengzhong Xu, Winsony Hu, Feng Zhang, Jianshen Zhu Zhanhui Kang, Di Wang, Jie Jiang},
+  journal={arXiv:},
+  year={2024}
+}
+```