Llama 3 zhtw

在 Llama 3 上試驗中文 Continue Pretraining (CP),共計訓練 800M tokens。

由於中文預訓練語料品質還有改進空間,CP 後表現未能超越原版 Llama 3,我們比較幾個開源社群訓練的中文 Llama 3 也有類似狀況。

在英文方面 LLaMA 3 zhtw 使用 FineWeb,使得 MMLU 表現高於其他中文CP模型,能力與原版 LLaMA 3 持平。

Benchmarks

Models ↑ TMMLU+ (ACC) CMMLU (ACC) MMLU (ACC)
TC, Knowledge CN, Knowledge EN, Knowledge
5 shot 5 shot 5 shot
Yi-6B 6B 49.63 75.53 65.35
Qwen-7B 7B 42.84 73.1 61.00
Meta-Llama-3-8B 8B 41.97 50.8 65.17
p208p2002/llama-3-zhtw-8B 8B 41.84 50.6 65.31
Breeze-7B-Base-v0_1 7B 40.35 44.05 61.63
hfl/llama-3-chinese-8b 8B 39.64 50.9 61.1

Recipe

Datasets

Dataset Lang Weight
FineWeb en 0.35
Wudao zh-cn 0.1
C4Tw zh-tw 0.1
WikiZhTw zh-tw 0.15
NdltdT10 zh-tw 0.1
GitHubMarkDown code 0.1
GitHubPython code 0.1

Hyper Parameters

  • Learning Rate: 1e-7
  • Global Batch Size: 60
  • Sequence Length: 8192
Downloads last month
34
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for p208p2002/llama-3-zhtw-8B

Quantizations
1 model

Datasets used to train p208p2002/llama-3-zhtw-8B

Spaces using p208p2002/llama-3-zhtw-8B 6

Collection including p208p2002/llama-3-zhtw-8B