DachengZhang commited on
Commit
cf2561e
1 Parent(s): 9fdce6f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -45,7 +45,9 @@ pipeline_tag: text-generation
45
 
46
  # Model Introduction
47
 
48
- - Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI. The base model is trained on 2.5T multilingual corpus, including Chinese, English, Japanese, Korean, etc, and it exhibits superior performance in these languages.
 
 
49
 
50
  - The Orion-14B series models exhibit the following features:
51
  - Among models with 20B-parameter scale level, Orion-14B-Base model shows outstanding performance in comprehensive evaluations.
 
45
 
46
  # Model Introduction
47
 
48
+ - Orion-14B-Chat is fine-tuned from Orion-14B-Base using a high-quality corpus of approximately 850,000 entries (only sft), and it also supports Chinese, English, Japanese, and Korean. It performs exceptionally well on the MT-Bench and AlignBench evaluation sets, significantly surpassing other models of the same parameter scale in multiple metrics.
49
+
50
+ - The 850,000 fine-tuning corpus comprises two parts: approximately 220,000 manually curated high-quality datasets and 630,000 entries selected and semantically deduplicated from open-source data through model filtering. Among these, the Japanese and Korean data, totaling 70,000 entries, have only undergone basic cleaning and deduplication.
51
 
52
  - The Orion-14B series models exhibit the following features:
53
  - Among models with 20B-parameter scale level, Orion-14B-Base model shows outstanding performance in comprehensive evaluations.