OrionStarAI
/

Orion-14B-Chat

Text Generation

Model card Files Files and versions Community

DachengZhang commited on Jan 20

Commit

cf2561e

•

1 Parent(s): 9fdce6f

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -45,7 +45,9 @@ pipeline_tag: text-generation
 # Model Introduction
-- Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI.  The base model is trained on 2.5T multilingual corpus, including Chinese, English, Japanese, Korean, etc, and it exhibits superior performance in these languages.
 - The Orion-14B series models exhibit the following features:
   - Among models with 20B-parameter scale level, Orion-14B-Base model shows outstanding performance in comprehensive evaluations.

 # Model Introduction
+- Orion-14B-Chat is fine-tuned from Orion-14B-Base using a high-quality corpus of approximately 850,000 entries (only sft), and it also supports Chinese, English, Japanese, and Korean. It performs exceptionally well on the MT-Bench and AlignBench evaluation sets, significantly surpassing other models of the same parameter scale in multiple metrics.
+- The 850,000 fine-tuning corpus comprises two parts: approximately 220,000 manually curated high-quality datasets and 630,000 entries selected and semantically deduplicated from open-source data through model filtering. Among these, the Japanese and Korean data, totaling 70,000 entries, have only undergone basic cleaning and deduplication.
 - The Orion-14B series models exhibit the following features:
   - Among models with 20B-parameter scale level, Orion-14B-Base model shows outstanding performance in comprehensive evaluations.