characharm commited on
Commit
84c3919
1 Parent(s): 392d642

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -0
README.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ converted via this PR
2
+ https://github.com/ggerganov/llama.cpp/pull/8604
3
+
4
+
5
+
6
+ license: apache-2.0
7
+ language:
8
+ - en
9
+ - fr
10
+ - de
11
+ - es
12
+ - it
13
+ - pt
14
+ - ru
15
+ - zh
16
+ - ja
17
+ ---
18
+
19
+ # Model Card for Mistral-Nemo-Instruct-2407
20
+
21
+ The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407). Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.
22
+
23
+ For more details about this model please refer to our release [blog post](https://mistral.ai/news/mistral-nemo/).
24
+
25
+ ## Key features
26
+ - Released under the **Apache 2 License**
27
+ - Pre-trained and instructed versions
28
+ - Trained with a **128k context window**
29
+ - Trained on a large proportion of **multilingual and code data**
30
+ - Drop-in replacement of Mistral 7B
31
+
32
+ ## Model Architecture
33
+ Mistral Nemo is a transformer model, with the following architecture choices:
34
+ - **Layers:** 40
35
+ - **Dim:** 5,120
36
+ - **Head dim:** 128
37
+ - **Hidden dim:** 14,436
38
+ - **Activation Function:** SwiGLU
39
+ - **Number of heads:** 32
40
+ - **Number of kv-heads:** 8 (GQA)
41
+ - **Vocabulary size:** 2**17 ~= 128k
42
+ - **Rotary embeddings (theta = 1M)**
43
+
44
+ ## Metrics
45
+
46
+ ### Main Benchmarks
47
+
48
+ | Benchmark | Score |
49
+ | --- | --- |
50
+ | HellaSwag (0-shot) | 83.5% |
51
+ | Winogrande (0-shot) | 76.8% |
52
+ | OpenBookQA (0-shot) | 60.6% |
53
+ | CommonSenseQA (0-shot) | 70.4% |
54
+ | TruthfulQA (0-shot) | 50.3% |
55
+ | MMLU (5-shot) | 68.0% |
56
+ | TriviaQA (5-shot) | 73.8% |
57
+ | NaturalQuestions (5-shot) | 31.2% |
58
+
59
+ ### Multilingual Benchmarks (MMLU)
60
+
61
+ | Language | Score |
62
+ | --- | --- |
63
+ | French | 62.3% |
64
+ | German | 62.7% |
65
+ | Spanish | 64.6% |
66
+ | Italian | 61.3% |
67
+ | Portuguese | 63.3% |
68
+ | Russian | 59.2% |
69
+ | Chinese | 59.0% |
70
+ | Japanese | 59.0% |