Update README.md

7d070ee verified 5 months ago

1.85 kB

	converted via this PR
	https://github.com/ggerganov/llama.cpp/pull/8604

	original model https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407


	license: apache-2.0
	language:
	- en
	- fr
	- de
	- es
	- it
	- pt
	- ru
	- zh
	- ja
	---

	# Model Card for Mistral-Nemo-Instruct-2407

	The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407). Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.

	For more details about this model please refer to our release [blog post](https://mistral.ai/news/mistral-nemo/).

	## Key features
	- Released under the Apache 2 License
	- Pre-trained and instructed versions
	- Trained with a 128k context window
	- Trained on a large proportion of multilingual and code data
	- Drop-in replacement of Mistral 7B

	## Model Architecture
	Mistral Nemo is a transformer model, with the following architecture choices:
	- Layers: 40
	- Dim: 5,120
	- Head dim: 128
	- Hidden dim: 14,436
	- Activation Function: SwiGLU
	- Number of heads: 32
	- Number of kv-heads: 8 (GQA)
	- Vocabulary size: 2**17 ~= 128k
	- Rotary embeddings (theta = 1M)

	## Metrics

	### Main Benchmarks

	\| Benchmark \| Score \|
	\| --- \| --- \|
	\| HellaSwag (0-shot) \| 83.5% \|
	\| Winogrande (0-shot) \| 76.8% \|
	\| OpenBookQA (0-shot) \| 60.6% \|
	\| CommonSenseQA (0-shot) \| 70.4% \|
	\| TruthfulQA (0-shot) \| 50.3% \|
	\| MMLU (5-shot) \| 68.0% \|
	\| TriviaQA (5-shot) \| 73.8% \|
	\| NaturalQuestions (5-shot) \| 31.2% \|

	### Multilingual Benchmarks (MMLU)

	\| Language \| Score \|
	\| --- \| --- \|
	\| French \| 62.3% \|
	\| German \| 62.7% \|
	\| Spanish \| 64.6% \|
	\| Italian \| 61.3% \|
	\| Portuguese \| 63.3% \|
	\| Russian \| 59.2% \|
	\| Chinese \| 59.0% \|
	\| Japanese \| 59.0% \|

	converted via this PR
	https://github.com/ggerganov/llama.cpp/pull/8604

	original model https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407


	license: apache-2.0
	language:
	- en
	- fr
	- de
	- es
	- it
	- pt
	- ru
	- zh
	- ja
	---

	# Model Card for Mistral-Nemo-Instruct-2407

	The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407). Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.

	For more details about this model please refer to our release [blog post](https://mistral.ai/news/mistral-nemo/).

	## Key features
	- Released under the Apache 2 License
	- Pre-trained and instructed versions
	- Trained with a 128k context window
	- Trained on a large proportion of multilingual and code data
	- Drop-in replacement of Mistral 7B

	## Model Architecture
	Mistral Nemo is a transformer model, with the following architecture choices:
	- Layers: 40
	- Dim: 5,120
	- Head dim: 128
	- Hidden dim: 14,436
	- Activation Function: SwiGLU
	- Number of heads: 32
	- Number of kv-heads: 8 (GQA)
	- Vocabulary size: 2**17 ~= 128k
	- Rotary embeddings (theta = 1M)

	## Metrics

	### Main Benchmarks

	\| Benchmark \| Score \|
	\| --- \| --- \|
	\| HellaSwag (0-shot) \| 83.5% \|
	\| Winogrande (0-shot) \| 76.8% \|
	\| OpenBookQA (0-shot) \| 60.6% \|
	\| CommonSenseQA (0-shot) \| 70.4% \|
	\| TruthfulQA (0-shot) \| 50.3% \|
	\| MMLU (5-shot) \| 68.0% \|
	\| TriviaQA (5-shot) \| 73.8% \|
	\| NaturalQuestions (5-shot) \| 31.2% \|

	### Multilingual Benchmarks (MMLU)

	\| Language \| Score \|
	\| --- \| --- \|
	\| French \| 62.3% \|
	\| German \| 62.7% \|
	\| Spanish \| 64.6% \|
	\| Italian \| 61.3% \|
	\| Portuguese \| 63.3% \|
	\| Russian \| 59.2% \|
	\| Chinese \| 59.0% \|
	\| Japanese \| 59.0% \|