edobobo commited on
Commit
7145d69
1 Parent(s): 699c973

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -12
README.md CHANGED
@@ -53,7 +53,7 @@ This model is part of the Minerva LLM family:
53
 
54
  *This section identifies foreseeable harms and misunderstandings.*
55
 
56
- This is a chat foundation model, subject to some degree of alignment. However, the model may still:
57
 
58
  - Overrepresent some viewpoints and underrepresent others
59
  - Contain stereotypes
@@ -87,7 +87,7 @@ pipeline = transformers.pipeline(
87
  )
88
 
89
  # Input text for the model.
90
- input_conv = [{"role": "user", "content": "Qualle è la capitale dell'Italia?"}]
91
 
92
  # Compute the outputs.
93
  output = pipeline(
@@ -103,7 +103,7 @@ output
103
 
104
  ## Model Architecture
105
 
106
- Minerva-7B-base-v1.0 is a Transformer model based on the Mistral architecture.
107
  Please look at the configuration file for a detailed breakdown of the hyperparameters we chose for this model.
108
 
109
  The Minerva LLM family is composed of:
@@ -132,23 +132,23 @@ The SFT model was trained using [Llama-Factory](https://github.com/hiyouga/LLaMA
132
 
133
  | Dataset | Source | Code | English | Italian |
134
  |--------------------------------------|------------------------------------------------------------------------|----------|---------|---------|
 
 
135
  | Alpaca-cleaned | [Link](https://huggingface.co/datasets/yahma/alpaca-cleaned) | 0 | 50,000 | 0 |
136
  | Databricks-dolly-15k | [Link](https://huggingface.co/datasets/databricks/databricks-dolly-15k) | 0 | 15,011 | 0 |
137
  | No-robots | [Link](https://huggingface.co/datasets/HuggingFaceH4/no_robots) | 0 | 9,499 | 0 |
138
  | OASST2 | [Link](https://huggingface.co/datasets/OpenAssistant/oasst2) | 0 | 29,000 | 528 |
139
- | Tower-blocks_it | [Link](https://huggingface.co/datasets/sapienzanlp/tower_blocks-v0.2_it) | 0 | 0 | 7,276 |
140
- | Glaive-code-assistant | [Link](https://huggingface.co/datasets/glaiveai/glaive-code-assistant) | 100,000 | 0 | 0 |
141
- | Alpaca-python | [Link](https://huggingface.co/datasets/Vezora/Tested-143k-Python-Alpaca) | 20,000 | 0 | 0 |
142
  | WizardLM | [Link](https://huggingface.co/datasets/WizardLMTeam/WizardLM_evol_instruct_70k) | 0 | 29,810 | 0 |
143
  | LIMA | [Link](https://huggingface.co/datasets/GAIR/lima?row=0) | 0 | 1,000 | 0 |
144
  | OPENORCA | [Link](https://huggingface.co/datasets/Open-Orca/OpenOrca) | 0 | 30,000 | 0 |
145
  | Ultrachat | [Link](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) | 0 | 50,000 | 0 |
146
  | MagpieMT | [Link](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-MT-300K-v0.1) | 0 | 30,000 | 0 |
147
  | Tulu-V2-Science | [Link](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture) | 0 | 7,000 | 0 |
 
 
148
  | Bactrian-X | [Link](https://huggingface.co/datasets/MBZUAI/Bactrian-X) | 0 | 0 | 67,000 |
149
  | Magpie (*Translated by us*) | - | 0 | 0 | 60,000 |
150
  | Everyday-conversations (*Translated by us*) | - | 0 | 0 | 2,260 |
151
- | Aya_datasets | [Link](http://CohereForAI/aya_dataset) | 0 | 3,944 | 738 |
152
  | alpaca-gpt4-it | [Link](https://huggingface.co/datasets/efederici/alpaca-gpt4-it) | 0 | 0 | 15,000 |
153
  | capybara-claude-15k-ita | [Link](https://huggingface.co/datasets/efederici/capybara-claude-15k-ita) | 0 | 0 | 15,000 |
154
  | Wildchat | [Link](https://huggingface.co/datasets/allenai/WildChat-1M) | 0 | 0 | 5,000 |
@@ -156,13 +156,13 @@ The SFT model was trained using [Llama-Factory](https://github.com/hiyouga/LLaMA
156
  | Safety Italian | - | 0 | 0 | 21,000 |
157
  | Handmade Italian | - | 0 | 0 | 2,000 |
158
 
159
- For more details please check [our tech report](https://nlp.uniroma1.it/minerva/blog#from-a-base-model-to-an-instruct-model).
160
 
161
  ### Online DPO Training
162
 
163
- This model card is for our DPO model. Direct Preference Optimization (DPO) is a method that refines models based on user feedback, similar to Reinforcement Learning from Human Feedback (RLHF), but without the complexity of reinforcement learning. Online DPO further improves this by allowing real-time adaptation during training, continuously refining the model with new feedback. For training this model, we used the [Hugging Face TRL](https://github.com/huggingface/trl) library and Online DPO, with the [Skywork/Skywork-Reward-Llama-3.1-8B-v0.2](https://huggingface.co/Skywork/Skywork-Reward-Llama-3.1-8B-v0.2) model as the judge to evaluate and guide optimization. For this stage we used just the prompts from HuggingFaceH4/ultrafeedback_binarized (English), efederici/evol-dpo-ita (Italian) and Babelscape/ALERT translated to Italian were used, with additional manually curated data for safety.
164
 
165
- For more details please check [our tech report](https://nlp.uniroma1.it/minerva/blog#from-a-base-model-to-an-instruct-model).
166
 
167
  ## Model Evaluation
168
 
@@ -275,5 +275,5 @@ Minerva-7B-base-v1.0 is a pretrained base model and, therefore, has no moderatio
275
 
276
  ## Acknowledgments
277
 
278
- This work was funded by the PNRR MUR project [PE0000013-FAIR](https://fondazione-fair.it).
279
- We acknowledge the [CINECA](https://www.cineca.it) award "IscB_medit" under the ISCRA initiative, for the availability of high performance computing resources and support.
 
53
 
54
  *This section identifies foreseeable harms and misunderstandings.*
55
 
56
+ This is a chat foundation model, subject to model alignment and safety risk mitigation strategies. However, the model may still:
57
 
58
  - Overrepresent some viewpoints and underrepresent others
59
  - Contain stereotypes
 
87
  )
88
 
89
  # Input text for the model.
90
+ input_conv = [{"role": "user", "content": "Qual è la capitale dell'Italia?"}]
91
 
92
  # Compute the outputs.
93
  output = pipeline(
 
103
 
104
  ## Model Architecture
105
 
106
+ Minerva-7B-base-v1.0 is a Transformer model based on the [Mistral](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) architecture.
107
  Please look at the configuration file for a detailed breakdown of the hyperparameters we chose for this model.
108
 
109
  The Minerva LLM family is composed of:
 
132
 
133
  | Dataset | Source | Code | English | Italian |
134
  |--------------------------------------|------------------------------------------------------------------------|----------|---------|---------|
135
+ | Glaive-code-assistant | [Link](https://huggingface.co/datasets/glaiveai/glaive-code-assistant) | 100,000 | 0 | 0 |
136
+ | Alpaca-python | [Link](https://huggingface.co/datasets/Vezora/Tested-143k-Python-Alpaca) | 20,000 | 0 | 0 |
137
  | Alpaca-cleaned | [Link](https://huggingface.co/datasets/yahma/alpaca-cleaned) | 0 | 50,000 | 0 |
138
  | Databricks-dolly-15k | [Link](https://huggingface.co/datasets/databricks/databricks-dolly-15k) | 0 | 15,011 | 0 |
139
  | No-robots | [Link](https://huggingface.co/datasets/HuggingFaceH4/no_robots) | 0 | 9,499 | 0 |
140
  | OASST2 | [Link](https://huggingface.co/datasets/OpenAssistant/oasst2) | 0 | 29,000 | 528 |
 
 
 
141
  | WizardLM | [Link](https://huggingface.co/datasets/WizardLMTeam/WizardLM_evol_instruct_70k) | 0 | 29,810 | 0 |
142
  | LIMA | [Link](https://huggingface.co/datasets/GAIR/lima?row=0) | 0 | 1,000 | 0 |
143
  | OPENORCA | [Link](https://huggingface.co/datasets/Open-Orca/OpenOrca) | 0 | 30,000 | 0 |
144
  | Ultrachat | [Link](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) | 0 | 50,000 | 0 |
145
  | MagpieMT | [Link](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-MT-300K-v0.1) | 0 | 30,000 | 0 |
146
  | Tulu-V2-Science | [Link](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture) | 0 | 7,000 | 0 |
147
+ | Aya_datasets | [Link](http://CohereForAI/aya_dataset) | 0 | 3,944 | 738 |
148
+ | Tower-blocks_it | [Link](https://huggingface.co/datasets/sapienzanlp/tower_blocks-v0.2_it) | 0 | 0 | 7,276 |
149
  | Bactrian-X | [Link](https://huggingface.co/datasets/MBZUAI/Bactrian-X) | 0 | 0 | 67,000 |
150
  | Magpie (*Translated by us*) | - | 0 | 0 | 60,000 |
151
  | Everyday-conversations (*Translated by us*) | - | 0 | 0 | 2,260 |
 
152
  | alpaca-gpt4-it | [Link](https://huggingface.co/datasets/efederici/alpaca-gpt4-it) | 0 | 0 | 15,000 |
153
  | capybara-claude-15k-ita | [Link](https://huggingface.co/datasets/efederici/capybara-claude-15k-ita) | 0 | 0 | 15,000 |
154
  | Wildchat | [Link](https://huggingface.co/datasets/allenai/WildChat-1M) | 0 | 0 | 5,000 |
 
156
  | Safety Italian | - | 0 | 0 | 21,000 |
157
  | Handmade Italian | - | 0 | 0 | 2,000 |
158
 
159
+ For more details, please check [our tech report](https://nlp.uniroma1.it/minerva/blog#from-a-base-model-to-an-instruct-model).
160
 
161
  ### Online DPO Training
162
 
163
+ This model card is for our DPO model. Direct Preference Optimization (DPO) is a method that refines models based on user feedback, similar to Reinforcement Learning from Human Feedback (RLHF), but without the complexity of reinforcement learning. Online DPO further improves this by allowing real-time adaptation during training, continuously refining the model with new feedback. For training this model, we used the [Hugging Face TRL](https://github.com/huggingface/trl) library and Online DPO, with the [Skywork/Skywork-Reward-Llama-3.1-8B-v0.2](https://huggingface.co/Skywork/Skywork-Reward-Llama-3.1-8B-v0.2) model as the judge to evaluate and guide optimization. For this stage we used just the prompts from HuggingFaceH4/ultrafeedback_binarized (English), efederici/evol-dpo-ita (Italian) and Babelscape/ALERT translated to Italian, with additional manually curated data for safety.
164
 
165
+ For more details, please check [our tech report](https://nlp.uniroma1.it/minerva/blog#from-a-base-model-to-an-instruct-model).
166
 
167
  ## Model Evaluation
168
 
 
275
 
276
  ## Acknowledgments
277
 
278
+ This work was funded by the PNRR MUR project [PE0000013-FAIR](https://fondazione-fair.it) and the CREATIVE PRIN Project.
279
+ We acknowledge the [CINECA](https://www.cineca.it) award "IscB_medit" under the ISCRA initiative for the availability of high-performance computing resources and support.