JackCloudman commited on
Commit
cfdcdaf
1 Parent(s): e1e5b35

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -0
README.md ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - argilla/ultrafeedback-binarized-preferences-cleaned
4
+ language:
5
+ - en
6
+ base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
7
+ library_name: transformers
8
+ pipeline_tag: text-generation
9
+ tags:
10
+ - dpo
11
+ - rlaif
12
+ - preference
13
+ - ultrafeedback
14
+ license: apache-2.0
15
+ model-index:
16
+ - name: notux-8x7b-v1
17
+ results: []
18
+ ---
19
+ <div align="center">
20
+ <h1>ExllamaV2 3.5bpw version</h1>
21
+ </div>
22
+ <div align="center">
23
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/60f0608166e5701b80ed3f02/dj-spsk9eXMMXVGxK6jRz.png" alt="A banner representing Notus, the wind god of the south, in a mythical and artistic style. The banner features a strong, swirling breeze, embodying the warm, wet character of the southern wind. Gracefully flowing across the scene are several paper planes, caught in the gentle yet powerful gusts of Notus. The background is a blend of warm colors, symbolizing the heat of the south, with hints of blue and green to represent the moisture carried by this wind. The overall atmosphere is one of dynamic movement and warmth."/>
24
+ </div>
25
+
26
+
27
+
28
+ # Model Card for Notux 8x7B-v1
29
+
30
+ This model is a preference-tuned version of [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) on the [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned) dataset using DPO (Direct Preference Optimization).
31
+
32
+ As of Dec 26th 2023, it outperforms `Mixtral-8x7B-Instruct-v0.1` and is the top ranked MoE (Mixture of Experts) model on the [Hugging Face Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
33
+
34
+ This is part of the Notus family of models and experiments, where the Argilla team investigates data-first and preference tuning methods like dDPO (distilled DPO). This model is the result of our first experiment at tuning a MoE model that has already been fine-tuned with DPO (i.e., Mixtral-8x7B-Instruct-v0.1).
35
+
36
+ ## Model Details
37
+
38
+ ### Model Description
39
+
40
+ - **Developed by:** Argilla (based on HuggingFace H4 and MistralAI previous efforts)
41
+ - **Shared by:** Argilla
42
+ - **Model type:** Pretrained generative Sparse Mixture of Experts
43
+ - **Language(s) (NLP):** Mainly English
44
+ - **License:** MIT
45
+ - **Finetuned from model:** [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
46
+
47
+ ### Model Sources
48
+
49
+ - **Repository:** https://github.com/argilla-io/notus
50
+ - **Paper:** N/A
51
+
52
+ ## Training Details
53
+
54
+ ### Training Hardware
55
+
56
+ We used a VM with 8 x H100 80GB hosted in runpod.io for 1 epoch (~10hr)
57
+
58
+ ### Training Data
59
+
60
+ We used a new iteration of the Argilla UltraFeedback preferences dataset named [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned).
61
+
62
+
63
+ ## Training procedure
64
+
65
+ ### Training hyperparameters
66
+
67
+ The following hyperparameters were used during training:
68
+ - learning_rate: 5e-07
69
+ - train_batch_size: 8
70
+ - eval_batch_size: 4
71
+ - seed: 42
72
+ - distributed_type: multi-GPU
73
+ - num_devices: 8
74
+ - total_train_batch_size: 64
75
+ - total_eval_batch_size: 32
76
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
77
+ - lr_scheduler_type: linear
78
+ - lr_scheduler_warmup_ratio: 0.1
79
+ - num_epochs: 1
80
+
81
+ ### Training results
82
+
83
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
84
+ |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
85
+ | 0.4384 | 0.22 | 200 | 0.4556 | -0.3275 | -1.9448 | 0.7937 | 1.6174 | -405.7994 | -397.8617 | -1.3157 | -1.4511 |
86
+ | 0.4064 | 0.43 | 400 | 0.4286 | -0.2163 | -2.2090 | 0.8254 | 1.9927 | -408.4409 | -396.7496 | -0.7660 | -0.6539 |
87
+ | 0.3952 | 0.65 | 600 | 0.4275 | -0.1311 | -2.1603 | 0.8016 | 2.0291 | -407.9537 | -395.8982 | -0.6783 | -0.7206 |
88
+ | 0.3909 | 0.87 | 800 | 0.4167 | -0.2273 | -2.3146 | 0.8135 | 2.0872 | -409.4968 | -396.8602 | -0.8458 | -0.7738 |
89
+
90
+
91
+ ### Framework versions
92
+
93
+ - Transformers 4.36.0
94
+ - Pytorch 2.1.0+cu118
95
+ - Datasets 2.14.6
96
+ - Tokenizers 0.15.0
97
+
98
+ ### An experiment by
99
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/62cdba3c7a036fc9941e9b65/G3uKA45G51GPuQ-jZ_gEz.png" width="720">