DavidAU commited on
Commit
91774f6
·
verified ·
1 Parent(s): 235e7d3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -8
README.md CHANGED
@@ -4,7 +4,7 @@ license: mit
4
  Imatrix compressions of FP Merge of "D_AU-Mistral-7B-Instruct-v0.2-Bagel-DarkSapling-DPO-7B-v2.0".
5
 
6
  "Imatrix Plus" is an upgraded form of Imatrix which using full precision for specific parts of the compression.
7
- As a result all compressions will be slightly larger in size than standard 13B compressions.
8
 
9
  This method results in a higher quality model, especially at lower compressions.
10
  This method is applied across all compressions from IQ1 to Q8.
@@ -12,20 +12,56 @@ This method is applied across all compressions from IQ1 to Q8.
12
  Even IQ1_S - the most compressed verison - works well, however IQ4/Q4 are suggested as minimums for quality.
13
  Highest quality will be Q6/Q8.
14
 
15
- In addition the Imatrix file used to "fix" the compressed files post compression resulted in
16
- over 2 whole points lower perplexity at IQ1_S vs some of the other "Imatrix" files currently in use.
17
-
18
  This merge was an experiment to test already established Roleplay, Fiction and Story
19
  generation of "DarkSapling" with a some of "Bagel"'s qualities with a Mistral Instruct Base.
20
 
21
  For Imatrix plus this was a test of high precision in specific areas of the model leading to a slightly larger compressed file.
22
- In addition the Imatrix process itself used a larger "calibration" file than standard to further enhance quality.
23
 
24
- The process added appoximately 310 MB to each compressed file.
25
- An additional enhancement added another 200 mb to each compressed file.
26
 
27
  A blank or standard Alpaca Template for text generation will work.
28
 
29
  Context length: 32768.
30
 
31
- Please see the orginal model card for specific details of use, additional credits and tips:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  Imatrix compressions of FP Merge of "D_AU-Mistral-7B-Instruct-v0.2-Bagel-DarkSapling-DPO-7B-v2.0".
5
 
6
  "Imatrix Plus" is an upgraded form of Imatrix which using full precision for specific parts of the compression.
7
+ As a result all compressions will be slightly larger in size than standard 7B compressions.
8
 
9
  This method results in a higher quality model, especially at lower compressions.
10
  This method is applied across all compressions from IQ1 to Q8.
 
12
  Even IQ1_S - the most compressed verison - works well, however IQ4/Q4 are suggested as minimums for quality.
13
  Highest quality will be Q6/Q8.
14
 
 
 
 
15
  This merge was an experiment to test already established Roleplay, Fiction and Story
16
  generation of "DarkSapling" with a some of "Bagel"'s qualities with a Mistral Instruct Base.
17
 
18
  For Imatrix plus this was a test of high precision in specific areas of the model leading to a slightly larger compressed file.
19
+ In addition the Imatrix process itself used a larger "calibration" file than standard was used to further enhance quality.
20
 
21
+ The process added appoximately 250 MB to each compressed file.
22
+ An additional enhancement added another 250 mb to each compressed file.
23
 
24
  A blank or standard Alpaca Template for text generation will work.
25
 
26
  Context length: 32768.
27
 
28
+ Please see the orginal model card for specific details of use, additional credits and tips under "Models Merged" below.
29
+
30
+ # merge
31
+
32
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
33
+
34
+ ## Merge Details
35
+ ### Merge Method
36
+
37
+ This model was merged using the SLERP merge method.
38
+
39
+ ### Models Merged
40
+
41
+ The following models were included in the merge:
42
+ * [TeeZee/DarkSapling-7B-v2.0](https://huggingface.co/TeeZee/DarkSapling-7B-v2.0)
43
+ * [MaziyarPanahi/bagel-dpo-7b-v0.1-Mistral-7B-Instruct-v0.2-slerp](https://huggingface.co/MaziyarPanahi/bagel-dpo-7b-v0.1-Mistral-7B-Instruct-v0.2-slerp)
44
+
45
+ ### Configuration
46
+
47
+ The following YAML configuration was used to produce this model:
48
+
49
+ ```yaml
50
+ slices:
51
+ - sources:
52
+ - model: MaziyarPanahi/bagel-dpo-7b-v0.1-Mistral-7B-Instruct-v0.2-slerp
53
+ layer_range: [0, 32]
54
+ - model: TeeZee/DarkSapling-7B-v2.0
55
+ layer_range: [0, 32]
56
+ merge_method: slerp
57
+ base_model: MaziyarPanahi/bagel-dpo-7b-v0.1-Mistral-7B-Instruct-v0.2-slerp
58
+ parameters:
59
+ t:
60
+ - filter: self_attn
61
+ value: [0, 0.5, 0.3, 0.7, 1]
62
+ - filter: mlp
63
+ value: [1, 0.5, 0.7, 0.3, 0]
64
+ - value: 0.5
65
+ dtype: bfloat16
66
+
67
+ ```