brucethemoose commited on
Commit
c701d83
1 Parent(s): 2b543fb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -1
README.md CHANGED
@@ -1,5 +1,76 @@
1
  ---
2
  license: other
3
  license_name: yi-license
4
- license_link: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE
 
 
 
 
5
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
3
  license_name: yi-license
4
+ license_link: https://huggingface.co/01-ai/Yi-34B/blob/main/LICENSE
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ pipeline_tag: text-generation
9
  ---
10
+
11
+ **NousResearch/Nous-Capybara-34B** and **migtissera/Tess-M-v1.2** and **migtissera/Tess-M-v1.3** merged with a new, experimental implementation of "dare ties" via mergekit. See:
12
+
13
+ > Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
14
+
15
+ https://github.com/yule-BUAA/MergeLM
16
+
17
+ https://github.com/cg123/mergekit/tree/dare-tokenizer
18
+
19
+ This should yield a better merge than a typical linear/slerp merge.
20
+ ***
21
+
22
+ Merged with the following config, and the tokenizer from Yi Llamafied:
23
+ ```
24
+ models:
25
+ - model: /home/alpha/Storage/Models/Raw/larryvrh_Yi-34B-200K-Llamafied
26
+ # no parameters necessary for base model
27
+ - model: /home/alpha/Storage/Models/Raw/migtissera_Tess-M-v1.3
28
+ parameters:
29
+ weight: 0.50
30
+ density: 0.56
31
+ - model: /home/alpha/Storage/Models/Raw/migtissera_Tess-M-v1.2
32
+ parameters:
33
+ weight: 0.20
34
+ density: 0.50
35
+ - model: /home/alpha/Storage/Models/Raw/Nous-Capybara-34B
36
+ parameters:
37
+ weight: 0.50
38
+ density: 0.56
39
+ merge_method: dare_ties
40
+ base_model: /home/alpha/Storage/Models/Raw/larryvrh_Yi-34B-200K-Llamafied
41
+ parameters:
42
+ int8_mask: true
43
+ dtype: bfloat16
44
+ ```
45
+
46
+ Tess 1.2 and 1.3 were used because, according to the trainer, they were trained on different datasets: https://migel.substack.com/p/learnings-from-training-tess
47
+
48
+ ***
49
+
50
+ ## Prompt template: Orca-Vicuna
51
+
52
+ ```
53
+ SYSTEM: {system_message}
54
+ USER: {prompt}
55
+ ASSISTANT:
56
+
57
+ ```
58
+ Being a Yi model, try disabling the BOS token and/or running a lower temperature with MinP if output doesn't seem right.
59
+
60
+ Sometimes the model "spells out" the stop token as `</s>` like Capybara, so you may need to add `</s>` as an additional stopping condition.
61
+
62
+ ***
63
+
64
+ Credits:
65
+
66
+ https://github.com/cg123/mergekit/tree/dare-tokenizer
67
+
68
+ https://huggingface.co/NousResearch/Nous-Capybara-34B/
69
+
70
+ https://huggingface.co/migtissera/Tess-M-v1.2
71
+
72
+ https://huggingface.co/migtissera/Tess-M-v1.2
73
+
74
+ https://huggingface.co/larryvrh/Yi-34B-200K-Llamafied
75
+
76
+ https://huggingface.co/01-ai/Yi-34B-200K