Blackroot commited on
Commit
20c2331
·
verified ·
1 Parent(s): d6feaef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -6
README.md CHANGED
@@ -8,7 +8,10 @@ base_model:
8
  - PKU-Baichuan-MLSystemLab/Llama3-PBM-Nova-70B
9
  - tokyotech-llm/Llama-3.1-Swallow-70B-v0.1
10
  - Bllossom/llama-3-Korean-Bllossom-70B
11
- - DISLab/SummLlama3-70B
 
 
 
12
  tags:
13
  - roleplay
14
  - experimental
@@ -19,12 +22,24 @@ tags:
19
  [![Discord](https://img.shields.io/discord/232596713892872193?logo=discord)](https://discord.gg/2JhHVh7CGu)
20
 
21
  # Previous Rendition:
22
- [v0.2](https://huggingface.co/Blackroot/Mirai-70B-0.2)
23
 
24
- This is the 34th evolution, some changes from the prior model:
25
 
26
- # Add DISLab/SummLlama3-70B
27
- The writing ability of the model seems to have gotten better, although responses got shorter overall.
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  # Model Architecture
30
 
@@ -42,7 +57,10 @@ Stock:
42
  - tokyotech-llm/Llama-3.1-Swallow-70B-v0.1
43
  - Bllossom/llama-3-Korean-Bllossom-70B
44
  - (Custom) MergedHistLlama-70B
45
- - DISLab/SummLlama3-70B
 
 
 
46
 
47
  The rest of this is going to go over both my rationalization for the approach, in the form of a totally unstructured series of rants, including my goal, and the observations I've made.
48
 
 
8
  - PKU-Baichuan-MLSystemLab/Llama3-PBM-Nova-70B
9
  - tokyotech-llm/Llama-3.1-Swallow-70B-v0.1
10
  - Bllossom/llama-3-Korean-Bllossom-70B
11
+ - WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-70B
12
+ - TsinghuaC3I/Llama-3-70B-UltraMedical
13
+ - allenai/Llama-3.1-Tulu-3-70B
14
+ - hitachi-nlp/Llama-3.1-70B-FLDx2
15
  tags:
16
  - roleplay
17
  - experimental
 
22
  [![Discord](https://img.shields.io/discord/232596713892872193?logo=discord)](https://discord.gg/2JhHVh7CGu)
23
 
24
  # Previous Rendition:
25
+ [v0.2](https://huggingface.co/Blackroot/Mirai-70B-0.3)
26
 
27
+ This is the 43rd evolution, some changes from the prior model:
28
 
29
+ # Remove DISLab/SummLlama3-70B
30
+ This model is certainly responsible for short responses, and I removed it because they were too short on average for my tastes. The writing style seemed to converge a bit as well, in a way that I found to get boring.
31
+
32
+ # Add WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-70B
33
+ Low impact model, seemed to contribute very slightly to a more behaved and interesting conversation. Strangely, this model actually seemed to primarily improve the instruct capabilities more than anything else, and I did not find any real evidence of downsides.
34
+
35
+ # Add TsinghuaC3I/Llama-3-70B-UltraMedical
36
+ I tried both this and aaditya/Llama3-OpenBioLLM-70B and I found that this was the better performer, and in fact open bio seemed to have a negative impact more than anything. I most strongly noticed a contribution to grammar and the general wordiness of the model. It also pretty clearly added some new info domains when asking raw knowledge questions. Likely a good addition.
37
+
38
+ # Add allenai/Llama-3.1-Tulu-3-70B
39
+ Tulu is back. I changed around how I do personality prompting and with all the other changes, I'd say this model is a fine addition now. It's quite hard for me to really get a measure on the changes, but it does seem to have improved instruct and likely contributed a bit to a sunnier overall personality. Unfortunately, I'm certain this added a moralization element, it's not overly strong but you can certainly pick up on it at times.
40
+
41
+ # Add hitachi-nlp/Llama-3.1-70B-FLDx2
42
+ Talk about unexpected. This might be the single best model in the entire merge. This model did so many things it's hard to count, but the impact is extreme in many areas. The lowest impact was to personality, which I did not notice a shift in at all. The logic, reasoning, and recall of the model improved dramatically, although it's somewhat more selective in its recall. The most substaintial thing is that repetition is very reduced, to such a high degree I've entirely turned off all repetition penalties and did not notice an issue. I think this model may have the most developed anti-copy heads of any model in the merge. This model was a total shock to me but a very welcome addition.
43
 
44
  # Model Architecture
45
 
 
57
  - tokyotech-llm/Llama-3.1-Swallow-70B-v0.1
58
  - Bllossom/llama-3-Korean-Bllossom-70B
59
  - (Custom) MergedHistLlama-70B
60
+ - WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-70B
61
+ - TsinghuaC3I/Llama-3-70B-UltraMedical
62
+ - allenai/Llama-3.1-Tulu-3-70B
63
+ - hitachi-nlp/Llama-3.1-70B-FLDx2
64
 
65
  The rest of this is going to go over both my rationalization for the approach, in the form of a totally unstructured series of rants, including my goal, and the observations I've made.
66