Blackroot
/

Mirai-70B-1.0

@@ -8,7 +8,10 @@ base_model:
 - PKU-Baichuan-MLSystemLab/Llama3-PBM-Nova-70B
 - tokyotech-llm/Llama-3.1-Swallow-70B-v0.1
 - Bllossom/llama-3-Korean-Bllossom-70B
-- DISLab/SummLlama3-70B
 tags:
 - roleplay
 - experimental
@@ -19,12 +22,24 @@ tags:
 [![Discord](https://img.shields.io/discord/232596713892872193?logo=discord)](https://discord.gg/2JhHVh7CGu)
 # Previous Rendition:
-[v0.2](https://huggingface.co/Blackroot/Mirai-70B-0.2)
-This is the 34th evolution, some changes from the prior model:
-# Add DISLab/SummLlama3-70B
-The writing ability of the model seems to have gotten better, although responses got shorter overall.
 # Model Architecture
@@ -42,7 +57,10 @@ Stock:
 - tokyotech-llm/Llama-3.1-Swallow-70B-v0.1
 - Bllossom/llama-3-Korean-Bllossom-70B
 - (Custom) MergedHistLlama-70B
-- DISLab/SummLlama3-70B
 The rest of this is going to go over both my rationalization for the approach, in the form of a totally unstructured series of rants, including my goal, and the observations I've made.

 - PKU-Baichuan-MLSystemLab/Llama3-PBM-Nova-70B
 - tokyotech-llm/Llama-3.1-Swallow-70B-v0.1
 - Bllossom/llama-3-Korean-Bllossom-70B
+- WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-70B
+- TsinghuaC3I/Llama-3-70B-UltraMedical
+- allenai/Llama-3.1-Tulu-3-70B
+- hitachi-nlp/Llama-3.1-70B-FLDx2
 tags:
 - roleplay
 - experimental
 [![Discord](https://img.shields.io/discord/232596713892872193?logo=discord)](https://discord.gg/2JhHVh7CGu)
 # Previous Rendition:
+[v0.2](https://huggingface.co/Blackroot/Mirai-70B-0.3)
+This is the 43rd evolution, some changes from the prior model:
+# Remove DISLab/SummLlama3-70B
+This model is certainly responsible for short responses, and I removed it because they were too short on average for my tastes. The writing style seemed to converge a bit as well, in a way that I found to get boring.
+# Add WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-70B
+Low impact model, seemed to contribute very slightly to a more behaved and interesting conversation. Strangely, this model actually seemed to primarily improve the instruct capabilities more than anything else, and I did not find any real evidence of downsides.
+# Add TsinghuaC3I/Llama-3-70B-UltraMedical
+I tried both this and aaditya/Llama3-OpenBioLLM-70B and I found that this was the better performer, and in fact open bio seemed to have a negative impact more than anything. I most strongly noticed a contribution to grammar and the general wordiness of the model. It also pretty clearly added some new info domains when asking raw knowledge questions. Likely a good addition.
+# Add allenai/Llama-3.1-Tulu-3-70B
+Tulu is back. I changed around how I do personality prompting and with all the other changes, I'd say this model is a fine addition now. It's quite hard for me to really get a measure on the changes, but it does seem to have improved instruct and likely contributed a bit to a sunnier overall personality. Unfortunately, I'm certain this added a moralization element, it's not overly strong but you can certainly pick up on it at times.
+# Add hitachi-nlp/Llama-3.1-70B-FLDx2
+Talk about unexpected. This might be the single best model in the entire merge. This model did so many things it's hard to count, but the impact is extreme in many areas. The lowest impact was to personality, which I did not notice a shift in at all. The logic, reasoning, and recall of the model improved dramatically, although it's somewhat more selective in its recall. The most substaintial thing is that repetition is very reduced, to such a high degree I've entirely turned off all repetition penalties and did not notice an issue. I think this model may have the most developed anti-copy heads of any model in the merge. This model was a total shock to me but a very welcome addition.
 # Model Architecture
 - tokyotech-llm/Llama-3.1-Swallow-70B-v0.1
 - Bllossom/llama-3-Korean-Bllossom-70B
 - (Custom) MergedHistLlama-70B
+- WhiteRabbitNeo/Llama-3.1-WhiteRabbitNeo-2-70B
+- TsinghuaC3I/Llama-3-70B-UltraMedical
+- allenai/Llama-3.1-Tulu-3-70B
+- hitachi-nlp/Llama-3.1-70B-FLDx2
 The rest of this is going to go over both my rationalization for the approach, in the form of a totally unstructured series of rants, including my goal, and the observations I've made.