Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ For a final model composed of:
|
|
15 |
|
16 |
----
|
17 |
|
18 |
-
This was done for the sake of testing the theory of how long context tunes affect attention when merged with a model that has been trained for a different purpose, on a shorter context span.
|
19 |
Different from the first merge [(That sports a 50/50 ratio)](https://huggingface.co/TehVenom/mpt-7b-InstructAndStorywriting-50_50-Merge), this one is lopsided towards the Instruct base model to have another comparison point for the effects of CTX span merging, and to have a model that is primarily focused on Instruct.
|
20 |
|
21 |
There are two objectives for this merge, first one is to see how much out of the 65k-Storywriter model is necessart to raise the ceiling of the final model's context size,
|
|
|
15 |
|
16 |
----
|
17 |
|
18 |
+
This was done for the sake of testing the theory of how 'long context' tunes affect attention when merged with a model that has been trained for a different purpose, on a shorter context span.
|
19 |
Different from the first merge [(That sports a 50/50 ratio)](https://huggingface.co/TehVenom/mpt-7b-InstructAndStorywriting-50_50-Merge), this one is lopsided towards the Instruct base model to have another comparison point for the effects of CTX span merging, and to have a model that is primarily focused on Instruct.
|
20 |
|
21 |
There are two objectives for this merge, first one is to see how much out of the 65k-Storywriter model is necessart to raise the ceiling of the final model's context size,
|