Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<h1 style="text-align: center">MPT-7b-InstructAndStorywriting-75_25-Merge </h1>
|
2 |
+
<h2 style="text-align: center">An merge between the long context Storywriting and the short context instruct MPT-7b models.</h2>
|
3 |
+
|
4 |
+
## Model Details:
|
5 |
+
|
6 |
+
This model is a merge between the following two MPT-7b models:
|
7 |
+
|
8 |
+
- 2048 CTX MTP-7b Instruct: https://huggingface.co/spaces/mosaicml/mpt-7b-instruct
|
9 |
+
- 65k CTX MTP-7b Storywriter: https://huggingface.co/mosaicml/mpt-7b-storywriter
|
10 |
+
|
11 |
+
This merge was done using an weighted average merge strategy, and the end result is a model composed of:
|
12 |
+
|
13 |
+
MTP-7b Storywriter [75%] + MTP-7b Instruct [25%]
|
14 |
+
|
15 |
+
----
|
16 |
+
|
17 |
+
This was done under for the sake of testing the theory of how long context tunes affect attention when merged with a model that has been trained for a different purpose, on a shorter context span.
|
18 |
+
Different from the first merge (That sports a 50/50 ratio)[https://huggingface.co/TehVenom/mpt-7b-InstructAndStorywriting-50_50-Merge], this one is lopsided towards the Instruct base model to have another comparison point for the effects of CTX span merging, and to have a model that is primarily focused on Instruct.
|
19 |
+
|
20 |
+
The end result is intended to be model that is capable of following the Instruct base's Assistant / Instruct / Helpful properties, while drawing some creativity for long prose.
|
21 |
+
|
22 |
+
Due to the influence of MPT-7b Storywriter, this model may generate content that is considered NSFW due to the wide array of books sampled for MPT-7b Storywriter.
|
23 |
+
|
24 |
+
The specific prompting is unknown, but try approaching it as a story / text completion prompt style first,
|
25 |
+
then a mix of that and Alpaca's instruct format to see what brings most interesting results.
|
26 |
+
|
27 |
+
Read the original model card to understand how to run inference on this model.
|