TehVenom
/

mpt-7b-InstructAndStorywriting-75_25-Merge

Text Generation

text-generation-inference

Model card Files Files and versions Community

mpt-7b-InstructAndStorywriting-75_25-Merge / README.md

TehVenom's picture

Update README.md

15535fc about 1 year ago

|

raw history blame contribute delete

1.77 kB

	<h1 style="text-align: center">MPT-7b-InstructAndStorywriting-75_25-Merge </h1>
	<h2 style="text-align: center">A merge between the long context Storywriting and the short context instruct MPT-7b models.</h2>

	## Model Details:

	This model is a merge between the following two MPT-7b models:

	- 2048 CTX MTP-7b Instruct: https://huggingface.co/TehVenom/MPT-7b-instruct-V
	- 65k CTX MTP-7b Storywriter: https://huggingface.co/TehVenom/MPT-7b-storywriter-Apache-2.0/

	This merge was done using an weighted average merge strategy, and the end result is a model composed of:

	MTP-7b Storywriter [25%] + MTP-7b Instruct [75%]

	----

	This was done under for the sake of testing the theory of how long context tunes affect attention when merged with a model that has been trained for a different purpose, on a shorter context span.
	Different from the first merge [(That sports a 50/50 ratio)](https://huggingface.co/TehVenom/mpt-7b-InstructAndStorywriting-50_50-Merge), this one is lopsided towards the Instruct base model to have another comparison point for the effects of CTX span merging, and to have a model that is primarily focused on Instruct.

	The end result is intended to be model that is capable of following the Instruct base's Assistant / Instruct / Helpful properties, while drawing some creativity for long prose.

	Due to the influence of MPT-7b Storywriter, this model may generate content that is considered NSFW due to the wide array of books sampled for MPT-7b Storywriter.

	The specific prompting is unknown, but try approaching it as a story / text completion prompt style first,
	then a mix of that and Alpaca's instruct format to see what brings most interesting results.

	Read the original model card to understand how to run inference on this model.