Update README.md

d2150ee verified 6 months ago

3.65 kB

	---
	license: other
	license_name: yi-license
	license_link: LICENSE
	datasets:
	- adamo1139/toxic-dpo-natural-v5
	- adamo1139/AEZAKMI_v3-7
	- adamo1139/rawrr_v2-2_stage1
	---

	## Model description



	Yi-34B 200K XLCTX base model fine-tuned on adamo1139/rawrr_v2-2_stage1 (DPO), adamo1139/AEZAKMI_v3-7 (SFT) and adamo1139/toxic-dpo-natural-v5 (ORPO) datasets. Training took around 7 (DPO) + 13 (SFT) + 3 (ORPO) = 23 hours total on RTX 3090 Ti, all finetuning was done locally. This is excluding failed attempts and issues I had with merging script, that basically made me run DPO and SFT stages 2 times over because I thought that my LoRAs were broken, but it turned out to be some bug with new transformers/peft versions.
	This model is tuned to use more natural language and also be very uncensored.
	Say goodbye to "It's important to remember"! \
	Prompt format is standard chatml. Don't expect it to be good at math, riddles or be crazy smart. My end goal with AEZAKMI is to create a cozy free chatbot.
	Cost of this fine-tune is about $5-$10 in electricity.
	Base model used for fine-tuning was Yi-34B-200K model shared by 01.ai, the newer version that has improved long context needle in a haystack retrieval. They didn't give it a new name, giving it numbers would mess up AEZAKMI naming scheme by adding a second number, so I will be calling it XLCTX.

	[You can see examples of responses to various prompts here (loaded with transformers load_in_4bit)](https://huggingface.co/datasets/adamo1139/misc/blob/main/benchmarks/yi-34b-200k-xlctx-aezakmi-raw-toxic-natural-orpo-0205/benchmark_prompts.txt)

	I had to lower max_positional_embeddings in config.json and model_max_length for training to start, otherwise I was OOMing straight away.
	This attempt had both max_position_embeddings and model_max_length set to 4096, which worked perfectly fine. I then reversed this to 200000 once I was uploading it.
	I think it should keep long context capabilities of the base model should be present here.

	If you want to see training scripts, let me know and I will upload them. LoRAs are uploaded [here adamo1139/yi-34b-200k-xlctx-aezakmi-raw-toxic-dpo-sft-orpo-lora-0205](https://huggingface.co/adamo1139/yi-34b-200k-xlctx-aezakmi-raw-toxic-dpo-sft-orpo-lora-0205)

	## Quants!

	EXL2 quant coming soon, I plan to make and upload something around 4.65bpw, it should work nicely with q4 cache in exllama2


	## Prompt Format

	I recommend using ChatML format, as this was used during fine-tune. \
	Here's a prompt format you should use, you can set a different system message, model was trained on SystemChat dataset, so it should respect system prompts fine.

	```
	<\|im_start\|>system
	A chat.<\|im_end\|>
	<\|im_start\|>user
	{prompt}<\|im_end\|>
	<\|im_start\|>assistant
	```

	## Intended uses & limitations

	Use is limited by Yi license. \
	Some datasets that were used prohibit commercial use (no_robots with CC-BY-NC-4.0), so I think you should use non-commercially only, unless you know law better and think it doesn't matter.

	## Known Issues

	I haven't found any yet.

	## Credits

	Thanks to unsloth and huggingface team for providing software packages used during fine-tuning. \
	Thanks to Jon Durbin, abacusai, huggingface, sandex, NobodyExistsOnTheInternet, Nous-Research, lmsys, PygmalionAI for open sourcing datasets I included in the AEZAKMI dataset. \
	AEZAKMI is basically a mix of open source datasets I found on HF, so without them this would not be possible at all.

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" alt="made with Unsloth" width="400" height="64"/>](https://github.com/unslothai/unsloth)