chtmp223
/

suri-i-orpo

Model card Files Files and versions Community

suri-i-orpo / README.md

chtmp223's picture

Update README.md

ff83217 verified 2 months ago

|

history blame contribute delete

No virus

3.68 kB

	---
	library_name: peft
	base_model: mistralai/Mistral-7B-Instruct-v0.2
	license: apache-2.0
	language:
	- en
	---

	# Suri-I-ORPO
	Suri-I-ORPO is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 using instructional odds ratio preference optimization (I-ORPO). Please check [our paper](https://arxiv.org/abs/2406.19371) for more details on the method.

	## 📒 Model Details

	### Model Description

	- Language(s) (NLP): English
	- License: Apache-2.0
	- Finetuned from model: [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)

	### Model Sources

	- Repository: [Github repository](https://github.com/chtmp223/suri) -- contains code to reconstruct books3 subset.
	- Paper: TODO
	- Demo: [Website](https://chtmp223.github.io/suri)

	## ⚠️ Getting Started

	Use the code in [this repository](https://github.com/chtmp223/suri) for training and inference.


	## 💻 Training Details

	### Training Data

	[chtmp223/suri](https://huggingface.co/datasets/chtmp223/suri)

	### Training Procedure

	\| Configurations \| Values \|
	\|----------------------------------\|--------------\|
	\| Hardware (Training and Inference)\| 4xA100s \|
	\| Tracking \| wandb \|
	\| lora_r \| 16 \|
	\| lora_alpha \| 16 \|
	\| lora_dropout \| 0.05 \|
	\| beta \| 0.4 \|
	\| gradient_accumulation_steps \| 1 \|
	\| gradient_checkpointing \| True \|
	\| learning_rate \| 5.0e-5 \|
	\| lr_scheduler_type \| cosine \|
	\| max_length \| 15024 \|
	\| max_completion_length \| 15000 \|
	\| max_prompt_length \| 5000 \|
	\| num_train_epochs \| 2 \|
	\| optim \| adamw_torch \|
	\| per_device_train_batch_size \| 1 \|

	#### Software

	Training code is adapted from [Alignment Handbook](https://github.com/huggingface/alignment-handbook) and [Trl](https://github.com/huggingface/trl).

	## 🤗 Inference

	```
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel, PeftConfig
	from datasets import load_dataset
	import torch
	os.environ["TOKENIZERS_PARALLELISM"] = "False"
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	torch.cuda.empty_cache()

	model_name = "chtmp223/suri-i-orpo"
	base_model_name = "mistralai/Mistral-7B-Instruct-v0.2"
	config = PeftConfig.from_pretrained(model_name)
	base_model = AutoModelForCausalLM.from_pretrained(base_model_name).to(device)
	model = PeftModel.from_pretrained(base_model, model_name).to(device)
	tokenizer = AutoTokenizer.from_pretrained(base_model_name)
	prompt = [
	{
	"role": "user",
	"content": user_prompt,
	}
	]
	input_context = tokenizer.apply_chat_template(
	prompt, add_generation_prompt=True, tokenize=False
	)
	input_ids = tokenizer.encode(
	input_context, return_tensors="pt", add_special_tokens=False
	).to(model.device)
	output = model.generate(
	input_ids, max_length=10000, do_sample=True, use_cache=True
	).cpu()

	print(tokenizer.decode(output[0]))
	```


	## 📜 Citation

	```
	@misc{pham2024surimulticonstraintinstructionfollowing,
	title={Suri: Multi-constraint Instruction Following for Long-form Text Generation},
	author={Chau Minh Pham and Simeng Sun and Mohit Iyyer},
	year={2024},
	eprint={2406.19371},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2406.19371},
	}
	```

	### ⚙️ Framework versions

	- PEFT 0.11.1