--- library_name: peft base_model: mistralai/Mistral-7B-Instruct-v0.2 license: apache-2.0 language: - en --- # Suri-I-ORPO Suri-I-ORPO is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 using instructional odds ratio preference optimization (I-ORPO). Please check [our paper](https://arxiv.org/abs/2406.19371) for more details on the method. ## 📒 Model Details ### Model Description - **Language(s) (NLP):** English - **License:** Apache-2.0 - **Finetuned from model:** [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) ### Model Sources - **Repository:** [Github repository](https://github.com/chtmp223/suri) -- contains code to reconstruct books3 subset. - **Paper:** TODO - **Demo:** [Website](https://chtmp223.github.io/suri) ## ⚠️ Getting Started Use the code in [this repository](https://github.com/chtmp223/suri) for training and inference. ## 💻 Training Details ### Training Data [chtmp223/suri](https://huggingface.co/datasets/chtmp223/suri) ### Training Procedure | **Configurations** | **Values** | |----------------------------------|--------------| | Hardware (Training and Inference)| 4xA100s | | Tracking | wandb | | lora_r | 16 | | lora_alpha | 16 | | lora_dropout | 0.05 | | beta | 0.4 | | gradient_accumulation_steps | 1 | | gradient_checkpointing | True | | learning_rate | 5.0e-5 | | lr_scheduler_type | cosine | | max_length | 15024 | | max_completion_length | 15000 | | max_prompt_length | 5000 | | num_train_epochs | 2 | | optim | adamw_torch | | per_device_train_batch_size | 1 | #### Software Training code is adapted from [Alignment Handbook](https://github.com/huggingface/alignment-handbook) and [Trl](https://github.com/huggingface/trl). ## 🤗 Inference ``` from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel, PeftConfig from datasets import load_dataset import torch os.environ["TOKENIZERS_PARALLELISM"] = "False" device = torch.device("cuda" if torch.cuda.is_available() else "cpu") torch.cuda.empty_cache() model_name = "chtmp223/suri-i-orpo" base_model_name = "mistralai/Mistral-7B-Instruct-v0.2" config = PeftConfig.from_pretrained(model_name) base_model = AutoModelForCausalLM.from_pretrained(base_model_name).to(device) model = PeftModel.from_pretrained(base_model, model_name).to(device) tokenizer = AutoTokenizer.from_pretrained(base_model_name) prompt = [ { "role": "user", "content": user_prompt, } ] input_context = tokenizer.apply_chat_template( prompt, add_generation_prompt=True, tokenize=False ) input_ids = tokenizer.encode( input_context, return_tensors="pt", add_special_tokens=False ).to(model.device) output = model.generate( input_ids, max_length=10000, do_sample=True, use_cache=True ).cpu() print(tokenizer.decode(output[0])) ``` ## 📜 Citation ``` @misc{pham2024surimulticonstraintinstructionfollowing, title={Suri: Multi-constraint Instruction Following for Long-form Text Generation}, author={Chau Minh Pham and Simeng Sun and Mohit Iyyer}, year={2024}, eprint={2406.19371}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2406.19371}, } ``` ### ⚙️ Framework versions - PEFT 0.11.1