---
library_name: peft
base_model: mistralai/Mistral-7B-Instruct-v0.2
license: apache-2.0
language:
- en
---

# Suri-I-ORPO
Suri-I-ORPO is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 using instructional odds ratio preference optimization (I-ORPO). Please check [our paper](https://arxiv.org/abs/2406.19371) for more details on the method. 

## 📒 Model Details

### Model Description

- **Language(s) (NLP):** English
- **License:** Apache-2.0
- **Finetuned from model:** [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)

### Model Sources

- **Repository:** [Github repository](https://github.com/chtmp223/suri) -- contains code to reconstruct books3 subset. 
- **Paper:** TODO
- **Demo:** [Website](https://chtmp223.github.io/suri)

## ⚠️ Getting Started

Use the code in [this repository](https://github.com/chtmp223/suri) for training and inference. 


## 💻 Training Details

### Training Data

[chtmp223/suri](https://huggingface.co/datasets/chtmp223/suri)

### Training Procedure

| **Configurations**               | **Values**   |
|----------------------------------|--------------|
| Hardware (Training and Inference)| 4xA100s      |
| Tracking                         | wandb        |
| lora_r                           | 16           |
| lora_alpha                       | 16           |
| lora_dropout                     | 0.05         |
| beta                             | 0.4          |
| gradient_accumulation_steps      | 1            |
| gradient_checkpointing           | True         |
| learning_rate                    | 5.0e-5       |
| lr_scheduler_type                | cosine       |
| max_length                       | 15024        |
| max_completion_length            | 15000        |
| max_prompt_length                | 5000         |
| num_train_epochs                 | 2            |
| optim                            | adamw_torch  |
| per_device_train_batch_size      | 1            |

#### Software

Training code is adapted from [Alignment Handbook](https://github.com/huggingface/alignment-handbook) and [Trl](https://github.com/huggingface/trl).

## 🤗 Inference

```
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel, PeftConfig
from datasets import load_dataset
import torch
os.environ["TOKENIZERS_PARALLELISM"] = "False"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.cuda.empty_cache()

model_name = "chtmp223/suri-i-orpo"
base_model_name = "mistralai/Mistral-7B-Instruct-v0.2"
config = PeftConfig.from_pretrained(model_name)
base_model = AutoModelForCausalLM.from_pretrained(base_model_name).to(device)
model = PeftModel.from_pretrained(base_model, model_name).to(device)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
prompt = [
  {
      "role": "user",
      "content": user_prompt, 
  }
]
input_context = tokenizer.apply_chat_template(
  prompt, add_generation_prompt=True, tokenize=False
)
input_ids = tokenizer.encode(
  input_context, return_tensors="pt", add_special_tokens=False
).to(model.device)
output = model.generate(
  input_ids, max_length=10000, do_sample=True, use_cache=True
).cpu()

print(tokenizer.decode(output[0]))
```


## 📜 Citation 

```
@misc{pham2024surimulticonstraintinstructionfollowing,
      title={Suri: Multi-constraint Instruction Following for Long-form Text Generation}, 
      author={Chau Minh Pham and Simeng Sun and Mohit Iyyer},
      year={2024},
      eprint={2406.19371},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.19371}, 
}
```

### ⚙️ Framework versions

- PEFT 0.11.1