--- license: apache-2.0 pipeline_tag: text-generation tags: - finetuned - lora inference: true widget: - messages: - role: user content: What is your favorite condiment? --- ## Training Details 30k chat sessions with a total of <= 1024 tokens were selected from the [sarvamai/samvaad-hi-v1](https://huggingface.co/datasets/sarvamai/samvaad-hi-v1) dataset, with 2k sessions reserved for the test set. The Lora adapter is utilized and fine-tuned using SFT TRL. Test set loss: | Model | Loss | |-----------------------|------| | Mistral-Hinglish-Instruct | 0.8 | | Mistral-Instruct | 1.8 | ## Instruction format In order to leverage instruction fine-tuning, your prompt should be surrounded by `[INST]` and `[/INST]` tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id. E.g. ``` text = "[INST] What is your favourite condiment? [/INST]" "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen! " "[INST] Do you have mayonnaise recipes? [/INST]" ``` This format is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method: ```python from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" # the device to load the model onto model = AutoModelForCausalLM.from_pretrained("arshadshk/Mistral-Hinglish-7B-Instruct-v0.2") tokenizer = AutoTokenizer.from_pretrained("arshadshk/Mistral-Hinglish-7B-Instruct-v0.2") messages = [ {"role": "user", "content": "What is your favourite condiment?"}, {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}, {"role": "user", "content": "Do you have mayonnaise recipes?"} ] encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt") model_inputs = encodeds.to(device) model.to(device) generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True) decoded = tokenizer.batch_decode(generated_ids) print(decoded[0]) ```