--- license: llama2 tags: - alignment-handbook - trl - dpo - generated_from_trainer base_model: llama-2-nl/Llama-2-7b-hf-lora-original-sft datasets: - BramVanroy/ultra_feedback_dutch model-index: - name: Llama-2-7b-hf-lora-original-it results: [] --- # ChocoLlama-2-7B-instruct This model is a fine-tuned version of [ChocoLlama/ChocoLlama-2-7B-base](https://huggingface.co/ChocoLlama/ChocoLlama-2-7B-base) on the BramVanroy/ultra_feedback_dutch dataset. It achieves the following results on the evaluation set: - Loss: 0.3536 - Rewards/chosen: 0.1143 - Rewards/rejected: -0.9295 - Rewards/accuracies: 0.9396 - Rewards/margins: 1.0437 - Logps/rejected: -547.4578 - Logps/chosen: -600.8353 - Logits/rejected: -0.8732 - Logits/chosen: -0.9594 # Use the model ``` from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained('ChocoLlama/ChocoLlama-2-7B-instruct') model = AutoModelForCausalLM.from_pretrained('ChocoLlama/ChocoLlama-2-7B-instruct', device_map="auto") messages = [ {"role": "system", "content": "Je bent een artificiële intelligentie-assistent en geeft behulpzame, gedetailleerde en beleefde antwoorden op de vragen van de gebruiker."}, {"role": "user", "content": "Jacques brel, Willem Elsschot en Jan Jambon zitten op café. Waar zouden ze over babbelen?"}, ] input_ids = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to(model.device) new_terminators = [ tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>") ] outputs = model.generate( input_ids, max_new_tokens=512, eos_token_id=new_terminators, do_sample=True, temperature=0.8, top_p=0.95, ) response = outputs[0][input_ids.shape[-1]:] print(tokenizer.decode(response, skip_special_tokens=True)) ``` ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 4 - total_train_batch_size: 64 - total_eval_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.5984 | 0.1327 | 100 | 0.5904 | 0.0549 | -0.1735 | 0.9030 | 0.2283 | -539.8975 | -601.4293 | -1.1606 | -1.1395 | | 0.4622 | 0.2653 | 200 | 0.4581 | 0.1134 | -0.4980 | 0.9351 | 0.6113 | -543.1426 | -600.8441 | -1.2714 | -1.2180 | | 0.3934 | 0.3980 | 300 | 0.3959 | 0.1263 | -0.7212 | 0.9366 | 0.8475 | -545.3747 | -600.7144 | -1.0528 | -1.0755 | | 0.3629 | 0.5307 | 400 | 0.3674 | 0.1170 | -0.8608 | 0.9381 | 0.9777 | -546.7705 | -600.8080 | -1.1109 | -1.1154 | | 0.3556 | 0.6633 | 500 | 0.3561 | 0.1136 | -0.9146 | 0.9388 | 1.0282 | -547.3090 | -600.8419 | -0.8266 | -0.9289 | | 0.3488 | 0.7960 | 600 | 0.3540 | 0.1104 | -0.9310 | 0.9410 | 1.0415 | -547.4734 | -600.8737 | -1.0676 | -1.0877 | | 0.3563 | 0.9287 | 700 | 0.3540 | 0.1166 | -0.9259 | 0.9396 | 1.0425 | -547.4224 | -600.8121 | -0.8736 | -0.9600 | ### Framework versions - Transformers 4.40.1 - Pytorch 2.1.2+cu121 - Datasets 2.19.0 - Tokenizers 0.19.1