open_llama_13b_alpaca_clean_dutch_qlora
Model description
This adapter model is a fine-tuned version of openlm-research/open_llama_13b. Finetuning was performed on the Dutch BramVanroy/alpaca-cleaned-dutch dataset which contains 52K of records with instruction following-data translated from English to Dutch.
See openlm-research/open_llama_13b for all information about the base model.
Model usage
A basic example of how to use the finetuned model.
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "robinsmits/open_llama_13b_alpaca_clean_dutch_qlora"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast = False, add_eos_token = True)
config = PeftConfig.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, load_in_8bit = True, device_map = "auto")
model = PeftModel.from_pretrained(model, model_name)
prompt = "### Instructie:\nWat zijn de drie belangrijkste softwareonderdelen die worden gebruikt bij webontwikkeling?\n\n### Antwoord:\n"
inputs = tokenizer(prompt, return_tensors = "pt", truncation = True).input_ids.cuda()
sample = model.generate(input_ids = inputs, max_new_tokens = 512, num_beams = 2, early_stopping = True, eos_token_id = tokenizer.eos_token_id)
output = tokenizer.decode(sample[0], skip_special_tokens = True)
print(output.split(prompt)[1])
The prompt and generated output for the above mentioned example is similar to the output shown below.
### Instructie:
Wat zijn de drie belangrijkste softwareonderdelen die worden gebruikt bij webontwikkeling?
### Antwoord:
Optional
1. HTML
2. CSS
3. JavaScript
For more extensive usage and a lot of generated samples (both good and bad samples) see the following Inference Notebook
Intended uses & limitations
The open_llama_13b model was primarily trained on the English language. Part of the dataset was a Wikipedia dump containing pages in 20 languages. Dutch was one of those languages. Given the size of the total dataset and the wikipedia part the Dutch language was very likely less than 0.5% of the total data.
The generated output and performance of this model for the Dutch language is very likely not always comparable to the various Open-Llama models that have been finetuned on English Alpaca datasets.
The primary intention of this model is to explore and research the use of the Dutch language in combination with an Open LLM model.
Training and evaluation data
This model was trained on the BramVanroy/alpaca-cleaned-dutch dataset.
Based on the dataset license only Non-Commercial use is allowed. Commercial use is strictly forbidden.
@misc{vanroy2023language,
title={Language Resources for {Dutch} Large Language Modelling},
author={Bram Vanroy},
year={2023},
eprint={2312.12852},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Training procedure
This model was finetuned with a QLoRA setup on a Google Colab A100 GPU in about 11 hours.
The notebook used for training can be found here: Training Notebook
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 64
- training_steps: 1536
The following bitsandbytes
quantization config was used during training:
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: bfloat16
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
1.0439 | 1.0 | 768 | 1.0496 |
0.9431 | 2.0 | 1536 | 0.9879 |
Framework versions
- Transformers 4.30.2
- Pytorch 2.0.1+cu118
- Datasets 2.13.1
- Tokenizers 0.13.3
- PEFT 0.4.0.dev0
- Downloads last month
- 2
Model tree for robinsmits/open_llama_13b_alpaca_clean_dutch_qlora
Base model
openlm-research/open_llama_13b