robinsmits's picture
Update README.md
12e189e
|
raw
history blame
3.81 kB
metadata
license: cc-by-nc-4.0
inference: false
datasets:
  - BramVanroy/alpaca-cleaned-dutch
base_model: DAMO-NLP-MT/polylm-1.7b
tags:
  - generated_from_trainer
  - alpaca
  - Transformers
  - PolyLM
  - text-generation-inference
model-index:
  - name: polylm_1.7b_ft_alpaca_clean_dutch
    results: []
language:
  - nl
library_name: peft
pipeline_tag: text-generation

polylm_1.7b_ft_alpaca_clean_dutch

This adapter model is a fine-tuned version of DAMO-NLP-MT/polylm-1.7b. It achieves the following results on the evaluation set:

  • Loss: 1.8483

Finetuning was performed on the Dutch BramVanroy/alpaca-cleaned-dutch dataset which contains 52K of records with instruction following-data translated from English to Dutch.

See DAMO-NLP-MT/polylm-1.7b for all information about the base model.

Model description

More information needed

Intended uses & limitations

The PolyLM-1.7B model was trained on 18 languages. The primary focus was to create a multi-lingual Open LLM. Dutch was one of those 18 languages. For training the model a diverse combination of multi-lingual datasets was used.

The generated output and performance of this model for the Dutch language is very likely not always comparable to the various Open-Llama models that have been finetuned on English Alpaca datasets.

The primary intention of this finetuned model is to explore and research the use of the Dutch language in combination with an Open LLM model.

Training and evaluation data

This model was trained on the BramVanroy/alpaca-cleaned-dutch dataset.

The dataset is the Dutch translation of the English Alpaca Cleaned instruction dataset.

Based on the dataset license only Non-Commercial use is allowed. Commercial use is strictly forbidden.

Training procedure

This model was finetuned with a QLoRA setup on a Google Colab A100 GPU in about 1.5 hours.

The notebook used for training can be found here: Training Notebook

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 64
  • num_epochs: 2

The following bitsandbytes quantization config was used during training:

  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: True
  • bnb_4bit_compute_dtype: bfloat16

Training results

Training Loss Epoch Step Validation Loss
2.1248 0.16 128 2.1129
2.0512 0.33 256 2.0347
1.9983 0.49 384 1.9948
1.9557 0.66 512 1.9655
1.9583 0.82 640 1.9386
1.916 0.99 768 1.9177
1.8671 1.15 896 1.9019
1.8626 1.32 1024 1.8885
1.8321 1.48 1152 1.8762
1.8596 1.65 1280 1.8631
1.843 1.81 1408 1.8539
1.8333 1.98 1536 1.8483

Framework versions

  • Transformers 4.31.0
  • Pytorch 2.0.1+cu118
  • Datasets 2.13.1
  • Tokenizers 0.13.3
  • PEFT 0.4.0